Skip to content
Advertisement

MySQL – Performance issue while joining most recent

I have two tables, markets (27 records) and histories (~1.75M records, ~67K per market).

I need to get every market with its most recent histories record. The solutions I tried work but are incredibly slow.

Tables DDL

What I tried

1 – Uncorrelated subquery

I started with this solution since I used it other times, it takes ~7.5s:

EXPLAIN result:

2 – Using WITH

I tried to run the subquery using WITH but with no improvement, still ~7.5s:

EXPLAIN result (identical to the previous one)

3 – Using WITH and window functions

I then upgraded from 5.7 to 8.0.22 in order to try this other suggested method, which takes even more: ~11s

EXPLAIN result:

Additional info

I then saw that the subquery alone (MAX and GROUP BY), needed in solutions 1 and 2, takes ~7.5s!

So I believe that there should be something fundamentally wrong with histories structure/indexes, rather than the way I’m joining markets with it. To be clear, that’s the one I’m referring to:

EXPLAIN result:

time is an int representing a Unix timestamp, id could be used but it does not improve performance.

Of course I tried to look at other questions/posts regarding the performance issue with this group by, but I’m still not able to fix it.

Advertisement

Answer

One alternative would be to filter with a subquery:

For performance you want an index on histories(market_id, is_final, time desc).

Since you want just two columns from the histories table, you might also want to consider using two subqueries:

The same index would help the query – we could even add close at the end of the index, so: histories(market_id, is_final, time desc, close).

FInally: un very recent versions of MySQL, you could try a lateral join:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement