I have about 5 million records in a database table (MySQL 5.6), and I wanted to get the last 2 dates, grouped by ID. Looking around the web, I found examples that allowed me to cobble together the following…
SELECT id, date FROM ( SELECT *, @id_rank := IF(@current_id = id, @id_rank + 1, 1) AS id_rank, @current_id := id FROM `data` ORDER BY id DESC, date DESC ) ranked WHERE id_rank <= 2 ORDER BY id ASC, date DESC
Running this code from MySQL Workbench, returned 5,700 rows, which is what I expected. I then tried to call this SQL as-is from PHP, using the following SQL string…
$subSql = "SELECT *, " . " @id_rank := IF(@current_id = id, @id_rank + 1, 1) AS id_rank, " . " @current_id := id " . "FROM `data` " . "ORDER BY id DESC, date DESC"; $sql = "SELECT id, date " . "FROM ($subSql) ranked " . "WHERE id_rank <= 2 " . "ORDER BY id ASC, date DESC";
However, running this code resulted in out-of-memory. I them modified the code to return a count of the number of records expected, and instead of the expected 5,700 rows, it returned 4,925,479. So my question is “what do I have to change in my PHP SQL above, to get the correct results that I was getting from MySQL Workbench”.
Advertisement
Answer
Assigning variables in MySQL has been deprecated. You should be using window functions:
SELECT id, date FROM (SELECT d.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) as id_rank FROM `data` d ) d WHERE id_rank <= 2 ORDER BY id ASC, date DESC;
Even in older versions your code is not guaranteed to work because MySQL does not guarantee the order of evaluation of expressions. You are assigning the variable in one expression and using it in another — and those can go in either order.
If your date
s are unique per id
, you can do this without variables:
select id, date from (select d.*, (select d2.date from data d2 where d2.id = d.id and d2.date >= d.date order by d2.date desc limit 1 offset 1 ) as date_2 from data d ) d where date >= date_2 or date_2 is null;
For performance, you want an index on data(id, date)
.