I have about 5 million records in a database table (MySQL 5.6), and I wanted to get the last 2 dates, grouped by ID. Looking around the web, I found examples that allowed me to cobble together the following…
SELECT id, date
FROM
(
SELECT *,
@id_rank := IF(@current_id = id, @id_rank + 1, 1) AS id_rank,
@current_id := id
FROM `data`
ORDER BY id DESC, date DESC
) ranked
WHERE id_rank <= 2
ORDER BY id ASC, date DESC
Running this code from MySQL Workbench, returned 5,700 rows, which is what I expected. I then tried to call this SQL as-is from PHP, using the following SQL string…
$subSql =
"SELECT *, " .
" @id_rank := IF(@current_id = id, @id_rank + 1, 1) AS id_rank, " .
" @current_id := id " .
"FROM `data` " .
"ORDER BY id DESC, date DESC";
$sql =
"SELECT id, date " .
"FROM ($subSql) ranked " .
"WHERE id_rank <= 2 " .
"ORDER BY id ASC, date DESC";
However, running this code resulted in out-of-memory. I them modified the code to return a count of the number of records expected, and instead of the expected 5,700 rows, it returned 4,925,479. So my question is “what do I have to change in my PHP SQL above, to get the correct results that I was getting from MySQL Workbench”.
Advertisement
Answer
Assigning variables in MySQL has been deprecated. You should be using window functions:
SELECT id, date
FROM (SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) as id_rank
FROM `data` d
) d
WHERE id_rank <= 2
ORDER BY id ASC, date DESC;
Even in older versions your code is not guaranteed to work because MySQL does not guarantee the order of evaluation of expressions. You are assigning the variable in one expression and using it in another — and those can go in either order.
If your date
s are unique per id
, you can do this without variables:
select id, date
from (select d.*,
(select d2.date
from data d2
where d2.id = d.id and d2.date >= d.date
order by d2.date desc
limit 1 offset 1
) as date_2
from data d
) d
where date >= date_2 or date_2 is null;
For performance, you want an index on data(id, date)
.