Skip to content
Advertisement

SQL script behaves differently when run from PHP

I have about 5 million records in a database table (MySQL 5.6), and I wanted to get the last 2 dates, grouped by ID. Looking around the web, I found examples that allowed me to cobble together the following…

SELECT id, date 
FROM 
(
    SELECT *,   
    @id_rank := IF(@current_id = id, @id_rank + 1, 1) AS id_rank,   
    @current_id := id 
    FROM `data` 
    ORDER BY id DESC, date DESC
) ranked 
WHERE id_rank <= 2 
ORDER BY id ASC, date DESC

Running this code from MySQL Workbench, returned 5,700 rows, which is what I expected. I then tried to call this SQL as-is from PHP, using the following SQL string…

$subSql =
    "SELECT *, " .
    "  @id_rank := IF(@current_id = id, @id_rank + 1, 1) AS id_rank, " .
    "  @current_id := id " .
    "FROM `data` " .
    "ORDER BY id DESC, date DESC";

$sql =
    "SELECT id, date " .
    "FROM ($subSql) ranked " .
    "WHERE id_rank <= 2 " .
    "ORDER BY id ASC, date DESC";

However, running this code resulted in out-of-memory. I them modified the code to return a count of the number of records expected, and instead of the expected 5,700 rows, it returned 4,925,479. So my question is “what do I have to change in my PHP SQL above, to get the correct results that I was getting from MySQL Workbench”.

Advertisement

Answer

Assigning variables in MySQL has been deprecated. You should be using window functions:

SELECT id, date 
FROM (SELECT d.*,  
             ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) as id_rank 
      FROM `data` d
     ) d 
WHERE id_rank <= 2 
ORDER BY id ASC, date DESC;

Even in older versions your code is not guaranteed to work because MySQL does not guarantee the order of evaluation of expressions. You are assigning the variable in one expression and using it in another — and those can go in either order.

If your dates are unique per id, you can do this without variables:

select id, date
from (select d.*,
             (select d2.date
              from data d2
              where d2.id = d.id and d2.date >= d.date
              order by d2.date desc
              limit 1 offset 1
             ) as date_2
      from data d
     ) d
where date >= date_2 or date_2 is null;

For performance, you want an index on data(id, date).

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement