Skip to content
Advertisement

Find number of rows with each property value, taking into account only the most recent rows in SQL

I have a database with tables that represents “edits” to “pages”. Every edit has an ID and a timestamp and a “status”, which has certain discrete values. Pages have IDs and also have “categories”.

I wish to find the number of pages with each status within a given category, taking into account only the state as of the most recent edit.

Edits:

+---------+---------+-----------+--------+
| edit_id | page_id | edit_time | status |
+---------+---------+-----------+--------+
| 1       | 10      | 20210502  | 90     |
| 2       | 10      | 20210503  | 91     |
| 3       | 20      | 20210504  | 91     |
| 4       | 30      | 20210504  | 90     |
| 5       | 30      | 20210505  | 92     |
| 6       | 40      | 20210505  | 90     |
| 7       | 50      | 20210503  | 90     |
+---------+---------+-----------+--------+

Pages:

+---------+--------+
| page_id | cat_id |
+---------+--------+
| 10      | 100    |
| 20      | 100    |
| 30      | 100    |
| 40      | 200    |
+---------+--------+

I want to get, for category 100:

+--------+-------+
| stat   | count |
+--------+-------+
| 90     | 1     |
| 91     | 2     |
| 92     | 1     |
+--------+-------+

Page 10 and 30 have two edits, but the later one “overrides” the first one, so only the edits with status 91 and 92 are counted. Pages 20 and 40 account for one of 91 and 90 each and page 50 is in the wrong category so it doesn’t feature.

I have tried the following, but it doesn’t seem to work. The idea was to select the max (i.e. latest) edit for each page with the right category. Then join that to the edit table and group by the status and count the rows:

SELECT stat, COUNT(*)
FROM edits as out_e
INNER JOIN (
    SELECT edit_id, page_id, max(edit_time) as last_edit
    FROM edits
    INNER JOIN pages on edit_page_id = page_id
    WHERE cat_id = 100
    GROUP BY page_id
) in_e ON out_e.edit_id = in_e.edit_id
GROUP BY stat
ORDER BY stat;
"""

For example in this fiddle: http://sqlfiddle.com/#!9/42f2ed/1

The result is:

+--------+-------+
| stat   | count |
+--------+-------+
| 90     | 3     |
| 91     | 1     |
+--------+-------+

What is the correct way to get this information?

Advertisement

Answer

SELECT cat_id, stat, COUNT(*) cnt
FROM pages
JOIN edits ON pages.page_id = edits.edit_page_id
JOIN ( SELECT edit_page_id, MAX(edit_time) edit_time
       FROM edits
       GROUP BY edit_page_id ) last_time ON edits.edit_page_id = last_time.edit_page_id
                                        AND edits.edit_time = last_time.edit_time
GROUP BY cat_id, stat

Output:

cat_id stat cnt
100 90 1
100 91 2
100 92 1
200 90 1

https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=7592c7853481f6b5a9626c8d111c1d3b (the query is applicable to MariaDB 10.1).


Is it possible to join on the edit_id (which is unique key for each edit)? – Inductiveload

No, this is impossible. cnt=2 counts two different edit_id values – what value must be used?

But you may obtain concatenated values list – simply add GROUP_CONCAT(edit_id) into the output list.

https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=b2391972c3f7c4be4254e47514d0f1da

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement