I have a database with tables that represents “edits” to “pages”. Every edit has an ID and a timestamp and a “status”, which has certain discrete values. Pages have IDs and also have “categories”.
I wish to find the number of pages with each status within a given category, taking into account only the state as of the most recent edit.
Edits:
+---------+---------+-----------+--------+ | edit_id | page_id | edit_time | status | +---------+---------+-----------+--------+ | 1 | 10 | 20210502 | 90 | | 2 | 10 | 20210503 | 91 | | 3 | 20 | 20210504 | 91 | | 4 | 30 | 20210504 | 90 | | 5 | 30 | 20210505 | 92 | | 6 | 40 | 20210505 | 90 | | 7 | 50 | 20210503 | 90 | +---------+---------+-----------+--------+
Pages:
+---------+--------+ | page_id | cat_id | +---------+--------+ | 10 | 100 | | 20 | 100 | | 30 | 100 | | 40 | 200 | +---------+--------+
I want to get, for category 100
:
+--------+-------+ | stat | count | +--------+-------+ | 90 | 1 | | 91 | 2 | | 92 | 1 | +--------+-------+
Page 10
and 30
have two edits, but the later one “overrides” the first one, so only the edits with status 91
and 92
are counted. Pages 20
and 40
account for one of 91
and 90
each and page 50
is in the wrong category so it doesn’t feature.
I have tried the following, but it doesn’t seem to work. The idea was to select the max (i.e. latest) edit for each page with the right category. Then join that to the edit table and group by the status and count the rows:
SELECT stat, COUNT(*) FROM edits as out_e INNER JOIN ( SELECT edit_id, page_id, max(edit_time) as last_edit FROM edits INNER JOIN pages on edit_page_id = page_id WHERE cat_id = 100 GROUP BY page_id ) in_e ON out_e.edit_id = in_e.edit_id GROUP BY stat ORDER BY stat; """
For example in this fiddle: http://sqlfiddle.com/#!9/42f2ed/1
The result is:
+--------+-------+ | stat | count | +--------+-------+ | 90 | 3 | | 91 | 1 | +--------+-------+
What is the correct way to get this information?
Advertisement
Answer
SELECT cat_id, stat, COUNT(*) cnt FROM pages JOIN edits ON pages.page_id = edits.edit_page_id JOIN ( SELECT edit_page_id, MAX(edit_time) edit_time FROM edits GROUP BY edit_page_id ) last_time ON edits.edit_page_id = last_time.edit_page_id AND edits.edit_time = last_time.edit_time GROUP BY cat_id, stat
Output:
cat_id | stat | cnt |
---|---|---|
100 | 90 | 1 |
100 | 91 | 2 |
100 | 92 | 1 |
200 | 90 | 1 |
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=7592c7853481f6b5a9626c8d111c1d3b (the query is applicable to MariaDB 10.1).
Is it possible to join on the edit_id (which is unique key for each edit)? – Inductiveload
No, this is impossible. cnt=2
counts two different edit_id
values – what value must be used?
But you may obtain concatenated values list – simply add GROUP_CONCAT(edit_id)
into the output list.
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=b2391972c3f7c4be4254e47514d0f1da