Given that I’ve got a table with the following, very simple content:
# select * from messages; id | verbosity ----+----------- 1 | 20 2 | 20 3 | 20 4 | 30 5 | 100 (5 rows)
I would like to select N messages, which sum of verbosity is lower than Y (for testing purposes let’s say it should be 70, then correct results will be messages with id 1,2,3). It’s really important to me, that solution should be database independent (it should work at least on Postgres and SQLite).
I was trying with something like:
SELECT * FROM messages GROUP BY id HAVING SUM(verbosity) < 70;
However it doesn’t seem to work as expected, because it doesn’t actually sum all values from verbosity column.
I would be very grateful for any hints/help.
Advertisement
Answer
SELECT m.id, sum(m1.verbosity) AS total FROM messages m JOIN messages m1 ON m1.id <= m.id WHERE m.verbosity < 70 -- optional, to avoid pointless evaluation GROUP BY m.id HAVING SUM(m1.verbosity) < 70 ORDER BY total DESC LIMIT 1;
This assumes a unique, ascending id
like you have in your example.
In modern Postgres – or generally with modern standard SQL (but not in SQLite):
Simple CTE
WITH cte AS ( SELECT *, sum(verbosity) OVER (ORDER BY id) AS total FROM messages ) SELECT * FROM cte WHERE total < 70 ORDER BY id;
Recursive CTE
Should be faster for big tables where you only retrieve a small set.
WITH RECURSIVE cte AS ( ( -- parentheses required SELECT id, verbosity, verbosity AS total FROM messages ORDER BY id LIMIT 1 ) UNION ALL SELECT c1.id, c1.verbosity, c.total + c1.verbosity FROM cte c JOIN LATERAL ( SELECT * FROM messages WHERE id > c.id ORDER BY id LIMIT 1 ) c1 ON c1.verbosity < 70 - c.total WHERE c.total < 70 ) SELECT * FROM cte ORDER BY id;
All standard SQL, except for LIMIT
.
Strictly speaking, there is no such thing as “database-independent”. There are various SQL-standards, but no RDBMS complies completely. LIMIT
works for PostgreSQL and SQLite (and some others). Use TOP 1
for SQL Server, rownum
for Oracle. Here’s a comprehensive list on Wikipedia.
The SQL:2008 standard would be:
... FETCH FIRST 1 ROWS ONLY
… which PostgreSQL supports – but hardly any other RDBMS.
The pure alternative that works with more systems would be to wrap it in a subquery and
SELECT max(total) FROM <subquery>
But that is slow and unwieldy.