Skip to content
Advertisement

Is there an idiom for querying sequences in SQL?

I’m studying to interview for a job involving a lot of SQL. I’ve noticed a few exercises that revolve around returning values based on a sequence across rows, and I would love to know if there’s a standard way of doing this. Something akin to the subquery below I’ve found useful for selecting a max/min value:

( SELECT column FROM table ... ORDER BY column [DESC] LIMIT 1 )

Here’s a relevant example from LeetCode:

Table: Stadium

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| visit_date    | date    |
| people        | int     |
+---------------+---------+
visit_date is the primary key for this table.

Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit. No two rows will have the same visit_date, and as the id increases, the dates increase as well.

Write an SQL query to display the records with three or more rows with consecutive id’s, and the number of people is greater than or equal to 100 for each.

Return the result table ordered by visit_date in ascending order.

The query result format is in the following example.

Stadium table:
+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 1    | 2017-01-01 | 10        |
| 2    | 2017-01-02 | 109       |
| 3    | 2017-01-03 | 150       |
| 4    | 2017-01-04 | 99        |
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+

Result table:
+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+

My attempt to solve the problem used a user variable. The code below was my best shot at a solution:

SET @rowIndex = 0;

SELECT s1.id, s1.visit_date, s1.people
FROM ( SELECT @rowIndex:=@rowIndex+1 as rowIndex, s.id, s.visit_date, s.people
    FROM Stadium as s
    WHERE s.people >=100 ) as s1
GROUP BY rowIndex - s1.id, s1.id, s1.visit_date, s1.people
HAVING COUNT(s.id) >= 3
ORDER BY s1.visit_date

The query above has a syntax error somewhere. Message:

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘SELECT s1.id, s1.visit_date, s1.people FROM ( SELECT @rowIndex := @rowIndex+1 as’ at line 4

Does anyone have a favorite way of selecting rows based on a sequence? Perhaps less importantly, can anyone spot the error in my query above?

Answer

I would address this as a gaps-and-island problem. Here is an approach using window functions:

select id, visit_date, people
from (
    select s.*, count(*) over(partition by id - rn) cnt
    from (
        select s.*, row_number() over(order by id) rn
        from stadium s
        where people > 100
    ) s
) s
where cnt >= 3

The idea is to filter out days that have less than 100 visits, and then use the difference between id and a monotonically increasing rank to identify the islands (consecutive days with more than 100 visits). We can then retain groups that have more than 3 rows only.