I’m looking for a way to filter out row sequences with the same column value while keeping the first element of the sequence.
Given the following table:
| id | location | timestamp |
|---|---|---|
| 1 | A | 1001 |
| 2 | B | 1002 |
| 3 | A | 1003 |
| 4 | A <— | 1004 |
| 5 | A <— | 1005 |
| 6 | B | 1006 |
I want to produce the following output:
| id | location | timestamp |
|---|---|---|
| 1 | A | 1001 |
| 2 | B | 1002 |
| 3 | A | 1003 |
| 6 | B | 1006 |
The goal is to remove row 4 and 5 because they both follow another location A. Only the 1st occurrence of the sequence should remain. I’m working with BigQuery SQL.
Advertisement
Answer
Just use lag():
select t.*
from (select t.*,
lag(location) over (order by timestamp) as prev_location
from t
) t
where prev_location is null or prev_location <> location;