I’m looking for a way to filter out row sequences with the same column value while keeping the first element of the sequence.
Given the following table:
id | location | timestamp |
---|---|---|
1 | A | 1001 |
2 | B | 1002 |
3 | A | 1003 |
4 | A <— | 1004 |
5 | A <— | 1005 |
6 | B | 1006 |
I want to produce the following output:
id | location | timestamp |
---|---|---|
1 | A | 1001 |
2 | B | 1002 |
3 | A | 1003 |
6 | B | 1006 |
The goal is to remove row 4 and 5 because they both follow another location A
. Only the 1st occurrence of the sequence should remain. I’m working with BigQuery SQL.
Advertisement
Answer
Just use lag()
:
select t.* from (select t.*, lag(location) over (order by timestamp) as prev_location from t ) t where prev_location is null or prev_location <> location;