Skip to content
Advertisement

Filter out Duplicates that follow each other in Sequence with BigQuery SQL

I’m looking for a way to filter out row sequences with the same column value while keeping the first element of the sequence.

Given the following table:

id location timestamp
1 A 1001
2 B 1002
3 A 1003
4 A <— 1004
5 A <— 1005
6 B 1006

I want to produce the following output:

id location timestamp
1 A 1001
2 B 1002
3 A 1003
6 B 1006

The goal is to remove row 4 and 5 because they both follow another location A. Only the 1st occurrence of the sequence should remain. I’m working with BigQuery SQL.

Advertisement

Answer

Just use lag():

select t.*
from (select t.*,
             lag(location) over (order by timestamp) as prev_location
      from t
     ) t
where prev_location is null or prev_location <> location;
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement