Filter out Duplicates that follow each other in Se…

I’m looking for a way to filter out row sequences with the same column value while keeping the first element of the sequence.

Given the following table:

id	location	timestamp
1	A	1001
2	B	1002
3	A	1003
4	A <—	1004
5	A <—	1005
6	B	1006

I want to produce the following output:

id	location	timestamp
1	A	1001
2	B	1002
3	A	1003
6	B	1006

The goal is to remove row 4 and 5 because they both follow another location A. Only the 1st occurrence of the sequence should remain. I’m working with BigQuery SQL.

Answer

Just use lag():

select t.*
from (select t.*,
             lag(location) over (order by timestamp) as prev_location
      from t
     ) t
where prev_location is null or prev_location <> location;

Filter out Duplicates that follow each other in Sequence with BigQuery SQL

Advertisement

Answer