I have “duplicated” rows in Bigquery and I need to keep just the last occurrence grouped by id of element.
As you can see, these are not duplicated rows, those are duplicated Timestamps with different values. I need to keep one registry per Timestamp.
I run this query to get the example:
select oid_asset_element, TimeStamp_600s, AmplifierTemperature_S0 , AmplifierTemperature_S1 from `XXX.YYY.ZZZ` where TimeStamp_600s = '2020-03-29T03:50:00' and oid_asset_element in (5483, 5490) order by oid_asset_element;
Table with data example:
Advertisement
Answer
In your sample data, the timestamps all look the same. Assuming they are really different, you can use qualify
:
select t.* from `table` t where 1=1 qualify row_number() over (partition by oid_asset_element order by timestamp_600s desc) = 1;
If you want to actually delete rows (and I would instead suggest putting the data into a new table), you could do:
delete from `table` t where timestamp_600s < (select max(t2.timestamp_600s) from `table` t2 where t2.oid_asset_element = t.oid_asset_element );