Skip to content
Advertisement

Remove duplicated rows with same Timestamp but different values

I have “duplicated” rows in Bigquery and I need to keep just the last occurrence grouped by id of element.

As you can see, these are not duplicated rows, those are duplicated Timestamps with different values. I need to keep one registry per Timestamp.

I run this query to get the example:

select 
    oid_asset_element, 
    TimeStamp_600s, 
    AmplifierTemperature_S0 , 
    AmplifierTemperature_S1  
from 
    `XXX.YYY.ZZZ` 
where TimeStamp_600s = '2020-03-29T03:50:00' 
  and oid_asset_element in (5483, 5490) 
order by oid_asset_element;

Table with data example:

enter link description here

Advertisement

Answer

In your sample data, the timestamps all look the same. Assuming they are really different, you can use qualify:

select t.*
from `table` t
where 1=1
qualify row_number() over (partition by oid_asset_element order by timestamp_600s desc) = 1;

If you want to actually delete rows (and I would instead suggest putting the data into a new table), you could do:

delete from `table` t
    where timestamp_600s < (select max(t2.timestamp_600s)
                            from `table` t2
                            where t2.oid_asset_element = t.oid_asset_element
                           );
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement