I’m trying to output all columns while have certain rows de-duplicated. Everything I’ve tried doesn’t seem to get anywhere close.
SELECT * FROM `project.dataset.table` ??
- de-duplicate based on Name
- where the chosen row to keep when de-duplicating Column1 uses the most recent date in LastUpdateDate
- the row to keep will always be the last row (with the duplicate value at Name) as it would have been appended to the table later
Advertisement
Answer
Consider using ARRAY_AGG
:
with TestData as ( select 'Tom' as Name, '1' as Phone, timestamp('2020-01-01 00:00:00') as LastUpdateDate union all select 'Tom' as Name, '2' as Phone, timestamp('2020-01-02 00:00:00') as LastUpdateDate union all select 'Eva' as Name, '3' as Phone, timestamp('2020-01-03 00:00:00') as LastUpdateDate union all select 'Eva' as Name, '4' as Phone, timestamp('2020-01-04 00:00:00') as LastUpdateDate ) SELECT deduplicated.* FROM ( SELECT ARRAY_AGG(t ORDER BY t.LastUpdateDate DESC LIMIT 1)[OFFSET(0)] as deduplicated FROM TestData as t GROUP BY Name )