Skip to content
Advertisement

Replace aggregated duplicate value in unique rows

I have a table in which each row represents a different event, with one of the fields providing a value aggregated across all the unique events that occur within that id. For example:

+-----------------------------+-----------+---------+--------+--------+
|          timestamp          |    id1    |   id2   |  id3   | value1 |
+-----------------------------+-----------+---------+--------+--------+
| 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |  25.10 |
| 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |  25.10 |
+-----------------------------+-----------+---------+--------+--------+

In the table above, 2 events occur, one at 18:00 and one at 19:00. The field value1 is a sum of the values that occur at each event, so the 18:00 event may have a value of 10 and the 19:00 event a value of 15.10, which combine to give the value of 25.10. How would one go about replacing the duplicate value1 with an empty string (not a null) such that the row is maintained:

+-----------------------------+-----------+---------+--------+--------+
|          timestamp          |    id1    |   id2   |  id3   | value1 |
+-----------------------------+-----------+---------+--------+--------+
| 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |  25.10 |
| 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |        |
+-----------------------------+-----------+---------+--------+--------+

Advertisement

Answer

Because you mentioned unique events – I assume timestamp is unique across partition. in this case – below is good option for you

#standardSQL
SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1) 
FROM (
  SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), value1, 0) new_value1
  FROM `project.dataset.table`
) 

if to apply to sample data from your question result is

Row timestamp               id1         id2     id3     value1   
1   2019-09-09 18:00:00 UTC 123456789   abcdefg 1a2b3c  25.1     
2   2019-09-09 19:00:00 UTC 123456789   abcdefg 1a2b3c  0.0    

The data type of value1 is most likely FLOAT64 so the options you have to present de-duped values is either 0 or NULL. If for some reason you need to present them as empty string you need to CAST whole field to STRING as in example below

#standardSQL
SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1) 
FROM (
  SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), CAST(value1 AS STRING), '') new_value1
  FROM `project.dataset.table`
)   

in this case result will be

Row timestamp               id1         id2     id3     value1   
1   2019-09-09 18:00:00 UTC 123456789   abcdefg 1a2b3c  25.1     
2   2019-09-09 19:00:00 UTC 123456789   abcdefg 1a2b3c       
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement