I have a table in which each row represents a different event, with one of the fields providing a value aggregated across all the unique events that occur within that id. For example:
+-----------------------------+-----------+---------+--------+--------+ | timestamp | id1 | id2 | id3 | value1 | +-----------------------------+-----------+---------+--------+--------+ | 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | 25.10 | | 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | 25.10 | +-----------------------------+-----------+---------+--------+--------+
In the table above, 2 events occur, one at 18:00 and one at 19:00. The field value1
is a sum of the values that occur at each event, so the 18:00 event may have a value of 10 and the 19:00 event a value of 15.10, which combine to give the value of 25.10. How would one go about replacing the duplicate value1
with an empty string (not a null) such that the row is maintained:
+-----------------------------+-----------+---------+--------+--------+ | timestamp | id1 | id2 | id3 | value1 | +-----------------------------+-----------+---------+--------+--------+ | 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | 25.10 | | 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | | +-----------------------------+-----------+---------+--------+--------+
Advertisement
Answer
Because you mentioned unique events – I assume timestamp
is unique across partition. in this case – below is good option for you
#standardSQL SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1) FROM ( SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), value1, 0) new_value1 FROM `project.dataset.table` )
if to apply to sample data from your question result is
Row timestamp id1 id2 id3 value1 1 2019-09-09 18:00:00 UTC 123456789 abcdefg 1a2b3c 25.1 2 2019-09-09 19:00:00 UTC 123456789 abcdefg 1a2b3c 0.0
The data type of value1
is most likely FLOAT64 so the options you have to present de-duped values is either 0 or NULL. If for some reason you need to present them as empty string you need to CAST whole field to STRING as in example below
#standardSQL SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1) FROM ( SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), CAST(value1 AS STRING), '') new_value1 FROM `project.dataset.table` )
in this case result will be
Row timestamp id1 id2 id3 value1 1 2019-09-09 18:00:00 UTC 123456789 abcdefg 1a2b3c 25.1 2 2019-09-09 19:00:00 UTC 123456789 abcdefg 1a2b3c