Skip to content
Advertisement

I want to de-dupe records in BigQuery with max column value on specific column with expression

I want to drop the company name with CO LLC instead want to keep Amar CO but want all the columns from Amar CO LLC as it has minimum NULL values or maximum column data.

In short: De-dupe the records, remove the company name with ‘ending with or matching with LLC’ (case insensitive), but keep the values from both of the record which has maximum Information column.

Expected output

Advertisement

Answer

to give precedence to the record having minimum null values …

Below is for BigQuery Standard SQL (query#1)

if applied to sample data from your question – output is

enter image description here

In case if you want to fill all fields from all the records – you can use below (query#2)

and finally – if you still want to give precedence to the record having minimum null values, but the rest of nulls replace with values from other rows – use below (query#3)

you can test/check the difference between this and previous option by applying them to below dummy data

the last query (query#3) gives

enter image description here

while previous (query#2) will just give max across all rows

enter image description here

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement