I am ingesting data from two different source (source A and source B). Although data may be coming from two different sources, there is a possibility that an ID for a record may exist in both sources.
The problem is, there are cases when one source is more complete than the other. In the example below, Source A has a NULL for column C, however Source B has a complete record.
I would like to enrich the data where I get a complete record back.
- Note: the ‘Source’ field below is just for illustration purposes, and not an actual field in the solution.
ID | A | B | C | D | Source |
---|---|---|---|---|---|
asc123 | 1 | 2 | NULL | 4 | Source A |
asc123 | 1 | 2 | 3 | 4 | Source B |
The end result should be as follows:
ID | A | B | C | D |
---|---|---|---|---|
asc123 | 1 | 2 | 3 | 4 |
I have attempting something like this, but have not been successful:
SELECT ID ,A ,B ,COALESCE(A.C, B.C) AS C ,D ,Source FROM( SELECT * FROM ( SELECT 'asc123' AS ID, 1 AS 'A', 2 AS 'B', NULL AS 'C', 4 AS 'D', 'Source A' AS 'Source' ) A UNION SELECT * FROM ( SELECT 'asc123' AS ID, 1 AS 'A', 2 AS 'B', 3 AS 'C', 4 AS 'D', 'Source B' AS 'Source' ) B ) X
Advertisement
Answer
You can just aggregate to remove the NULL values:
with sampledata as ( select 'asc123' AS ID, 1 AS A, 2 AS B, NULL AS C, 4 as D, 'Source A' AS Source union all select 'asc123' AS ID, 1 AS A, 2 AS B, 3 AS C, 4 AS D, 'Source B' AS Source ) select Id, max(A)A, Max(B)B, Max(C)C, Max(D)D from sampledata group by Id
Note also that for column aliases, if you need to delimit them use []
or ""
, single quotes are for literal values.