I have to compare 2 different sources and identify all the mismatches for all IDs
Source_excel
table
+-----+-------------+------+----------+ | id | name | City | flag | +-----+-------------+------+----------+ | 101 | Plate | NY | Ready | | 102 | Back washer | NY | Sold | | 103 | Ring | MC | Planning | | 104 | Glass | NMC | Ready | | 107 | Cover | PR | Ready | +-----+-------------+------+----------+
Source_dw
table
+-----+----------+------+----------+ | id | name | City | flag | +-----+----------+------+----------+ | 101 | Plate | NY | Planning | | 102 | Nut | TN | Expired | | 103 | Ring | MC | Planning | | 104 | Top Wire | NY | Ready | | 105 | Bolt | MC | Expired | +-----+----------+------+----------+
Expected result
+-----+-------------+----------+------------+----------+------------+---------+------------------+ | ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT | +-----+-------------+----------+------------+----------+------------+---------+------------------+ | 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH | | 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH | | 102 | Back washer | Nut | Sold | Expired | NY | TN | FLAG_MISMATCH | | 102 | Back washer | Nut | Sold | Expired | NY | TN | CITY_MISMATCH | | 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH | | 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH | | 104 | Glass | Top Wire | Ready | Ready | NMC | NY | CITY_MISMATCH | | 107 | Cover | | Ready | | PR | | MISSING IN DW | | 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL | +-----+-------------+----------+------------+----------+------------+---------+------------------+
I have tried the below query but it is giving only one mismatch.
select ISNULL(EXCEL.ID,DW.ID) ID, excel.name as excel_name,dw.name as dw_name, excel.flag as excel_flag,dw.flag as dw_flag, excel.city as excel_city,dw.city as dw_city, RESULT = CASE WHEN excel.ID IS NULL THEN 'MISSING IN EXCEL' WHEN dw.ID IS NULL THEN 'MISSING IN DW' WHEN excel.NAME<>dw.NAME THEN 'NAME_MISMATCH' WHEN excel.CITY<>dw.CITY THEN 'CITY_MISMATCH' WHEN excel.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH' ELSE 'ALL_MATCH' END from source_excel excel FULL OUTER JOIN source_dw dw ON excel.id=dw.id
Actual output
+-----+-------------+----------+------------+----------+------------+---------+------------------+ | ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT | +-----+-------------+----------+------------+----------+------------+---------+------------------+ | 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH | | 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH | | 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH | | 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH | | 107 | Cover | | Ready | | PR | | MISSING IN DW | | 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL | +-----+-------------+----------+------------+----------+------------+---------+------------------+
I understand that case
expression will only check the first satisfying condition. Is there any other way I can check all the condition?
Advertisement
Answer
If I follow you correctly, you want one row per mismatch, or one row indicating that everything matches.
You can use cross apply
to generate the rows, like so:
SELECT COALESCE(xl.ID, dw.ID) ID, xl.name as excel_name,dw.name as dw_name, xl.flag as excel_flag,dw.flag as dw_flag, xl.city as excel_city,dw.city as dw_city, x.result FROM source_excel xl FULL OUTER JOIN source_dw dw ON xl.id = dw.id CROSS APPLY (VALUES (CASE WHEN xl.ID IS NULL THEN 'MISSING IN EXCEL' END), (CASE WHEN dw.ID IS NULL THEN 'MISSING IN DW' END), (CASE WHEN WHEN xl.NAME <> dw.NAME THEN 'NAME_MISMATCH' END), (CASE WHEN xl.CITY <> dw.CITY THEN 'CITY_MISMATCH' END), (CASE WHEN xl.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH' END), (CASE WHEN xl.ID = dw.ID AND xl.NAME = dw.NAME AND xl.CITY = dw.CITY AND xl.FLAG = dw.FLAG THEN 'ALL_MATCH' END) ) x(result) WHERE x.result IS NOT NULL