I’m doing validations between source tables and destination tables in different processes in a Data Lake.
I work with StandardSQL in BigQuery, and I do the comparisons between both tables with this query:
SELECT SALESDATE ,'table_destination_Y - table_source_X' PROCESS ,'SOURCE' TABLE ,SUM(SALESMNT) SALESMNT ,SUM(SALESQTY) SALESQTY ,COUNT(DISTINCT TRX) COUNTTRX FROM data-set.table_source_X WHERE <CONDITIONS> GROUP BY SALESDATE UNION ALL SELECT SALESDATE ,'table_destination_Y - table_source_X' PROCESS ,'DESTINY' TABLE ,SUM(SALESMNT) SALESMNT ,SUM(SALESQTY) SALESQTY ,COUNT(DISTINCT TRX) COUNTTRX FROM data-set.table_destination_Y GROUP BY SALESDATE
And this is the result:
Is there a way to normalize and denormalize this information as follows?
I have to evaluate several processes that have more or less measures, and I would need to pivot the query to compare more easily and in a standardized way for all the processes.
Thanks and regards.
UPDATE: Without creating tables or views. I need it to be with a query.
UPDATE2:
In case of evaluating 3 or more tables, I would need the query to be able to join the process column (with the name of the table) according to the evaluated measures.
SELECT SALESDATE ,'table_destination_Y' PROCESS ,'DESTINATION' TABLE ,SUM(SALESMNT) SALESMNT ,SUM(SALESQTY) SALESQTY ,COUNT(DISTINCT TRX) COUNTTRX ,SUM(SALESCOST) SALESCOST FROM data-set.table_destination_Y GROUP BY SALESDATE UNION ALL SELECT SALESDATE ,'table_source_X' PROCESS ,'SOURCE' TABLE ,SUM(SALESMNT) SALESMNT ,SUM(SALESQTY) SALESQTY ,COUNT(DISTINCT TRX) COUNTTRX ,0 FROM data-set.table_source_X WHERE <CONDITIONS> GROUP BY SALESDATE UNION ALL SELECT SALESDATE ,'table_source_Z' ,'SOURCE' ,0 ,0 ,0 ,SUM(SALESCOST) SALESCOST FROM data-set.table_source_Z WHERE <CONDITIONS> GROUP BY SALESDATE
This would be the result of the previous query:
and this should be the result:
Is this possible?
Thanks and regards.
Advertisement
Answer
Use CROSS JOIN to ‘explode’ the rows, so you have a row for each measure.
Use ‘conditional aggregation’ to ‘collapse’ pairs of rows in to one row with two columns.
SELECT src.SALESDATE, src.PROCESS, pvt.MEASURE, MAX( CASE WHEN src.TABLE != 'SOURCE' THEN NULL WHEN pvt.MEASURE = 'SALESMNT' THEN src.SALESMNT WHEN pvt.MEASURE = 'SALESQTY' THEN src.SALESQTY WHEN pvt.MEASURE = 'COUNTTRX' THEN src.COUNTTRX END ) AS SOURCE, MAX( CASE WHEN src.TABLE != 'DESTINATION' THEN NULL WHEN pvt.MEASURE = 'SALESMNT' THEN src.SALESMNT WHEN pvt.MEASURE = 'SALESQTY' THEN src.SALESQTY WHEN pvt.MEASURE = 'COUNTTRX' THEN src.COUNTTRX END ) AS DESTINATION FROM ( yourQuery ) AS src CROSS JOIN ( SELECT 'SALESMNT' AS MEASURE UNION ALL SELECT 'SALESQTY' UNION ALL SELECT 'COUNTTRX' ) AS pvt GROUP BY src.SALESDATE, src.PROCESS, pvt.MEASURE