I’m doing validations between source tables and destination tables in different processes in a Data Lake.
I work with StandardSQL in BigQuery, and I do the comparisons between both tables with this query:
SELECT
SALESDATE
,'table_destination_Y - table_source_X' PROCESS
,'SOURCE' TABLE
,SUM(SALESMNT) SALESMNT
,SUM(SALESQTY) SALESQTY
,COUNT(DISTINCT TRX) COUNTTRX
FROM data-set.table_source_X
WHERE <CONDITIONS>
GROUP BY SALESDATE
UNION ALL
SELECT
SALESDATE
,'table_destination_Y - table_source_X' PROCESS
,'DESTINY' TABLE
,SUM(SALESMNT) SALESMNT
,SUM(SALESQTY) SALESQTY
,COUNT(DISTINCT TRX) COUNTTRX
FROM data-set.table_destination_Y
GROUP BY SALESDATE
And this is the result:
Is there a way to normalize and denormalize this information as follows?
I have to evaluate several processes that have more or less measures, and I would need to pivot the query to compare more easily and in a standardized way for all the processes.
Thanks and regards.
UPDATE: Without creating tables or views. I need it to be with a query.
UPDATE2:
In case of evaluating 3 or more tables, I would need the query to be able to join the process column (with the name of the table) according to the evaluated measures.
SELECT
SALESDATE
,'table_destination_Y' PROCESS
,'DESTINATION' TABLE
,SUM(SALESMNT) SALESMNT
,SUM(SALESQTY) SALESQTY
,COUNT(DISTINCT TRX) COUNTTRX
,SUM(SALESCOST) SALESCOST
FROM data-set.table_destination_Y
GROUP BY SALESDATE
UNION ALL
SELECT
SALESDATE
,'table_source_X' PROCESS
,'SOURCE' TABLE
,SUM(SALESMNT) SALESMNT
,SUM(SALESQTY) SALESQTY
,COUNT(DISTINCT TRX) COUNTTRX
,0
FROM data-set.table_source_X
WHERE <CONDITIONS>
GROUP BY SALESDATE
UNION ALL
SELECT
SALESDATE
,'table_source_Z'
,'SOURCE'
,0
,0
,0
,SUM(SALESCOST) SALESCOST
FROM data-set.table_source_Z
WHERE <CONDITIONS>
GROUP BY SALESDATE
This would be the result of the previous query:
and this should be the result:
Is this possible?
Thanks and regards.
Advertisement
Answer
Use CROSS JOIN to ‘explode’ the rows, so you have a row for each measure.
Use ‘conditional aggregation’ to ‘collapse’ pairs of rows in to one row with two columns.
SELECT
src.SALESDATE,
src.PROCESS,
pvt.MEASURE,
MAX(
CASE
WHEN src.TABLE != 'SOURCE' THEN NULL
WHEN pvt.MEASURE = 'SALESMNT' THEN src.SALESMNT
WHEN pvt.MEASURE = 'SALESQTY' THEN src.SALESQTY
WHEN pvt.MEASURE = 'COUNTTRX' THEN src.COUNTTRX
END
)
AS SOURCE,
MAX(
CASE
WHEN src.TABLE != 'DESTINATION' THEN NULL
WHEN pvt.MEASURE = 'SALESMNT' THEN src.SALESMNT
WHEN pvt.MEASURE = 'SALESQTY' THEN src.SALESQTY
WHEN pvt.MEASURE = 'COUNTTRX' THEN src.COUNTTRX
END
)
AS DESTINATION
FROM
(
yourQuery
)
AS src
CROSS JOIN
(
SELECT 'SALESMNT' AS MEASURE
UNION ALL
SELECT 'SALESQTY'
UNION ALL
SELECT 'COUNTTRX'
)
AS pvt
GROUP BY
src.SALESDATE,
src.PROCESS,
pvt.MEASURE