Skip to content
Advertisement

How to pivot in BigQuery?

I’m doing validations between source tables and destination tables in different processes in a Data Lake.

I work with StandardSQL in BigQuery, and I do the comparisons between both tables with this query:

SELECT
  SALESDATE
  ,'table_destination_Y - table_source_X' PROCESS
  ,'SOURCE' TABLE
  ,SUM(SALESMNT) SALESMNT
  ,SUM(SALESQTY) SALESQTY
  ,COUNT(DISTINCT TRX) COUNTTRX
FROM data-set.table_source_X
WHERE <CONDITIONS>
GROUP BY SALESDATE
UNION ALL
SELECT
  SALESDATE
  ,'table_destination_Y - table_source_X' PROCESS
  ,'DESTINY' TABLE
  ,SUM(SALESMNT) SALESMNT
  ,SUM(SALESQTY) SALESQTY
  ,COUNT(DISTINCT TRX) COUNTTRX
FROM data-set.table_destination_Y
GROUP BY SALESDATE

And this is the result:

enter image description here

Is there a way to normalize and denormalize this information as follows?

enter image description here

I have to evaluate several processes that have more or less measures, and I would need to pivot the query to compare more easily and in a standardized way for all the processes.

Thanks and regards.

UPDATE: Without creating tables or views. I need it to be with a query.


UPDATE2:

In case of evaluating 3 or more tables, I would need the query to be able to join the process column (with the name of the table) according to the evaluated measures.

SELECT
  SALESDATE
  ,'table_destination_Y' PROCESS
  ,'DESTINATION' TABLE
  ,SUM(SALESMNT) SALESMNT
  ,SUM(SALESQTY) SALESQTY
  ,COUNT(DISTINCT TRX) COUNTTRX
  ,SUM(SALESCOST) SALESCOST
FROM data-set.table_destination_Y
GROUP BY SALESDATE
UNION ALL
SELECT
  SALESDATE
  ,'table_source_X' PROCESS
  ,'SOURCE' TABLE
  ,SUM(SALESMNT) SALESMNT
  ,SUM(SALESQTY) SALESQTY
  ,COUNT(DISTINCT TRX) COUNTTRX
  ,0 
FROM data-set.table_source_X
WHERE <CONDITIONS>
GROUP BY SALESDATE
UNION ALL
SELECT
  SALESDATE
  ,'table_source_Z'
  ,'SOURCE'
  ,0
  ,0
  ,0
  ,SUM(SALESCOST) SALESCOST
FROM data-set.table_source_Z
WHERE <CONDITIONS>
GROUP BY SALESDATE

This would be the result of the previous query:

enter image description here

and this should be the result:

enter image description here

Is this possible?

Thanks and regards.

Advertisement

Answer

Use CROSS JOIN to ‘explode’ the rows, so you have a row for each measure.

Use ‘conditional aggregation’ to ‘collapse’ pairs of rows in to one row with two columns.

SELECT
  src.SALESDATE,
  src.PROCESS,
  pvt.MEASURE,
  MAX(
    CASE
      WHEN src.TABLE  != 'SOURCE'      THEN NULL
      WHEN pvt.MEASURE = 'SALESMNT'    THEN src.SALESMNT
      WHEN pvt.MEASURE = 'SALESQTY'    THEN src.SALESQTY
      WHEN pvt.MEASURE = 'COUNTTRX'    THEN src.COUNTTRX
    END
  )
    AS SOURCE,
  MAX(
    CASE
      WHEN src.TABLE  != 'DESTINATION' THEN NULL
      WHEN pvt.MEASURE = 'SALESMNT'    THEN src.SALESMNT
      WHEN pvt.MEASURE = 'SALESQTY'    THEN src.SALESQTY
      WHEN pvt.MEASURE = 'COUNTTRX'    THEN src.COUNTTRX
    END
  )
    AS DESTINATION
FROM
(
  yourQuery
)
  AS src
CROSS JOIN
(
  SELECT 'SALESMNT' AS MEASURE
  UNION ALL
  SELECT 'SALESQTY'
  UNION ALL
  SELECT 'COUNTTRX'
)
  AS pvt
GROUP BY
  src.SALESDATE,
  src.PROCESS,
  pvt.MEASURE
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement