Skip to content
Advertisement

How to transpose the vertical data into horizontal by using BigQuery?

I am currently working on the employees’ benefits data. However, the spreadsheet data is totally in a mess. I would like to format it as easy to capture the information. The current formatting is as below:

Relationship EmployeeName  BenefitCode  BenefitOption  Name  
               Alice          DEN         EEC           
  CHL          Alice          DEN         EEC          John
  SPS          Alice          MED                      Lee
               Lily           VIS                      
  SPS          Lily           VIS                       Tom

I would like to transfer it like this:

Relationship    Name     MED    DEN    VIS 
Employee        Alice           EEC
CHL             John            EEC
SPS             Lee      MED
Employee        Lily                   VIS
SPS             Tom                    VIS

I tried grouping the data by names and BenefitCode, but I just got so confused about it.

My codes are as below:

SELECT   RelationshipCode, EmployeeName, 
         MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
         MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
         MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX` 
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode

But it seems like not a good idea to lose the dependents’ relationship to the employee. Can anyone tell me how to transfer the vertical data into horizontal? Or do you have any good idea to solve it?

Advertisement

Answer

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  EmployeeName,
  IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
  IFNULL(Name, EmployeeName) Name, 
  MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
  MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
  MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship 
-- ORDER BY Name, Relationship

If to apply to sample data from your question – result is

Row EmployeeName    Relationship    Name    DEN     MEDICAL VISION   
1   Alice           Self            Alice   EEC     null    null     
2   Alice           CHL             John    EEC     null    null     
3   Alice           SPS             Lee     null    MED     null     
4   Lily            Self            Lily    null    null    VIS  
5   Lily            SPS             Tom     null    null    VIS    

Another option would be extend above flatten version into “hierarchical”

#standardSQL
SELECT EmployeeName,
  ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
  SELECT 
    EmployeeName,
    IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
    IFNULL(Name, EmployeeName) Name, 
    MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
    MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
    MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
  FROM `project.dataset.table`
  GROUP BY Name, EmployeeName, Relationship 
) 
GROUP BY EmployeeName
-- ORDER BY EmployeeName

In this case, result will be

Row EmployeeName    benefits.Name   benefits.Relationship   benefits.DEN    benefits.MEDICAL    benefits.VISION  
1   Alice           Alice           Self                    EEC             null                null     
                    John            CHL                     EEC             null                null     
                    Lee             SPS                     null            MED                 null       
2   Lily            Lily            Self                    null            null                VIS  
                    Tom             SPS                     null            null                VIS  
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement