I am currently working on the employees’ benefits data. However, the spreadsheet data is totally in a mess. I would like to format it as easy to capture the information. The current formatting is as below:
Relationship EmployeeName BenefitCode BenefitOption Name Alice DEN EEC CHL Alice DEN EEC John SPS Alice MED Lee Lily VIS SPS Lily VIS Tom
I would like to transfer it like this:
Relationship Name MED DEN VIS Employee Alice EEC CHL John EEC SPS Lee MED Employee Lily VIS SPS Tom VIS
I tried grouping the data by names and BenefitCode, but I just got so confused about it.
My codes are as below:
SELECT RelationshipCode, EmployeeName, MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN, MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL, MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION FROM `TableXXX` WHERE RelationshipCode = 'Employee' GROUP BY EmployeeName, RelationshipCode
But it seems like not a good idea to lose the dependents’ relationship to the employee. Can anyone tell me how to transfer the vertical data into horizontal? Or do you have any good idea to solve it?
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL SELECT EmployeeName, IF(Relationship IS NULL, 'Self', Relationship) Relationship, IFNULL(Name, EmployeeName) Name, MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN, MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL, MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION FROM `project.dataset.table` GROUP BY Name, EmployeeName, Relationship -- ORDER BY Name, Relationship
If to apply to sample data from your question – result is
Row EmployeeName Relationship Name DEN MEDICAL VISION 1 Alice Self Alice EEC null null 2 Alice CHL John EEC null null 3 Alice SPS Lee null MED null 4 Lily Self Lily null null VIS 5 Lily SPS Tom null null VIS
Another option would be extend above flatten version into “hierarchical”
#standardSQL SELECT EmployeeName, ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits FROM ( SELECT EmployeeName, IF(Relationship IS NULL, 'Self', Relationship) Relationship, IFNULL(Name, EmployeeName) Name, MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN, MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL, MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION FROM `project.dataset.table` GROUP BY Name, EmployeeName, Relationship ) GROUP BY EmployeeName -- ORDER BY EmployeeName
In this case, result will be
Row EmployeeName benefits.Name benefits.Relationship benefits.DEN benefits.MEDICAL benefits.VISION 1 Alice Alice Self EEC null null John CHL EEC null null Lee SPS null MED null 2 Lily Lily Self null null VIS Tom SPS null null VIS