I am currently working on the employees’ benefits data. However, the spreadsheet data is totally in a mess. I would like to format it as easy to capture the information. The current formatting is as below:
Relationship EmployeeName BenefitCode BenefitOption Name
Alice DEN EEC
CHL Alice DEN EEC John
SPS Alice MED Lee
Lily VIS
SPS Lily VIS Tom
I would like to transfer it like this:
Relationship Name MED DEN VIS
Employee Alice EEC
CHL John EEC
SPS Lee MED
Employee Lily VIS
SPS Tom VIS
I tried grouping the data by names and BenefitCode, but I just got so confused about it.
My codes are as below:
SELECT RelationshipCode, EmployeeName,
MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX`
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode
But it seems like not a good idea to lose the dependents’ relationship to the employee. Can anyone tell me how to transfer the vertical data into horizontal? Or do you have any good idea to solve it?
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
-- ORDER BY Name, Relationship
If to apply to sample data from your question – result is
Row EmployeeName Relationship Name DEN MEDICAL VISION
1 Alice Self Alice EEC null null
2 Alice CHL John EEC null null
3 Alice SPS Lee null MED null
4 Lily Self Lily null null VIS
5 Lily SPS Tom null null VIS
Another option would be extend above flatten version into “hierarchical”
#standardSQL
SELECT EmployeeName,
ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
)
GROUP BY EmployeeName
-- ORDER BY EmployeeName
In this case, result will be
Row EmployeeName benefits.Name benefits.Relationship benefits.DEN benefits.MEDICAL benefits.VISION
1 Alice Alice Self EEC null null
John CHL EEC null null
Lee SPS null MED null
2 Lily Lily Self null null VIS
Tom SPS null null VIS