I have a table like as shown below
I would like to create two new binary columns
indicating whether the subject had steroids
and aspirin
. I am looking to implement this in Postgresql and google bigquery
I tried the below but it doesn’t work
select subject_id case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') then 1 else 0 end as steroids, case when lower(drug) like ('%peptide%','%paracetamol%') then 1 else 0 end as aspirin, from db.Team01.Table_1 SELECT db.Team01.Table_1.drug FROM `table_1`, UNNEST(table_1.drug) drug WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')
I expect my output to be like as shown below
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL SELECT subject_id, SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids, SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin FROM `db.Team01.Table_1` GROUP BY subject_id
if to apply to sample data from your question – result is
Row subject_id steroids aspirin 1 1 3 1 2 2 1 1
Note: instead of simple LIKE ending with lengthy and redundant text – I am using LIKE on steroids
– which is REGEXP_CONTAINS