I have a table like as shown below
I would like to create two new binary columns
indicating whether the subject had steroids
and aspirin
. I am looking to implement this in Postgresql and google bigquery
I tried the below but it doesn’t work
x
select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%')
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%')
then 1 else 0 end as aspirin,
from db.Team01.Table_1
SELECT
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')
I expect my output to be like as shown below
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL
SELECT
subject_id,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id
if to apply to sample data from your question – result is
Row subject_id steroids aspirin
1 1 3 1
2 2 1 1
Note: instead of simple LIKE ending with lengthy and redundant text – I am using LIKE on steroids
– which is REGEXP_CONTAINS