Skip to content
Advertisement

Equivalent of string contains in google bigquery

I have a table like as shown below

enter image description here

I would like to create two new binary columns indicating whether the subject had steroids and aspirin. I am looking to implement this in Postgresql and google bigquery

I tried the below but it doesn’t work

select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') 
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%') 
then 1 else 0 end as aspirin,
from db.Team01.Table_1


SELECT 
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')

I expect my output to be like as shown below

enter image description here

Advertisement

Answer

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id   

if to apply to sample data from your question – result is

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1     

Note: instead of simple LIKE ending with lengthy and redundant text – I am using LIKE on steroids – which is REGEXP_CONTAINS

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement