Let’s imagine that I have a website that sells electronics. In order to attract customers, I invested some money in online ad campaigns.
The following table has three columns: user_id (id that identifies each user), event_date (the date when the user clicked on the ad) and mkt campaign (the name of the ad campaign):
|user_id| event_date|mkt campaign | 1 | 25-11-2021|FB - Black Friday | 2 | 20-12-2020|G Display - Christmas campaign | 1 | 22-12-2019|G Display - Christmas campaign | 3 | 12-10-2020|Whatsapp - July - campaign | 2 | 01-08-2018|IG Post - Holidays campaign | 3 | 30-08-2020|FB - Holidays campaign
I want to select only the rows that contain the first campaign that attracted each user to my website. In other words, I need to identify how each user got to my website for the first time.
The expected result would be like this:
|user_id| event_date|mkt campaign | 1 | 22-12-2019|G Display - Christmas campaign | 2 | 01-08-2018|IG Post - Holidays campaign | 3 | 30-08-2020|FB - Holidays campaign
I know that I could use “ORDER BY event_date ASC” to order the rows by date. However, I need the result to show only one result per user_id (the one with the oldest campaign).
Please, do you know how to do this in BigQuery (BQ Standard SQL)?
Thank you very much.
Advertisement
Answer
Consider below approach
select * from your_table where true qualify 1 = row_number() over(partition by user_id order by parse_date('%d-%m-%Y', event_date))
if apply to sample data in your question as
with your_table as ( select 1 user_id, '25-11-2021' event_date, 'FB - Black Friday' mkt_campaign union all select 2, '20-12-2020', 'G Display - Christmas campaign' union all select 1, '22-12-2019', 'G Display - Christmas campaign ' union all select 3, '12-10-2020', 'Whatsapp - July - campaign' union all select 2, '01-08-2018', 'IG Post - Holidays campaign' union all select 3, '30-08-2020', 'FB - Holidays campaign' )
output is