DATA Explanation
I have two data tables, one (PAGE VIEWS) which represents user events (CV 1,2,3 etc) and associated timestamp with member ID. The second table (ORDERS) represents the orders made – event time & order value. Membership ID is available on each table.
Table 1 – PAGE VIEWS (1,000 Rows in Total)
Event_Day | Member ID | CV1 | CV2 | CV3 | CV4 |
---|---|---|---|---|---|
11/5/2021 | 115126 | APP | camp1 | Trigger | APP-camp1-Trigger |
11/14/2021 | 189192 | SEARCH | camp4 | Search | SEARCH-camp4-Search |
11/5/2021 | 193320 | SEARCH | camp5 | Search | SEARCH-camp5-Search |
Table 2 – ORDERS (249 rows in total)
Date | Purchase Order ID | Membership Number | Order Value |
---|---|---|---|
7/12/2021 | 0088 | 183300 | 29.34 |
18/12/2021 | 0180 | 132159 | 132.51 |
4/12/2021 | 0050 | 141542 | 24.35 |
What I’m trying to answer
I’d like to attribute the CV columns (PAGE VIEWS) with the (ORDERS) order value, by the earliest event date in (PAGE VIEWS). This would be a simple attribution use case.
Visual explanation of the two data tables
Issues
I’ve spent the weekend result and scrolling through a variety of online articles but the closest is using the following query
Select min (event_day) As "first date",member_id,cv2,order_value,purchase_order_id from mta_app_allpages,mta_app_orders where member_id = membership_number group by member_id,cv2,order_value,purchase_order_id;
The resulting data is correct using the DISTINCT function as Row 2 is different to Row 1, but I’d like to associate the result to Row 1 for member_id 113290, and row 3 for member_id 170897 etc.
Date | member_id | cv2 | Order Value |
---|---|---|---|
2021-11-01 | 113290 | camp5 | 58.81 |
2021-11-05 | 113290 | camp4 | 58.51 |
2021-11-03 | 170897 | camp3 | 36.26 |
2021-11-09 | 170897 | camp5 | 36.26 |
2021-11-24 | 170897 | camp1 | 36.26 |
Image showing the results table
I’ve tried using partition and sub query functions will little success. The correct call should return a maximum of 249 rows as that is as many rows as I have in the ORDERS table.
First-time poster so hopefully I have the format right. Many thanks.
Advertisement
Answer
Using RANK()
is the best approach:
select * from ( select *, RANK()OVER(partition by membership_number order by Event_Day) as rnk from page_views as pv INNER JOIN orders as o ON pv.Member_ID=o.Membership_Number ) as q where rnk=1
This will only fetch the minimum event_day
.
However, you can use MIN()
to achieve the same (but with complex sub-query
):
select * from (select pv.* from page_views as pv inner join ( select Member_ID, min(event_day) as mn_dt from page_views group by member_id ) as mn ON mn.Member_ID=pv.Member_ID and mn.mn_dt=pv.event_day )as sq INNER JOIN orders as o ON sq.Member_ID=o.Membership_Number
Both the queries will get us the same answer.
See the demo in db<>fiddle