Skip to content
Advertisement

How to retrieve the earliest date from columns that have different dimensions?

DATA Explanation

I have two data tables, one (PAGE VIEWS) which represents user events (CV 1,2,3 etc) and associated timestamp with member ID. The second table (ORDERS) represents the orders made – event time & order value. Membership ID is available on each table.

Table 1 – PAGE VIEWS (1,000 Rows in Total)

Event_Day Member ID CV1 CV2 CV3 CV4
11/5/2021 115126 APP camp1 Trigger APP-camp1-Trigger
11/14/2021 189192 SEARCH camp4 Search SEARCH-camp4-Search
11/5/2021 193320 SEARCH camp5 Search SEARCH-camp5-Search

Table 2 – ORDERS (249 rows in total)

Date Purchase Order ID Membership Number Order Value
7/12/2021 0088 183300 29.34
18/12/2021 0180 132159 132.51
4/12/2021 0050 141542 24.35

What I’m trying to answer

I’d like to attribute the CV columns (PAGE VIEWS) with the (ORDERS) order value, by the earliest event date in (PAGE VIEWS). This would be a simple attribution use case.

Visual explanation of the two data tables

Issues

I’ve spent the weekend result and scrolling through a variety of online articles but the closest is using the following query

Select min (event_day) As "first date",member_id,cv2,order_value,purchase_order_id
from mta_app_allpages,mta_app_orders
where member_id = membership_number
group by member_id,cv2,order_value,purchase_order_id;

The resulting data is correct using the DISTINCT function as Row 2 is different to Row 1, but I’d like to associate the result to Row 1 for member_id 113290, and row 3 for member_id 170897 etc.

Date member_id cv2 Order Value
2021-11-01 113290 camp5 58.81
2021-11-05 113290 camp4 58.51
2021-11-03 170897 camp3 36.26
2021-11-09 170897 camp5 36.26
2021-11-24 170897 camp1 36.26

Image showing the results table

I’ve tried using partition and sub query functions will little success. The correct call should return a maximum of 249 rows as that is as many rows as I have in the ORDERS table.

First-time poster so hopefully I have the format right. Many thanks.

Advertisement

Answer

Using RANK() is the best approach:

select * from
     (
       select *, RANK()OVER(partition by membership_number order by Event_Day) as rnk
       from page_views as pv
       INNER JOIN orders as o
       ON pv.Member_ID=o.Membership_Number
     ) as q
where rnk=1

This will only fetch the minimum event_day.

However, you can use MIN() to achieve the same (but with complex sub-query):

select *
from 
(select pv.* 
from page_views as pv
inner join
       (
       select Member_ID, min(event_day) as mn_dt
       from page_views
       group by member_id
       ) as mn
   ON mn.Member_ID=pv.Member_ID and mn.mn_dt=pv.event_day
)as sq
INNER JOIN orders as o
ON sq.Member_ID=o.Membership_Number

Both the queries will get us the same answer.

See the demo in db<>fiddle

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement