Tables:
Sessions
- session_ts
- visitor_id
- vertical
- session_id
Transactions
- session_ts
- session_id
- rev_bucket
- revenue
Currently have the following query (using SQLite):
SELECT s.visitor_id, sub.session_id, month, year, total_rev, CASE WHEN (row_num IN (1,2) >= total_rev >= 500) THEN 'Yes' ELSE 'No' END AS High_Value_Transactions, sub.row_num FROM sessions s JOIN ( SELECT s.visitor_id, t.session_id, strftime('%m',t.session_ts) as month, strftime('%Y',t.session_ts) as year, SUM(t.revenue) as total_rev, row_number() OVER(PARTITION BY s.visitor_id ORDER BY s.session_ts) as row_num FROM Transactions t JOIN sessions s ON s.session_id = t.session_id WHERE strftime('%m',t.session_ts) = '01' AND strftime('%Y',t.session_ts) = '2020' GROUP BY 1,2 ) sub ON s.session_id = sub.session_id WHERE sub.row_num IN (1,2) ORDER BY 1
I’m having trouble identifying the first two sessions that combine for $500. Open to any feedback and simplifying of query. Thanks!
Advertisement
Answer
You can use window functions and aggregation:
select visitor_id, sum(t.revenue) total_revenue from ( select s.visitor_id, t.revenue, row_number() over(partition by s.visitor_id order by t.session_ts) rn from transactions t inner join sessions s on s.session_id = t.session_id where t.session_ts >= '2020-01-01' and t.session_ts < '2020-02-01' ) t where rn <= 2 group by visitor_id having sum(t.revenue) >= 500
The subquery joins the two tables, filters on the target month (note that using half-open interval predicates is more efficient than applying date functions on the date column), and ranks each row within groups of visits of the same customer.
Then, the outer query filters on the first two visits per visitor, aggregates by visitor, computes the corresponding revenue, and filters it with a having
clause.