Data source
User ID | Visit Date |
---|---|
1 | 2020-01-01 12:29:15 |
1 | 2020-01-02 12:30:11 |
1 | 2020-04-01 12:31:01 |
2 | 2020-05-01 12:31:14 |
Problem
I need advice im trying to do sub query for this result to mark user as retention if they havent visit back for 3 month. i using this query for the data to get user’s latest visit each month includes null
select u.user_id, gs.yyyymm, s.last_visit_date from (select distinct user_id from source s) u cross join generate_series('2021-01-01'::timestamp, '2021-12-01'::timestamp, interval '1 month' ) gs(yyyymm) left join lateral (select max(s.visit_date) as last_visit_date from source s where s.user_id = u.user_id and s.visit_date >= gs.yyyymm and s.visit_date < gs.yyyymm + interval '1 month' ) s on 1=1;
but i think its really affect to performance if user keep increasing, do you guys have any advice to achieve result like below?
Expected Result
Month | User ID | Type |
---|---|---|
1 | 1 | FIRST |
2 | 1 | RETENTION |
3 | 1 | RETENTION |
4 | 1 | REACTIVATE |
…. | ||
12 | 1 | null |
1 | 2 | null |
… | ||
5 | 2 | FIRST |
6 | 2 | RETENTION |
7 | 2 | RETENTION |
8 | 2 | RETENTION |
9 | 2 | null |
… and so on |
or it could be like this
Month | First | Retention | Reactiavate |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 |
3 | 0 | 1 | 0 |
4 | 0 | 0 | 1 |
5 | 1 | 0 | 0 |
6 | 0 | 1 | 0 |
7 | 0 | 1 | 0 |
8 | 0 | 1 | 0 |
9 | 0 | 0 | 0 |
… and so on |
Advertisement
Answer
My solution have some reasonable requirements, but I can work without it (the price is some performance).
I built some helper tables what is autofilled with trigger.
The requirement is, that UPDATE
or DELETE
isn’t allowed on visit table.
mst_user table stores distinct user_id-s and it’s first_visit. user_monthly_visit table stores the last and the first visit_date pro user_id and month.
TABLES
CREATE TABLE mst_user ( id BIGINT, first_visit TIMESTAMP, CONSTRAINT pk_mst_user PRIMARY KEY (id) ); CREATE TABLE visit ( user_id BIGINT, visit_date TIMESTAMP, CONSTRAINT visit_user_fkey FOREIGN KEY (user_id) REFERENCES mst_user (id) ); CREATE TABLE user_monthly_visit ( user_id BIGINT, month DATE, first_visit_this_month TIMESTAMP, last_visit_this_month TIMESTAMP, CONSTRAINT pk_user_monthly_visit PRIMARY KEY (user_id, month), CONSTRAINT user_monthly_visit_user_fkey FOREIGN KEY (user_id) REFERENCES mst_user (id) ); CREATE INDEX ix_user_monthly_visit_month ON user_monthly_visit(month);
TRIGGER
CREATE OR REPLACE FUNCTION trf_visit() RETURNS trigger VOLATILE AS $xx$ DECLARE l_user_id BIGINT; l_row RECORD; l_user_monthly_visit user_monthly_visit; BEGIN IF (tg_op = 'INSERT') THEN l_row := NEW; INSERT INTO mst_user(id, first_visit) VALUES (l_row.user_id, l_row.visit_date) ON CONFLICT(id) DO UPDATE SET first_visit = LEAST(mst_user.first_visit, l_row.visit_date); INSERT INTO user_monthly_visit(user_id,month,first_visit_this_month,last_visit_this_month) VALUES (l_row.user_id,date_trunc('month',l_row.visit_date),l_row.visit_date,l_row.visit_date) ON CONFLICT(user_id,month) DO UPDATE SET first_visit_this_month = LEAST(user_monthly_visit.first_visit_this_month,l_row.visit_date), last_visit_this_month = GREATEST(user_monthly_visit.last_visit_this_month,l_row.visit_date); ELSE RAISE EXCEPTION 'UPDATE and DELETE arent allowed!'; END IF; RETURN l_row; END; $xx$ LANGUAGE plpgsql; CREATE TRIGGER trig_visit BEFORE INSERT OR DELETE OR UPDATE ON visit FOR EACH ROW EXECUTE PROCEDURE trf_visit();
TESTDATA
INSERT INTO visit (user_id, visit_date) VALUES (1, '20200101 122915'); INSERT INTO visit (user_id, visit_date) VALUES (1, '20200102 123011'); INSERT INTO visit (user_id, visit_date) VALUES (1, '20200401 123101'); INSERT INTO visit (user_id, visit_date) VALUES (2, '20200501 123114');
QUERY
SELECT mnt AS month, user_id, CASE WHEN first_visit IS NULL OR first_visit> yyyymm + INTERVAL '1 month' THEN NULL WHEN first_visit_this_month = first_visit THEN 'FIRST' WHEN first_visit_this_month IS NULL AND last_three_month + INTERVAL '3 month' >= yyyymm THEN 'RETENTION' WHEN first_visit_this_month IS NOT NULL THEN 'REACTIVATE' ELSE NULL END user_type FROM (SELECT date_part('month', gs.yyyymm)::INTEGER AS mnt, gs.yyyymm, u.id user_id, umv.first_visit_this_month, umv.last_visit_this_month, u.first_visit, GREATEST( LAG(last_visit_this_month) OVER w, LAG(last_visit_this_month,2) OVER w, LAG(last_visit_this_month,3) OVER w ) last_three_month FROM generate_series('2020-01-01'::TIMESTAMP, '2020-12-01'::TIMESTAMP, INTERVAL '1 month') gs(yyyymm) CROSS JOIN mst_user u LEFT JOIN user_monthly_visit umv on (umv.user_id=u.id AND umv.month = gs.yyyymm) WINDOW w AS (PARTITION BY u.id ORDER BY gs.yyyymm) ) monthly_visit ORDER BY user_id,mnt;
RESULT
month | user_id | user_type |
---|---|---|
1 | 1 | FIRST |
2 | 1 | RETENTION |
3 | 1 | RETENTION |
4 | 1 | REACTIVATE |
5 | 1 | RETENTION |
6 | 1 | RETENTION |
7 | 1 | RETENTION |
8 | 1 | (null) |
9 | 1 | (null) |
10 | 1 | (null) |
11 | 1 | (null) |
12 | 1 | (null) |
1 | 2 | (null) |
2 | 2 | (null) |
3 | 2 | (null) |
4 | 2 | (null) |
5 | 2 | FIRST |
6 | 2 | RETENTION |
7 | 2 | RETENTION |
8 | 2 | RETENTION |
9 | 2 | (null) |
10 | 2 | (null) |
11 | 2 | (null) |
12 | 2 | (null) |