I have a table that has an activity date of when things change such as
2020-08-13 123 Upgrade 2020-08-17 123 Downgrade 2020-08-21 123 Upgrade
Basically this in relation to a line there are 3 activities happening on this account. They have a basic account then they downgrade but then they upgrade again
I would like to have these happen in steps such as
2020-08-13 123 Upgrade1 2020-08-14 123 Upgrade1 2020-08-15 123 Upgrade1 2020-08-16 123 Upgrade1 2020-08-17 123 Downgrade1 2020-08-18 123 Downgrade1 2020-08-19 123 Downgrade1 2020-08-20 123 Downgrade1 2020-08-21 123 Upgrade2 . . . 2020-09-09 123 Upgrade2
Then I would like to partition them by their activity and see this in the end results so I can calculate how many users stayed in their downgraded state for more than 30 days to see their behavior compared to a upgraded change.
2020-08-13 123 Upgrade1. 1 2020-08-14 123 Upgrade1. 2 2020-08-15 123 Upgrade1. 3 2020-08-16 123 Upgrade1. 4 2020-08-17 123 Downgrade1. 1 2020-08-18 123 Downgrade1. 2 2020-08-19 123 Downgrade1. 3 2020-08-20 123 Downgrade1. 4 2020-08-21 123 Upgrade2. 1 . . . 2020-09-09 123 Upgrade2. 18
I have tried doing Coalesce then row_num but I can’t wrap my head around how to partition out each activity based on when they changed their account status.
Advertisement
Answer
Generate rows using posexplode(split(space(datediff(next_date,activity_date)-1),’ ‘)). Calculate new_group flag when previous activity<>current activity. Use analytic sum() to calculate group(partition) number. See comments in the code:
with mydata as ( select stack(3, '2020-08-13', 123, 'Upgrade', '2020-08-17', 123, 'Downgrade', '2020-08-21', 123, 'Upgrade' ) as (activity_date, account, activity) ) --calculate row_number in account, activity select activity_date, account, activity, activity_partition, row_number() over(partition by account, activity_partition order by activity_date ) activity_partition_rn, count(*) over(partition by account, activity_partition ) days_on_activity from (--Calculate partition select activity_date, account, activity, concat(activity, sum(new_group_flag) over(partition by account, activity order by activity_date rows between unbounded preceding and current row) ) activity_partition from (--Calculate new group flag select activity_date, account, activity, case when lag(activity) over (partition by account order by activity_date) = activity then 0 else 1 end as new_group_flag from ( --generate Date range select date_add(activity_date,i) as activity_date, account, activity from ( --Get next_date to generate date range select activity_date, account, activity, lead(activity_date,1, activity_date) over (partition by account order by activity_date) next_date from mydata d ) s lateral view posexplode(split(space(datediff(next_date,activity_date)-1),' ')) e as i,x --generate rows )s )s )s order by activity_date;
Result:
activity_date account activity activity_partition activity_partition_rn days_on_activity 2020-08-13 123 Upgrade Upgrade1 1 4 2020-08-14 123 Upgrade Upgrade1 2 4 2020-08-15 123 Upgrade Upgrade1 3 4 2020-08-16 123 Upgrade Upgrade1 4 4 2020-08-17 123 Downgrade Downgrade1 1 4 2020-08-18 123 Downgrade Downgrade1 2 4 2020-08-19 123 Downgrade Downgrade1 3 4 2020-08-20 123 Downgrade Downgrade1 4 4 2020-08-21 123 Upgrade Upgrade2 1 1