Skip to content
Advertisement

Create Missing Data Hive SQL

I have a table that has an activity date of when things change such as

Basically this in relation to a line there are 3 activities happening on this account. They have a basic account then they downgrade but then they upgrade again

I would like to have these happen in steps such as

Then I would like to partition them by their activity and see this in the end results so I can calculate how many users stayed in their downgraded state for more than 30 days to see their behavior compared to a upgraded change.

I have tried doing Coalesce then row_num but I can’t wrap my head around how to partition out each activity based on when they changed their account status.

Advertisement

Answer

Generate rows using posexplode(split(space(datediff(next_date,activity_date)-1),’ ‘)). Calculate new_group flag when previous activity<>current activity. Use analytic sum() to calculate group(partition) number. See comments in the code:

Result:

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement