I have a data set with more than 2 million rows in a Oracle SQL table and want to do some association analysis on this. To apply a sequence mining algorithm to this data, I need a column named ‘sequenceId’ and a column ‘eventId’.
The table structure looks like this:
- time
- pId
- uId
Now I need an id that increments every time the uId changes. How can I do that in Oracle SQL? I tried it in R but it takes more than 12 hours there…
Sample data:
time | pId | uId |
---|---|---|
2019-10-01 12:12:24 | 3806 | 535447446 |
2019-10-01 19:51:55 | 3762 | 535447446 |
2019-10-02 18:09:34 | 3806 | 552286734 |
2019-10-02 17:54:01 | 3928 | 493964166 |
Expected result:
time | pId | uId | sequence id |
---|---|---|---|
2019-10-01 12:12:24 | 3806 | 535447446 | 1 |
2019-10-01 19:51:55 | 3762 | 535447446 | 1 |
2019-10-02 18:09:34 | 3806 | 552286734 | 2 |
2019-10-02 17:54:01 | 3928 | 493964166 | 3 |
The id should increment when the user_id changes
Advertisement
Answer
WITH source_data AS ( SELECT '2019-10-01 12:12:24' AS timestamp, 3806 AS product_id, 535447446 AS user_id FROM DUAL UNION ALL SELECT '2019-10-01 19:51:55', 3762, 535447446 FROM DUAL UNION ALL SELECT '2019-10-02 18:09:34', 3806, 552286734 FROM DUAL UNION ALL SELECT '2019-10-02 17:54:01', 3928, 493964166 FROM DUAL ), cte AS ( SELECT timestamp, product_id, user_id, CASE WHEN user_id = LAG(user_id) OVER (ORDER BY timestamp) THEN 0 ELSE 1 END new_user FROM source_data ) SELECT timestamp, product_id, user_id, SUM(new_user) OVER (ORDER BY timestamp) sequence_id FROM cte;
TIMESTAMP PRODUCT_ID USER_ID SEQUENCE_ID 2019-10-01 12:12:24 3806 535447446 1 2019-10-01 19:51:55 3762 535447446 1 2019-10-02 17:54:01 3928 493964166 2 2019-10-02 18:09:34 3806 552286734 3