I am attempting to create a chatbot in Python 3.9 using my own text messages, but am having trouble formatting my data correctly.
I have a table of text messages that looks like this:
row_id | Type | Date | Text |
---|---|---|---|
1 | Incoming | 2020-08-10 08:09:18 | Hi |
2 | Outgoing | 2020-08-10 08:11:04 | Hello |
3 | Incoming | 2020-08-10 08:11:12 | For tomorrow |
4 | Incoming | 2020-08-10 08:11:20 | Are we still on for dinner? |
5 | Outgoing | 2020-08-10 08:11:31 | Let me check. |
6 | Outgoing | 2020-08-10 08:11:43 | Yes |
7 | Incoming | 2020-08-10 08:11:45 | Great! |
What I need to do is combine all the Incoming Texts between the last and next Outgoing, and all the Outgoing Texts between the last and next Incoming into a column.
For example, the above table should look like this:
Incoming | Outgoing |
---|---|
Hi | Hello |
For Tomorrow Are we still on for dinner? | Let me check. Yes |
Great |
The conversation is over 17,000 records long. I am running it in Python 3.9 using sqlite3.
How would I go about completing this task?
Advertisement
Answer
This is a gaps and islands problem. We can use the difference in row numbers method here combined with string aggregation:
WITH cte AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY Date) rn1, ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Date) rn2 FROM yourTable ), cte2 AS ( SELECT *, rn1-rn2 AS grp, DENSE_RANK() OVER (PARTITION BY Type ORDER BY rn1-rn2) rnk FROM cte ORDER BY grp, rnk ) SELECT GROUP_CONCAT(CASE WHEN Type = 'Incoming' THEN TEXT END, ' ') AS Incoming, GROUP_CONCAT(CASE WHEN Type = 'Outgoing' THEN TEXT END, ' ') AS Outgoing FROM cte2 GROUP BY rnk ORDER BY rnk;