I am attempting to create a chatbot in Python 3.9 using my own text messages, but am having trouble formatting my data correctly.
I have a table of text messages that looks like this:
| row_id | Type | Date | Text | 
|---|---|---|---|
| 1 | Incoming | 2020-08-10 08:09:18 | Hi | 
| 2 | Outgoing | 2020-08-10 08:11:04 | Hello | 
| 3 | Incoming | 2020-08-10 08:11:12 | For tomorrow | 
| 4 | Incoming | 2020-08-10 08:11:20 | Are we still on for dinner? | 
| 5 | Outgoing | 2020-08-10 08:11:31 | Let me check. | 
| 6 | Outgoing | 2020-08-10 08:11:43 | Yes | 
| 7 | Incoming | 2020-08-10 08:11:45 | Great! | 
What I need to do is combine all the Incoming Texts between the last and next Outgoing, and all the Outgoing Texts between the last and next Incoming into a column.
For example, the above table should look like this:
| Incoming | Outgoing | 
|---|---|
| Hi | Hello | 
| For Tomorrow Are we still on for dinner? | Let me check. Yes | 
| Great | 
The conversation is over 17,000 records long. I am running it in Python 3.9 using sqlite3.
How would I go about completing this task?
Advertisement
Answer
This is a gaps and islands problem. We can use the difference in row numbers method here combined with string aggregation:
WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Date) rn1,
              ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Date) rn2
    FROM yourTable
),
cte2 AS (
    SELECT *, rn1-rn2 AS grp, DENSE_RANK() OVER (PARTITION BY Type ORDER BY rn1-rn2) rnk
    FROM cte
    ORDER BY grp, rnk
)
SELECT
    GROUP_CONCAT(CASE WHEN Type = 'Incoming' THEN TEXT END, ' ') AS Incoming,
    GROUP_CONCAT(CASE WHEN Type = 'Outgoing' THEN TEXT END, ' ') AS Outgoing
FROM cte2
GROUP BY
    rnk
ORDER BY
    rnk;
