The context of this problem is a Live Chat service. Each conversation consists of multiple messages of which the timestamp is registred. Conversations are held in one or more channels, depending on the difficulty of the inquiry.
The picture below shows an example of the data. For testing purposes the sheet can be found here.
In the picture you see the initial difficulty is determined at row 5 (based on the content of the messages). Rows 2 till 4 should have the same difficulty, as these messages form a group. After that it gets easier, as all messages between two difficulties belong to the one earlier in time (basically messages are of difficulty X until difficulty Y is introduced). The colors in the picture show which difficulty should belong to which messages.
So far I have succeeded in writing a query which can assign the correct difficulty to rows 5 till 29. Using the same logic and a subquery, I could probably also find a way to assign the correct difficulty to messages before the first channel ID, but I was wondering of any of you see a cleaner, more intuitive way to solve this.
WITH channelMessage AS ( SELECT conversation, channel, timestamp FROM MessagesTable WHERE channel IS NOT NULL AND channel != '' ), messages AS ( SELECT a.message, b.channel, a.conversation, a.timestamp, CASE WHEN b.timestamp IS NULL THEN 0 ELSE TIMESTAMP_DIFF(b.timestamp, a.timestamp, MILLISECOND) END AS diff, CASE WHEN b.timestamp IS NULL THEN 0 ELSE MAX(TIMESTAMP_DIFF(b.timestamp, a.timestamp, MILLISECOND)) OVER (PARTITION BY a.message ORDER BY a.timestamp) END AS diff2 FROM MessagesTable AS a LEFT JOIN (SELECT * FROM channelMessage) AS b ON a.conversation = b.conversation AND a.timestamp >= b.timestamp ) SELECT timestamp, conversation, message, channel, diff, diff2, FROM messages WHERE diff = diff2 ORDER BY timestamp
Advertisement
Answer
The channel for each message that doesn’t already have one is the channel of the most recent earlier message in the same conversation that does have a channel. To handle rows 2-4 we can say that if there is no earlier message with a channel, we’ll take the earliest message in the conversation that does have a channel.
I’ve renamed your table to “Messages” for clarity.
SELECT a.timestamp, a.conversation, COALESCE( -- Message already has a channel a.channel, -- Channel from most recent earlier message (SELECT MAX(c.channel) FROM Messages c WHERE c.conversation = a.conversation AND c.timestamp = (SELECT MAX(c2.timestamp) FROM Messages c2 WHERE c2.conversation = a.conversation AND c2.channel IS NOT NULL AND c2.timestamp < a.timestamp)), -- Channel of earliest message (SELECT MAX(c.channel) FROM Messages c WHERE c.conversation = a.conversation AND c.timestamp = (SELECT MIN(c2.timestamp) FROM Messages c2 WHERE c2.conversation = a.conversation AND c2.channel IS NOT NULL))) AS channel FROM Messages a;
There’s another solution that involves assigning a number to each row of the original table, then using a recursive CTE to find the previous non-NULL value. You’d need to figure out how to handle rows 2-4 in this case. I think my solution is a more straightforward implementation of the channel selection logic you described.