BigQuery: need a clever solution for a difficult query

Question

The context of this problem is a Live Chat service. Each conversation consists of multiple messages of which the timestamp is registred. Conversations are held in one or more channels, depending on the difficulty of the inquiry. The picture below shows an example of the data. For testing purposes the sheet ca…

Accepted Answer

The channel for each message that doesn&#8217;t already have one is the channel of the most recent earlier message in the same conversation that does have a channel. To handle rows 2-4 we can say that if there is no earlier message with a channel, we&#8217;ll take the earliest message in the conversation that does have a channel.I&#8217;ve renamed your table to &#8220;Messages&#8221; for clarity.SELECT a.timestamp, a.conversation,   COALESCE(    -- Message already has a channel    a.channel,    -- Channel from most recent earlier message    (SELECT MAX(c.channel) FROM Messages c         WHERE c.conversation = a.conversation         AND c.timestamp =           (SELECT MAX(c2.timestamp) FROM Messages c2                WHERE c2.conversation = a.conversation                AND c2.channel IS NOT NULL                AND c2.timestamp < a.timestamp)),    -- Channel of earliest message    (SELECT MAX(c.channel) FROM Messages c         WHERE c.conversation = a.conversation         AND c.timestamp =           (SELECT MIN(c2.timestamp) FROM Messages c2                WHERE c2.conversation = a.conversation                AND c2.channel IS NOT NULL))) AS channelFROM Messages a;There&#8217;s another solution that involves assigning a number to each row of the original table, then using a recursive CTE to find the previous non-NULL value. You&#8217;d need to figure out how to handle rows 2-4 in this case. I think my solution is a more straightforward implementation of the channel selection logic you described.

Advertisement

Answer