Skip to content
Advertisement

BigQuery: need a clever solution for a difficult query

The context of this problem is a Live Chat service. Each conversation consists of multiple messages of which the timestamp is registred. Conversations are held in one or more channels, depending on the difficulty of the inquiry.

The picture below shows an example of the data. For testing purposes the sheet can be found here.

enter image description here

In the picture you see the initial difficulty is determined at row 5 (based on the content of the messages). Rows 2 till 4 should have the same difficulty, as these messages form a group. After that it gets easier, as all messages between two difficulties belong to the one earlier in time (basically messages are of difficulty X until difficulty Y is introduced). The colors in the picture show which difficulty should belong to which messages.

So far I have succeeded in writing a query which can assign the correct difficulty to rows 5 till 29. Using the same logic and a subquery, I could probably also find a way to assign the correct difficulty to messages before the first channel ID, but I was wondering of any of you see a cleaner, more intuitive way to solve this.

Advertisement

Answer

The channel for each message that doesn’t already have one is the channel of the most recent earlier message in the same conversation that does have a channel. To handle rows 2-4 we can say that if there is no earlier message with a channel, we’ll take the earliest message in the conversation that does have a channel.

I’ve renamed your table to “Messages” for clarity.

There’s another solution that involves assigning a number to each row of the original table, then using a recursive CTE to find the previous non-NULL value. You’d need to figure out how to handle rows 2-4 in this case. I think my solution is a more straightforward implementation of the channel selection logic you described.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement