Skip to content
Advertisement

Group rows based on column values in SQL / BigQuery

Is it possible to “group” rows within BigQuery/SQL depending on column values? Let’s say I want to assign a string/id for all rows between stream_start_init and stream_start and then do the same for the rows between stream_resume and the last stream_ad.

The amount of stream_ad event can differ hence I can’t use a RANK() or ROW() to group them be based on those values.

How I wish the table to be

Advertisement

Answer

I wouldn’t assign a string. I would assign a number. This appears to be a cumulative sum. I think a sum of the number of “stream_start_init” and “stream_resume” does what you want:

Note that this produces 0 for the first group — which seems like a good thing. You can convert that to a NULL using NULLIF().

If you really want strings, you can use CONCAT().

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement