Skip to content
Advertisement

How to merge different rows in SQL

I have a table with playid, userid, created, and stopped. I want to merge certain rows by keeping the earliest created and the latest stopped within 120 minutes from the earliest created.

See the following table:

playid userid created stopped
1 a01 2021-01-30 02:29:58.559858 2021-01-30 02:40:56.558436
2 a01 2021-01-30 02:41:24.023358 2021-01-30 02:55:24.112713
3 a01 2021-01-30 02:57:30.178579 2021-01-30 03:11:14.866678
4 a01 2021-01-30 03:11:41.098424 2021-01-30 03:22:50.155918
5 a01 2021-01-30 03:23:20.545288 2021-01-30 03:36:37.027486
6 a01 2021-01-30 03:46:10.237971 2021-01-30 03:59:17.526151
7 a01 2021-01-30 03:59:57.020326 2021-01-30 04:16:25.948311
8 a01 2021-01-30 04:26:28.968568 2021-01-30 04:30:11.863079
9 a01 2021-01-30 04:30:14.43786 2021-01-30 04:41:42.231915
10 a01 2021-01-30 04:43:17.669945 2021-01-30 04:45:10.443101
11 a01 2021-01-30 04:45:14.50346 2021-01-30 04:47:07.082114
12 a01 2021-01-30 04:47:10.334574 2021-01-30 04:49:30.96017
13 a01 2021-01-30 04:49:34.146011 2021-01-30 04:50:54.45988
14 a01 2021-01-30 04:50:57.948305 2021-01-30 05:11:01.246284
15 a01 2021-01-30 05:39:29.387396 2021-01-30 05:41:39.508654
16 a01 2021-01-30 05:41:44.524951 2021-01-30 05:43:38.231266
17 a01 2021-01-30 05:43:40.785809 2021-01-30 05:54:40.711381
18 a01 2021-01-30 05:55:10.851725 2021-01-30 05:58:24.262351
19 a01 2021-01-30 05:58:29.43821 2021-01-30 06:00:50.870644
20 a01 2021-01-30 06:00:54.168696 2021-01-30 06:12:37.210234
21 a01 2021-01-30 14:41:01.529666 2021-01-30 14:53:01.349479
22 a01 2021-01-30 14:53:48.053136 2021-01-30 15:05:09.532786
23 a01 2021-01-31 15:11:08.547111 2021-01-31 15:11:37.564802
24 a01 2021-01-31 15:11:40.248499 2021-01-31 15:13:40.21787
25 a01 2021-01-31 15:13:59.373145 2021-01-31 15:31:54.099898
26 a01 2021-01-31 15:32:23.20448 2021-01-31 15:46:33.993751
27 a01 2021-01-31 16:55:19.141051 2021-01-31 17:03:43.464444

I want the query to give this table (see below). Please note that the next created is after the previous stopped, meaning the created with playid “8” is after stopped with playid “1”:

playid userid created stopped
1 a01 2021-01-30 02:29:58.559858 2021-01-30 04:16:25.948311
8 a01 2021-01-30 04:26:28.968568 2021-01-30 06:12:37.210234
21 a01 2021-01-30 14:41:01.529666 2021-01-30 15:05:09.532786
23 a01 2021-01-31 15:11:08.547111 2021-01-31 17:03:43.464444

Here is the query I have tried:

and querying the previous table using this query:

This is the (wrong) table I get when I run the first query:

playid userid created stopped
1 a01 2021-01-30 02:29:58.559858 2021-01-30 04:16:25.948311
2 a01 2021-01-30 02:41:24.023358 2021-01-30 04:30:11.863079
3 a01 2021-01-30 02:57:30.178579 2021-01-30 04:50:54.45988
4 a01 2021-01-30 03:11:41.098424 2021-01-30 05:11:01.246284
5 a01 2021-01-30 03:23:20.545288 2021-01-30 05:11:01.246284
6 a01 2021-01-30 03:46:10.237971 2021-01-30 05:43:38.231266
7 a01 2021-01-30 03:59:57.020326 2021-01-30 05:58:24.262351
8 a01 2021-01-30 04:26:28.968568 2021-01-30 06:12:37.210234
9 a01 2021-01-30 04:30:14.43786 2021-01-30 06:12:37.210234
10 a01 2021-01-30 04:43:17.669945 2021-01-30 06:12:37.210234
11 a01 2021-01-30 04:45:14.50346 2021-01-30 06:12:37.210234
12 a01 2021-01-30 04:47:10.334574 2021-01-30 06:12:37.210234
13 a01 2021-01-30 04:49:34.146011 2021-01-30 06:12:37.210234
14 a01 2021-01-30 04:50:57.948305 2021-01-30 06:12:37.210234
15 a01 2021-01-30 05:39:29.387396 2021-01-30 06:12:37.210234
16 a01 2021-01-30 05:41:44.524951 2021-01-30 06:12:37.210234
17 a01 2021-01-30 05:43:40.785809 2021-01-30 06:12:37.210234
18 a01 2021-01-30 05:55:10.851725 2021-01-30 06:12:37.210234
19 a01 2021-01-30 05:58:29.43821 2021-01-30 06:12:37.210234
20 a01 2021-01-30 06:00:54.168696 2021-01-30 06:12:37.210234
21 a01 2021-01-30 14:41:01.529666 2021-01-30 15:05:09.532786
22 a01 2021-01-30 14:53:48.053136 2021-01-30 15:05:09.532786
23 a01 2021-01-31 15:11:08.547111 2021-01-31 17:03:43.464444
24 a01 2021-01-31 15:11:40.248499 2021-01-31 17:03:43.464444
25 a01 2021-01-31 15:13:59.373145 2021-01-31 17:03:43.464444
26 a01 2021-01-31 15:32:23.20448 2021-01-31 17:03:43.464444
27 a01 2021-01-31 16:55:19.141051 2021-01-31 17:03:43.464444

The problem here is that the column created in the next row does not jump to the next playid with a created value which is larger/later than the stopped value of the current playid. Instead it just iterates through the table. Please see the tables above and use the playid column to make it easier to understand.

PS: I am very new to SQL. Please help me optimize this if this is very ineffective. Would be best to only have one query, of course.

Thank you!

Advertisement

Answer

This needs loops/recursion which is not supported natively in BigQuery SQL. Try user-defined functions, but they are not very scalable:

enter image description here

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement