I’m working in bigquery with this mock data:
create schema if not exists dbo; create table if not exists dbo.player_history(team_id INT, player_id INT, active_from TIMESTAMP, active_to TIMESTAMP); truncate table dbo.player_history; INSERT INTO dbo.player_history VALUES(1,1,'2020-01-01', '2020-01-08'); INSERT INTO dbo.player_history VALUES(1,2,'2020-06-01', '2020-09-08'); INSERT INTO dbo.player_history VALUES(1,3,'2020-06-10', '2020-10-01'); INSERT INTO dbo.player_history VALUES(1,4,'2020-02-01', '2020-02-15'); INSERT INTO dbo.player_history VALUES(1,5,'2021-01-01', '2021-01-08'); INSERT INTO dbo.player_history VALUES(1,6,'2021-01-02', '2021-06-08'); INSERT INTO dbo.player_history VALUES(1,7,'2021-01-03', '2021-06-08'); INSERT INTO dbo.player_history VALUES(1,8,'2021-01-04', '2021-06-08'); INSERT INTO dbo.player_history VALUES(1,9,'2020-01-02', '2021-02-05'); INSERT INTO dbo.player_history VALUES(1,10,'2020-10-01', '2021-04-08'); INSERT INTO dbo.player_history VALUES(1,11,'2020-11-01', '2021-05-08'); select * from dbo.player_history order by 3, 4
and what I want to get out is the active lineups. The output would look like so:
With the logic behind it being:
I’ve almost cracked it using some sort of lead(valid_from) between valid_to and valid_from and, doing a case when to make it 1 if its a new lineup 0 otherwise, and then doing some sort of cumulative sum on that to get the ID but I’m not able to solve it 100%… I’m very desperate, don’t know where to look anymore.
**correction: lineup 4 & 5 should actually just be one lineup.
Advertisement
Answer
Given that a player can belong to multiple line up as we discussed in the comment section, you might try the approach below using JOIN
:
WITH LINEUPS AS (SELECT a.*,b.player_id as b_player_id FROM `dbo.player_history` a INNER JOIN `dbo.player_history` b on b.active_from BETWEEN a.active_from AND a.active_to ORDER BY 3, 4) SELECT team_id, ROW_NUMBER () OVER (PARTITION BY team_id ORDER BY active_from, active_to) AS lineup_id, active_from, active_to, ARRAY_AGG(DISTINCT b_player_id) as player_ids FROM LINEUPS GROUP BY team_id, active_from, active_to ORDER BY active_from, active_to
Since the output is too long for me to show you via screenshot in Bigquery console, I extracted the results to Google sheets. See below screenshot of output: