Dont know how to solve the problem.May be you can show right direction or give a link. I have a table:
id Date 23 01.01.2020 23 03.01.2020 23 04.01.2020 56 07.01.2020 56 08.01.2020 87 11.01.2020 23 12.01.2020 23 18.01.2020
I want to aggregate data (id, Date_min) and add new column like this one:
id Date_min Date_new 23 01.01.2020 07.01.2020 56 07.01.2020 11.01.2020 87 11.01.2020 12.01.2020 23 12.01.2020 18.01.2020
In column Data_new I want to see next user’s first date. If there is no next user, add user`s max date
Advertisement
Answer
LEAD will give you the next date, but we also have the slight sticking problem that your ID repeats, so we need something to make the second 23 distinct from the first. For that I guess we can establish a counter that ticks up every time the ID changes:
with a as( select '23' as id, '01.01.2020' as "date" union all select '23' as id, '03.01.2020' as "date" union all select '23' as id, '04.01.2020' as "date" union all select '56' as id, '07.01.2020' as "date" union all select '56' as id, '08.01.2020' as "date" union all select '87' as id, '11.01.2020' as "date" union all select '23' as id, '12.01.2020' as "date" union all select '23' as id, '18.01.2020' as "date" ), b as ( SELECT *, LAG(id) OVER(ORDER BY "date") as last_id FROM a ), c AS( SELECT *, LEAD("date") OVER(ORDER BY "date") as next_date, SUM(CASE WHEN last_id <> id THEN 1 ELSE 0 END) OVER(ORDER BY "date" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) id_ctr FROM b ) SELECT id, MIN("date"), MAX(next_date) FROM c GROUP BY id, id_ctr
I haven’t got a PG instance to test this on, but it works in SQLS and I’m pretty sure that PG supports everything that SQLS does – there isn’t any SQLS specific stuff here
a
takes the place of your table – you can drop it from your query and just straight d awith b as (select... from yourtablenamehere)
b
calculates the previous ID; we’ll use this to detect if the id has changed between current row and prev row. If it changes we’ll put a 1 otherwise a 0. When these are summed as a running total it effectively means the counter ticks up every time the ID changes, so we can group by this counter as well as the ID to split our two23
s apart. We need to do this separately because window functions can’t be nestedc
takes the last_id and does the running total. It also does thenext_date
with a simple window function that pulls the date from the following row (rows ordered by date). theROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
is techincally unnecessary as it’s the default action for a SUM OVER ORDERBY, but I find being explicit helps document/change if needed- then all that is required is to select the id, min date and max next_date, but throw the counter in there too to split the
23
s up – you’re allowed to group by more columns than you select but not the other way round