I have a table with measurements of weather here is a simplified version of it:
"station_id","measured_at","rainy" ------------------------------------------------------------------------- "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:35:35.27+00",FALSE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:36:33.976+00",FALSE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:37:33.864+00",FALSE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:38:34.767+00",TRUE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:39:36.076+00",TRUE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:40:29.776+00",FALSE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:41:35.579+00",FALSE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:42:34.274+00",TRUE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:43:23.842+00",TRUE "b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:44:35.08+00",FALSE
and I need to calculate how much time is been sunny. I know I need to use window functions but I’m stuck. Because I need this interval to be calculated in some range for example for last day. I have been able to make a query to this stage.
SELECT prev.station_id, prev.rainy, prev.measured_at AS started_at, COALESCE(LEAD(prev.measured_at) OVER (ORDER BY prev.measured_at ASC), NOW()) AS ended_at, (COALESCE(LEAD(prev.measured_at) OVER (ORDER BY prev.measured_at ASC), NOW()) - prev.measured_at) AS diff FROM ( SELECT m.station_id, m.measured_at, m.rainy, COALESCE(LEAD(m.rainy) OVER (ORDER BY m.measured_at ASC), m.rainy) AS prev_rainy FROM z_measurements m WHERE m.measured_at >= '2020-01-30T00:00:00.000Z' ORDER BY m.measured_at ASC ) prev WHERE prev.rainy IS DISTINCT FROM prev.prev_rainy ORDER BY prev.measured_at ASC;
this query results in to:
"station_id","rainy","started_at","ended_at","diff" --------------------------------------------------- "b6b53561-dab5-4b9a-8d28-a7de1e4d1485",FALSE,"2020-01-31 18:37:33.864","2020-01-31 18:39:36.076+00","00:02:02.212" "b6b53561-dab5-4b9a-8d28-a7de1e4d1485",TRUE,"2020-01-31 18:39:36.076","2020-01-31 18:41:35.579+00","00:01:59.503" "b6b53561-dab5-4b9a-8d28-a7de1e4d1485",FALSE,"2020-01-31 18:41:35.579","2020-01-31 18:43:23.842+00","00:01:48.263" "b6b53561-dab5-4b9a-8d28-a7de1e4d1485",TRUE,"2020-01-31 18:43:23.842","2020-01-31 21:18:04.89333+00","02:34:41.05133"
but if the first lane of this result is rainy = False
the start_date
should be the same as one in query 2020-01-30T00:00:00.000Z
(I’m assuming that if it’s sunny on the first result it has been sunny for whole time between my range start and this first recorded measurement) and also there is missing the last row because last measured state is sunny. So started_at
for this last row should be 2020-01-31 18:44:35.08+00
and end_at
should be NOW()
.
Can somebody please help me?
I’m using postgresql 12.1.
Advertisement
Answer
SELECT t.station_id, t.rainy, t.started_at, t.ended_at, t.ended_at - t.started_at AS diff FROM ( SELECT prev.station_id, prev.rainy, CASE WHEN LAG(prev.measured_at) OVER measured_at_by_station_id IS NULL THEN '2020-01-30T00:00:00.000Z' ELSE prev.measured_at END AS started_at, LEAD(prev.measured_at, 1, NOW()) OVER measured_at_by_station_id AS ended_at FROM ( SELECT m.station_id, m.measured_at, m.rainy, LAG(m.rainy, 1, NOT(m.rainy)) OVER (PARTITION BY m.station_id ORDER BY m.measured_at ASC) AS prev_rainy FROM z_measurements m WHERE m.measured_at >= '2020-01-30T00:00:00.000Z' ORDER BY m.station_id ASC, m.measured_at ASC ) prev WHERE prev.rainy IS DISTINCT FROM prev.prev_rainy WINDOW measured_at_by_station_id AS (PARTITION BY prev.station_id ORDER BY prev.measured_at ASC) ) t ORDER BY t.station_id ASC, t.started_at ASC