I’m attempting to generate a series of dates in Presto SQL (Athena) using unnest
and sequence
something similair to generate_series
in postgres.
my table looks like
job_name | run_date A | '2021-08-21' A | '2021-08-25' B | '2021-08-07' B | '2021-08-24' SELECT d.job_name, d.run_date FROM ( VALUES ('A', '2021-08-21'), ('A', '2021-08-25'), ('B', '2021-08-07'), ('B', '2021-08-24') ) d(job_name, run_date)
I’m aiming for an output as follows
job_name | run_date A | 2021-08-21 A | 2021-08-22 A | 2021-08-23 A | 2021-08-24 A | 2021-08-25 B | 2021-08-07 B | 2021-08-08 B | 2021-08-09 B | 2021-08-10 B | 2021-08-11 B | 2021-08-12 B | 2021-08-13 B | 2021-08-14 B | 2021-08-15 B | 2021-08-16 B | 2021-08-17 B | 2021-08-18 B | 2021-08-19 B | 2021-08-20 B | 2021-08-21 B | 2021-08-22 B | 2021-08-23 B | 2021-08-24
I’ve attempted to use the following query to achieve this – however I get an error when trying to unnest my date sequence
SELECT t.job_name, d.dte FROM (SELECT job_name , min(run_date) as mind , max(run_date) as maxd , SEQUENCE(min(run_date), max(run_date)) as date_arr FROM job_log_table t GROUP BY job_name ) jd CROSS JOIN UNNEST(jd.date_arr) d(dte) LEFT JOIN job_log_table t ON t.job_name = jd.job_name AND t.latest_date = d.dte;
which yields the following error :
[HY000][100071] [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. [ErrorCategory:USER_ERROR, ErrorCode:SYNTAX_ERROR], Detail:SYNTAX_ERROR: line 5:14: Unexpected parameters (date, date) for function sequence. Expected: sequence(bigint, bigint, bigint) , sequence(bigint, bigint) , sequence(timestamp, timestamp, interval day to second) , sequence(timestamp, timestamp, interval year to month)
Is this a limitation of Athena’s flavour of Presto SQL or have I made a school boy error somewhere?
Advertisement
Answer
You need to provide interval
to generate date sequence (in this case interval '1' day
):
WITH dataset AS ( SELECT * FROM ( VALUES ('A', DATE '2021-08-21'), ('A', DATE '2021-08-25'), ('B', DATE '2021-08-07'), ('B', DATE '2021-08-24') ) AS d (job_name, run_date) ) select job_name, sequence(min(run_date), max(run_date), interval '1' day) seq from dataset group by job_name
Output:
job_name | seq |
---|---|
A | [2021-08-21 00:00:00.000, 2021-08-22 00:00:00.000, 2021-08-23 00:00:00.000, 2021-08-24 00:00:00.000, 2021-08-25 00:00:00.000] |
B | [2021-08-07 00:00:00.000, 2021-08-08 00:00:00.000, 2021-08-09 00:00:00.000, 2021-08-10 00:00:00.000, 2021-08-11 00:00:00.000, 2021-08-12 00:00:00.000, 2021-08-13 00:00:00.000, 2021-08-14 00:00:00.000, 2021-08-15 00:00:00.000, 2021-08-16 00:00:00.000, 2021-08-17 00:00:00.000, 2021-08-18 00:00:00.000, 2021-08-19 00:00:00.000, 2021-08-20 00:00:00.000, 2021-08-21 00:00:00.000, 2021-08-22 00:00:00.000, 2021-08-23 00:00:00.000, 2021-08-24 00:00:00.000] |