I have a database that I am querying with athena.
I am using subqueries to select a subset of the data like so
(select item , max(date) from (select * from tablename where x=y) groupby 1 ) as p LEFT JOIN (select item ,date , max(date2) from (select * from tablename where x=y) groupby 1,2 ) as q ON p.item=q.item ;
can I save the query results of
select * from tablename where x=y
in a variable VAR
so that we need not query it again and again and also to make query look cleaner?
Advertisement
Answer
There is no such concept as variable in Presto. You can using the query results from S3 as the source for the new table but I would argue that it can be brittle and cumbersome. As for “reusing” the code – you can achieve that with WITH
clause designed especially for that:
with cte as (select * from tablename where x=y), p as ( select item, max(date) from cte group by 1 ), q as( select item, date, max(date2) from cte group by 1,2 ) select * from p LEFT JOIN q as q ON p.item = q.item;
Another approach which can be worth looking at is using grouping sets (see corresponding point of “Complex Grouping Operations” in the GROUP BY
Clause docs):
with dataset(item, date, date2) as ( VALUES ('id1', 1, 1), ('id1', 1, 2), ('id1', 2, 1), ('id2', 1, 1) ) select item, date, max(date) max_date, max(date2) max_date2 from dataset where true -- x=y GROUP BY GROUPING SETS ( (item), (item, date)) order by item, date
Output
item | date | max_date | max_date2 |
---|---|---|---|
id1 | 1 | 1 | 2 |
id1 | 2 | 2 | 1 |
id1 | 2 | 2 | |
id2 | 1 | 1 | 1 |
id2 | 1 | 1 |
date
will be equal to max_date
for rows which are grouped both by item
and date