I am trying to get the number of request by hour of my CloudFront distribution using athena query.
I created cloudfront_logs table guided by this Link in my sample_db
Below is the query I made to get the number of request by hour
SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT FROM "sample_db"."cloudfront_logs" WHERE from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00') AND from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00') GROUP BY TIME ORDER BY TIME ASC;
However it returns error like this
SYNTAX_ERROR: line 2:8: '"date_trunc"('hour', "from_iso8601_timestamp"("concat"("concat"("date_format"("date", '%Y-%m-%d'), 'T'), "time")))' must be an aggregate expression or appear in GROUP BY clause
Therefore, I replaced the TIME
after GROUP BY
to date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)))
, and tried again.
SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT FROM "sample_db"."cloudfront_logs" WHERE from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00') AND from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00') GROUP BY date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) ORDER BY TIME ASC;
Finally, I got result.
I think, it should work with the first query also. Can I get any advice on the first query? Because It looks more simple.
Advertisement
Answer
This just appears that Athena SQL does not make an alias used in the SELECT
clause available in the GROUP BY
clause at the same level. Many versions of SQL do allow this, though. You may try using a CTE here to make things less verbose:
WITH cte AS ( SELECT from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) AS ts FROM "sample_db"."cloudfront_logs" ) SELECT DATE_TRUNC('hour', ts) AS TIME, COUNT(*) AS CNT FROM cte WHERE ts >= from_iso8601_timestamp('2019-05-29T00:00:00') AND ts < from_iso8601_timestamp('2019-05-30T00:00:00') GROUP BY DATE_TRUNC('hour', ts) ORDER BY DATE_TRUNC('hour', ts);