Skip to content
Advertisement

How to fix ‘must be an aggregate expression or appear in GROUP BY clause’ with ‘as’ clause

I am trying to get the number of request by hour of my CloudFront distribution using athena query.

I created cloudfront_logs table guided by this Link in my sample_db

Below is the query I made to get the number of request by hour

SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs" 
WHERE 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00')
AND 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY TIME
ORDER BY TIME ASC;

However it returns error like this

SYNTAX_ERROR: line 2:8: '"date_trunc"('hour', "from_iso8601_timestamp"("concat"("concat"("date_format"("date", '%Y-%m-%d'), 'T'), "time")))' must be an aggregate expression or appear in GROUP BY clause

Therefore, I replaced the TIME after GROUP BY to date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))), and tried again.

SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs" 
WHERE 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00')
AND 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)))
ORDER BY TIME ASC;

Finally, I got result.

I think, it should work with the first query also. Can I get any advice on the first query? Because It looks more simple.

Advertisement

Answer

This just appears that Athena SQL does not make an alias used in the SELECT clause available in the GROUP BY clause at the same level. Many versions of SQL do allow this, though. You may try using a CTE here to make things less verbose:

WITH cte AS (
    SELECT from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) AS ts
    FROM "sample_db"."cloudfront_logs"
)

SELECT
    DATE_TRUNC('hour', ts) AS TIME,
    COUNT(*) AS CNT
FROM cte
WHERE
    ts >= from_iso8601_timestamp('2019-05-29T00:00:00') AND
    ts <  from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY
    DATE_TRUNC('hour', ts)
ORDER BY
    DATE_TRUNC('hour', ts);
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement