Skip to content
Advertisement

SQL: Expression Not in GROUP BY Key

I have a transaction table t1 in Hive that looks like this:

store_id   cust_id    zip_code   transaction_count   spend  
    1000       100      123      3                    50
    2000       200      456      2                    20

I’m trying to compute some metrics using this table as follows:

SELECT
    store_id,
    zip_code,
    SUM(transaction_count) OVER (PARTITION BY store_id, zip_code) / COUNT(*) OVER(PARTITION BY store_id, zip_code) AS avg_transactions,
    SUM(spend) OVER(PARTITION BY store_id, zip_code)/SUM(transaction_count) OVER(PARTITION BY store_id, zip_code) AS avg_purchase
FROM 
    t1
GROUP BY 
    1, 2

Hive is throwing an error

SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Expression not in GROUP BY key ‘spend’

But the column spend is being used with an aggregation function (sum), so I cannot group by this column. How can I fix this error?

Advertisement

Answer

Window functions do not need group by. You either want:

SELECT
    store_id,
    zip_code,
    AVG(transaction_count) OVER(PARTITION BY store_id, zip_code) AS avg_transactions,
    SUM(spend) OVER(PARTITION BY store_id, zip_code)
        / SUM(transaction_count) OVER(PARTITION BY store_id, zip_code) AS avg_purchase
FROM t1

Note that the first expression can be shortened with AVG().

This brings as many rows as in the original table. If, on the other hand, you want just one row per store and zip code, then use regular aggregate functions with GROUP BY:

SELECT
    store_id,
    zip_code,
    AVG(transaction_count) AS avg_transactions,
    SUM(spend) / SUM(transaction_count) AS avg_purchase
FROM t1
GROUP BY store_id, zip_code
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement