Skip to content
Advertisement

Invalid group by expression error when using any_value with max and window function in Snowflake

I was given a query and I am attempting to modify it in order to get the most recent version of each COMP_ID. The original query:

SELECT 
    ANY_VALUE(DATA_INDEX)::string AS DATA_INDEX, 
    COMP_ID::string AS COMP_ID, 
    ANY_VALUE(ACCOUNT_ID)::string AS ACCOUNT_ID, 
    ANY_VALUE(COMP_VERSION)::string AS COMP_VERSION, 
    ANY_VALUE(NAME)::string AS NAME, 
    ANY_VALUE(DESCRIPTION)::string AS DESCRIPTION,
    MAX(OBJECT_DICT:"startshape-type")[0]::string AS STARTSHAPE_TYPE,
    MAX(OBJECT_DICT:"startshape-connector-type")[0]::string AS STARTSHAPE_CONNECTOR_TYPE ,
    MAX(OBJECT_DICT:"startshape-action-type")[0]::string AS STATSHAPE_ACTION_TYPE,
    MAX(OBJECT_DICT:"overrides-enabled")[0]::string AS OVERRIDES_ENABLED
FROM COMP_DATA
GROUP BY COMP_ID
ORDER BY COMP_ID;

I then attempted to use a window function to grab only the highest version for each comp_id. This is the modified query:

SELECT 
    ANY_VALUE(DATA_INDEX)::string AS DATA_INDEX, 
    COMP_ID::string AS COMP_ID, 
    ANY_VALUE(ACCOUNT_ID)::string AS ACCOUNT_ID, 
    ANY_VALUE(COMP_VERSION)::string AS COMP_VERSION, 
    ANY_VALUE(NAME)::string AS NAME, 
    ANY_VALUE(DESCRIPTION)::string AS DESCRIPTION,
    MAX(OBJECT_DICT:"startshape-type")[0]::string AS STARTSHAPE_TYPE,
    MAX(OBJECT_DICT:"startshape-connector-type")[0]::string AS STARTSHAPE_CONNECTOR_TYPE ,
    MAX(OBJECT_DICT:"startshape-action-type")[0]::string AS STATSHAPE_ACTION_TYPE,
    MAX(OBJECT_DICT:"overrides-enabled")[0]::string AS OVERRIDES_ENABLED,
    ROW_NUMBER() OVER (PARTITION BY COMP_ID ORDER BY COMP_VERSION DESC) AS ROW_NUM
FROM COMP_DATA
QUALIFY 1 = ROW_NUM;

When attempting to compile the below error is given:

SQL compilation error: [COMP_DATA.COMP_ID] is not a valid group by expression

I had originally thought the issue was the ANY_VALUE on COMP_VERSION, but after removing the ANY_VALUE the same error was given. The only way I found to not get an error was removing the 4 MAX fields and all of the ANY_VALUE()’s, as shown below:

SELECT 
    DATA_INDEX::string AS DATA_INDEX, 
    COMP_ID::string AS COMP_ID, 
    ACCOUNT_ID::string AS ACCOUNT_ID, 
    COMP_VERSION::string AS COMP_VERSION, 
    NAME::string AS NAME, 
    DESCRIPTION::string AS DESCRIPTION,
    ROW_NUMBER() OVER (PARTITION BY COMP_ID ORDER BY COMP_VERSION DESC) AS ROW_NUM
FROM COMP_DATA
QUALIFY 1 = ROW_NUM;

Of course this is not at all sufficient since I need the 4 max fields.

I have also tried creating the table with the max fields and from that new table using the window function to select the highest COMP_VERSION of each COMP_ID, but the same error was given.

Advertisement

Answer

When you added your QUALIFY clause you dropped the GROUP BY clause from your SQL, aggregate function like MAX, need all selections to be aggregate function OR to have a GROUP BY clause.

So if you only want the best row per the grouping clause, which you note, you aggregate functions need to be explicitly windowed. Thus

SELECT 
    data_index::string AS data_index, 
    comp_id::string AS comp_id, 
    account_id::string AS account_id, 
    comp_version::string AS comp_version, 
    name::string AS name, 
    description::string AS description,
    MAX(object_dict:"startshape-type")OVER(PARTITION BY comp_id)[0]::string  AS startshape_type,
    MAX(object_dict:"startshape-connector-type")OVER (PARTITION BY comp_id)[0]::string AS startshape_connector_type ,
    MAX(object_dict:"startshape-action-type")OVER (PARTITION BY comp_id)[0]::string AS statshape_action_type,
    MAX(object_dict:"overrides-enabled")OVER(PARTITION BY comp_id)[0]::string AS overrides_enabled,
FROM COMP_DATA
QUALIFY 1 = ROW_NUMBER() OVER (PARTITION BY comp_id ORDER BY comp_version DESC);

There is a small chance you will need to add a set of brackets around those MAX’s like

(MAX(object_dict:"overrides-enabled")OVER(PARTITION BY comp_id))[0]::string AS overrides_enabled,

But I suspect it will work out of the box. And I assumed you don’t want the row_number so pushed it into the qualify (because it will always be the value 1)

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement