I was given a query and I am attempting to modify it in order to get the most recent version of each COMP_ID. The original query:
SELECT ANY_VALUE(DATA_INDEX)::string AS DATA_INDEX, COMP_ID::string AS COMP_ID, ANY_VALUE(ACCOUNT_ID)::string AS ACCOUNT_ID, ANY_VALUE(COMP_VERSION)::string AS COMP_VERSION, ANY_VALUE(NAME)::string AS NAME, ANY_VALUE(DESCRIPTION)::string AS DESCRIPTION, MAX(OBJECT_DICT:"startshape-type")[0]::string AS STARTSHAPE_TYPE, MAX(OBJECT_DICT:"startshape-connector-type")[0]::string AS STARTSHAPE_CONNECTOR_TYPE , MAX(OBJECT_DICT:"startshape-action-type")[0]::string AS STATSHAPE_ACTION_TYPE, MAX(OBJECT_DICT:"overrides-enabled")[0]::string AS OVERRIDES_ENABLED FROM COMP_DATA GROUP BY COMP_ID ORDER BY COMP_ID;
I then attempted to use a window function to grab only the highest version for each comp_id. This is the modified query:
SELECT ANY_VALUE(DATA_INDEX)::string AS DATA_INDEX, COMP_ID::string AS COMP_ID, ANY_VALUE(ACCOUNT_ID)::string AS ACCOUNT_ID, ANY_VALUE(COMP_VERSION)::string AS COMP_VERSION, ANY_VALUE(NAME)::string AS NAME, ANY_VALUE(DESCRIPTION)::string AS DESCRIPTION, MAX(OBJECT_DICT:"startshape-type")[0]::string AS STARTSHAPE_TYPE, MAX(OBJECT_DICT:"startshape-connector-type")[0]::string AS STARTSHAPE_CONNECTOR_TYPE , MAX(OBJECT_DICT:"startshape-action-type")[0]::string AS STATSHAPE_ACTION_TYPE, MAX(OBJECT_DICT:"overrides-enabled")[0]::string AS OVERRIDES_ENABLED, ROW_NUMBER() OVER (PARTITION BY COMP_ID ORDER BY COMP_VERSION DESC) AS ROW_NUM FROM COMP_DATA QUALIFY 1 = ROW_NUM;
When attempting to compile the below error is given:
SQL compilation error: [COMP_DATA.COMP_ID] is not a valid group by expression
I had originally thought the issue was the ANY_VALUE on COMP_VERSION, but after removing the ANY_VALUE the same error was given. The only way I found to not get an error was removing the 4 MAX fields and all of the ANY_VALUE()’s, as shown below:
SELECT DATA_INDEX::string AS DATA_INDEX, COMP_ID::string AS COMP_ID, ACCOUNT_ID::string AS ACCOUNT_ID, COMP_VERSION::string AS COMP_VERSION, NAME::string AS NAME, DESCRIPTION::string AS DESCRIPTION, ROW_NUMBER() OVER (PARTITION BY COMP_ID ORDER BY COMP_VERSION DESC) AS ROW_NUM FROM COMP_DATA QUALIFY 1 = ROW_NUM;
Of course this is not at all sufficient since I need the 4 max fields.
I have also tried creating the table with the max fields and from that new table using the window function to select the highest COMP_VERSION of each COMP_ID, but the same error was given.
Advertisement
Answer
When you added your QUALIFY clause you dropped the GROUP BY clause from your SQL, aggregate function like MAX, need all selections to be aggregate function OR to have a GROUP BY clause.
So if you only want the best row per the grouping clause, which you note, you aggregate functions need to be explicitly windowed. Thus
SELECT data_index::string AS data_index, comp_id::string AS comp_id, account_id::string AS account_id, comp_version::string AS comp_version, name::string AS name, description::string AS description, MAX(object_dict:"startshape-type")OVER(PARTITION BY comp_id)[0]::string AS startshape_type, MAX(object_dict:"startshape-connector-type")OVER (PARTITION BY comp_id)[0]::string AS startshape_connector_type , MAX(object_dict:"startshape-action-type")OVER (PARTITION BY comp_id)[0]::string AS statshape_action_type, MAX(object_dict:"overrides-enabled")OVER(PARTITION BY comp_id)[0]::string AS overrides_enabled, FROM COMP_DATA QUALIFY 1 = ROW_NUMBER() OVER (PARTITION BY comp_id ORDER BY comp_version DESC);
There is a small chance you will need to add a set of brackets around those MAX’s like
(MAX(object_dict:"overrides-enabled")OVER(PARTITION BY comp_id))[0]::string AS overrides_enabled,
But I suspect it will work out of the box. And I assumed you don’t want the row_number so pushed it into the qualify (because it will always be the value 1)