Flatten data source in Snowflake from Array

Question

I am trying to fix an array in a dataset. Currently, I have a data set that has a reference number to multiple different uuids. What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each uuid. For example Should end up looking like: I just started

Accepted Answer

While FLATTEN is the right approach when exploding an array, the UUID column value shown in the original description is invalid if interpreted as JSON syntax: "[""val1"", ""val2""]" and that&#8217;ll need correction before a LATERAL FLATTEN approach can be applied by treating it as a VARIANT type.If your data sample in the original description is a literal one and applies for all columnar values, then the following query will help transform it into a valid JSON syntax and then apply a lateral flatten to yield the desired result:SELECT  T.REFERENCE,  X.VALUE AS UUIDFROM (  SELECT    REFERENCE,    -- Attempts to transform an invalid JSON array syntax such as "[""a"", ""b""]"    -- to valid JSON: ["a", "b"] by stripping away unnecessary quotes    PARSE_JSON(REPLACE(REPLACE(REPLACE(UUID, '""', '"'), '["', '['), ']"', ']')) AS UUID_ARR_CLEANED    FROM TABLENAME) T,  LATERAL FLATTEN(T.UUID_ARR_CLEANED) XIf your data is already in a valid VARIANT type with a successful PARSE_JSON done for the UUID column during ingest, and the example provided in the description was just a formatting issue that only displays the JSON invalid in the post, then the simpler version of the same query as above will suffice:SELECT REFERENCE, X.VALUE AS UUIDFROM TABLENAME, LATERAL FLATTEN(TABLENAME.UUID) X

Advertisement

Answer