I am trying to fix an array in a dataset. Currently, I have a data set that has a reference number to multiple different uuids. What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each uuid. For example
Reference UUID 1) 9f823c2a-ced5-4dbe-be65-869311462f75 "[ ""05554f65-6aa9-4dd1-6271-8ce2d60f10c4"", ""df662812-7f97-0b43-9d3e-12f64f504fbb"", ""08644a69-76ed-ce2d-afff-b236a22efa69"", ""f1162c2e-eeb5-83f6-5307-2ed644e6b9eb"", ]"
Should end up looking like:
Reference UUID 1) 9f823c2a-ced5-4dbe-be65-869311462f75 05554f65-6aa9-4dd1-6271-8ce2d60f10c4 2) 9f823c2a-ced5-4dbe-be65-869311462f75 df662812-7f97-0b43-9d3e-12f64f504fbb 3) 9f823c2a-ced5-4dbe-be65-869311462f75 08644a69-76ed-ce2d-afff-b236a22efa69 4) 9f823c2a-ced5-4dbe-be65-869311462f75 f1162c2e-eeb5-83f6-5307-2ed644e6b9eb
I just started working in Snowflake so I am new to it. It looks like there is a lateral flatten, but this is either not working on telling me that I have all sorts of errors with it. The documentation from snowflake is a bit perplexing when it comes to this.
Advertisement
Answer
While FLATTEN
is the right approach when exploding an array, the UUID
column value shown in the original description is invalid if interpreted as JSON syntax: "[""val1"", ""val2""]"
and that’ll need correction before a LATERAL FLATTEN
approach can be applied by treating it as a VARIANT
type.
If your data sample in the original description is a literal one and applies for all columnar values, then the following query will help transform it into a valid JSON syntax and then apply a lateral flatten to yield the desired result:
SELECT T.REFERENCE, X.VALUE AS UUID FROM ( SELECT REFERENCE, -- Attempts to transform an invalid JSON array syntax such as "[""a"", ""b""]" -- to valid JSON: ["a", "b"] by stripping away unnecessary quotes PARSE_JSON(REPLACE(REPLACE(REPLACE(UUID, '""', '"'), '["', '['), ']"', ']')) AS UUID_ARR_CLEANED FROM TABLENAME) T, LATERAL FLATTEN(T.UUID_ARR_CLEANED) X
If your data is already in a valid VARIANT
type with a successful PARSE_JSON
done for the UUID
column during ingest, and the example provided in the description was just a formatting issue that only displays the JSON invalid in the post, then the simpler version of the same query as above will suffice:
SELECT REFERENCE, X.VALUE AS UUID FROM TABLENAME, LATERAL FLATTEN(TABLENAME.UUID) X