I’ve got a column with data in the format:
array(row(action varchar,actor varchar,special_notes varchar,timestamp bigint))
where the array is guaranteed to have 1 or more elements. The arrays are not guaranteed to be the same length.
Let’s call it “my_array_row_column”. Here’s what one row of this column looks like, as an example:
[{action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958}, {action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492}, {action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}]
I’ve tried using CROSS JOIN UNNEST(my_array_row_column)
but it ends up only returning the first row()
in the array. Here’s the most recent query I tried:
SELECT unnested FROM athena.movies CROSS JOIN UNNEST(my_array_row_column) AS t(unnested)
which, much to my dismay, only returns
unnested --------------------------------------------------------- {action=cast_role, actor=MorganFreeman, special_notes=null, timestamp=1611616961958}
when I want it to return all three rows (Morgan.Freeman, employee, newhire) as separate rows in the result, like this:
unnested --------------------------------------------------------- {action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958} --------------------------------------------------------- {action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492} --------------------------------------------------------- {action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}
Any ideas of how I might accomplish this?
Advertisement
Answer
According to the documentation here, this is the correct pattern:
WITH dataset AS ( SELECT ARRAY[ CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)), CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)), CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER)) ] AS users ) SELECT * FROM dataset
which returns
+-----------------------------------------------------------------+ | users | +-----------------------------------------------------------------+ | [{NAME=Bob, AGE=38}, {NAME=Alice, AGE=35}, {NAME=Jane, AGE=27}] | +-----------------------------------------------------------------+
To UNNEST this would be:
WITH dataset AS ( SELECT ARRAY[ CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)), CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)), CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER)) ] AS users ) SELECT unnested FROM dataset, UNNEST(users) t(unnested)
Which should return
+---------------------+ | unnested | +---------------------+ | {NAME=Bob, AGE=38} | | {NAME=Alice, AGE=35}| | {NAME=Jane, AGE=27} | +---------------------+
In your case
WITH dataset AS ( SELECT my_array_row_column AS movieDetail from athena.movies ) SELECT unnested FROM dataset, UNNEST(movieDetail) AS t(unnested)
Or something like that. I think your cross join is unnecessary because you are not introducing any of the fields on the same grain as the key of the athena.movies table so there is nothing to multiply with.