Skip to content
Advertisement

Is there a way to unnest or return all the elements in an ARRAY(ROW()) in Athena?

I’ve got a column with data in the format:

array(row(action varchar,actor varchar,special_notes varchar,timestamp bigint))

where the array is guaranteed to have 1 or more elements. The arrays are not guaranteed to be the same length.

Let’s call it “my_array_row_column”. Here’s what one row of this column looks like, as an example:

[{action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958},
{action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492},
{action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}]

I’ve tried using CROSS JOIN UNNEST(my_array_row_column) but it ends up only returning the first row() in the array. Here’s the most recent query I tried:

SELECT unnested
FROM athena.movies
CROSS JOIN UNNEST(my_array_row_column) AS t(unnested)

which, much to my dismay, only returns

unnested
---------------------------------------------------------
{action=cast_role, actor=MorganFreeman, special_notes=null, timestamp=1611616961958}

when I want it to return all three rows (Morgan.Freeman, employee, newhire) as separate rows in the result, like this:

unnested
---------------------------------------------------------
{action=cast_role, actor=Morgan.Freeman, special_notes=null, timestamp=1611616961958}
---------------------------------------------------------
{action=note_create, actor=employee@example.com, special_notes=null, timestamp=1611617308492}
---------------------------------------------------------
{action=dismissed, actor=newhire@example.com, special_notes=NA, timestamp=1611617308512}

Any ideas of how I might accomplish this?

Advertisement

Answer

According to the documentation here, this is the correct pattern:

WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users
)
SELECT * FROM dataset

which returns

+-----------------------------------------------------------------+
| users                                                           |
+-----------------------------------------------------------------+
| [{NAME=Bob, AGE=38}, {NAME=Alice, AGE=35}, {NAME=Jane, AGE=27}] |
+-----------------------------------------------------------------+

To UNNEST this would be:

WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users
)
SELECT unnested
FROM dataset, UNNEST(users) t(unnested)

Which should return

+---------------------+
| unnested            |
+---------------------+
| {NAME=Bob, AGE=38}  |
| {NAME=Alice, AGE=35}| 
| {NAME=Jane, AGE=27} |
+---------------------+

In your case

WITH dataset AS (
  SELECT my_array_row_column AS movieDetail
  from athena.movies 
)

SELECT unnested
FROM dataset, UNNEST(movieDetail) AS t(unnested)

Or something like that. I think your cross join is unnecessary because you are not introducing any of the fields on the same grain as the key of the athena.movies table so there is nothing to multiply with.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement