Skip to content
Advertisement

Athena/Presto : Unnest 2 arrays with left join

So I have 2 Json arrays that need unnesting, and joining based on a key within the json structure. In theory is easy, but without having a ‘left join unnest’ functionality, it all becomes messy.

I have achieved what I want, by grouping the results; but I also have concerns that it is doing 2 cross joins, effectively generating many thousands of superfluous rows (in a live environment) before filtering them back out again.

Hence, my question here, is really looking for a much more efficient strategy to do the same logic. I’m well aware that my Presto experience & knowledge is n its infancy !

Thanks for any guidance !

Workings:

Basic logic : Each item in the ‘left’ array has an $.id value. For some of the ‘left’ items, there will be a matching right item with $.a.id value

Examples :

  1. The first SQL & results below show the setup, if not the desired result.
  2. The second set, shows my current solution.

(1) Raw results of Cross Join

Output:

record_id left right left_id right_id
123 {“id”:”01″,”key1″:[“val1”]} {“a”:{“id”:”02″,”key1″:[“apples”]},”b”:{“lala”:”bananas”}} “01” “02”
123 {“id”:”01″,”key1″:[“val1”]} {“a”:{“id”:”01″,”key1″:[“one”]},”b”:{“lala”:”oneone”}} “01” “01”
123 {“id”:”02″,”key1″:[“val2”]} {“a”:{“id”:”02″,”key1″:[“apples”]},”b”:{“lala”:”bananas”}} “02” “02”
123 {“id”:”02″,”key1″:[“val2”]} {“a”:{“id”:”01″,”key1″:[“one”]},”b”:{“lala”:”oneone”}} “02” “01”
123 {“id”:”03″,”key1″:[“val3”]} {“a”:{“id”:”02″,”key1″:[“apples”]},”b”:{“lala”:”bananas”}} “03” “02”
123 {“id”:”03″,”key1″:[“val3”]} {“a”:{“id”:”01″,”key1″:[“one”]},”b”:{“lala”:”oneone”}} “03” “01”

(2) Current Solution

record_id left match
123 {“id”:”01″,”key1″:[“val1”]} {“a”:{“id”:”01″,”key1″:[“one”]},”b”:{“lala”:”oneone”}}
123 {“id”:”02″,”key1″:[“val2”]} {“a”:{“id”:”02″,”key1″:[“apples”]},”b”:{“lala”:”bananas”}}
123 {“id”:”03″,”key1″:[“val3”]}

Advertisement

Answer

Unnest both arrays in CTEs and left join CTEs, in this case you will eliminate cross join, but the code is a bit longer:

Result:

record_id left right left_id right_id
123 {“id”:”01″,”key1″:[“val1”]} {“a”:{“id”:”01″,”key1″:[“one”]},”b”:{“lala”:”oneone”}} “01” “01”
123 {“id”:”02″,”key1″:[“val2”]} {“a”:{“id”:”02″,”key1″:[“apples”]},”b”:{“lala”:”bananas”}} “02” “02”
123 {“id”:”03″,”key1″:[“val3”]} N “03” N
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement