Skip to content
Advertisement

How to replace exist in Hive with two correlated subqueries

I have a query that looks like this

I researched and read that in Hive IN or EXIST are not supported statements. I read that a workaround for this would be to use a LEFT JOIN. I have tried this but I am having trouble with the GROUP BY u.id. I read that this needs to be paired always with an aggregation function like COUNT() however I’m not sure how I can rewrite this query so as to get it to work. All the other examples I have seen online do not seem to be as complicated as this one.

Advertisement

Answer

Like you said, you can convert them to left join or may be left join since they uses exists in both subquery. Simply convert your subqueries to inline view and join them with original tables.

With left join, there may be chance of duplicates, you can use group by in subqry2 to avoid it.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement