Skip to content
Advertisement

Array operation on hive collect_set

I am working on hive on large dataset, I have table with colum array and the content of the colum is as follows.

I need a set as per the asc date of prod e.g. I need to trim date from the array and apply collect_set to get below result.

Advertisement

Answer

Explode array, remove date (digits at the beginning of the string), aggregate using collect_set:

Result:

One more possible method is to concatenate array first, remove dates from the string, split to get an array. Unfortunately we still need to explode to do collect_set to remove duplicates (example using the same WITH mydata CTE):

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement