Skip to content
Advertisement

Extract multiple string elements in table column using Impala or Hive

I have a column A which elements consist of lists with multiple string elements (not arrays), such as:

The amount of list elements varies across records. The list elements however are always of the same length and are separated by comma.

I need to extract these string elements to a separate table/list B and get the distinct elements such as:

How can I do that using SQL? I’m using Impala or Hive I’m using Cloudera Hue for analysis.

Thanks,

KS

Advertisement

Answer

First use lateral view explode to create a column with array elements, then dictinct, e.g.

See Hive’s manual for more details on lateral view

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement