I have a column A which elements consist of lists with multiple string elements (not arrays), such as:
**A** 1 'A123','B234','C123' 2 'A123','B234' 3 'A124',
The amount of list elements varies across records. The list elements however are always of the same length and are separated by comma.
I need to extract these string elements to a separate table/list B and get the distinct elements such as:
**B** 1 'A123' 2 'A124' 3 'B234' 4 'C123'
How can I do that using SQL? I’m using Impala or Hive I’m using Cloudera Hue for analysis.
Thanks,
KS
Advertisement
Answer
First use lateral view explode
to create a column with array elements, then dictinct, e.g.
SELECT distinct elem as col1 FROM tableA LATERAL VIEW explode(split(array_column, ',')) aTable AS elem
See Hive’s manual for more details on lateral view