Skip to content
Advertisement

Extract multiple string elements in table column using Impala or Hive

I have a column A which elements consist of lists with multiple string elements (not arrays), such as:

   **A**
1 'A123','B234','C123'
2 'A123','B234'
3 'A124',

The amount of list elements varies across records. The list elements however are always of the same length and are separated by comma.

I need to extract these string elements to a separate table/list B and get the distinct elements such as:

  **B**
1 'A123'
2 'A124'
3 'B234'
4 'C123'
 

How can I do that using SQL? I’m using Impala or Hive I’m using Cloudera Hue for analysis.

Thanks,

KS

Advertisement

Answer

First use lateral view explode to create a column with array elements, then dictinct, e.g.

SELECT distinct elem  as col1
  FROM tableA LATERAL VIEW explode(split(array_column, ',')) aTable AS elem

See Hive’s manual for more details on lateral view

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement