Skip to content
Advertisement

Split column in hive

I am new to Hive and Hadoop framework. I am trying to write a hive query to split the column delimited by a pipe ‘|’ character. Then I want to group up the 2 adjacent values and separate them into separate rows.

Example, I have a table

I am able to split the column by using split(mapper, "\|") which gives me the array

Now I tried to to use the lateral view to split the mapper array into separate rows, but it will separate all the values, where as I want to separate by group.

Expected:

Actual

How can I achieve this?

Advertisement

Answer

I would suggest you to split your pairs split(mapper, '(?<=\d)\|(?=\w)'), e.g.

results in

then explode the resulting array and split by |.

Update:

If you have digits as well and your float numbers have only one digit after decimal marker then the regex should be extended to split(mapper, '(?<=\.\d)\|(?=\w|\d)').

Update 2:

OK, the best way is to split on the second | as follows

e.g.

results in

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement