Skip to content
Advertisement

Deleting Part of a string in HIVE

I am trying to delete a part of a string in HIVE. I want to delete the last eleven characters for all records in a column. The data looks like:

1018492743|0001-01-01

I want it to look like:

1018492743

The code I have tried looks like:

Select right(a.ord_id, len(a.ord_id)-ll)

It isn’t working because len isnt a function in HIVE

Another issue I have is that some of the records are already in the correct format. Does this mean I need to create a case statement that checks for this?

Advertisement

Answer

You can extract digits before | character using regexp:

hive> select regexp_extract('1018492743|0001-01-01','([0-9]*)\|',1);
OK
1018492743

Or use substr to get first 10 characters:

hive> select substr('1018492743|0001-01-01',1,10);
OK
1018492743

Or exactly like you described using length and substr to get substring without last 11 characters:

hive> select substr('1018492743|0001-01-01',1,length('1018492743|0001-01-01')-11);
OK
1018492743

One more solution using split():

hive> select split('1018492743|0001-01-01','\|')[0];
OK
1018492743

See docs here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement