I am trying to delete a part of a string in HIVE. I want to delete the last eleven characters for all records in a column. The data looks like:
1018492743|0001-01-01
I want it to look like:
1018492743
The code I have tried looks like:
Select right(a.ord_id, len(a.ord_id)-ll)
It isn’t working because len isnt a function in HIVE
Another issue I have is that some of the records are already in the correct format. Does this mean I need to create a case statement that checks for this?
Advertisement
Answer
You can extract digits before |
character using regexp:
hive> select regexp_extract('1018492743|0001-01-01','([0-9]*)\|',1); OK 1018492743
Or use substr to get first 10 characters:
hive> select substr('1018492743|0001-01-01',1,10); OK 1018492743
Or exactly like you described using length and substr to get substring without last 11 characters:
hive> select substr('1018492743|0001-01-01',1,length('1018492743|0001-01-01')-11); OK 1018492743
One more solution using split()
:
hive> select split('1018492743|0001-01-01','\|')[0]; OK 1018492743
See docs here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF