Skip to content
Advertisement

HIVE converting unix timestamp for calculation

I’m trying to perform subtraction among timestamps and would like to convert the timestamps in forms that can be converted to minutes.

I used regexp_replace to convert timestamp in such form:

2020-06-20T17:25:59:378Z

The following code will convert it to seconds

unix_timestamp(regexp_replace(value,'(.*?)T(.*?):([^:]*?)Z$','$1 $2\.$3'))  

I have other two timestamps that I wish to convert to seconds, such as:

2020-03-19 15:45:33
03-19-2020 11:07:25:103

How should I convert these two timestamp to seconds by using regexp_replace() or any other functions?

Thank you!

Advertisement

Answer

First of all, unix_timestamp returns seconds passed from unix epoch. It does ignore milliseconds. This is why if you want epoch time in seconds, you can extract only 'yyyy-MM-dd HH:mm:ss'.

Second, If you have all these different formats in single dataset and you want to convert them all, you can use case statement to check pattern and convert accordingly:

with your_data as ( --This is your data example
select stack(3,
             '2020-06-20T17:25:59:378Z',
             '2020-03-19 15:45:33',
             '03-19-2020 11:07:25:103'
            ) as str
)

select case when str rlike '^(\d{4}-\d{2}-\d{2})[T ](\d{2}:\d{2}:\d{2})' --matches first two strings
             then unix_timestamp(regexp_replace(str,'^(\d{4}-\d{2}-\d{2})[T ](\d{2}:\d{2}:\d{2})','$1 $2'))
            when str rlike '^(\d{2})-(\d{2})-(\d{4})[T ](\d{2}:\d{2}:\d{2})' --matches third string, allows T or space after date
             then unix_timestamp(regexp_replace(str,'^(\d{2})-(\d{2})-(\d{4})[T ](\d{2}:\d{2}:\d{2})','$3-$1-$2 $4'))
        end result_unix_timestamp
from your_data

Returns:

result_unix_timestamp
1592673959
1584632733
1584616045

You can add more patterns to the case with corresponding conversion and in such way convert all possible cases. And of course, not necessarily all cases should use regex_replace for conversion. Though regex allows to identify and parse most complex strings.

Also you can try to convert using one pattern and if it returns null then try to convert using another pattern and so on:

coalesce(unix_timestamp(regexp_replace(str,'^(\d{4}-\d{2}-\d{2})[T ](\d{2}:\d{2}:\d{2})','$1 $2')),
         unix_timestamp(regexp_replace(str,'^(\d{2})-(\d{2})-(\d{4})[T ](\d{2}:\d{2}:\d{2})','$3-$1-$2 $4'))
        )
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement