I am trying to get this out out, but i am experiencing that the substr i am using is incorrect ,
For an example , all my columns are displaying hdfs://asdasda/asdas/fdsfdsfd/received_files/asdasd_20191231_11122333_123456789_CO.dat
some of which has more character so in order for me to get the exact date in the column is inconsistent if i am using subsring
some will return 20191230 but some will return _2020123
How do we tackle this problem ?
i am trying to display only data , this is using sql language or hue , when i input my script in ,
select SUBSTR(input_file_name, LENGTH(input_file_name) - 44, 9) from th_ingestion_status limit 100
i feel my script for Like and substr statement is incorrect
Advertisement
Answer
I you want the first sequence of 8 digits surrounded by underscores, use regexp_extract()
:
select regexp_extract(filename, '_([0-9]{8})_', 1)
If you need this after the last /
, then:
select regexp_extract(filename, '_([0-9]{8})_[^/]*$', 1)