I have a dataset with a column with processes and the path. I am trying to use regex with Impala to strip off the executable. The dataset looks like this:
C:\Windows\System32\svchost.exe C:\Windows\System32\conhost.exe C:\Windows\System32\net1.exe C:\Windows\System32\schtasks.exe C:\Program Files (x86)\Citrix\ICA Client\SelfServicePlugin\SelfService.exe C:\Windows\System32\backgroundTaskHost.exe C:\Windows\System32\net.exe C:\Windows\System32\conhost.exe C:\Program Files (x86)\Wireless AutoSwitch\wrlssw.exe
Desired output:
svchost.exe conhost.exe net1.exe schtasks.exe SelfService.exe backgroundTaskHost.exe net.exe conhost.exe wrlssw.exe
I have tried many queries like the two below but keep running into errors
select regexp_extract(w.destinationprocessname, '([^\]+)$') from winworkstations_realtime w where w.externalid = '4688' limit 10
Error:
AnalysisException: No matching function with signature: regexp_replace(STRING, STRING). select regexp_extract(w.destinationprocessname, '\(?:.(?!\))+$',0) from winworkstations_realtime w where w.externalid = '4688' limit 10
Error:
Could not compile regexp pattern: (?:.(?!))+$ Error: invalid perl operator: (?!
Looking for some guidance from anyone that is good with impala or regex.
Advertisement
Answer
Not being a regex expert, I am sure there is a better way but this does the work
select regexp_replace(regexp_extract("C:\Windows\\System32\\svchost.exe", ".+(\\.+)$", 1), "\\", "");