I have a dataset with a column with processes and the path. I am trying to use regex with Impala to strip off the executable. The dataset looks like this:
x
C:\Windows\System32\svchost.exe
C:\Windows\System32\conhost.exe
C:\Windows\System32\net1.exe
C:\Windows\System32\schtasks.exe
C:\Program Files (x86)\Citrix\ICA Client\SelfServicePlugin\SelfService.exe
C:\Windows\System32\backgroundTaskHost.exe
C:\Windows\System32\net.exe
C:\Windows\System32\conhost.exe
C:\Program Files (x86)\Wireless AutoSwitch\wrlssw.exe
Desired output:
svchost.exe
conhost.exe
net1.exe
schtasks.exe
SelfService.exe
backgroundTaskHost.exe
net.exe
conhost.exe
wrlssw.exe
I have tried many queries like the two below but keep running into errors
select regexp_extract(w.destinationprocessname, '([^\]+)$')
from winworkstations_realtime w
where w.externalid = '4688'
limit 10
Error:
AnalysisException: No matching function with signature: regexp_replace(STRING, STRING).
select regexp_extract(w.destinationprocessname, '\(?:.(?!\))+$',0)
from winworkstations_realtime w
where w.externalid = '4688'
limit 10
Error:
Could not compile regexp pattern: (?:.(?!))+$ Error: invalid perl operator: (?!
Looking for some guidance from anyone that is good with impala or regex.
Advertisement
Answer
Not being a regex expert, I am sure there is a better way but this does the work
select regexp_replace(regexp_extract("C:\Windows\\System32\\svchost.exe", ".+(\\.+)$", 1), "\\", "");