Skip to content
Advertisement

Selecting string after the last \ using regex with Impala SQL

I have a dataset with a column with processes and the path. I am trying to use regex with Impala to strip off the executable. The dataset looks like this:

C:\Windows\System32\svchost.exe
C:\Windows\System32\conhost.exe
C:\Windows\System32\net1.exe
C:\Windows\System32\schtasks.exe
C:\Program Files (x86)\Citrix\ICA Client\SelfServicePlugin\SelfService.exe
C:\Windows\System32\backgroundTaskHost.exe
C:\Windows\System32\net.exe
C:\Windows\System32\conhost.exe
C:\Program Files (x86)\Wireless AutoSwitch\wrlssw.exe

Desired output:

svchost.exe
conhost.exe
net1.exe
schtasks.exe
SelfService.exe
backgroundTaskHost.exe
net.exe
conhost.exe
wrlssw.exe

I have tried many queries like the two below but keep running into errors

select regexp_extract(w.destinationprocessname, '([^\]+)$')
from winworkstations_realtime w
where w.externalid = '4688'
limit 10

Error:

AnalysisException: No matching function with signature: regexp_replace(STRING, STRING).

select regexp_extract(w.destinationprocessname, '\(?:.(?!\))+$',0)
from winworkstations_realtime w
where w.externalid = '4688'
limit 10

Error:

Could not compile regexp pattern: (?:.(?!))+$ Error: invalid perl operator: (?!

Looking for some guidance from anyone that is good with impala or regex.

Advertisement

Answer

Not being a regex expert, I am sure there is a better way but this does the work

select regexp_replace(regexp_extract("C:\Windows\\System32\\svchost.exe", ".+(\\.+)$", 1), "\\", "");
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement