Skip to content
Advertisement

Join two tables on id fields using Impala

I have two tables in in HDFS that I want to join using Impala. One is Employee_Logs the other is HR_Data.

Queries:

select e.employee_id, e.action from Employee_Logs e where e.employment_status_desc = 'Active'
select h.employee_id, h.name from HR_Data h

Employee_Logs:

employee_id  action
2325255b     login     
51666164     login
51666164v    login
r1211        logoff
r18552421    login

HR_Data:

employee_id  name
2325255      Rob    
51666164     Tom
r1211        Tammy
r18552421    Ron

I want to join them so that the data looks like this:

employee_id  action  name
2325255b     login   Rob  
51666164     login   Tom
51666164v    login   Tom
r1211        logoff  Tammy
r18552421    login   Ron

I could do an easy join if the employee_id field matched up on both tables, but the same user can have a “b” or a “v” after their employee id to specify if the account is elevated like an admin account. Some user accounts have an “r” in front of the id but that is the case in both tables.

Is there a way where I can do some where actions and create a new field in the Employee_Logs table like strip the “v” and “b” off of the end of the employee id and then join or is there a better way?

Advertisement

Answer

Probably the safest method is multiple left joins:

select el.*,
       coalesce(h.name, hv.name, hb.name) as name
from employee_logs el left join
     hr_data h
     on el.employee_id = h.employee_id left join
     hr_data hv
     on el.employee_id = concat(h.employee_id, 'v') left join
     hr_data hb
     on el.employee_id = concat(h.employee_id, 'b');
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement