I have two tables in in HDFS that I want to join using Impala. One is Employee_Logs the other is HR_Data.
Queries:
select e.employee_id, e.action from Employee_Logs e where e.employment_status_desc = 'Active' select h.employee_id, h.name from HR_Data h
Employee_Logs:
employee_id action 2325255b login 51666164 login 51666164v login r1211 logoff r18552421 login
HR_Data:
employee_id name 2325255 Rob 51666164 Tom r1211 Tammy r18552421 Ron
I want to join them so that the data looks like this:
employee_id action name 2325255b login Rob 51666164 login Tom 51666164v login Tom r1211 logoff Tammy r18552421 login Ron
I could do an easy join if the employee_id field matched up on both tables, but the same user can have a “b” or a “v” after their employee id to specify if the account is elevated like an admin account. Some user accounts have an “r” in front of the id but that is the case in both tables.
Is there a way where I can do some where actions and create a new field in the Employee_Logs table like strip the “v” and “b” off of the end of the employee id and then join or is there a better way?
Advertisement
Answer
Probably the safest method is multiple left
joins:
select el.*, coalesce(h.name, hv.name, hb.name) as name from employee_logs el left join hr_data h on el.employee_id = h.employee_id left join hr_data hv on el.employee_id = concat(h.employee_id, 'v') left join hr_data hb on el.employee_id = concat(h.employee_id, 'b');