Skip to content
Advertisement

PostgreSQL equivalent of Pandas outer merge

I am trying to do in Postgres the equivalent of Pandas outer merge, in order to outer merge two tables.

Table df_1 contains these data:

enter image description here

Table df_2 contains these data:

enter image description here

So Table df_1 has one extra column (random_id) than df_2. Also, job_id 1711418 and worker_id 45430 exist in both df_1 and df_2.

If I use the “outer merge” method in Pandas: df_1.merge(df_2, on=['job_id', 'worker_id'], how='outer'), I would get the ideal result as shown below:

enter image description here

However, I wasn’t able to find a SQL (postgres) equivalent of that Pandas outer merge method.

I’ve tried running the following query:

select *
from df_1
full outer join df_2
   on df_1.job_id = df_2.job_id 
  and df_1.worker_id = df_2.worker_id

However, the result generated 2 extra duplicate columns called job_id_duplicate_column_name_1 and worker_id_duplicate_column_name_1 which I don’t want: enter image description here

Could anyone help suggest a query method that can achieve the same as outer merge method in Pandas?

Advertisement

Answer

This looks like a UNION to me, not a JOIN:

select job_id, worker_id, random_id
from df_1
union all
select job_id, worker_id, 'NaN'
from df_2 d2
where not exists (select *
                  from df_1 d1
                  where d1.job_id = d2.job_id
                    and d1.worker_id = d2.worker_id)

The second part of the union returns all rows from df_2 that don’t exist in df_1

An alternative would be to use EXCEPT to get the rows from df_2 that don’t exist in df_1

select job_id, worker_id, random_id
from df_1
union all
select job_id, worker_id, 'NaN'
from 
(
  select job_id, worker_id
  from df_2 d2
  except
  select job_id, worker_id
  from df_1
) t

Online example

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement