I am trying to do in Postgres the equivalent of Pandas outer merge, in order to outer merge two tables.
Table df_1
contains these data:
Table df_2
contains these data:
So Table df_1
has one extra column (random_id
) than df_2
. Also, job_id 1711418 and worker_id 45430 exist in both df_1
and df_2
.
If I use the “outer merge” method in Pandas:
df_1.merge(df_2, on=['job_id', 'worker_id'], how='outer')
, I would get the ideal result as shown below:
However, I wasn’t able to find a SQL (postgres) equivalent of that Pandas outer merge method.
I’ve tried running the following query:
select * from df_1 full outer join df_2 on df_1.job_id = df_2.job_id and df_1.worker_id = df_2.worker_id
However, the result generated 2 extra duplicate columns called job_id_duplicate_column_name_1
and worker_id_duplicate_column_name_1
which I don’t want:
Could anyone help suggest a query method that can achieve the same as outer merge method in Pandas?
Advertisement
Answer
This looks like a UNION to me, not a JOIN:
select job_id, worker_id, random_id from df_1 union all select job_id, worker_id, 'NaN' from df_2 d2 where not exists (select * from df_1 d1 where d1.job_id = d2.job_id and d1.worker_id = d2.worker_id)
The second part of the union returns all rows from df_2
that don’t exist in df_1
An alternative would be to use EXCEPT
to get the rows from df_2 that don’t exist in df_1
select job_id, worker_id, random_id from df_1 union all select job_id, worker_id, 'NaN' from ( select job_id, worker_id from df_2 d2 except select job_id, worker_id from df_1 ) t