How to replace duplicated columns with a foreign key?

Given that I have the following tables (datasets) made by python:

import pandas as pd

dic ={

    "ID":[1,2,3],
    "patient_ID":[100,200,300],
    "Year":[2007,2008,2012],
    "month":[8,6,3]
}

paitent = pd.DataFrame(dic)

​x
 
import pandas as pd​dic ={​    "ID":[1,2,3],    "patient_ID":[100,200,300],    "Year":[2007,2008,2012],    "month":[8,6,3]}​paitent = pd.DataFrame(dic)​

So the paitent table is:

  ID  patient_ID  Year  month
   1         100  2007      8
   2         200  2008      6
   3         300  2012      3

 
  ID  patient_ID  Year  month   1         100  2007      8   2         200  2008      6   3         300  2012      3​

and then the other table has the same columns patient_ID, Year and the month. These 3 columns are duplicated from patient table.

dic ={

    "patient_ID":[100,200,100,300],
    "Year":[2007,2008,2007,2012],
    "month":[8,6,8,3],
    "Polyp":[4,5,6,8]
}

paitentPolyp = pd.DataFrame(dic)

 
dic ={​    "patient_ID":[100,200,100,300],    "Year":[2007,2008,2007,2012],    "month":[8,6,8,3],    "Polyp":[4,5,6,8]}​paitentPolyp = pd.DataFrame(dic)​

So, it is its final look of paitentPolyp

  patient_ID  Year  month  Polyp
         100  2007      8      4
         200  2008      6      5
         100  2007      8      6
         300  2012      3      8

 
  patient_ID  Year  month  Polyp         100  2007      8      4         200  2008      6      5         100  2007      8      6         300  2012      3      8​

The question is:

How may i obtain a view from paitentPolyp which instead of patient_ID, Year and the month only has the ID of paitent?

So, my favorit output is:

  patient_ID  Polyp
           1      4
           2      5
           1      6
           3      8

 
  patient_ID  Polyp           1      4           2      5           1      6           3      8​

Although, i am working with python, but a SQL solution is more welcome.

Answer

In Python:

p = paitentPolyp.merge(paitent, on = ['patient_ID', 'Year', 'month'], how = 'left')
p = p.drop(['patient_ID', 'Year', 'month'], axis =1)
p = (p[['ID', 'Polyp']]).sort_values('ID')

 
p = paitentPolyp.merge(paitent, on = ['patient_ID', 'Year', 'month'], how = 'left')p = p.drop(['patient_ID', 'Year', 'month'], axis =1)p = (p[['ID', 'Polyp']]).sort_values('ID')​

In SQL:

CREATE TABLE patient(
ID int,
patient_id INT,
Years INT,
months int
);

CREATE TABLE patientpoly(
patient_id int,
Years INT,
months INT,
polyp int
);

INSERT INTO patient VALUES(1, 100, 2007, 8);
INSERT INTO patient VALUES(2, 200, 2008, 6);
INSERT INTO patient VALUES(3, 300, 2012, 3);
INSERT INTO patientpoly VALUES(100, 2007, 8, 4);
INSERT INTO patientpoly VALUES(200, 2008, 8, 5);
INSERT INTO patientpoly VALUES(100, 2007, 8, 6);
INSERT INTO patientpoly VALUES(300, 2012, 8, 8);

SELECT patient.ID, patientpoly.polyp
FROM patient
JOIN patientpoly ON patient.patient_id = patientpoly.patient_id
ORDER BY patient.id;

 
CREATE TABLE patient(ID int,patient_id INT,Years INT,months int);​CREATE TABLE patientpoly(patient_id int,Years INT,months INT,polyp int);​INSERT INTO patient VALUES(1, 100, 2007, 8);INSERT INTO patient VALUES(2, 200, 2008, 6);INSERT INTO patient VALUES(3, 300, 2012, 3);INSERT INTO patientpoly VALUES(100, 2007, 8, 4);INSERT INTO patientpoly VALUES(200, 2008, 8, 5);INSERT INTO patientpoly VALUES(100, 2007, 8, 6);INSERT INTO patientpoly VALUES(300, 2012, 8, 8);​SELECT patient.ID, patientpoly.polypFROM patientJOIN patientpoly ON patient.patient_id = patientpoly.patient_idORDER BY patient.id;​

Gives the following output:

ID      polyp
1         4
1         6
2         5
3         8

 
ID      polyp1         41         62         53         8​

Advertisement

Answer