Given that I have the following tables (datasets) made by python:
x
import pandas as pd
dic ={
"ID":[1,2,3],
"patient_ID":[100,200,300],
"Year":[2007,2008,2012],
"month":[8,6,3]
}
paitent = pd.DataFrame(dic)
So the paitent table is:
ID patient_ID Year month
1 100 2007 8
2 200 2008 6
3 300 2012 3
and then the other table has the same columns patient_ID
, Year
and the month
. These 3 columns are duplicated from patient
table.
dic ={
"patient_ID":[100,200,100,300],
"Year":[2007,2008,2007,2012],
"month":[8,6,8,3],
"Polyp":[4,5,6,8]
}
paitentPolyp = pd.DataFrame(dic)
So, it is its final look of paitentPolyp
patient_ID Year month Polyp
100 2007 8 4
200 2008 6 5
100 2007 8 6
300 2012 3 8
The question is:
How may i obtain a view from paitentPolyp
which instead of patient_ID
, Year
and the month
only has the ID of paitent
?
So, my favorit output is:
patient_ID Polyp
1 4
2 5
1 6
3 8
Although, i am working with python, but a SQL
solution is more welcome.
Advertisement
Answer
In Python:
p = paitentPolyp.merge(paitent, on = ['patient_ID', 'Year', 'month'], how = 'left')
p = p.drop(['patient_ID', 'Year', 'month'], axis =1)
p = (p[['ID', 'Polyp']]).sort_values('ID')
In SQL:
CREATE TABLE patient(
ID int,
patient_id INT,
Years INT,
months int
);
CREATE TABLE patientpoly(
patient_id int,
Years INT,
months INT,
polyp int
);
INSERT INTO patient VALUES(1, 100, 2007, 8);
INSERT INTO patient VALUES(2, 200, 2008, 6);
INSERT INTO patient VALUES(3, 300, 2012, 3);
INSERT INTO patientpoly VALUES(100, 2007, 8, 4);
INSERT INTO patientpoly VALUES(200, 2008, 8, 5);
INSERT INTO patientpoly VALUES(100, 2007, 8, 6);
INSERT INTO patientpoly VALUES(300, 2012, 8, 8);
SELECT patient.ID, patientpoly.polyp
FROM patient
JOIN patientpoly ON patient.patient_id = patientpoly.patient_id
ORDER BY patient.id;
Gives the following output:
ID polyp
1 4
1 6
2 5
3 8