What I am trying to achieve is group them by id
and create a column for the date
as well as data
.
The background of the dataset are it is lab result taken by participant and some test are not able to be taken on same day due to fasting restrictions n etc. The database I am using is SQL Server.
Below are my DataSet as well as the desired output.
Sample dataset:
create table Sample ( Id int, LAB_DATE date, A_CRE_1 varchar(100), B_GLUH_1 varchar(100), C_LDL_1 varchar(100), D_TG_1 varchar(100), E_CHOL_1 varchar(100), F_HDL_1 varchar(100), G_CRPH_1 varchar(100), H_HBA1C_1 varchar(100), I_GLU120_1 varchar(100), J_GLUF_1 varchar(100), K_HCR_1 varchar(100) ) insert into Sample(Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1) values (01, '2017-11-21', '74', '6.4', '2.04', '4.17', '1.64', '6.1', '2.54') insert into sample (Id, LAB_DATE, I_GLU120_1) values (01, '2017-11-22','8.8') insert into sample (Id, LAB_DATE, D_TG_1) values (01, '2017-11-23','0.56') insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,K_HCR_1) values (2,'2018-10-02','57','8.91','2.43','1.28','3.99','1.25','3.19') insert into sample (Id,LAB_DATE,H_HBA1C_1) values (2,'2018-10-03','8.6') insert into sample (Id,LAB_DATE,J_GLUF_1) values (2,'2018-10-04','7.8') insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,G_CRPH_1,H_HBA1C_1,K_HCR_1) values (3,'2016-10-01','100','6.13','3.28','0.94','5.07','1.19','0.27','5.8','4.26')
Desired output:
ID|LAB_DATE|A_CRE_1|B_GLUH_1|C_LDL_1|Date_TG_1|D_TG_1|E_CHOL_1|F_HDL_1|G_CRPH_1|H_HBA1C_1|Date_GLU120_1|I_GLU120_1|J_GLUF_1|K_HCR_1 1|2017-11-21|74|6.4|2.04|2017-11-23|0.56|4.17|1.64|||6.1|2017-11-22|8.8|||2.54 2|02/10/2018|57|8.91|2.43||1.28|3.99|1.25||03/10/2018|8.6|||04/10/2018|7.8|3.19 3|01/10/2016|100|6.13|3.28||0.94|5.07|1.19|0.27||5.8|||||4.26
Advertisement
Answer
Here’s a solution (that cannot cope with multiple rows of the same id/sample type – you haven’t said what to do with those)
select * from (select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1 from sample) s1 INNER JOIN (select Id, LAB_DATE as glu120date, I_GLU120_1 from sample) s2 ON s1.id = s2.id (select Id, LAB_DATE as dtgdate, D_TG_1 from sample) s3 ON s1.id = s3.id
Hopefully you get the idea with this pattern; if you have other sample types with their own dates, break them out of s1 and into their own subquery in a similar way (eg make an s4 for e_chol_1, s5 for k_hcr_1 etc). Note that if any sample type is missing it will cause the whole row to disappear from the results. If this is not desired and you accept NULL for missing samples, use LEFT JOIN instead of INNER
If there will be multiple samples for patient 01 and you only want the latest, the pattern becomes:
select * from (select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1, row_number() over(partition by id order by lab_date desc) rn from sample) s1 INNER JOIN (select Id, LAB_DATE as glu120date, I_GLU120_1, row_number() over(partition by id order by lab_date desc) rn from sample) s2 ON s1.id = s2.id and s1.rn = s2.rn WHERE s1.rn = 1
Note the addition of row_number() over(partition by id order by lab_date desc) rn
– this establishes an incrementing counter in descending date order(latest record = 1, older = 2 …) that restarts from 1 for every different id. We join on it too then say where rn = 1 to pick only the latest records for each sample type