Skip to content
Advertisement

How to de-duplicate SQL table rows by multiple columns with hierarchy?

I have a table with multiple records for each patient.

My end goal is a table that is 1-to-1 between Patient_id and Value.

I would like to de-duplicate (in respect to patient_id) my rows based on “a hierarchical series of aggregate functions” (if someone has a better way to phrase this, I’d appreciate that as well.)

For each patient_id, I would like to get the row(s) that has/have the MAX(Date). In the case that there are still duplicated patient_id, I would like to get the row(s) with the MIN(Priority). In the case that there are still duplicated rows I would like to get the row(s) with the MIN(Date2).

The way I’ve approached this problem is using a series of queries like this to de-duplicate on the columns one at a time.

Is there a way to do this that allows me to de-dup on multiple columns at once? Is there a more elegant way to do this?

I’m able to get my results, but my solution feels very inefficient, and I keep running into this. Thank you for any input.

Advertisement

Answer

You could use row_number(), if your RDBMS supports it:

Another option is to filter with a correlated subquery that sorts the record according to your criteria, like so:

The actual syntax for limit varies accross RDBMS.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement