Skip to content
Advertisement

Select Distinct Rows with complex criteria

I have a table generated from a CRM software and it has a lot of repited individuals in different rows but with different fields completed in each repetition, something like this:

id birth_date sex postal_code customer smoker
001 NULL NULL 00067 Yes 1
001 NULL Male 00067 NULL 1
001 21/03/1994 NULL 00067 NULL NULL
002 NULL Female NULL NULL NULL
002 NULL NULL 09986 No 0
003 13/01/1986 NULL NULL No 1

The desired table extracted from this one should be this:

id birth_date sex postal_code customer smoker
001 21/03/1994 Male 00067 Yes 1
002 NULL Female 09986 No 0
003 13/01/1986 NULL NULL No 1

So as you can see, there are some columns completed in some repetitions, so the idea is to avoid repetitions (here we see a repetition because there’s the same id for more than one row, the id column should be the primary key) and get the maximum information about that id.

Any idea of how to achieve that?

Advertisement

Answer

You can aggregate the rows and take the max non-null value, as in:

select
  id,
  max(birth_date) as birth_date,
  max(sex) as sex,
  max(postal_code) as postal_code,
  max(customer) as customer,
  max(smoker) as smoker
from t
group by id

Now, this solution won’t account for inconsistent data when, for example, the same person has two different birth dates.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement