Filtering out duplicates yet showing non-duplicate…

I have a table with multiple duplicate values. What I’d like to do is use some sort of SELECT query that will filter the duplicates by name and yet keep all the appropriate values from the “expertise” column and show them next to each other in one row separated by a coma.

In other words I want to turn a table that looks like this:

 names   | organization | education_years | expertise 
---------+--------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------
 John    | University   | 2016 – 2020     | Programming
 John    | University   | 2016 – 2020     | machine Learning
 John    | University   | 2016 – 2020     | AI
 John    | University   | 2016 – 2020     | Robotics
 John    | University   | 2016 – 2020     | Symbolic Reasoning
 Gabriel | Company      | 2015 – Present  | models/networks
 Gabriel | Company      | 2015 – Present  | AI
 Gabriel | Company      | 2015 – Present  | Speech/language
 Gabriel | Company      | 2015 – Present  | machine learning
 Gabriel | Company      | 2015 – Present  | Programming
 Nicolas | Company      | Present         | machine learning, factorization, learning theory, supervised learning

​x
 
 names   | organization | education_years | expertise ---------+--------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------ John    | University   | 2016 – 2020     | Programming John    | University   | 2016 – 2020     | machine Learning John    | University   | 2016 – 2020     | AI John    | University   | 2016 – 2020     | Robotics John    | University   | 2016 – 2020     | Symbolic Reasoning Gabriel | Company      | 2015 – Present  | models/networks Gabriel | Company      | 2015 – Present  | AI Gabriel | Company      | 2015 – Present  | Speech/language Gabriel | Company      | 2015 – Present  | machine learning Gabriel | Company      | 2015 – Present  | Programming Nicolas | Company      | Present         | machine learning, factorization, learning theory, supervised learning​

Into This:

 names   | organization | education_years | expertise 
---------+--------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------
 John    | University   | 2016 – 2020     | Programming, machine Learning, AI, Robotics, Symbolic Reasoning
 Gabriel | Company      | 2015 – Present  | models/networks, AI, Speech/language, machine learning, Programming
 Nicolas | Company      | Present         | machine learning, factorization, learning theory, supervised learning

 
 names   | organization | education_years | expertise ---------+--------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------ John    | University   | 2016 – 2020     | Programming, machine Learning, AI, Robotics, Symbolic Reasoning Gabriel | Company      | 2015 – Present  | models/networks, AI, Speech/language, machine learning, Programming Nicolas | Company      | Present         | machine learning, factorization, learning theory, supervised learning​

I’m using PostgreSQL as my database.

Can anyone help? Haven’t done queries for so long, and couldn’t find a similar question that addresses this.

Answer

Use aggregation:

select name, organization, education_years,
       string_agg(expertise, ', ')
from t
group by name, organization, education_years;

 
select name, organization, education_years,       string_agg(expertise, ', ')from tgroup by name, organization, education_years;​

The above uses string_agg() because the result looks like a string. I would advise you to use arrays instead:

select name, organization, education_years,
       array_agg(expertise)
from t
group by name, organization, education_years;

 
select name, organization, education_years,       array_agg(expertise)from tgroup by name, organization, education_years;​

Filtering out duplicates yet showing non-duplicate rows from another column in one row

Advertisement

Answer