Skip to content
Advertisement

similar to groupByKey() in Spark but using SQL queries

I trying to make

 ID    CATEGORY  VALUE
'AAA'    'X'      123
'AAA'    'Y'      456
'BBB'    'X'      321
'BBB'    'Y'      654

into

 ID     VALUE_X   VALUE_Y
'AAA'     123       456
'BBB'     321       654

using only SQL queries. It is kind of similar to using groupByKey() in pyspark.

Is there a way to do this?

Advertisement

Answer

Just use conditional aggregation. One method is:

select id,
       max(case when category = 'X' then value end) as x_value,
       max(case when category = 'Y' then value end) as y_value
from t
group by id;

In Postgres, this would be phrased using the standard filter clause:

select id,
       max(value) filter (where category = 'X'),
       max(value) filter (where category = 'Y')
from t
group by id;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement