Skip to content
Advertisement

Break out nested data within SQL, criteria across multiple rows (similar to dcast in R)

I’m trying to write a simple query to take a data set that looks like this:

    ID  |  Col2
    X      B 
    X      C
    Y      B
    Y      D

and return this:

    ID | Col2 | Col3
    X    B      C
    Y    B      D

Essentially, I have an ID column that can have either B, C, or D in Col2. I am trying to identify which IDs only have B and D. I have a query to find both, but not only that combination. Query:

select ID, Col2
from Table1
where ID in (
  select ID from Table1
  group by ID
  having count(distinct Col2) = 2) 
order by ID

Alternatively, I could use help in finding a way to filter that query on B and D and leave off B and C. I have seen perhaps a self join, but am not sure how to implement that.

Thanks!

EDIT: Most of the data set has, for a given ID, all three of B, C, and D. The goal here is to isolate the IDs that are missing one, namely missing C.

Advertisement

Answer

I am trying to identify which IDs only have B and D. I have a query to find both

If this is what you want, you don’t need multiple columns:

select id
from table1
where col2 in ('B', 'D')
group by id
having count(distinct col2) = 2;

If you want only 'B' and 'D' and no others, then:

select id
from table1
group by id
having sum(case when col2 = 'B' then 1 else 0 end) > 0 AND
       sum(case when col2 = 'C' then 1 else 0 end) > 0 AND
       sum(case when col2 not in ('B', 'D') then 1 else 0 end) = 0;

If there are only two columns, you can also easily pivot the values using aggregation:

select id, min(col2), nullif(max(col2), min(col2))
from table1
group by id;
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement