Skip to content
Advertisement

How to query in DB2 for distinct column value, rather than a distinct row?

I’m trying to construct a DB2 query that returns distinct rows, but for the distinction to be based on only one column’s values.

The query below will allow multiple rows with the same order number if, say, itemColor is different.

select distinct
    orderNumber,
    itemColor,
    orderDate
from ORDER_TABLE

returns…

order_number      item_color      order_date
    0001      |      red      |   2018-06-19
    0001      |     green     |   2018-06-19 // DUPLICATE ORDER_NUMBER
    0002      |      red      |   2018-06-19
    0003      |      red      |   2018-06-19
    0004      |      red      |   2018-06-19

What I’d like is for my query to do something like this:

select
    orderNumber,
    itemColor,
    orderDate
from ORDER_TABLE
where orderNumber is distinct

That way I can be assured that only one record will be returned for every order number. It seems that DB2 does not allow for distinct column values, but only distinct full rows.

A suggestion I saw said to do a sub query, like so:

select
    orderNumber,
    itemColor,
    orderDate
from ORDER_TABLE
where orderNumber in (select distinct(orderNumber) from ORDER_TABLE)

… but sadly this didn’t work, I still saw records with the same orderNumber in my result set.

I see the official docs say to use the group by operator – but this seems to be for aggregating functions, and I don’t want to aggregate any records! I just want ONE row per orderNumber. Maybe I’m misunderstanding the docs.

Thanks.

Advertisement

Answer

You can use window functions:

select orderNumber, itemColor, orderDate
from (select o.*, row_number() over (partition by orderNumber order by orderNumber) as seqnum
      from ORDER_TABLE
     ) o
where seqnum = 1;
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement