Skip to content
Advertisement

How distinct works in sql?

I tried to understand how the distinct (and therefore distinct count) work. I have a data with:

Sessionid, User_solici, code

While within 1 SessionID there is 1 User_solici. However, there could be multiple SessionID for the same User_solici.

For User_solici of “a”, I have Session = 1, 55 and null.

I want to find distinct SessionID where (code = 7 and code = 1) OR (code = 7 and code = 2) for the same User_solici.

My query is complete.

create table mytable 
(
    sessionid varchar(255),
    User_solici varchar(255),
    code varchar(255)
);

insert into mytable (sessionid, User_solici, code)
values 
('1', 'a', '7'),
('1', 'a', '1'),
('1', 'a', '2'),
('1', 'a', '999'),
('2', 'b', '7'),
('2', 'b', '1'),
('2', 'b', '2'),
('2', 'b', '999'),
('55', 'a', '7'),
('55', 'a', '1'),
('55', 'a', '2'),
('', 'a', '7'),
('', 'a', '1'),
('', 'a', '2'),
('', 'c', '7'),
('', 'c', '1'),
('', 'c', '2');

select distinct 
    T1.sessionid
from 
    mytable T1 
join 
    mytable T2 on T1.User_solici = T2.User_solici
               and T1.code = '7' 
               and T2.code in ('1', '2');

select 
    T1.sessionid, 
    T1.User_solici, 
    T1.code as T1code, 
    T2.code as T2code
from 
    mytable T1 
join 
    mytable T2 on T1.User_solici = T2.User_solici
               and T1.code = '7' 
               and T2.code in ('1', '2');

But I try to understand what SQL does.

Look at the raw table, I see 5 sessionid (1,a), (2,b), (55,a), (,a) (,c) that satisfy.

OR I can argue only 4 sessionid works (1),(2),(55), and (blank)

I thought that the software will go to row 1, if code=7, then join on User_solici, when done, select distinct (in this case will be session 1 and blank), put this distinct to count (or final file). Done with row 1, go to row 2. This case, it will create 5 count.

Can anyone help me to understand how SQL handle the join (or null) in this case?

Thank you,

Harry

Advertisement

Answer

First, the DISTINCT clause will only return a single instance of all columns in your select regardless of how many records qualify the criteria.

Your first query actually works given your second assessment of 1, 2, 55 and blank. But because the blank session has multiple users, THAT is the break in your count. So, if you did DISTINCT SessionID AND User_Solici you will get the 5 entries I THINK you are looking for. I would just shift your T1 criteria into the WHERE clause as that has no bearing on the JOIN to T2 table

select distinct 
      T1.sessionid,
      T1.User_solici
   from 
      mytable T1 
         join mytable T2
            on T1.User_solici = T2.User_solici
           and T2.code in ('1','2')
   where
      T1.code = '7' 
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement