Skip to content
Advertisement

How distinct works in sql?

I tried to understand how the distinct (and therefore distinct count) work. I have a data with:

While within 1 SessionID there is 1 User_solici. However, there could be multiple SessionID for the same User_solici.

For User_solici of “a”, I have Session = 1, 55 and null.

I want to find distinct SessionID where (code = 7 and code = 1) OR (code = 7 and code = 2) for the same User_solici.

My query is complete.

But I try to understand what SQL does.

Look at the raw table, I see 5 sessionid (1,a), (2,b), (55,a), (,a) (,c) that satisfy.

OR I can argue only 4 sessionid works (1),(2),(55), and (blank)

I thought that the software will go to row 1, if code=7, then join on User_solici, when done, select distinct (in this case will be session 1 and blank), put this distinct to count (or final file). Done with row 1, go to row 2. This case, it will create 5 count.

Can anyone help me to understand how SQL handle the join (or null) in this case?

Thank you,

Harry

Advertisement

Answer

First, the DISTINCT clause will only return a single instance of all columns in your select regardless of how many records qualify the criteria.

Your first query actually works given your second assessment of 1, 2, 55 and blank. But because the blank session has multiple users, THAT is the break in your count. So, if you did DISTINCT SessionID AND User_Solici you will get the 5 entries I THINK you are looking for. I would just shift your T1 criteria into the WHERE clause as that has no bearing on the JOIN to T2 table

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement