In a SQL Server statement there is SELECT id, book, acnt, prod, category from Table1 UNION SELECT id, book, acnt, prod, category from Table2 The first …

UNION two SELECT queries but result set is smaller…

In a SQL Server statement there is

SELECT id, book, acnt, prod, category from Table1 <where clause...>
UNION
SELECT id, book, acnt, prod, category from Table2 <where clause...>

​x
 
SELECT id, book, acnt, prod, category from Table1 <where clause...>UNIONSELECT id, book, acnt, prod, category from Table2 <where clause...>​

The first query returned 131,972 lines of data; the 2nd one, 147,692 lines. I didn’t notice there is any commonly shared line of data from these two tables, so I expect the result set after UNION should be the same as the sum of 131972 + 147692 = 279,384.

However the result set after UNION is 133,857. Even though they might have overlapped lines that I accidently missed, the result should be at least the same as the larger result set of those two. I can’t figure how the number 133,857 came from.

Is my understanding about SQL UNION correct? I use SQL server in this case.

Answer

To expand comment given under the question, which I think states what you already know:

UNION takes care of duplicates also within one table as well.

Just take a look at a example:

SETUP:

create table tbl1 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(3,4);
create table tbl2 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(1,2),
(1,2),
(3,4);

 
create table tbl1 (col1 int, col2 int);insert into tbl1 values(1,2),(3,4);create table tbl2 (col1 int, col2 int);insert into tbl1 values(1,2),(1,2),(1,2),(3,4);​

Query

select * from tbl1
union 
select * from tbl2;

 
select * from tbl1union select * from tbl2;​

will produce output

col1 | col2
-----|------
1    | 2
3    | 4

 
col1 | col2-----|------1    | 23    | 4​

DB fiddle

UNION two SELECT queries but result set is smaller than one of them

Advertisement

Answer