In a SQL Server statement there is
SELECT id, book, acnt, prod, category from Table1 <where clause...> UNION SELECT id, book, acnt, prod, category from Table2 <where clause...>
The first query returned 131,972 lines of data; the 2nd one, 147,692 lines. I didn’t notice there is any commonly shared line of data from these two tables, so I expect the result set after UNION should be the same as the sum of 131972 + 147692 = 279,384.
However the result set after UNION is 133,857. Even though they might have overlapped lines that I accidently missed, the result should be at least the same as the larger result set of those two. I can’t figure how the number 133,857 came from.
Is my understanding about SQL UNION correct? I use SQL server in this case.
Advertisement
Answer
To expand comment given under the question, which I think states what you already know:
UNION
takes care of duplicates also within one table as well.
Just take a look at a example:
SETUP:
create table tbl1 (col1 int, col2 int); insert into tbl1 values (1,2), (3,4); create table tbl2 (col1 int, col2 int); insert into tbl1 values (1,2), (1,2), (1,2), (3,4);
Query
select * from tbl1 union select * from tbl2;
will produce output
col1 | col2 -----|------ 1 | 2 3 | 4