The tables created and the queries made are not the primary focus of this question, what confuses me is that why the first query and the second query returns different numbers of rows
drop table Reserves; drop table Sailors; drop table Boats; create table Sailors ( sid char(1) not null, sname char(1) not null, rating int, age int not null, primary key (sid) ); create table Boats ( bid char(1) not null, bname char(1) not null, color varchar(5), primary key (bid) ); create table Reserves ( sid char(1) not null, bid char(1) not null, rdate int not null, primary key (sid, bid, rdate), foreign key (sid) references Sailors(sid) on delete cascade, foreign key (bid) references Boats(bid) on delete cascade ); ------------------------------------------------------------------------------ -- Insert values insert into Sailors values ('1', 'q', 90, 24); insert into Sailors values ('0', 's', 60, 22); insert into Sailors values ('2', 'd', 80, 20); insert into Sailors values ('3', 'w', 70, 18); insert into Sailors values ('4', 'a', 60, 19); insert into Sailors values ('5', 'l', 80, 17); insert into Sailors values ('6', 'o', 90, 18); insert into Sailors values ('7', 'q', 70, 20); insert into Sailors values ('8', 'd', 60, 16); insert into Sailors values ('9', 'i', 80, 22); insert into Boats values ('0', 'U', 'red'); insert into Boats values ('1', 'P', 'red'); insert into Boats values ('2', 'Q', 'blue'); insert into Boats values ('3', 'C', 'green'); insert into Boats values ('4', 'L', 'blue'); insert into Boats values ('5', 'O', 'blue'); insert into Boats values ('6', 'A', 'red'); insert into Boats values ('7', 'C', 'red'); insert into Boats values ('8', 'Y', 'green'); insert into Boats values ('9', 'N', 'blue'); insert into Reserves values ('0', '0', 3); insert into Reserves values ('0', '1', 2); insert into Reserves values ('0', '2', 1); insert into Reserves values ('0', '2', 3); insert into Reserves values ('1', '0', 4); insert into Reserves values ('3', '2', 2); insert into Reserves values ('4', '0', 3); insert into Reserves values ('4', '0', 1); insert into Reserves values ('4', '1', 3); insert into Reserves values ('4', '6', 4); insert into Reserves values ('4', '7', 1); insert into Reserves values ('5', '8', 2); insert into Reserves values ('5', '9', 2); insert into Reserves values ('7', '4', 4); insert into Reserves values ('7', '5', 1); insert into Reserves values ('8', '3', 2); insert into Reserves values ('9', '3', 3); insert into Reserves values ('9', '0', 1); insert into Reserves values ('9', '6', 1); insert into Reserves values ('9', '8', 2); commit; select * from Sailors join Boats on color='red' natural left outer join Reserves where rdate is null; select sid from Sailors join Boats on color='red' natural left outer join Reserves where rdate is null;
I want to find the sid of the sailors who have not ordered all the red boats, the first query above returns the correct rows I am expecting, nonethless the second query returns only rows with sid=2 and sid=6, despite the two queries are identical. sailors with sid 2 and 6 are the only sailors who have not booked any boat.
Advertisement
Answer
As far as I can tell, this looks like a bug in Oracle’s implementation of natural [...] join
. I will do some testing to see if it affects inner joins too.
Instead of natural
join, one can use the syntax left|right|inner join USING(...)
and giving the list of column names in the using
clause. The list of columns should be the list of ALL columns that have the same name in the two members of the join.
Very simple experimentation with the data you provided (+1 even for that alone) shows that the results are the same as if we had written using(sid, bid)
in the first query, but only using(sid)
in the second. If in either query – regardless of what it selects, whether *
or sid
– you use the using
syntax, you get the same number of rows in the output as either your first or your second query, depending on what you put in the using
clause.
So, what Oracle does for the second query is simply wrong. I can only speculate, but I believe Oracle looks at the SELECT
clause first, and perhaps at other clauses, to see what columns it needs to retrieve from each table. (For example, this tells the optimizer what indexes it could use, etc.) And at this step, Oracle decides – in your second query – that it doesn’t need bid
. Then when it translates “natural” join to its own internal code, it doesn’t throw in bid
as a join column. Which is wrong – and which is why I called this a “bug”.
IMPORTANT NOTE: Others have commented that both queries are “wrong” in that they do not solve the problem you were trying to solve. That may be entirely true. I didn’t even look at your problem specification; here I am answering your question, which is valid regardless of the problem – which is, why do the two queries produce different numbers of rows. Even if they are “wrong” for your use case, they should essentially give “the same” wrong answer, not “different” wrong answers. That is the only thing I discussed above.