Skip to content
Advertisement

Why selecting a single attribute returns less rows than selecting all columns in oracle SQL

The tables created and the queries made are not the primary focus of this question, what confuses me is that why the first query and the second query returns different numbers of rows

drop table Reserves;
drop table Sailors;
drop table Boats;

create table Sailors (
    sid      char(1) not null,
    sname    char(1) not null,
    rating   int,
    age      int not null,
    primary key (sid)
);

create table Boats (
    bid      char(1) not null,
    bname    char(1) not null,
    color    varchar(5),
    primary key (bid)
);

create table Reserves (
    sid      char(1) not null,
    bid      char(1) not null,
    rdate    int not null,
    primary key (sid, bid, rdate),
    foreign key (sid) references Sailors(sid)
        on delete cascade,
    foreign key (bid) references Boats(bid)
        on delete cascade
);


------------------------------------------------------------------------------

-- Insert values

insert into Sailors values ('1', 'q', 90, 24);
insert into Sailors values ('0', 's', 60, 22);
insert into Sailors values ('2', 'd', 80, 20);
insert into Sailors values ('3', 'w', 70, 18);
insert into Sailors values ('4', 'a', 60, 19);
insert into Sailors values ('5', 'l', 80, 17);
insert into Sailors values ('6', 'o', 90, 18);
insert into Sailors values ('7', 'q', 70, 20);
insert into Sailors values ('8', 'd', 60, 16);
insert into Sailors values ('9', 'i', 80, 22);

insert into Boats values ('0', 'U', 'red');
insert into Boats values ('1', 'P', 'red');
insert into Boats values ('2', 'Q', 'blue');
insert into Boats values ('3', 'C', 'green');
insert into Boats values ('4', 'L', 'blue');
insert into Boats values ('5', 'O', 'blue');
insert into Boats values ('6', 'A', 'red');
insert into Boats values ('7', 'C', 'red');
insert into Boats values ('8', 'Y', 'green');
insert into Boats values ('9', 'N', 'blue');

insert into Reserves values ('0', '0', 3);
insert into Reserves values ('0', '1', 2);
insert into Reserves values ('0', '2', 1);
insert into Reserves values ('0', '2', 3);
insert into Reserves values ('1', '0', 4);
insert into Reserves values ('3', '2', 2);
insert into Reserves values ('4', '0', 3);
insert into Reserves values ('4', '0', 1);
insert into Reserves values ('4', '1', 3);
insert into Reserves values ('4', '6', 4);
insert into Reserves values ('4', '7', 1);
insert into Reserves values ('5', '8', 2);
insert into Reserves values ('5', '9', 2);
insert into Reserves values ('7', '4', 4);
insert into Reserves values ('7', '5', 1);
insert into Reserves values ('8', '3', 2);
insert into Reserves values ('9', '3', 3);
insert into Reserves values ('9', '0', 1);
insert into Reserves values ('9', '6', 1);
insert into Reserves values ('9', '8', 2);

commit;


select *
from Sailors join Boats on color='red' natural left outer join Reserves
where rdate is null;

select sid
from Sailors join Boats on color='red' natural left outer join Reserves
where rdate is null;

I want to find the sid of the sailors who have not ordered all the red boats, the first query above returns the correct rows I am expecting, nonethless the second query returns only rows with sid=2 and sid=6, despite the two queries are identical. sailors with sid 2 and 6 are the only sailors who have not booked any boat.

Advertisement

Answer

As far as I can tell, this looks like a bug in Oracle’s implementation of natural [...] join. I will do some testing to see if it affects inner joins too.

Instead of natural join, one can use the syntax left|right|inner join USING(...) and giving the list of column names in the using clause. The list of columns should be the list of ALL columns that have the same name in the two members of the join.

Very simple experimentation with the data you provided (+1 even for that alone) shows that the results are the same as if we had written using(sid, bid) in the first query, but only using(sid) in the second. If in either query – regardless of what it selects, whether * or sid – you use the using syntax, you get the same number of rows in the output as either your first or your second query, depending on what you put in the using clause.

So, what Oracle does for the second query is simply wrong. I can only speculate, but I believe Oracle looks at the SELECT clause first, and perhaps at other clauses, to see what columns it needs to retrieve from each table. (For example, this tells the optimizer what indexes it could use, etc.) And at this step, Oracle decides – in your second query – that it doesn’t need bid. Then when it translates “natural” join to its own internal code, it doesn’t throw in bid as a join column. Which is wrong – and which is why I called this a “bug”.

IMPORTANT NOTE: Others have commented that both queries are “wrong” in that they do not solve the problem you were trying to solve. That may be entirely true. I didn’t even look at your problem specification; here I am answering your question, which is valid regardless of the problem – which is, why do the two queries produce different numbers of rows. Even if they are “wrong” for your use case, they should essentially give “the same” wrong answer, not “different” wrong answers. That is the only thing I discussed above.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement