I haven’t understood the difference between count(*)
and count having an attribute as argument.
Here I make an example: there are two relationships:
project (id, name, number); employee (ssn, name, surname); working_on (employee_ssn, project_id);
With employee_ssn
which references employee(ssn)
and project_id
referencing project(id)
.
Primary keys: project(id)
, employee(ssn)
, working_on (employee_ssn
, project_id
).
Now I have to find for each project, the id of the project, the name and the number of employees working on it.
A correct solution (found on the book) is this one:
select id, name, count (*) from working_on join project on id=project_id group by name, id
I have,’t understood why this solution is valid, shouldn’t count (*)
count all tuples? Why this way it gets the exact number of employees working on it?
I have written this solution instead:
select id, name, count (employee_ssn) from working_on join project on id=project_id group by name, id
Are the solutions equivalent?
And in general, there is a difference using count (*)
and count(attribute)
? Could you provide an example where these two syntaxes produce different results?
Advertisement
Answer
I figured out two differences till now –
as eric mentioned above, count(col) doesn’t count null values whereas count (*) does.
It behaves differently in a join query.
Consider a situation where i need to join employee and department table, and i need to find count of employees in each department.
2.a.
SELECT DPT.ID , COUNT(*) EMP_COUNT FROM DEPARTMENT DPT INNER JOIN EMPLOYEE E ON DPT.ID = E.DEPARTMENT_ID GROUP BY DPT.ID;
In the above query, it will not return those records where employee table count=0.
2.b.
SELECT DPT.ID , COUNT(E.DEPARTMENT_ID) EMP_COUNT FROM DEPARTMENT DPT INNER JOIN EMPLOYEE E ON DPT.ID = E.DEPARTMENT_ID GROUP BY DPT.ID;
If you use count(E.DEPARTMENT_ID) instead of count(*), this time it will give records even when employee table count=0.
2.c.
SELECT DPT.ID , COUNT(DPT.ID) EMP_COUNT FROM DEPARTMENT DPT INNER JOIN EMPLOYEE E ON DPT.ID = E.DEPARTMENT_ID GROUP BY DPT.ID;
This is the interesting part, when you use COUNT(DPT.ID), since DPT.ID is column of department, here again records where employee table count=0 will NOT be considered.