I’m learning SQL (Postgres)
and I’m slightly confused about something I’m trying to do.
I have the following tables. Employee:
fname | minit | lname | ssn | bdate | address | sex | salary | super_ssn | dno
----------+-------+---------+-----------+------------+-------------------------+-----+----------+-----------+-----
James | E | Borg | 888665555 | 1937-11-10 | 450 Stone, Houston TX | M | 55000.00 | | 1
John | B | Smith | 123456789 | 1965-01-09 | 731 Fondren, Houston TX | M | 30000.00 | 333445555 | 5
Franklin | T | Wong | 333445555 | 1955-12-08 | 638 Voss, Houston TX | M | 40000.00 | 888665555 | 5
Alicia | J | Zelaya | 999887777 | 1968-01-19 | 3321 Castle, Spring TX | F | 25000.00 | 987654321 | 4
Jennifer | S | Wallace | 987654321 | 1941-06-20 | 291 Berry, Bellaire TX | F | 43000.00 | 888665555 | 4
Ramesh | K | Narayan | 666884444 | 1962-09-15 | 975 Fire Oak, Humble TX | M | 38000.00 | 333445555 | 5
Joyce | A | English | 453453453 | 1972-07-31 | 5631 Rice, Houston TX | F | 25000.00 | 333445555 | 5
Ahmad | V | Jabbar | 987987987 | 1969-03-29 | 980 Dallas, Houston TX | M | 25000.00 | 987654321 | 4
Project:
pname | pnumber | plocation | dnum
-----------------+---------+-----------+------
ProductX | 1 | Bellaire | 5
ProductY | 2 | Sugarland | 5
ProductZ | 3 | Houston | 5
Computerization | 10 | Stafford | 4
Reorganization | 20 | Houston | 1
Newbenefits | 30 | Stafford | 4
Department:
dname | dnumber | mgr_ssn | mgr_start
----------------+---------+-----------+------------
Research | 5 | 333445555 | 1988-05-22
Administration | 4 | 987654321 | 1995-01-01
Headquarters | 1 | 888665555 | 1981-06-19
And works_on:
essn | pno | hours
-----------+-----+-------
123456789 | 1 | 32.5
123456789 | 2 | 7.5
666884444 | 3 | 40.0
453453453 | 1 | 20.0
453453453 | 2 | 20.0
333445555 | 2 | 10.0
333445555 | 3 | 10.0
333445555 | 10 | 10.0
333445555 | 20 | 10.0
999887777 | 30 | 30.0
999887777 | 10 | 10.0
987987987 | 10 | 35.0
987987987 | 30 | 5.0
987654321 | 30 | 20.0
987654321 | 20 | 15.0
888665555 | 20 | 0.0
For this, I was trying to do something slightly more complex: To create a view that has project name, department name, number of employees, and total hours worked on each project that has the criteria of more than one employee on it. I got some reading and did some basic count(*)
and sum(*)
but the syntax here is killing me.
This is the furthest I got after many hours trying to understand how this syntax operates:
SELECT
p.pname AS "Project",
d.dname AS "Department",
count(w.essn) AS "# of Employees",
sum(w.hours) AS "Total Hours"
FROM project p JOIN works_on w
ON p.pnumber = w.pno
JOIN department d
ON d.dnumber = p.dnum
WHERE w.essn > 1
That WHERE
clause is obviously wrong. I think I’ve got the JOINS
correct. How do I put these aggregate functions correctly together with all these joins? Do I need nested queries? I’ve found some examples using HAVING
but I heard it’s legacy, and would like to use things that are newer and fresher, but found no other examples except using GROUP BY
and HAVING
. Are those fine to use? If not, what should I use?
Please and thank you.
Advertisement
Answer
First, you need a GROUP BY
clause to make your query a valid aggregation query: that clause should include all non-aggreated columns (I also added the project number, in case two projects have the same name).
Then, you can use a HAVING
clause to filter on projects that have more than one employee:
SELECT
p.pname AS "Project",
d.dname AS "Department",
COUNT(DISTINCT w.essn) AS "# of Employees",
SUM(w.hours) AS "Total Hours"
FROM project p
JOIN works_on w ON p.pnumber = w.pno
JOIN department d ON d.dnumber = p.dnum
GROUP BY p.pname, d.dname, p.pnumber
HAVING COUNT(DISTINCT w.essn) > 1
If (essn, number)
tuples are unique in the works
table, then you don’t need DISTINCT
in the COUNT()
:
SELECT
p.pname AS "Project",
d.dname AS "Department",
COUNT(*) AS "# of Employees",
SUM(w.hours) AS "Total Hours"
FROM project p
JOIN works_on w ON p.pnumber = w.pno
JOIN department d ON d.dnumber = p.dnum
GROUP BY p.pname, d.dname, p.pnumber
HAVING COUNT(*) > 1