I’m learning SQL (Postgres)
and I’m slightly confused about something I’m trying to do.
I have the following tables. Employee:
fname | minit | lname | ssn | bdate | address | sex | salary | super_ssn | dno ----------+-------+---------+-----------+------------+-------------------------+-----+----------+-----------+----- James | E | Borg | 888665555 | 1937-11-10 | 450 Stone, Houston TX | M | 55000.00 | | 1 John | B | Smith | 123456789 | 1965-01-09 | 731 Fondren, Houston TX | M | 30000.00 | 333445555 | 5 Franklin | T | Wong | 333445555 | 1955-12-08 | 638 Voss, Houston TX | M | 40000.00 | 888665555 | 5 Alicia | J | Zelaya | 999887777 | 1968-01-19 | 3321 Castle, Spring TX | F | 25000.00 | 987654321 | 4 Jennifer | S | Wallace | 987654321 | 1941-06-20 | 291 Berry, Bellaire TX | F | 43000.00 | 888665555 | 4 Ramesh | K | Narayan | 666884444 | 1962-09-15 | 975 Fire Oak, Humble TX | M | 38000.00 | 333445555 | 5 Joyce | A | English | 453453453 | 1972-07-31 | 5631 Rice, Houston TX | F | 25000.00 | 333445555 | 5 Ahmad | V | Jabbar | 987987987 | 1969-03-29 | 980 Dallas, Houston TX | M | 25000.00 | 987654321 | 4
Project:
pname | pnumber | plocation | dnum -----------------+---------+-----------+------ ProductX | 1 | Bellaire | 5 ProductY | 2 | Sugarland | 5 ProductZ | 3 | Houston | 5 Computerization | 10 | Stafford | 4 Reorganization | 20 | Houston | 1 Newbenefits | 30 | Stafford | 4
Department:
dname | dnumber | mgr_ssn | mgr_start ----------------+---------+-----------+------------ Research | 5 | 333445555 | 1988-05-22 Administration | 4 | 987654321 | 1995-01-01 Headquarters | 1 | 888665555 | 1981-06-19
And works_on:
essn | pno | hours -----------+-----+------- 123456789 | 1 | 32.5 123456789 | 2 | 7.5 666884444 | 3 | 40.0 453453453 | 1 | 20.0 453453453 | 2 | 20.0 333445555 | 2 | 10.0 333445555 | 3 | 10.0 333445555 | 10 | 10.0 333445555 | 20 | 10.0 999887777 | 30 | 30.0 999887777 | 10 | 10.0 987987987 | 10 | 35.0 987987987 | 30 | 5.0 987654321 | 30 | 20.0 987654321 | 20 | 15.0 888665555 | 20 | 0.0
For this, I was trying to do something slightly more complex: To create a view that has project name, department name, number of employees, and total hours worked on each project that has the criteria of more than one employee on it. I got some reading and did some basic count(*)
and sum(*)
but the syntax here is killing me.
This is the furthest I got after many hours trying to understand how this syntax operates:
SELECT p.pname AS "Project", d.dname AS "Department", count(w.essn) AS "# of Employees", sum(w.hours) AS "Total Hours" FROM project p JOIN works_on w ON p.pnumber = w.pno JOIN department d ON d.dnumber = p.dnum WHERE w.essn > 1
That WHERE
clause is obviously wrong. I think I’ve got the JOINS
correct. How do I put these aggregate functions correctly together with all these joins? Do I need nested queries? I’ve found some examples using HAVING
but I heard it’s legacy, and would like to use things that are newer and fresher, but found no other examples except using GROUP BY
and HAVING
. Are those fine to use? If not, what should I use?
Please and thank you.
Advertisement
Answer
First, you need a GROUP BY
clause to make your query a valid aggregation query: that clause should include all non-aggreated columns (I also added the project number, in case two projects have the same name).
Then, you can use a HAVING
clause to filter on projects that have more than one employee:
SELECT p.pname AS "Project", d.dname AS "Department", COUNT(DISTINCT w.essn) AS "# of Employees", SUM(w.hours) AS "Total Hours" FROM project p JOIN works_on w ON p.pnumber = w.pno JOIN department d ON d.dnumber = p.dnum GROUP BY p.pname, d.dname, p.pnumber HAVING COUNT(DISTINCT w.essn) > 1
If (essn, number)
tuples are unique in the works
table, then you don’t need DISTINCT
in the COUNT()
:
SELECT p.pname AS "Project", d.dname AS "Department", COUNT(*) AS "# of Employees", SUM(w.hours) AS "Total Hours" FROM project p JOIN works_on w ON p.pnumber = w.pno JOIN department d ON d.dnumber = p.dnum GROUP BY p.pname, d.dname, p.pnumber HAVING COUNT(*) > 1