I'm learning SQL (Postgres) and I'm slightly confused about something I'm trying to do. I have the following tables. Employee: Project: Department: And works_on: For this, I was trying to do something slightly more complex: To create a view that has project name, department name, number of employees, and total hours worked on each project that has the criteria of

Syntax for counting greater than and summing a value with multiple joins without the keyword having

I’m learning SQL (Postgres) and I’m slightly confused about something I’m trying to do.

I have the following tables. Employee:

  fname   | minit |  lname  |    ssn    |   bdate    |         address         | sex |  salary  | super_ssn | dno 
----------+-------+---------+-----------+------------+-------------------------+-----+----------+-----------+-----
 James    | E     | Borg    | 888665555 | 1937-11-10 | 450 Stone, Houston TX   | M   | 55000.00 |           |   1
 John     | B     | Smith   | 123456789 | 1965-01-09 | 731 Fondren, Houston TX | M   | 30000.00 | 333445555 |   5
 Franklin | T     | Wong    | 333445555 | 1955-12-08 | 638 Voss, Houston TX    | M   | 40000.00 | 888665555 |   5
 Alicia   | J     | Zelaya  | 999887777 | 1968-01-19 | 3321 Castle, Spring TX  | F   | 25000.00 | 987654321 |   4
 Jennifer | S     | Wallace | 987654321 | 1941-06-20 | 291 Berry, Bellaire TX  | F   | 43000.00 | 888665555 |   4
 Ramesh   | K     | Narayan | 666884444 | 1962-09-15 | 975 Fire Oak, Humble TX | M   | 38000.00 | 333445555 |   5
 Joyce    | A     | English | 453453453 | 1972-07-31 | 5631 Rice, Houston TX   | F   | 25000.00 | 333445555 |   5
 Ahmad    | V     | Jabbar  | 987987987 | 1969-03-29 | 980 Dallas, Houston TX  | M   | 25000.00 | 987654321 |   4

Project:

      pname      | pnumber | plocation | dnum 
-----------------+---------+-----------+------
 ProductX        |       1 | Bellaire  |    5
 ProductY        |       2 | Sugarland |    5
 ProductZ        |       3 | Houston   |    5
 Computerization |      10 | Stafford  |    4
 Reorganization  |      20 | Houston   |    1
 Newbenefits     |      30 | Stafford  |    4

Department:

     dname      | dnumber |  mgr_ssn  | mgr_start  
----------------+---------+-----------+------------
 Research       |       5 | 333445555 | 1988-05-22
 Administration |       4 | 987654321 | 1995-01-01
 Headquarters   |       1 | 888665555 | 1981-06-19

And works_on:

   essn    | pno | hours 
-----------+-----+-------
 123456789 |   1 |  32.5
 123456789 |   2 |   7.5
 666884444 |   3 |  40.0
 453453453 |   1 |  20.0
 453453453 |   2 |  20.0
 333445555 |   2 |  10.0
 333445555 |   3 |  10.0
 333445555 |  10 |  10.0
 333445555 |  20 |  10.0
 999887777 |  30 |  30.0
 999887777 |  10 |  10.0
 987987987 |  10 |  35.0
 987987987 |  30 |   5.0
 987654321 |  30 |  20.0
 987654321 |  20 |  15.0
 888665555 |  20 |   0.0

For this, I was trying to do something slightly more complex: To create a view that has project name, department name, number of employees, and total hours worked on each project that has the criteria of more than one employee on it. I got some reading and did some basic count(*) and sum(*) but the syntax here is killing me.

This is the furthest I got after many hours trying to understand how this syntax operates:

SELECT 
p.pname AS "Project",
d.dname AS "Department",
count(w.essn) AS "# of Employees",
sum(w.hours) AS "Total Hours"
FROM project p JOIN works_on w
ON p.pnumber = w.pno 
JOIN department d
ON d.dnumber = p.dnum
WHERE w.essn > 1

That WHERE clause is obviously wrong. I think I’ve got the JOINS correct. How do I put these aggregate functions correctly together with all these joins? Do I need nested queries? I’ve found some examples using HAVING but I heard it’s legacy, and would like to use things that are newer and fresher, but found no other examples except using GROUP BY and HAVING. Are those fine to use? If not, what should I use?

Please and thank you.

Answer

First, you need a GROUP BY clause to make your query a valid aggregation query: that clause should include all non-aggreated columns (I also added the project number, in case two projects have the same name).

Then, you can use a HAVING clause to filter on projects that have more than one employee:

SELECT 
    p.pname AS "Project",
    d.dname AS "Department",
    COUNT(DISTINCT w.essn) AS "# of Employees",
    SUM(w.hours) AS "Total Hours"
FROM project p 
JOIN works_on w ON p.pnumber = w.pno 
JOIN department d ON d.dnumber = p.dnum
GROUP BY p.pname, d.dname, p.pnumber
HAVING COUNT(DISTINCT w.essn) > 1

If (essn, number) tuples are unique in the works table, then you don’t need DISTINCT in the COUNT():

SELECT 
    p.pname AS "Project",
    d.dname AS "Department",
    COUNT(*) AS "# of Employees",
    SUM(w.hours) AS "Total Hours"
FROM project p 
JOIN works_on w ON p.pnumber = w.pno 
JOIN department d ON d.dnumber = p.dnum
GROUP BY p.pname, d.dname, p.pnumber
HAVING COUNT(*) > 1

Advertisement

Answer