Skip to content
Advertisement

INNER JOIN with aggregate functions in my SELECT

I’m trying to join a new column to my current query that uses aggregate functions. I create this column with a new query that also uses an aggregate function from a different table but I’m not sure if a JOIN will work for me since I need to join it to its respective row.

TABLE A (employees that are enrolled or were enrolled in a project)

ID DEPARTMENT ENROLLED PROJECT
1 MARKETING Yes ARQ
2 MARKETING Yes TC
3 MARKETING No ARQ
4 MARKETING No TC
5 FINANCE Yes ARQ
6 FINANCE Yes TC
7 FINANCE No ARQ
8 FINANCE Yes TC

This table has more departments and more projects, but I simplified.

TABLE B (relation with departments and employees)

ID DEPARTMENT TOTAL_EMPLOYEES
1 MARKETING 2
2 MARKETING 3
3 FINANCE 4
4 FINANCE 8

In my first query I was asked to achieve the following result – using only table A:

             (employees enrolled)               (employees not enrolled)     
DEPARTMENT ARQ_E TC_E TOTAL_ENROLLED ARQ_N TC_N TOTAL_NOT_ENROLLED TOTAL
MARKETING 1 1 2 1 1 2 4
FINANCE 1 1 2 1 1 2 4

Using the following query:

SELECT  tableA.department, 
        sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
        sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
        sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
        sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
        sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
        sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
        count (*) AS Total               
FROM    tableA
GROUP BY tableA.department;

My second query gets departments and their total employees from table B:

DEPARTMENT TOTAL_EMPLOYEES
MARKETING 5
FINANCE 12

Using the following query:

SELECT  tableB.department,
        sum(tableB.total_employees) AS TOTAL_EMPLOYEES
FROM tableB
GROUP BY tableB.department;

I need to add the column TOTAL_EMPLOYEES to my first query, next to TOTAL will be TOTAL_EMPLOYEES. But it has to be placed with its respective department row. I need this to compare this 2 columns and see how many employees were not assigned to any project.

This is my expected result.

                  (employees enrolled)               (employees not enrolled)     
DEPARTMENT ARQ_E TC_E TOTAL_ENROLLED ARQ_N TC_N TOTAL_NOT_ENROLLED TOTAL T_EMPL
MARKETING 1 1 2 1 1 2 4 5
FINANCE 1 1 2 1 1 2 4 12

I have tried to achieve this using the following query:

SELECT  tableA.department, 
        sum(case when enrolled  = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
        sum(case when enrolled  = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
        sum(case when enrolled  = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
        sum(case when enrolled  != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
        sum(case when enrolled  != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
        sum(case when enrolled  != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
        count (*) AS Total,
        sum (tableB.total_employees) AS T_EMPL               
FROM    tableA
JOIN    tableB
ON tableA.department = tableB.department
GROUP BY tableA.department;

But the numbers I get in my query are completely wrong since the JOINS repeat my rows and my SUMS duplicate.

I don’t know if I really need to use a join or a subquery to place my sum(tableB.department) in its respective row.

I’m using PostgreSQL but since I’m using Standard 92 any SQL solution will help.

Advertisement

Answer

Your main issue stemmed from inadvertently multiplying rows with the join, and has already been addressed. See:

But use the standard SQL aggregate FILTER clause. It’s shorter, cleaner, and noticeably faster. See:

SELECT *
FROM  (
   SELECT department
        , count(*) FILTER (WHERE enrolled AND project = 'ARQ')     AS arq_e
        , count(*) FILTER (WHERE enrolled AND project = 'TC')      AS tc_e
        , count(*) FILTER (WHERE enrolled)                         AS total_enrolled
        , count(*) FILTER (WHERE NOT enrolled AND project = 'ARQ') AS arq_n
        , count(*) FILTER (WHERE NOT enrolled AND project = 'TC')  AS tc_n
        , count(*) FILTER (WHERE NOT enrolled)                     AS total_not_enrolled
        , count(*) AS total
   FROM   tableA a
   GROUP  BY 1
   ) a
LEFT JOIN  (  -- !
   SELECT department
        , sum(total_employees) AS total_employees
   FROM   tableB b
   GROUP  BY 1
   ) b USING (department);

enrolled should be a boolean column. Make it so if it isn’t. Then you can use it directly. Smaller, faster, cleaner, shorter code.

I replaced the [INNER] JOIN with a LEFT [OUTER] JOIN on a suspicion. Typically, you want to keep all results, even if the same department is not found in the other table. Maybe even a FULL [OUTER] JOIN?

Also, USING (department) as join condition conveniently outputs that column only once, so we can make do with SELECT * in the outer SELECT.

Finally, subqueries are shorter and faster than CTEs. Not much since Postgres 12, but still. See:

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement