Skip to content
Advertisement

Postgresql – access/use joined subquery result in another joined subquery

I have a database with tables for

  • equipment we service (table e, field e_id)
  • contracts on the equipment (table c, fields c_id, e_id, c_start, c_end)
  • maintenance we have performed in the past (table m, e_id, m_id, m_date)

I am trying to build a query that will show me all equipment records, if it is currently in contract with the start/end date, and a count of any maintenance performed since the start date of the contract.

I have a subquery to get the current contract (this table is large and has a new line for each contract revision), but I can’t work out how to use the result of the contract subquery to return the maintenance visits since that date without returning multiple lines.

select
  e.e_id,
  c2.c_id,
  c2.c_start,
  c2.c_end,
  m2.count
from e
left join (
  select
  c_id,
  c_start,
  c_end,
  e_id
  ...other things and filtering by joining the table to itself
  from c
) as c2 on c2.e_id = e.e_id

I would also like to be able to add this
m-subquery v1

left join (
  select
  count(*),
  e_id
  from m
  where m.m_date >= c2.start
) as m2 on m2.e_id = e.e_id

But I’m unable to access c2.C_start from within the second subquery.

I am able to return this table by joining outside the subquery, but this returns multiple lines.
m-subquery v2

left join (
  select
  e_id,
  m_date,
  from m
) as m2 on m2.e_id = e.e_id and m.m_date >= c2.start

Is there a way to:

  • Get the subquery field c2.start into the m-subquery v1?
  • Aggregate the result of the m-subquery v2 without using group by (there are a lot of columns in the main select query)?
  • Do this differently?

I’ve seen lateral which I kind of think might be what I need but I have tried the keyword in front of both subqueries individually and together and it didn’t work to let me use c2.c_start inside at any point.

I am a little averse to using group by, mainly as the BI analyst at work says “slap a group by on it” when there are duplicates in reports rather than trying to understand the business process/database properly. I feel like having a group by on the main query shouldn’t be needed when I know for certain that the e table has one record per e_id, and the mess that having probably 59 out of 60 columns named in the group by would cause might make the query less maintainable.

Thanks, Sam

Advertisement

Answer

Since not all RDBMS support lateral, I would like to present you the following general solution. You can make use of CTEs (WITH queries) to help structuring the query and reuse partial results. E.g. in the following code, you can think of current_contracts as a kind of virtual table existing only during query execution.

Part 1: DDLs and test data

DROP TABLE IF EXISTS e;
CREATE TABLE e
(
  e_id INTEGER
);

DROP TABLE IF EXISTS c;
CREATE TABLE c
(
  c_id INTEGER,
  e_id INTEGER,
  c_start DATE,
  c_end DATE
);

DROP TABLE IF EXISTS m;
CREATE TABLE m
(
  e_id INTEGER,
  m_id INTEGER,
  m_date DATE
);

INSERT INTO e VALUES (101),(102),(103);
INSERT INTO c VALUES (201, 101, DATE '2021-01-01', DATE '2021-12-31'), (202, 102, DATE '2021-03-01', DATE '2021-04-15'), (203, 102, DATE '2021-04-16', DATE '2021-04-30'), (204, 103, DATE '2003-01-01', DATE '2003-12-31'), (205, 103, DATE '2021-04-01', DATE '2021-04-30');
INSERT INTO m VALUES (101, 301, DATE '2021-01-01'), (101, 302, DATE '2021-02-01'), (101, 303, DATE '2021-03-01'), (102, 304, DATE '2021-04-02'), (102, 305, DATE '2021-04-03'), (103, 306, DATE '2021-04-03');

Part 2: the actual query

WITH
-- find currently active contracts per equipment:
-- we assume there is 0 or 1 contract active per equipment at any time
current_contracts AS
(
  SELECT *
  FROM c
  WHERE c.c_start <= CURRENT_DATE  -- only active contracts
    AND c.c_end   >= CURRENT_DATE  -- only active contracts
),

-- count maintenance visits during the (single) active contract per equipment, if any:
current_maintenance AS
(
  SELECT m.e_id, COUNT(*) AS count_m_per_e  -- a count of maintenance visits per equipment
  FROM m
  INNER JOIN current_contracts cc
     ON cc.e_id = m.e_id        -- match maintenance to current contracts via equipment
    AND cc.c_start <= m.m_date  -- only maintenance that was done during the current contract
  GROUP BY m.e_id
)

-- bring the parts together for our result:
-- we start with equipment and use LEFT JOINs to assure we retain all equipment
SELECT 
  e.*, 
  cc.c_start, cc.c_end, 
  CASE WHEN cc.e_id IS NOT NULL THEN 'yes' ELSE 'no' END AS has_contract,
  COALESCE(cm.count_m_per_e, 0)  -- to replace NULL when no contract is active
FROM e
LEFT JOIN current_contracts cc
   ON cc.e_id = e.e_id
LEFT JOIN current_maintenance cm
   ON cm.e_id = e.e_id
ORDER BY e.e_id;

Please note that your real pre-processing logic for contracts and maintenance visits may be more complex, e.g. due to overlapping periods of active contracts per equipment.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement