Skip to content
Advertisement

The difference between two values from different rows grouping by column

Let’s say there is a table jd_log:

job_name | duration
-------------------
A        | 10
B        | 2
B        | 3
C        | 15
A        | 12
D        | 100
C        | 9

What im trying to do, is to find the difference between 2 durations (as duration_diff) for each unique job_name. It is guaranteed, that for each unique job_name there will be no more than 2 entries in the table with such job_name. If there is only 1 entry for specific job_name, duration_diff should be null.

So the desired resultset should looks like following:

job_name | duration_diff
------------------------
A        | -2
B        | -1
C        | 6
D        | null

A: 10 – 12 = -2

B: 2 – 3 = -1

C: 15 – 9 = 6

D: null (as there is only 1 entry with such job_name)

The solution I came up with looks like this:

WITH unique_job_name AS (SELECT DISTINCT job_name FROM jd_log)
SELECT ujn.job_name, duration_diff
FROM unique_job_name ujn
LEFT JOIN (
  SELECT *
  FROM (
    SELECT job_name, (LAG(duration) OVER (PARTITION BY job_name) - duration) AS duration_diff
    FROM jd_log
  )
  WHERE duration_diff IS NOT NULL
) jndd
ON ujn.job_name = jndd.job_name

It seems to me that my solution looks overcomplicated and, perhaps, not entirely correct.

So I would like to know, if there is no simpler solution for such a problem? Thank you in advance.

Advertisement

Answer

Your question cannot be answered really unless there exist one or more columns which provides the ordering of the duration value. Assuming say such an id column exist, your data might look like this:

id | job_name | duration
------------------------
1  | A        | 10
2  | B        | 2
3  | B        | 3
4  | C        | 15
5  | A        | 12
6  | D        | 100
7  | C        | 9

Then we could write a solution to your question using ROW_NUMBER:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY job_name ORDER BY id) rn
    FROM yourTable
)

SELECT
    job_name,
    CASE WHEN MAX(rn) > 1 THEN
         SUM(CASE rn WHEN 1 THEN duration WHEN 2 THEN -duration END) END AS duration_diff
FROM cte
GROUP BY
    job_name;

screen capture from demo link below

Demo

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement