SQL GROUP BY and kind of rereduce afterwards

I’m using PostgreSQL v.11. I have a table with 3 columns. My goal is to find redundancy inside data.

First of all, I do a simple GROUP BY:

SELECT client, block, "date"
FROM lines
GROUP BY client, block, "date"
ORDER BY client, block

​x
 
SELECT client, block, "date"FROM linesGROUP BY client, block, "date"ORDER BY client, block​

The result looks like this:

1 | P10001 | 2020-01-01
1 | P10002 | 2020-04-17
1 | P10002 | 2020-05-04
1 | P10003 | 2020-05-05

 
1 | P10001 | 2020-01-011 | P10002 | 2020-04-171 | P10002 | 2020-05-041 | P10003 | 2020-05-05​

Now I would like to identify rows that have the same “block”, but different “date”. In this example, this is the case for row 2 and 3 (block = P10002)

For now, I use javascript to parse the complete resultset and find these 2 redundant rows (I use result.reduce(...))

But is there a way, in pure SQL, to extract this “rereduced” result?

The expected result is this:

1 | P10002 | 2020-04-17
1 | P10002 | 2020-05-04

 
1 | P10002 | 2020-04-171 | P10002 | 2020-05-04​

It should be great to keep the two dates in the resultset, because I need to change the “block” for only one of them. If I don’t have the date column in the resultset, I will need to do a second request to find all the affected rows.

Here is a script for the table and data

CREATE TABLE lines (
  "client" integer NOT NULL,
  "block" text NOT NULL,
  "date" date NOT NULL
);

INSERT INTO lines ("client", "block", "date") VALUES
  (1, 'P10001', '2020-01-01'),
  (1, 'P10002', '2020-04-17'),
  (1, 'P10002', '2020-05-04'),
  (1, 'P10003', '2020-05-05');

 
CREATE TABLE lines (  "client" integer NOT NULL,  "block" text NOT NULL,  "date" date NOT NULL);​INSERT INTO lines ("client", "block", "date") VALUES  (1, 'P10001', '2020-01-01'),  (1, 'P10002', '2020-04-17'),  (1, 'P10002', '2020-05-04'),  (1, 'P10003', '2020-05-05');​

Many thanks

Answer

You can do it with EXISTS:

select l.* from lines l
where exists (
  select 1 from lines
  where client = l.client and block = l.block and date <> l.date
)

 
select l.* from lines lwhere exists (  select 1 from lines  where client = l.client and block = l.block and date <> l.date)​

If there is no case of duplicate dates for each client, block you could also use COUNT(*) window function:

select client, block, date
from (
  select *, count(*) over (partition by client, block) counter
  from lines
) t
where counter > 1

 
select client, block, datefrom (  select *, count(*) over (partition by client, block) counter  from lines) twhere counter > 1​

See the demo.
Results:

> client | block  | date      
> -----: | :----- | :---------
>      1 | P10002 | 2020-04-17
>      1 | P10002 | 2020-05-04

 
> client | block  | date      > -----: | :----- | :--------->      1 | P10002 | 2020-04-17>      1 | P10002 | 2020-05-04​

Advertisement

Answer