I’m using PostgreSQL v.11. I have a table with 3 columns. My goal is to find redundancy inside data.
First of all, I do a simple GROUP BY:
SELECT client, block, "date" FROM lines GROUP BY client, block, "date" ORDER BY client, block
The result looks like this:
1 | P10001 | 2020-01-01 1 | P10002 | 2020-04-17 1 | P10002 | 2020-05-04 1 | P10003 | 2020-05-05
Now I would like to identify rows that have the same “block”, but different “date”. In this example, this is the case for row 2 and 3 (block = P10002)
For now, I use javascript to parse the complete resultset and find these 2 redundant rows (I use result.reduce(...)
)
But is there a way, in pure SQL, to extract this “rereduced” result?
The expected result is this:
1 | P10002 | 2020-04-17 1 | P10002 | 2020-05-04
It should be great to keep the two dates in the resultset, because I need to change the “block” for only one of them. If I don’t have the date column in the resultset, I will need to do a second request to find all the affected rows.
Here is a script for the table and data
CREATE TABLE lines ( "client" integer NOT NULL, "block" text NOT NULL, "date" date NOT NULL ); INSERT INTO lines ("client", "block", "date") VALUES (1, 'P10001', '2020-01-01'), (1, 'P10002', '2020-04-17'), (1, 'P10002', '2020-05-04'), (1, 'P10003', '2020-05-05');
Many thanks
Advertisement
Answer
You can do it with EXISTS
:
select l.* from lines l where exists ( select 1 from lines where client = l.client and block = l.block and date <> l.date )
If there is no case of duplicate dates for each client, block
you could also use COUNT(*)
window function:
select client, block, date from ( select *, count(*) over (partition by client, block) counter from lines ) t where counter > 1
See the demo.
Results:
> client | block | date > -----: | :----- | :--------- > 1 | P10002 | 2020-04-17 > 1 | P10002 | 2020-05-04