I want to find all duplicates in a column and display the primary key so I can clean them up.
So after a bunch of tutorials I have this which is supposed to work and returns no rows which is great because it means there’s nothing I need to clean up. ‘equipment_id’ is the primary key on the table
SELECT
equipment_id, tent_id, COUNT(*)
FROM
equipment
GROUP BY
equipment_id, tent_id
HAVING
COUNT(*) > 1
But when I just look for the dupes like this without involving the primary key column
SELECT
tent_id, COUNT(*)
FROM
equipment
GROUP BY
tent_id
HAVING
COUNT(*) > 1
I get 900+ results, which is also fine but I need the primary key so I know which ones to edit.
TABLE equipment
ROWS
equipment_id tent type tent_id note
100 large 66666 out
101 small 66666 in
102 small 55555 in
103 large 44444 in
104 small 33333 out
The result I want will look like
100 66666
101 66666
Advertisement
Answer
If the first query be returning any records and the equipment_id
really be a primary key column, then it must be the case that a single tent_id
value is associated with multiple records/multiple equipment_id
values. To find this tent_id
values, try asserting the distinct count of the equipment_id
:
SELECT tent_id
FROM equipment
GROUP BY tent_id
HAVING COUNT(equipment_id) > 1;
To generate the output you want, you could try:
WITH cte AS (
SELECT equipment_id, tent_id, COUNT(*) OVER (PARTITION BY tent_id) cnt
FROM equipment
)
SELECT
equipment_id,
tent_id
FROM cte
WHERE cnt > 1;