Skip to content
Advertisement

Select rows based on duplicates in a column

I want to find all duplicates in a column and display the primary key so I can clean them up.

So after a bunch of tutorials I have this which is supposed to work and returns no rows which is great because it means there’s nothing I need to clean up. ‘equipment_id’ is the primary key on the table

SELECT
    equipment_id, tent_id, COUNT(*)
FROM
    equipment
GROUP BY
    equipment_id, tent_id
HAVING 
    COUNT(*) > 1

But when I just look for the dupes like this without involving the primary key column

SELECT
    tent_id, COUNT(*)
FROM
    equipment
GROUP BY
    tent_id
HAVING 
COUNT(*) > 1

I get 900+ results, which is also fine but I need the primary key so I know which ones to edit.

TABLE equipment

ROWS
equipment_id    tent type    tent_id    note
100             large        66666      out
101             small        66666      in
102             small        55555      in
103             large        44444      in
104             small        33333      out

The result I want will look like

100      66666
101      66666

Advertisement

Answer

If the first query be returning any records and the equipment_id really be a primary key column, then it must be the case that a single tent_id value is associated with multiple records/multiple equipment_id values. To find this tent_id values, try asserting the distinct count of the equipment_id:

SELECT tent_id
FROM equipment
GROUP BY tent_id
HAVING COUNT(equipment_id) > 1;

To generate the output you want, you could try:

WITH cte AS (
    SELECT equipment_id, tent_id, COUNT(*) OVER (PARTITION BY tent_id) cnt
    FROM equipment
)

SELECT
    equipment_id,
    tent_id
FROM cte
WHERE cnt > 1;
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement