date | window | points | actual_bool | previous_bool | creation_time | source ------------+---------+---------+---------------------+---------------------------------+----------------------------+-------- 2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:20:57.51966 | bldgh 2021-02-11 | 150 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | fiata 2021-02-11 | 110 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | nfiws 2021-02-11 | 150 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | fiata 2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:20:57.51966 | bldgh 2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:22:22.969014 | asdg1 2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:22:22.969014 | j 2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:22:22.969014 | aba 2021-02-11 | 110 | 0.5 | 0 | 1 | 2021-02-14 09:22:22.969014 | fg 2021-02-11 | 110 | 0.6 | 1 | 0 | 2021-02-14 09:22:22.969014 | wdda 2021-02-11 | 110 | 0.7 | 1 | 1 | 2021-02-14 09:23:21.977685 | dda 2021-02-11 | 110 | 0.5 | 1 | 0 | 2021-02-14 09:23:21.977685 | dd 2021-02-11 | 110 | 0.6 | 1 | 1 | 2021-02-14 09:23:21.977685 | so 2021-02-11 | 110 | 0.5 | 1 | 1 | 2021-02-14 09:23:21.977685 | dar 2021-02-11 | 110 | 0.6 | 1 | 1 | 2021-02-14 09:23:21.977685 | firr 2021-02-11 | 110 | 0.8 | 1 | 1 | 2021-02-14 09:24:15.831411 | xim 2021-02-11 | 110 | 0.8 | 1 | 1 | 2021-02-14 09:24:15.831411 | cxyy 2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:24:15.831411 | bisd 2021-02-11 | 110 | 0.1 | 0 | 1 | 2021-02-14 09:24:15.831411 | cope 2021-02-11 | 110 | 0.2 | 0 | 1 | 2021-02-14 09:24:15.831411 | sand ...
I have the following dataset in a postgresql table called testtable in testdb.
I have accidentally copied over the database and duplicated rows.
How can I delete the duplicates?
Row 1 and row 5 are copies in this frame and row 2 and row 4 are copies too.
I have never used sql before to drop duplicates I have no idea where to start.
I tried
select creation_time, count(creation_time) from classification group by creation_time having count (creation_time)>1 order by source;
But all it did was show me howmany duplicates I had in each day,
Like this
creation_time | count ----------------------------+------- 2021-02-14 09:20:57.51966 | 10 2021-02-14 09:22:22.969014 | 10 2021-02-14 09:23:21.977685 | 10 2021-02-14 09:24:15.831411 | 10 2021-02-14 09:24:27.733763 | 10 2021-02-14 09:24:38.41793 | 10 2021-02-14 09:27:04.432466 | 10 2021-02-14 09:27:21.62256 | 10 2021-02-14 09:27:22.677763 | 10 2021-02-14 09:27:37.996054 | 10 2021-02-14 09:28:09.275041 | 10 2021-02-14 09:28:22.649391 | 10 ...
There should only be 5 unique records in each creation_timestamp.
It doesnt show me the duplicates and even if i did it would have no idea how to drop them.
Advertisement
Answer
That is a lot of rows to delete. I would suggest just recreating the table:
create table new_classification as select distinct c.* from classification c;
After you have validated the data, you can reload it if you really want:
truncate table classification; insert into classification select * from new_classification;
This process should be much faster than deleting 90% of the rows.