Skip to content
Advertisement

Deleting duplicate rows from sqlite database

I have a huge table – 36 million rows – in SQLite3. In this very large table, there are two columns:

  • hash – text
  • d – real

Some of the rows are duplicates. That is, both hash and d have the same values. If two hashes are identical, then so are the values of d. However, two identical d‘s does not imply two identical hash‘es.

I want to delete the duplicate rows. I don’t have a primary key column.

What’s the fastest way to do this?

Advertisement

Answer

You need a way to distinguish the rows. Based on your comment, you could use the special rowid column for that.

To delete duplicates by keeping the lowest rowid per (hash,d):

delete   from YourTable
where    rowid not in
         (
         select  min(rowid)
         from    YourTable
         group by
                 hash
         ,       d
         )
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement