Is there a way of removing near identical values from a table in mysql? My table has records more than 10K out of which one of the company looks like this:
id name 123 Vianet Group Inc 5214 Vianet Group, Inc.
on using describe tablename
I get this:
Field Type Null Key Default Extra id int NO PRI auto_increment name varchar(150) NO UNI
the names of the company are same however I would like to delete the second instance from table, thereby keeping just a single instance of the name in the table. This is just one instance and there are others like these.. Is there a swift way of removing identical values from the column? Please help.
Advertisement
Answer
You could try using soundex to find the “near identical” values –
SELECT * FROM tablename t1 INNER JOIN tablename t2 ON t1.id < t2.id AND SOUNDEX(t1.name) = SOUNDEX(t2.name)
You will need to test it with some of your example “near identical” values to see what it does and does not work for. As suggested by Akina you will probably need to go for some kind of normalisation process (stored function) or the Levenshtein distance function linked by Slava.