Main Content

Removing duplicates from a MySQL table

Archive - Originally posted on "The Horse's Mouth" - 2010-02-22 11:26:39 - Graham Ellis

Surely there's got to be an easy way to remove duplicate records from a MySQL table? With a well designed table that has a unique primary key, you can simply get rid of a few duplicates, but if there are a lot of 'em, it's time consuming. You could write some sort of complex delete based on a select. You could write an application to do it (it wouldn't be more that a few lines of Perl, for example) but surely there's got to be an easy way!

A neat trick for which I take no credit ... thanks to a Paul Swarthout who gave it as an answer [here] on Database Journal.

1. Create a Unique Index on the column that should be unique but isn't, using the ignore keyword to suppress error messages ... it will zap the fault records instead.

2. Optionally (if you decide you didn't really want the index ;-) ) take the index away again.

Let's see an example of that:



OOOOOOoops!

alter ignore table demo add unique index dwarf_name (name(20))



Features of each Dwarf from Pub Quiz Help - it's not the sort of thing I know much about myself.

If you want to try this for yourself, the data may be found [here].

Aside - before you remove duplicates, be very careful to define what you actually want to do. Do you want to remove records with duplicates in a particular column (which is what this example does) or do you want to remove rows where the complete row (except for the unique ID field?) is identical?