Tuesday 23 January, 2007

Dear friends,

It is much complex task to delete the duplicated data from the database. We generall work in many steps to delete all these including creating temparory table and so on.

This is the single statement which will work for you.

All you need to do is compare the table to itself to find out which candidates are duplicates. Do this by assigning aliases to the table so you can use it twice, once as A and again as B, like this:

delete from jobs where job_desc in
( select a.job_desc from jobs a, jobs b where a.job_desc = b.job_desc
group by a.job_desc
having count(a.job_desc) >1
)


When you do this you’ll get a count based on the column value you think is duplicated. I used “desc” because the IDs will be different, so the description is the thing that is the candidate for repetition. Join the table to itself on that candidate to find matches of it. Everything will match to itself at least once that’s why you group by the thing you think is a duplicate. Applying the HAVING clause to it squeezes out all the “ones” or singletons, leaving only the rows that have counts that are more than one in other words, your duplicate rows.

By the way, this code trashes all the records that are duplicates. If you want to save one, add a comparison for the IDs to be different in the WHERE clause.

Hits4Pay