Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.0k views
in Technique[技术] by (71.8m points)

mysql - Remove duplicate rows leaving oldest row Only?

I have a table of data and there are many duplicate entries from user submissions.

I want to delete all duplicates rows based on the field subscriberEmail, leaving only the original submission.

In other words, I want to search for all duplicate emails, and delete those rows, leaving only the original.

How can I do this without swapping tables?
My table contains unique IDs for each row.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since you're using the id column as an indicator of which record is 'original':

delete x 
from myTable x
 join myTable z on x.subscriberEmail = z.subscriberEmail
where x.id > z.id

This will leave one record per email address.

edit to add:

To explain the query above...

The idea here is to join the table against itself. Pretend that you have two copies of the table, each named something different. Then you could compare them to each other, and find the lowest id or for each email address. You'd then see the duplicate records that were created later on and could delete them. (I was visualizing Excel when thinking about this.)

In order to do that operation on a table, compare it to itself and be able to identify each side, you use table aliases. x is a table alias. It is assigned in the from clause like so: from <table> <alias>. x can now be used elsewhere in the same query to refer to that table as a shortcut.

delete x starts the query off with our action and target. We're going to perform a query to select records from multiple tables, and we want to delete records that appear in x.

Aliases are used to refer to both 'instances' of the table. from myTable x join myTable z on x.subscriberEmail = z.subscriberEmail bumps the table up against itself where the emails match. Without the where clause that follows, every record would be selected as it could be joined up against itself.

The where clause limits the records that are selected. where x.id > z.id allows the 'instance' aliased x to contain only the records that match emails but have a higher id value. The data that you really want in the table, unique email addresses (with the lowest id) will not be part of x and will not be deleted. The only records in x will be duplicate records (email addresses) that have a higher id than the original record for that email address.

The join and where clauses could be combined in this case:

delete x 
  from myTable x 
  join myTable z
    on x.subscriberEmail = z.subscriberEmail
      and x.id > z.id

For preventing duplicates, consider making the subscriberEmail column a UNIQUE indexed column.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...