Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
842 views
in Technique[技术] by (71.8m points)

mysql - Delete huge amounts of data from huge table

I have two tables. Let's call them KEY and VALUE.
KEY is small, somewhere around 1.000.000 records.
VALUE is huge, say 1.000.000.000 records.

Between them there is a connection such that each KEY might have many VALUES. It's not a foreign key but basically the same meaning.

The DDL looks like this

create table KEY (
 key_id int,
 primary key (key_id)
);

create table VALUE (
 key_id int,
 value_id int,
 primary key (key_id, value_id)
);

Now, my problem. About half of all key_ids in VALUE have been deleted from KEY and I need to delete them in a orderly fashion while both tables are still under high load.

It would be easy to do

delete v 
  from VALUE v
  left join KEY k using (key_id)
 where k.key_id is null;

However, as it's not allowed to have a limit on multi table delete I don't like this approach. Such a delete would take hours to run and that makes it impossible to throttle the deletes.

Another approach is to create cursor to find all missing key_ids and delete them one by one with a limit. That seems very slow and kind of backwards.

Are there any other options? Some nice tricks that could help?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Any solution that tries to delete so much data in one transaction is going to overwhelm the rollback segment and cause a lot of performance problems.

A good tool to help is pt-archiver. It performs incremental operations on moderate-sized batches of rows, as efficiently as possible. pt-archiver can copy, move, or delete rows depending on options.

The documentation includes an example of deleting orphaned rows, which is exactly your scenario:

pt-archiver --source h=host,D=db,t=VALUE --purge 
  --where 'NOT EXISTS(SELECT * FROM `KEY` WHERE key_id=`VALUE`.key_id)' 
  --limit 1000 --commit-each

Executing this will take significantly longer to delete the data, but it won't use too many resources, and without interrupting service on your existing database. I have used it successfully to purge hundreds of millions of rows of outdated data.

pt-archiver is part of the Percona Toolkit for MySQL, a free (GPL) set of scripts that help common tasks with MySQL and compatible databases.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...