Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
563 views
in Technique[技术] by (71.8m points)

android - SQLite: efficient way to drop lots of rows

SQlite, Android, true story. I have a table, which I use as a cache:

CREATE TABLE cache(key TEXT, ts TIMESTAMP, size INTEGER, data BLOB);
CREATE UNIQUE INDEX by_key ON cache(key);
CREATE INDEX by_ts ON cache(ts);

During app lifetime I fill the cache and at some point I want to clear it out and drop N records. Typically this table will contain ~25000 blobs ~100-500Kb each, total blobs size in the DB is 600-800Mb, but now I test for ~2000 which are about 60Mb (following numbers are for this case). Clear removes 90% of cache entries.

I tried different ways to do it, here brief description:

[1] Worst and simplest. First select, than remove one by one, walking cursor. Terribly slow.

[2] Make SQLite to do it with query (delete blobs with totally N bytes in them):

DELETE FROM blobs WHERE
  ROWID IN (SELECT ROWID FROM blobs WHERE 
             (SELECT SUM(size) FROM blobs AS _ WHERE ts <= blobs.ts) <= N);

This is faster, but still terribly slow: ~15 sec. Seems also it like it has quadratic complexity.

[3] Select row around where to remove (using average blob size for computations) and delete with simple WHERE clause:

-- Find row after which to delete, let it's time stamp is T0:
SELECT ts FROM cache ORDER BY ts LIMIT 1 OFFSET count;
-- Delete
DELETE FROM cache WHERE ts < T0;

This is much better, but takes ~7 sec.

[4] Create new table, copy that I need to save and drop old one. Note, that I create index in the new table AFTER I copied all this stuff:

  -- Insert only rows I want leave
  INSERT INTO temp(key, ts, size, data) SELECT key, ts, size, data 
    FROM cache ORDER BY ts LIMIT count;
  -- Drop table and indices.
  DROP INDEX by_key;
  DROP INDEX by_ts;
  DROP TABLE cache;
  -- Rename temp table and create indices...

Copying takes ~300ms for 6Mb for blobs. But DROP TABLE is about ~8 sec.

Note in all cases I do VACUUM which takes another ~1 sec. How can I make it fast? Why DROP TABLE and deletion are so slow? I think it might be because of indices: when I dropped key index before DELETE it worked faster. How to make SQLite delete fast?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are working on a database with "big" data - ie, each blob using multiple pages.

At some point near optimal performance you will reach a limit you can't improve.

Checking all your choices, I see different behaviors, not just different algorithms.

[1] This one shouldn't be terrible slow as long as you use a transaction. You need two operations at once, query (to get blob size) and delete.

[2] This is a good approach. As two queries and a delete, all in a single command, so SQLite engine will optimize.

[3] This is a different behaviour from all before. Same as DELETE FROM cache WHERE ts < (SELECT ts FROM cache ORDER BY ts LIMIT 1 OFFSET count). Query is less expensive then previous, but I bet number of rows deleted are far less then previous one! Expensive part of query/delete will be delete! Query optimization is important, but things will always get slower in delete.

[4] This is a very bad approach!!! Copying all your data to a new table - maybe another database - will be VERY expensive. I only get one advantage from this: you may copy data to a new database and avoid VACUUM, as new database was build from base and it's clean.

About VACUUM... Worst then DELETE is VACUUM. Vacuum is not supposed to be used often in a database. I understand this algorithm is supposed to "clean" your database, but cleaning shouldn't be a frequent operation - databases are optimized for select/insert/delete/update - not to keep all data at a minimal size.

My choice would be using a DELETE ... IN (SELECT ...) single operation, according to predefined criteria. VACUUM wouldn't be used, at least not so often. One good choice would be monitor db size - when this size run over a limit, run a assumed expensive cleaning to trim database.

At last, when using multiple commands, never forget to use transactions!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...