Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
805 views
in Technique[技术] by (71.8m points)

postgresql - SQL query for index/primary key ordinal

In our on-line contest system, there is a frequently changing table standings with integer columns (user_id, score). Both are indexed with a unique constraint. Two kinds of queries are required:

  1. Given a score not in the table, return the 1-based position that the score would occupy if it were inserted.
  2. Given a user_id in the table, return the position of the corresponding score.

In both cases, position is with respect to score ascending: a new score smaller than all currently in the table will have position 1.

Here's the tough part: we probably can't afford a table scan. The table may have up to 10 million records, and we need to handle at least 40 queries per second.

How to do this in PostgreSQL?

I have a non-SQL solution in Berkeley DB that uses its logical record number-enabled B-trees. It easily has good enough performance. But we would like to get rid of the BDB by re-implementing with a PostgreSQL query. I have tried the obvious

select 1+count(*) from standings where score < ? limit 1;

This causes a table scan.

I expect the answer to be "no way" because the logical record number facility of BDB requires locking the entire B-Tree for each edit. To get O(log N) performance, it relies on leaf counts in each node. All these counts in the path to root must change with every edit; hence, the locking. Such locking is against the design principles of PostgreSQL and probably any multi-user database.

So if the problem can't be solved with PostgreSQL, confirmation of this is the next best result of this question.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With a regular table, there is not much you can do in PostgreSQL 9.1. count() results in a table scan, because indexes do not have visibility information. To verify the rows are not deleted in the meantime, PostgreSQL has to visit the table.

If the table is read-only (or rarely updated), you could add a row number to the table. Then a query like:

SELECT rownumber+1
FROM   standings
WHERE  score < ?
ORDER  BY score DESC
LIMIT  1;

With an index:

CREATE INDEX standings_score_idx ON standings (score DESC);

Would get the result almost instantly. However, that's not an option for a table with write load for obvious reasons. So not for you.


The good news: one of the major new features of the upcoming PostgreSQL 9.2 is just right for you: "Covering index" or "index-only scan". I quote the 9.2 release notes here:

Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)

This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visible tuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementing this feature.

This blog post by Robert Haas has more details how this affects count performance. It helps performance even with a WHERE clause, like in your case.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...