In our on-line contest system, there is a frequently changing table standings
with integer columns (user_id, score)
. Both are indexed with a unique constraint. Two kinds of queries are required:
- Given a
score
not in the table, return the 1-based position that the score would occupy if it were inserted.
- Given a
user_id
in the table, return the position of the corresponding score.
In both cases, position is with respect to score ascending: a new score smaller than all currently in the table will have position 1.
Here's the tough part: we probably can't afford a table scan. The table may have up to 10 million records, and we need to handle at least 40 queries per second.
How to do this in PostgreSQL?
I have a non-SQL solution in Berkeley DB that uses its logical record number-enabled B-trees. It easily has good enough performance. But we would like to get rid of the BDB by re-implementing with a PostgreSQL query. I have tried the obvious
select 1+count(*) from standings where score < ? limit 1;
This causes a table scan.
I expect the answer to be "no way" because the logical record number facility of BDB requires locking the entire B-Tree for each edit. To get O(log N) performance, it relies on leaf counts in each node. All these counts in the path to root must change with every edit; hence, the locking. Such locking is against the design principles of PostgreSQL and probably any multi-user database.
So if the problem can't be solved with PostgreSQL, confirmation of this is the next best result of this question.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…