A single disk seek is about 15ms, maybe a little less with server grade disks. A response time of less than 500ms limits you to about 30 random disk accesses. That is not a lot.
On my tiny laptop, I have a development database with
root@localhost [kris]> select @@innodb_buffer_pool_size/1024/1024 as pool_mb;
+--------------+
| pool_mb |
+--------------+
| 128.00000000 |
+--------------+
1 row in set (0.00 sec)
and a slow laptop disk. I created a score table with
root@localhost [kris]> show create table scoreG
*************************** 1. row ***************************
Table: score
Create Table: CREATE TABLE `score` (
`player_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`score` int(11) NOT NULL,
PRIMARY KEY (`player_id`),
KEY `score` (`score`)
) ENGINE=InnoDB AUTO_INCREMENT=2490316 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
with random integer scores and sequential player_id values. We have
root@localhost [kris]> select count(*)/1000/1000 as mrows from scoreG
*************************** 1. row ***************************
mrows: 2.09715200
1 row in set (0.39 sec)
The database maintains the pair (score, player_id)
in score
order in the index score
, as data in an InnoDB index is stored in a BTREE, and the row pointer (data pointer) is the primary key value, so that the definition KEY (score)
ends up being KEY(score, player_id)
internally. We can prove that by looking at the query plan for a score retrieval:
root@localhost [kris]> explain select * from score where score = 17G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: score
type: ref
possible_keys: score
key: score
key_len: 4
ref: const
rows: 29
Extra: Using index
1 row in set (0.00 sec)
As you can see, the key: score
is being used with Using index
, meaning that no data access is necessary.
The ranking query for a given constant player_id
takes precisely 500ms on my laptop:
root@localhost [kris]> select p.*, count(*) as rank
from score as p join score as s on p.score < s.score
where p.player_id = 479269G
*************************** 1. row ***************************
player_id: 479269
score: 99901
rank: 2074
1 row in set (0.50 sec)
With more memory and on a faster box it can be quicker, but it is still a comparatively expensive operation, because the plan sucks:
root@localhost [kris]> explain select p.*, count(*) as rank from score as p join score as s on p.score < s.score where p.player_id = 479269;
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
| 1 | SIMPLE | p | const | PRIMARY,score | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | s | index | score | score | 4 | NULL | 2097979 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+-------+---------+--------------------------+
2 rows in set (0.00 sec)
As you can see, the second table in the plan is an index scan, so the query slows down linearly with the number of players.
If you want a full leaderboard, you need to leave off the where clause, and then you get two scans and quadratic execution times. So this plan implodes completely.
Time to go procedural here:
root@localhost [kris]> set @count = 0;
select *, @count := @count + 1 as rank from score where score >= 99901 order by score desc ;
...
| 2353218 | 99901 | 2075 |
| 2279992 | 99901 | 2076 |
| 2264334 | 99901 | 2077 |
| 2239927 | 99901 | 2078 |
| 2158161 | 99901 | 2079 |
| 2076159 | 99901 | 2080 |
| 2027538 | 99901 | 2081 |
| 1908971 | 99901 | 2082 |
| 1887127 | 99901 | 2083 |
| 1848119 | 99901 | 2084 |
| 1692727 | 99901 | 2085 |
| 1658223 | 99901 | 2086 |
| 1581427 | 99901 | 2087 |
| 1469315 | 99901 | 2088 |
| 1466122 | 99901 | 2089 |
| 1387171 | 99901 | 2090 |
| 1286378 | 99901 | 2091 |
| 666050 | 99901 | 2092 |
| 633419 | 99901 | 2093 |
| 479269 | 99901 | 2094 |
| 329168 | 99901 | 2095 |
| 299189 | 99901 | 2096 |
| 290436 | 99901 | 2097 |
...
Because this is a procedural plan, it is unstable:
- You cannot use LIMIT, because that will offset the counter. Instead you have to download all this data.
- You cannot really sort. This
ORDER BY
clause works, because it does not sort, but uses an index. As soon as you see using filesort
, the counter values will be wildly off.
It is the solution that comes closest to what a NoSQL (read: procedural) database will do as an execution plan, though.
We can stabilize the NoSQL inside a subquery and then slice out the part that is of interest to us, though:
root@localhost [kris]> set @count = 0;
select * from (
select *, @count := @count + 1 as rank
from score
where score >= 99901
order by score desc
) as t
where player_id = 479269;
Query OK, 0 rows affected (0.00 sec)
+-----------+-------+------+
| player_id | score | rank |
+-----------+-------+------+
| 479269 | 99901 | 2094 |
+-----------+-------+------+
1 row in set (0.00 sec)
root@localhost [kris]> set @count = 0;
select * from (
select *, @count := @count + 1 as rank
from score
where score >= 99901
order by score desc
) as t
where rank between 2090 and 2100;
Query OK, 0 rows affected (0.00 sec)
+-----------+-------+------+
| player_id | score | rank |
+-----------+-------+------+
| 1387171 | 99901 | 2090 |
| 1286378 | 99901 | 2091 |
| 666050 | 99901 | 2092 |
| 633419 | 99901 | 2093 |
| 479269 | 99901 | 2094 |
| 329168 | 99901 | 2095 |
| 299189 | 99901 | 2096 |
| 290436 | 99901 | 2097 |
+-----------+-------+------+
8 rows in set (0.01 sec)
The subquery will materialize the former result set as an ad-hoc table named t, which we then can access in the outer query. Because it is an ad-hoc table, in MySQL it will have no index. This limits what is possible efficiently in the outer query.
Note how both queries satisfy your timing constraint, though. Here is the plan:
root@localhost [kris]> set @count = 0; explain select * from ( select *, @count := @count + 1 as rank from score where score >= 99901 order by score desc ) as t where rank between 2090 and 2100G
Query OK, 0 rows affected (0.00 sec)
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2097
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: score
type: range
possible_keys: score
key: score
key_len: 4
ref: NULL
rows: 3750
Extra: Using where; Using index
2 rows in set (0.00 sec)
Both query components (the inner, DERIVED
query and the outer BETWEEN
constraint) will get slower for badly ranked players, though, and then grossly violate your timing constraints.
root@localhost [kris]> set @count = 0; select * from ( select *, @count := @count + 1 as rank from score where score >= 0 order by score desc ) as t;
...
2097152 rows in set (3.56 sec)
The execution time for the descriptive approach is stable (dependent only on table size):
root@localhost [kris]> select p.*, count(*) as rank
from score as p join score as s on p.score < s.score
where p.player_id = 1134026;
+-----------+-------+---------+
| player_id | score | rank |
+-----------+-------+---------+
| 1134026 | 0 | 2097135 |
+-----------+-------+---------+
1 row in set (0.53 sec)
Your call.