Performance comparison
suffixtree
vs. pytrie
vs. trie
vs. datrie
vs. startswith
-functions
Setup
The recorded time is a minimum time among 3 repetitions of 1000 searches. A trie construction time is included and spread among all searches. The search is performed on collections of hostnames from 1 to 1000000 items.
Three types of a search string:
non_existent_key
- there is no match for the string
rare_key
- around 20 in a million
frequent_key
- number of occurrences is comparable to the collection size
Results
Maximum memory consumption for a million urls:
| function | memory, | ratio |
| | GiB | |
|-------------+---------+-------|
| suffix_tree | 0.853 | 1.0 |
| pytrie | 3.383 | 4.0 |
| trie | 3.803 | 4.5 |
| datrie | 0.194 | 0.2 |
| startswith | 0.069 | 0.1 |
#+TBLFM: $3=$2/@3$2;%.1f
To reproduce the results, run the trie benchmark code.
rare_key/nonexistent_key case
if number of urls is less than 10000 then datrie is the fastest, for
N>10000 - suffixtree
is faster, startwith
is significally slower on average.
axes:
- vertical (time) scale is ~1 second (2**20 microseconds)
- horizontal axis shows total number of urls in each case: N= 1, 10, 100, 1000, 10000, 100000, and 1000000 (a million).
frequent_key
Upto N=100000 datrie
is the fastest (for a million urls the time is
dominated by the trie construction time).
The most time is taken by finding the longest match among found matches. So all functions behave similar as expected.
startswith
- time performance is independent from type of key.
trie
and pytrie
behave similar to each other.
Performance without trie construction time
datrie
- the fastest, decent memory consumption
startswith
is even more at disadvantage here because other approaches are not penalized by the time it takes to build a trie.
datrie
, pytrie
, trie
- almost O(1) (constant time) for rare/non_existent key
Fitting (approximating) polynoms of known functions for comparison (same log/log scale as in figures):
| Fitting polynom | Function |
|------------------------------+-------------------|
| 0.15 log2(N) + 1.583 | log2(N) |
| 0.30 log2(N) + 3.167 | log2(N)*log2(N) |
| 0.50 log2(N) + 1.111e-15 | sqrt(N) |
| 0.80 log2(N) + 7.943e-16 | N**0.8 |
| 1.00 log2(N) + 2.223e-15 | N |
| 2.00 log2(N) + 4.446e-15 | N*N |
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…