I have a simple table in Postgres with a bit over 8 million rows. The column of interest holds short text strings, typically one or more words total length less than 100 characters. It is set as 'character varying (100)'. The column is indexed. A simple look up like below takes > 3000 ms.
SELECT a, b, c FROM t WHERE a LIKE '?%'
Yes, for now, the need is to simply find the rows where "a" starts with the entered text. I want to bring the speed of look up down to under 100 ms (the appearance of instantaneous). Suggestions? Seems to me that full text search won't help here as my column of text is too short, but I would be happy to try that if worthwhile.
Oh, btw I also loaded the exact same data in mongodb and indexed column "a". Loading the data in mongodb was amazingly quick (mongodb++). Both mongodb and Postgres are pretty much instantaneous when doing exact lookups. But, Postgres actually shines when doing trailing wildcard searches as above, consistently taking about 1/3 as long as mongodb. I would be happy to pursue mongodb if I could speed that up as this is only a readonly operation.
Update: First, a couple of EXPLAIN ANALYZE
outputs
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE a LIKE 'abcd%'
"Seq Scan on t (cost=0.00..282075.55 rows=802 width=40)
(actual time=1220.132..1220.132 rows=0 loops=1)"
" Filter: ((a)::text ~~ 'abcd%'::text)"
"Total runtime: 1220.153 ms"
I actually want to compare Lower(a)
with the search term which is always at least 4 characters long, so
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE Lower(a) LIKE 'abcd%'
"Seq Scan on t (cost=0.00..302680.04 rows=40612 width=40)
(actual time=4.681..3321.387 rows=788 loops=1)"
" Filter: (lower((a)::text) ~~ 'abcd%'::text)"
"Total runtime: 3321.504 ms"
So I created an index
CREATE INDEX idx_t ON t USING btree (Lower(Substring(a, 1, 4) ));
"Seq Scan on t (cost=0.00..302680.04 rows=40612 width=40)
(actual time=3243.841..3243.841 rows=0 loops=1)"
" Filter: (lower((a)::text) = 'abcd%'::text)"
"Total runtime: 3243.860 ms"
Seems the only time an index is being used is when I am looking for an exact match
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE a = 'abcd'
"Index Scan using idx_t on geonames (cost=0.00..57.89 rows=13 width=40)
(actual time=40.831..40.923 rows=17 loops=1)"
" Index Cond: ((ascii_name)::text = 'Abcd'::text)"
"Total runtime: 40.940 ms"
Found a solution by implementing an index with varchar_pattern_ops
, and am now looking for an even quicker lookups.
See Question&Answers more detail:
os