This is a case of relational-division. I added the tag.
Indexes
Assuming a PK or UNIQUE constraint on USER_PROPERTY_MAP(property_value_id, user_id)
- columns in this order to make my queries fast. Related:
You should also have an index on PROPERTY_VALUE(value, property_name_id, id)
. Again, columns in this order. Add the the last column id
only if you get index-only scans out of it.
For a given number of properties
There are many ways to solve it. This should be one of the simplest and fastest for exactly two properties:
SELECT u.*
FROM users u
JOIN user_property_map up1 ON up1.user_id = u.id
JOIN user_property_map up2 USING (user_id)
WHERE up1.property_value_id =
(SELECT id FROM property_value WHERE property_name_id = 1 AND value = '101')
AND up2.property_value_id =
(SELECT id FROM property_value WHERE property_name_id = 2 AND value = '102')
-- AND u.user_name = 'user1' -- more filters?
-- AND u.city = 'city1'
Not visiting table PROPERTY_NAME
, since you seem to have resolved property names to IDs already, according to your example query. Else you could add a join to PROPERTY_NAME
in each subquery.
We have assembled an arsenal of techniques under this related question:
For an unknown number of properties
@Mike and @Valera have very useful queries in their respective answers. To make this even more dynamic:
WITH input(property_name_id, value) AS (
VALUES -- provide n rows with input parameters here
(1, '101')
, (2, '102')
-- more?
)
SELECT *
FROM users u
JOIN (
SELECT up.user_id AS id
FROM input
JOIN property_value pv USING (property_name_id, value)
JOIN user_property_map up ON up.property_value_id = pv.id
GROUP BY 1
HAVING count(*) = (SELECT count(*) FROM input)
) sub USING (id);
Only add / remove rows from the VALUES
expression. Or remove the WITH
clause and the JOIN
for no property filters at all.
The problem with this class of queries (counting all partial matches) is performance. My first query is less dynamic, but typically considerably faster. (Just test with EXPLAIN ANALYZE
.) Especially for bigger tables and a growing number of properties.
Best of both worlds?
This solution with a recursive CTE should be a good compromise: fast and dynamic:
WITH RECURSIVE input AS (
SELECT count(*) OVER () AS ct
, row_number() OVER () AS rn
, *
FROM (
VALUES -- provide n rows with input parameters here
(1, '101')
, (2, '102')
-- more?
) i (property_name_id, value)
)
, rcte AS (
SELECT i.ct, i.rn, up.user_id AS id
FROM input i
JOIN property_value pv USING (property_name_id, value)
JOIN user_property_map up ON up.property_value_id = pv.id
WHERE i.rn = 1
UNION ALL
SELECT i.ct, i.rn, up.user_id
FROM rcte r
JOIN input i ON i.rn = r.rn + 1
JOIN property_value pv USING (property_name_id, value)
JOIN user_property_map up ON up.property_value_id = pv.id
AND up.user_id = r.id
)
SELECT u.*
FROM rcte r
JOIN users u USING (id)
WHERE r.ct = r.rn; -- has all matches
dbfiddle here
The manual about recursive CTEs.
The added complexity does not pay for small tables where the additional overhead outweighs any benefit or the difference is negligible to begin with. But it scales much better and is increasingly superior to "counting" techniques with growing tables and a growing number of property filters.
Counting techniques have to visit all rows in user_property_map
for all given property filters, while this query (as well as the 1st query) can eliminate irrelevant users early.
Optimizing performance
With current table statistics (reasonable settings, autovacuum
running), Postgres has knowledge about "most common values" in each column and will reorder joins in the 1st query to evaluate the most selective property filters first (or at least not the least selective ones). Up to a certain limit: join_collapse_limit
. Related:
This "deus-ex-machina" intervention is not possible with the 3rd query (recursive CTE). To help performance (possibly a lot) you have to place more selective filters first yourself. But even with the worst-case ordering it will still outperform counting queries.
Related:
Much more gory details:
More explanation in the manual: