Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
236 views
in Technique[技术] by (71.8m points)

postgresql - Two SQL LEFT JOINS produce incorrect result

I have 3 tables:

users(id, account_balance)
grocery(user_id, date, amount_paid)
fishmarket(user_id, date, amount_paid)

Both fishmarket and grocery tables may have multiple occurrences for the same user_id with different dates and amounts paid or have nothing at all for any given user. When I try the following query:

SELECT
     t1."id" AS "User ID",
     t1.account_balance AS "Account Balance",
     count(t2.user_id) AS "# of grocery visits",
     count(t3.user_id) AS "# of fishmarket visits"
FROM users t1
LEFT OUTER JOIN grocery t2 ON (t2.user_id=t1."id") 
LEFT OUTER JOIN fishmarket t3 ON (t3.user_id=t1."id") 
GROUP BY t1.account_balance,t1.id
ORDER BY t1.id

It produces an incorrect results: "1", "12", "12".
But when I try to LEFT JOIN to just one table it produces a correct results for either grocery or fishmarket visits, which are "1", "3", "4".

What am I doing wrong here?
I am using PostgreSQL 9.1.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Joins are processed left to right (unless parentheses dictate otherwise). If you LEFT JOIN (or just JOIN, similar effect) three groceries to one user you get 3 rows (1 x 3). If you then join 4 fishmarkets for the same user, you get 12 (3 x 4) rows, multiplying the previous count in the result, not adding to it, like you may have hoped for.
Thereby multiplying the visits for groceries and fishmarkets alike.

You can make it work like this:

SELECT u.id
     , u.account_balance
     , g.grocery_visits
     , f.fishmarket_visits
FROM   users u
LEFT   JOIN (
   SELECT user_id, count(*) AS grocery_visits
   FROM   grocery
   GROUP  BY user_id
   ) g ON g.user_id = u.id
LEFT   JOIN (
   SELECT user_id, count(*) AS fishmarket_visits
   FROM   fishmarket
   GROUP  BY user_id
   ) f ON f.user_id = u.id
ORDER  BY u.id;

To get aggregated values for one or few users, correlated subqueries like @Vince provided are just fine. For a whole table or major parts of it, it is (much) more efficient to aggregate the n-tables and join to the result once. This way, we also do not need another GROUP BY in the outer query.

grocery_visits and fishmarket_visits are NULL for users without any related entries in the respective tables. If you need 0 instead (or any arbitrary number), use COALESCE in the outer SELECT:

SELECT u.id
     , u.account_balance
     , COALESCE(g.grocery_visits   , 0) AS grocery_visits
     , COALESCE(f.fishmarket_visits, 0) AS fishmarket_visits
FROM   ...

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...