This is an example of the greatest-n-per-group
problem that has appeared regularly on StackOverflow.
Here's how I usually recommend solving it:
SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND
(p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))
WHERE p2.id IS NULL;
Explanation: given a row p1
, there should be no row p2
with the same customer and a later date (or in the case of ties, a later id
). When we find that to be true, then p1
is the most recent purchase for that customer.
Regarding indexes, I'd create a compound index in purchase
over the columns (customer_id
, date
, id
). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN
on MySQL.
Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…