I'm tracking views to different pages, and I want to know the highest page per session, in order to know how far they've clicked through (they're required to view every page all the way to the end) in any given session.
Ordering before grouping is a highly unreliable way to do this.
MySQL
extends GROUP BY
syntax: you can use ungrouped and unaggregated fields in SELECT
and ORDER BY
clauses.
In this case, a random value of page
is output per each session
.
Documentation explicitly states that you should never make any assumptions on which value exactly will it be:
Do not use this feature if the columns you omit from the GROUP BY
part are not constant in the group. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.
However, in practice, the values from the first row scanned are returned.
Since you are using an ORDER BY page DESC
in your subquery, this row happens to be the rows with a maximal page
per session.
You shouldn't rely on it, since this behaviour is undocumented and if some other row will be returned in next version, it will not be considered a bug.
But you don't even have to do such nasty tricks.
Just use aggregate functions:
SELECT MAX(page)
FROM views
WHERE user_id = '1'
GROUP BY
session
This is documented and clean way to do what you want.
Create a composite index on (user_id, session, page)
for the query to run faster.
If you need all columns from your table, not only the aggregated ones, use this syntax:
SELECT v.*
FROM (
SELECT DISTINCT user_id, session
FROM views
) vo
JOIN views v
ON v.id =
(
SELECT id
FROM views vi
WHERE vi.user_id = vo.user_id
AND vi.session = vo.session
ORDER BY
page DESC
LIMIT 1
)
This assumes that id
is a PRIMARY KEY
on views
.