The specific difficulty here: Queries with one or more aggregate functions in the SELECT
list and no GROUP BY
clause produce exactly one row, even if no row is found in the underlying table.
There is nothing you can do in the WHERE
clause to suppress that row. You have to exclude such a row after the fact, i.e. in the HAVING
clause, or in an outer query.
Per documentation:
If a query contains aggregate function calls, but no GROUP BY
clause,
grouping still occurs: the result is a single group row (or perhaps no
rows at all, if the single row is then eliminated by HAVING
). The same
is true if it contains a HAVING
clause, even without any aggregate
function calls or GROUP BY
clause.
It should be noted that adding a GROUP BY
clause with only a constant expression (which is otherwise completely pointless!) works, too. See example below. But I'd rather not use that trick, even if it's short, cheap and simple, because it's hardly obvious what it does.
The following query only needs a single table scan and returns the top 7 categories ordered by count. If (and only if) there are more categories, the rest is summarized into 'Others':
WITH cte AS (
SELECT categoryid, count(*) AS data
, row_number() OVER (ORDER BY count(*) DESC, categoryid) AS rn
FROM contents
GROUP BY 1
)
( -- parentheses required again
SELECT categoryid, COALESCE(ca.name, 'Unknown') AS label, data
FROM cte
LEFT JOIN category ca ON ca.id = cte.categoryid
WHERE rn <= 7
ORDER BY rn
)
UNION ALL
SELECT NULL, 'Others', sum(data)
FROM cte
WHERE rn > 7 -- only take the rest
HAVING count(*) > 0; -- only if there actually is a rest
-- or: HAVING sum(data) > 0
You need to break ties if multiple categories can have the same count across the 7th / 8th rank. In my example, categories with the smaller categoryid
win such a race.
Parentheses are required to include a LIMIT
or ORDER BY
clause to an individual leg of a UNION
query.
You only need to join to table category
for the top 7 categories. And it's generally cheaper to aggregate first and join later in this scenario. So don't join in the the base query in the CTE (common table expression) named cte
, only join in the first SELECT
of the UNION
query, that's cheaper.
Not sure why you need the COALESCE
. If you have a foreign key in place from contents.categoryid
to category.id
and both contents.categoryid
and category.name
are defined NOT NULL
(like they probably should be), then you don't need it.
The odd GROUP BY true
This would work, too:
...
UNION ALL
SELECT NULL , 'Others', sum(data)
FROM cte
WHERE rn > 7
GROUP BY true;
And I even get slightly faster query plans. But it's a rather odd hack ...
SQL Fiddle demonstrating all.
Related answer with more explanation for the UNION ALL
/ LIMIT
technique:
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…