This isn't the problem with Django ORM, this is just the way relational databases work. When you're constructing simple querysets like
Player.objects.annotate(weapon_count=Count('unit_set__weapon_set'))
or
Player.objects.annotate(rarity_sum=Sum('unit_set__rarity'))
ORM does exactly what you expect it to do - join Player
with Weapon
SELECT "sandbox_player"."id", "sandbox_player"."name", COUNT("sandbox_weapon"."id") AS "weapon_count"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit"
ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
LEFT OUTER JOIN "sandbox_weapon"
ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"
or Player
with Unit
SELECT "sandbox_player"."id", "sandbox_player"."name", SUM("sandbox_unit"."rarity") AS "rarity_sum"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"
and perform either COUNT
or SUM
aggregation on them.
Note that although the first query has two joins between three tables, the intermediate table Unit
is neither in columns referenced in SELECT
, nor in the GROUP BY
clause. The only role that Unit
plays here is to join Player
with Weapon
.
Now if you look at your third queryset, things get more complicated. Again, as in the first query the joins are between three tables, but now Unit
is referenced in SELECT
as there is SUM
aggregation for Unit.rarity
:
SELECT "sandbox_player"."id",
"sandbox_player"."name",
COUNT(DISTINCT "sandbox_weapon"."id") AS "weapon_count",
SUM("sandbox_unit"."rarity") AS "rarity_sum"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"
And this is the crucial difference between the second and the third queries. In the second query, you're joining Player
to Unit
, so a single Unit
will be listed once for each player that it references.
But in the third query you're joining Player
to Unit
and then Unit
to Weapon
, so not only a single Unit
will be listed once for each player that it references, but also for each weapon that references Unit
.
Let's take a look at the simple example:
insert into sandbox_player values (1, "player_1");
insert into sandbox_unit values(1, 10, 1);
insert into sandbox_weapon values (1, 1), (2, 1);
One player, one unit and two weapons that reference the same unit.
Confirm that the problem exists:
>>> from sandbox.models import Player
>>> from django.db.models import Count, Sum
>>> Player.objects.annotate(weapon_count=Count('unit_set__weapon_set')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2}]>
>>> Player.objects.annotate(rarity_sum=Sum('unit_set__rarity')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'rarity_sum': 10}]>
>>> Player.objects.annotate(
... weapon_count=Count('unit_set__weapon_set', distinct=True),
... rarity_sum=Sum('unit_set__rarity')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2, 'rarity_sum': 20}]>
From this example it's easy to see that the problem is that in the combined query the unit will be listed twice, one time for each of the weapons referencing it:
sqlite> SELECT "sandbox_player"."id",
...> "sandbox_player"."name",
...> "sandbox_weapon"."id",
...> "sandbox_unit"."rarity"
...> FROM "sandbox_player"
...> LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
...> LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id");
id name id rarity
---------- ---------- ---------- ----------
1 player_1 1 10
1 player_1 2 10
What should you do?
As @ivissani mentioned, one of the easiest solutions would be to write subqueries for each of the aggregations:
>>> from django.db.models import Count, IntegerField, OuterRef, Subquery, Sum
>>> weapon_count = Player.objects.annotate(weapon_count=Count('unit_set__weapon_set')).filter(pk=OuterRef('pk'))
>>> rarity_sum = Player.objects.annotate(rarity_sum=Sum('unit_set__rarity')).filter(pk=OuterRef('pk'))
>>> qs = Player.objects.annotate(
... weapon_count=Subquery(weapon_count.values('weapon_count'), output_field=IntegerField()),
... rarity_sum=Subquery(rarity_sum.values('rarity_sum'), output_field=IntegerField())
... )
>>> qs.values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2, 'rarity_sum': 10}]>
which produces the following SQL
SELECT "sandbox_player"."id", "sandbox_player"."name",
(
SELECT COUNT(U2."id") AS "weapon_count"
FROM "sandbox_player" U0
LEFT OUTER JOIN "sandbox_unit" U1
ON (U0."id" = U1."player_id")
LEFT OUTER JOIN "sandbox_weapon" U2
ON (U1."id" = U2."unit_id")
WHERE U0."id" = ("sandbox_player"."id")
GROUP BY U0."id", U0."name"
) AS "weapon_count",
(
SELECT SUM(U1."rarity") AS "rarity_sum"
FROM "sandbox_player" U0
LEFT OUTER JOIN "sandbox_unit" U1
ON (U0."id" = U1."player_id")
WHERE U0."id" = ("sandbox_player"."id")
GROUP BY U0."id", U0."name") AS "rarity_sum"
FROM "sandbox_player"