If you don't need to STORE the data (which you shouldn't, because you need to update the running totals any time any row is changed, added or deleted), and if you don't trust the quirky update (which you shouldn't, because it isn't guaranteed to work and its behavior could change with a hotfix, service pack, upgrade, or even an underlying index or statistics change), you can try this type of query at runtime. This is a method fellow MVP Hugo Kornelis coined "set-based iteration" (he posted something similar in one of his chapters of SQL Server MVP Deep Dives). Since running totals typically requires a cursor over the entire set, a quirky update over the entire set, or a single non-linear self-join that becomes more and more expensive as the row counts increase, the trick here is to loop through some finite element in the set (in this case, the "rank" of each row in terms of month, for each user - and you process only each rank once for all user/month combinations at that rank, so instead of looping through 200,000 rows, you loop up to 24 times).
DECLARE @t TABLE
(
[user_id] INT,
[month] TINYINT,
total DECIMAL(10,1),
RunningTotal DECIMAL(10,1),
Rnk INT
);
INSERT @t SELECT [user_id], [month], total, total,
RANK() OVER (PARTITION BY [user_id] ORDER BY [month])
FROM dbo.my_table;
DECLARE @rnk INT = 1, @rc INT = 1;
WHILE @rc > 0
BEGIN
SET @rnk += 1;
UPDATE c SET RunningTotal = p.RunningTotal + c.total
FROM @t AS c INNER JOIN @t AS p
ON c.[user_id] = p.[user_id]
AND p.rnk = @rnk - 1
AND c.rnk = @rnk;
SET @rc = @@ROWCOUNT;
END
SELECT [user_id], [month], total, RunningTotal
FROM @t
ORDER BY [user_id], rnk;
Results:
user_id month total RunningTotal
------- ----- ----- ------------
1 1 2.0 2.0
1 2 1.0 3.0
1 3 3.5 6.5 -- I think your calculation is off
2 1 0.5 0.5
2 2 1.5 2.0
2 3 2.0 4.0
Of course you can update the base table from this table variable, but why bother, since those stored values are only good until the next time the table is touched by any DML statement?
UPDATE mt
SET cumulative_total = t.RunningTotal
FROM dbo.my_table AS mt
INNER JOIN @t AS t
ON mt.[user_id] = t.[user_id]
AND mt.[month] = t.[month];
Since we're not relying on implicit ordering of any kind, this is 100% supported and deserves a performance comparison relative to the unsupported quirky update. Even if it doesn't beat it but comes close, you should consider using it anyway IMHO.
As for the SQL Server 2012 solution, Matt mentions RANGE
but since this method uses an on-disk spool you should also test with ROWS
instead of just running with RANGE
. Here is a quick example for your case:
SELECT
[user_id],
[month],
total,
RunningTotal = SUM(total) OVER
(
PARTITION BY [user_id]
ORDER BY [month] ROWS UNBOUNDED PRECEDING
)
FROM dbo.my_table
ORDER BY [user_id], [month];
Compare this with RANGE UNBOUNDED PRECEDING
or no ROWSRANGE
at all (which will also use the RANGE
on-disk spool). The above will have lower overall duration and way less I/O, even though the plan looks slightly more complex (an additional sequence project operator).
I've recently published a blog post outlining some performance differences I observed for a specific running totals scenario:
http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals