It can be surprisingly difficult to perform timings on function calls like these accurately and significantly. Here's a modification of your program that illustrates the difficulties:
#include <stdio.h>
#include <time.h>
#include <math.h>
long exponent(int a, int b);
long exponentFast(int a, int b);
void tester(long (*)(int, int));
#define NTRIALS 1000000000
int main(void)
{
clock_t begin;
clock_t end;
double time_spent;
begin = clock();
tester(exponentFast);
end = clock();
time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("exponentFast: Time: %.9f = %.10f/call
", time_spent, time_spent / NTRIALS);
begin = clock();
tester(exponent);
end = clock();
time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("exponent: Time: %.9f = %.10f/call
", time_spent, time_spent / NTRIALS);
}
void tester(long (*func)(int, int))
{
int base = 2;
int power = 25;
int i;
unsigned long accum = 0;
for(i = 0; i < NTRIALS; i++) {
accum += (*func)(base, power);
base = (base + 1) % 5;
power = (power + 7) % 16;
}
printf("(accum = %lu)
", accum);
}
long exponent(int a, int b)
{
return pow(a, b);
}
long exponentFast(int a, int b)
{
long ret = 1;
int i;
for(i = 0; i < b; i++)
ret *= a;
return ret;
}
You will notice that:
- I've arranged to perform multiple trials, which involved adding a new function
tester()
to do this. (tester()
therefore takes a pointer to the function being tested, which is a technique you may not have been familiar with yet.)
- I've arranged to vary the arguments to the function under test, between calls.
- I've arranged to do something with the return values of the functions under test, namely add them all up.
(The second and third bullets follow suggestions by Jonathan Leffler, and are intended to ensure that a too-clever compiler doesn't optimize out some or all of the interesting work.)
When I ran this on my computer (an ordinary consumer-grade laptop), these were the results I got:
(accum = 18165558496053920)
exponentFast: Time: 20.954286000 = 0.0000000210/call
(accum = 18165558496053920)
exponent: Time: 23.409001000 = 0.0000000234/call
There are two huge things to notice here.
- I ran each function one billion times. That's right, a billion, a thousand million. And yet this only took about 20 seconds.
- Even with that many trials, there's still almost no visible difference between the "regular" and "fast" versions. On average, one took 21 nanoseconds (nanoseconds!); the other took 23 nanoseconds. Big whoop.
(Actually, however, this first trial was significantly misleading. More on that in a bit.)
Before continuing, it's worth asking a couple of questions.
- Do these results even make sense? Is it actually possible for these functions to be taking mere tens of nanoseconds to do their work?
- Presuming they're accurate, do these results tell us that we're wasting our time, that the difference between the "regular" and "fast" versions is so slight as to not be worth the effort to write and debug the "fast" one?
To the first point, let's do a quick back-of-the-envelope analysis. My machine claims to have a 2.2 GHz CPU. That means (roughly speaking) that it can do 2.2 billion things per second, or about 0.45 nanoseconds per thing. So a function that's taking 21 nanoseconds can be doing roughly 21 ÷ 0.45 = 46 things. And since my sample exponentFast
function is doing roughly as many multiplications as the value of the exponent, it looks like we're probably in the right ballpark.
The other thing I did to confirm that I was getting at least quasi-reasonable results was to vary the number of trials. With NTRIALS
reduced to 100000000, the overall program took just about exactly a tenth of the time to run, meaning that the time per call was consistent.
Now, to point 2, I still remember one of my formative experiences as a programmer, when I wrote a new and improved version of a standard function which I just knew was going to be gobs faster, and after several hours spent debugging it to get it to work at all, I discovered that it wasn't measurably faster, until I increased the number of trials up into the millions, and the penny (as they say) dropped.
But as I said, the results I've presented so far were, by a funny coincidence, misleading. When I first threw together some simple code to vary the arguments given to the function calls, as shown above, I had:
int base = 2;
int power = 25;
and then, within the loop
base = (base + 1) % 5;
power = (power + 7) % 16;
This was intended to allow base
to run from 0 to 4, and power
from 0 to 15, with the numbers chosen to ensure that the result wouldn't overflow even when base
was 4. But this means that power
was, on average, only 8, meaning that my simpleminded exponentFast
call was only having to make 8 trips through its loop, on average, not 25 as in your original post.
When I changed the iteration step to
power = 25 + (power - 25 + 1) % 5;
-- that is, not varying base
(and therefore allowing it to remain as the constant 2) and allowing power
to vary between 25 and 30, now the time per call for exponentFast
rose to about 63 nanoseconds. The good news is that this makes sense (roughly three times as many iterations, on average, made it about three times slower), but the bad news is that it looks like my "exponentFast
" function isn't very fast! (Obviously, though, I didn't expect it to be, with its simpleminded, brute-force loop. If I wanted to make it faster, the first thing I'd do, at little additional cost in complexity, would be to apply "binary exponentiation".)
There's at least one more thing to worry about, though, which is that if we call these functions one billion times, we're not only counting one billion times the amount of time each function takes to do its work, but also one billion times the function call overhead. If the function call overhead is on a par with the amount of work the function is doing, we will (a) have a hard time measuring the actual work time, but also (b) have a hard time speeding things up! (We could get rid of the function call overhead by inlining the functions for our test, but that obviously wouldn't be meaningful if the actual use of the functions in the end program were going to involve real function calls.)
And then yet one more inaccuracy is that we're introducing a timing artifact by computing new and different values of base
and/or power
for each call, and adding up all the results, so that the amortized time to do that work goes into what we've been calling "time per call". (This problem, at least, since it affects either exponentiation function equally, won't detract from our ability to assess which one, if either, is faster.)
Addendum: Since my initial exponent of "exponentFast
" really was pretty embarrassingly simpleminded, and since binary exponentiation is so easy and elegant, I performed one more test, rewriting exponentFast
as
long exponentFast(int a, int b)
{
long ret = 1;
long fac = a;
while(1) {
if(b & 1) ret *= fac;
b >>= 1;
if(b == 0) break;
fac *= fac;
}
return ret;
}
Now -- Hooray! -- the average call to exponentFast
goes down to about 16 ns per call on my machine. But the "Hooray!" is qualified. It's evidently about 25% faster than calling pow()
, and that's nicely significant, but not an order of magnitude or anything. If the program where I'm using this is spending all its time exponentiating, I'll have made that program 25% faster, too, but if not, the improvement will be less. And there are cases where the improvement (the time saved over all anticipated runs of the program) will be less than the time I spent writing and testing my own version. And I haven't yet spent any time doing proper regression tests on my improved exponentFast
function, but if this were anything other than a Stack Overflow post, I'd have to. It's got several sets of edge cases, and might well contain lurking bugs.