Yes, that code sits and busy-waits for an entire second, which has causes that core to be 100% busy for a second. One second is more than enough time for dynamic clocking algorithms to detect load and kick the CPU frequency up out of power-saving states. I wouldn't be surprised if processors with boost actually show you a frequency above the labelled frequency.
The concept isn't bad, however. What you have to do is sleep for an interval of about one second. Then, instead of assuming the RDTSC invocations were exactly one second apart, divide by the actual time indicated by QueryPerformanceCounter
.
Also, I recommend checking RDTSC
both before and after the QueryPerformanceCounter
call, to detect whether there was a context switch between RDTSC
and QueryPerformanceCounter
which would mess up your results.
Unfortunately, RDTSC
on new processors doesn't actually count CPU clock cycles. So this doesn't reflect the dynamically changing CPU clock rate (it does measure the nominal rate without busy-waiting, though, so it is a big improvement over the code provided in the question).
So it looks like you'd need to access model-specific registers after all. Which can't be done from user-mode. The OpenHardwareMonitor project has both a driver that can be used and code for the frequency calculations
float ProcSpeedCalc()
{
/*
RdTSC:
It's the Pentium instruction "ReaD Time Stamp Counter". It measures the
number of clock cycles that have passed since the processor was reset, as a
64-bit number. That's what the <CODE>_emit</CODE> lines do.
*/
// Microsoft inline assembler knows the rdtsc instruction. No need for emit.
// variables for the CPU cycle counter (unknown rate):
__int64 tscBefore, tscAfter, tscCheck;
// variables for the Performance Counter 9steady known rate):
LARGE_INTEGER hpetFreq, hpetBefore, hpetAfter;
// retrieve performance-counter frequency per second:
if (!QueryPerformanceFrequency(&hpetFreq)) return 0;
int retryLimit = 10;
do {
// read CPU cycle count
_asm
{
rdtsc
mov DWORD PTR tscBefore, eax
mov DWORD PTR [tscBefore + 4], edx
}
// retrieve the current value of the performance counter:
QueryPerformanceCounter(&hpetBefore);
// read CPU cycle count again, to detect context switch
_asm
{
rdtsc
mov DWORD PTR tscCheck, eax
mov DWORD PTR [tscCheck + 4], edx
}
} while ((tscCheck - tscBefore) > 800 && (--retryLimit) > 0);
Sleep(1000);
do {
// read CPU cycle count
_asm
{
rdtsc
mov DWORD PTR tscAfter, eax
mov DWORD PTR [tscAfter + 4], edx
}
// retrieve the current value of the performance counter:
QueryPerformanceCounter(&hpetAfter);
// read CPU cycle count again, to detect context switch
_asm
{
rdtsc
mov DWORD PTR tscCheck, eax
mov DWORD PTR [tscCheck + 4], edx
}
} while ((tscCheck - tscAfter) > 800 && (--retryLimit) > 0);
// stop-start is speed in Hz divided by 1,000,000 is speed in MHz
return (double)(tscAfter - tscBefore) / (double)(hpetAfter.QuadPart - hpetBefore.QuadPart) * (double)hpetFreq.QuadPart / 1.0e6;
}
Most compilers provide an __rdtsc()
intrinsic, in which case you could use tscBefore = __rdtsc();
instead of the __asm
block. Both methods are platform- and compiler-specific, unfortunately.