I wrote this quite a while ago when I had the same basic question (along with another that will be obvious). I've updated it to show a little more about not only how long it takes to create threads, but how long it takes for the threads to start executing:
#include <windows.h>
#include <iostream>
#include <time.h>
#include <vector>
const int num_threads = 32;
const int switches_per_thread = 100000;
DWORD __stdcall ThreadProc(void *start) {
QueryPerformanceCounter((LARGE_INTEGER *) start);
for (int i=0;i<switches_per_thread; i++)
Sleep(0);
return 0;
}
int main(void) {
HANDLE threads[num_threads];
DWORD junk;
std::vector<LARGE_INTEGER> start_times(num_threads);
LARGE_INTEGER l;
QueryPerformanceCounter(&l);
clock_t create_start = clock();
for (int i=0;i<num_threads; i++)
threads[i] = CreateThread(NULL,
0,
ThreadProc,
(void *)&start_times[i],
0,
&junk);
clock_t create_end = clock();
clock_t wait_start = clock();
WaitForMultipleObjects(num_threads, threads, TRUE, INFINITE);
clock_t wait_end = clock();
double create_millis = 1000.0 * (create_end - create_start) / CLOCKS_PER_SEC / num_threads;
std::cout << "Milliseconds to create thread: " << create_millis << "
";
double wait_clocks = (wait_end - wait_start);
double switches = switches_per_thread*num_threads;
double us_per_switch = wait_clocks/CLOCKS_PER_SEC*1000000/switches;
std::cout << "Microseconds per thread switch: " << us_per_switch;
LARGE_INTEGER f;
QueryPerformanceFrequency(&f);
for (auto s : start_times)
std::cout << 1000.0 * (s.QuadPart - l.QuadPart) / f.QuadPart <<" ms
";
return 0;
}
Sample results:
Milliseconds to create thread: 0.015625
Microseconds per thread switch: 0.0479687
The first few thread start times look like this:
0.0632517 ms
0.117348 ms
0.143703 ms
0.18282 ms
0.209174 ms
0.232478 ms
0.263826 ms
0.315149 ms
0.324026 ms
0.331516 ms
0.3956 ms
0.408639 ms
0.4214 ms
Note that although these happen to be monotonically increasing, that's not guaranteed (though there is definitely a trend in that general direction).
When I first wrote this, the units I used made more sense -- on a 33 MHz 486, those results weren't tiny fractions like this. :-) I suppose someday when I'm feeling ambitious, I should rewrite this to use std::async
to create the threads and std::chrono
to do the timing, but...
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…