Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
214 views
in Technique[技术] by (71.8m points)

c++ - How do I retrieve the processor time in Linux without function calls?

I need to calculate the running time of a portion of (C++) code and want to do this by finding the number of clock ticks elapsed during the execution of the code.

I want to find the processor time at the beginning of the code and the processor time at the end and then subtract them to find the number of elapsed ticks.

This can be done with the clock function. However, the time I'm measuring needs to be very precise and using a function call proved to be very intrusive since the caller-saved register allocator spilled many variables on each call.

Therefore, I cannot use any function calls and need to retrieve the processor time myself. Assembly code is fine.

I am using Debian and an i7 Intel processor. I can't use a profiler because it's too intrusive.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You should read time(7). Be aware that even written in assembler, your program will be rescheduled at arbitrary moments (perhaps a context switch every millisecond; look also into /proc/interrupts and see proc(5)). Then any hardware timer is meaningless. Even using the RDTSC x86-64 machine instruction to read the hardware timestamp counter is useless (since after any context switch it would be wrong, and the Linux kernel is doing preemptive scheduling, which does happen at any time).

You should consider clock_gettime(2). It is really fast (about 3.5 or 4 nanoseconds on my i5-4690S, when measuring thousands of calls to it) because of vdso(7). BTW it is a system call, so you might code directly the assembler instructions doing them. I don't think it is worth the trouble (and could be slower than the vdso call).

BTW, any kind of profiling or benchmarking is somehow intrusive.

At last, if your benchmarked function runs very quickly (much less than a microsecond), cache misses become significant and even dominant (remember that an L3 cache miss requiring effective access to DRAM modules lasts several hundred nanoseconds, enough to run hundreds of machine instructions in L1 I-cache). You might (and probably should) try to benchmark several (hundreds of) consecutive calls. But you won't be able to measure precisely and accurately.

Hence I believe that you cannot do much better than using clock_gettime and I don't understand why it is not good enough for your case... BTW, clock(3) is calling clock_gettime with CLOCK_PROCESS_CPUTIME_ID so IMHO it should be enough, and simpler.

In other words, I believe that avoiding any function calls is a misconception from your part. Remember that function call overhead is a lot cheaper than cache misses!

See this answer to a related question (as unclear as yours); consider also using perf(1), gprof(1), oprofile(1), time(1). See this.

At last, you should consider asking more optimizations from your compiler. Have you considered compiling and linking with g++ -O3 -flto -march=native (with link-time optimizations).

If your code is of numerical and vectorial nature (so obviously and massively parallelisable), you could even consider spending months of your development time to port its core code (the numerical compute kernels) on your GPGPU in OpenCL or CUDA. But are you sure it is worth such an effort? You'll need to tune and redevelop your code when changing hardware!

You could also redesign your application to use multi-threading, JIT compilation and partial evaluation and metaprogramming techniques, multiprocessing or cloud-computing (with inter-process communication, such as socket(7)-s, maybe using 0mq or other messaging libraries). This could take years of development. There is No Silver Bullet.

(Don't forget to take development costs into account; prefer algorithmic improvements when possible.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...