Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
489 views
in Technique[技术] by (71.8m points)

c - faster equivalent of gettimeofday

In trying to build a very latency sensitive application, that needs to send 100s of messages a seconds, each message having the time field, we wanted to consider optimizing gettimeofday. Out first thought was rdtsc based optimization. Any thoughts ? Any other pointers ? Required accurancy of the time value returned is in milliseconds, but it isn't a big deal if the value is occasionally out of sync with the receiver for 1-2 milliseconds. Trying to do better than the 62 nanoseconds gettimeofday takes

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

POSIX Clocks

I wrote a benchmark for POSIX clock sources:

  • time (s) => 3 cycles
  • ftime (ms) => 54 cycles
  • gettimeofday (us) => 42 cycles
  • clock_gettime (ns) => 9 cycles (CLOCK_MONOTONIC_COARSE)
  • clock_gettime (ns) => 9 cycles (CLOCK_REALTIME_COARSE)
  • clock_gettime (ns) => 42 cycles (CLOCK_MONOTONIC)
  • clock_gettime (ns) => 42 cycles (CLOCK_REALTIME)
  • clock_gettime (ns) => 173 cycles (CLOCK_MONOTONIC_RAW)
  • clock_gettime (ns) => 179 cycles (CLOCK_BOOTTIME)
  • clock_gettime (ns) => 349 cycles (CLOCK_THREAD_CPUTIME_ID)
  • clock_gettime (ns) => 370 cycles (CLOCK_PROCESS_CPUTIME_ID)
  • rdtsc (cycles) => 24 cycles

These numbers are from an Intel Core i7-4771 CPU @ 3.50GHz on Linux 4.0. These measurements were taken using the TSC register and running each clock method thousands of times and taking the minimum cost value.

You'll want to test on the machines you intend to run on though as how these are implemented varies from hardware and kernel version. The code can be found here. It relies on the TSC register for cycle counting, which is in the same repo (tsc.h).

TSC

Access the TSC (processor time-stamp counter) is the most accurate and cheapest way to time things. Generally, this is what the kernel is using itself. It's also quite straight-forward on modern Intel chips as the TSC is synchronized across cores and unaffected by frequency scaling. So it provides a simple, global time source. You can see an example of using it here with a walkthrough of the assembly code here.

The main issue with this (other than portability) is that there doesn't seem to be a good way to go from cycles to nanoseconds. The Intel docs as far as I can find state that the TSC runs at a fixed frequency, but that this frequency may differ from the processors stated frequency. Intel doesn't appear to provide a reliable way to figure out the TSC frequency. The Linux kernel appears to solve this by testing how many TSC cycles occur between two hardware timers (see here).

Memcached

Memcached bothers to do the cache method. It may simply be to make sure the performance is more predictable across platforms, or scale better with multiple cores. It may also no be a worthwhile optimization.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...