Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
298 views
in Technique[技术] by (71.8m points)

c - Is there any way to write "mod 31" without modulus/division operators?

Getting the modulus of a number can be easily done without the modulus operator or divisions, if your operand is a power of 2. In that case, the following formula holds: x % y = (x & (y ? 1)). This is often many performant in many architectures. Can the same be done for mod 31?

int mod31(int a){ return a % 31; };
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here are two ways to approach this problem. The first one using a common bit-twiddling technique, and if carefully optimized can beat hardware division. The other one substitutes a multiply for the divide, similar to the optimization performed by gcc, and is far and away the fastest. The bottom line is that there's not much point trying to avoid the % operator if the second argument is constant, because gcc's got it covered. (And probably other compilers, too.)

The following function is based on the fact that x is the same (mod 31) as the sum of the base-32 digits of x. That's true because 32 is 1 mod 31, and consequently any power of 32 is 1 mod 31. So each "digit" position in a base-32 number contributes the digit * 1 to the mod 31 sum. And it's easy to get the base-32 representation: we just take the bits five at a time.

(Like the rest of the functions in this answer, it will only work for non-negative x).

unsigned mod31(unsigned x) {
  unsigned tmp;
  for (tmp = 0; x; x >>= 5) {
    tmp += x & 31;
  }
  // Here we assume that there are at most 160 bits in x
  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

For a specific integer size, you could unroll the loop and quite possibly beat division. (And see @chux's answer for a way to convert the loop into O(log bits) operations instead of O(bits) It's more difficult to beat gcc, which avoids division when the dividend is a constant known at compile-time.

In a very quick benchmark using unsigned 32 bit integers, the naive unrolled loop took 19 seconds and a version based on @chux's answer took only 13 seconds, but gcc's x%31 took 9.7 seconds. Forcing gcc to use a hardware divide (by making the division non-constant) took 23.4 seconds, and the code as shown above took 25.6 seconds. Those figures should be taken with several grains of salt. The times are for computing i%31 for all possible values of i, on my laptop using -O3 -march=native.

gcc avoids 32-bit division by a constant by replacing it with what is essentially a 64-bit multiplication by the inverse of the constant followed by a right shift. (The actual algorithm does a bit more work to avoid overflows.) The procedure was implemented more than 20 years ago in gcc v2.6, and the paper which describes the algorithm is available on the gmp site. (GMP also uses this trick.)

Here's a simplified version: Say we want to compute n // 31 for some unsigned 32-bit integer n (using the pythonic // to indicate truncated integer division). We use the "magic constant" m = 232 // 31, which is 138547332. Now it's clear that for any n:

m * n <= 232 * n/31 < m * n + n ⇒ m * n // 232 <= n//31 <= (m * n + n) // 232

(Here we make use of the fact that if a < b then floor(a) <= floor(b).)

Furthermore, since n < 232, m * n // 232 and (m * n + n) // 232 are either the same integer or two consecutive integers. Consequently, one (or both) of those two is the actual value of n//31.

Now, we really want to compute n%31. So we need to multiply the (presumed) quotient by 31, and subtract that from n. If we use the smaller of the two possible quotients, it may turn out that the computed modulo value is too big, but it can only be too big by 31.

Or, to put it in code:

static unsigned long long magic = 138547332;
unsigned mod31g(unsigned x) {
  unsigned q = (x * magic) >> 32;
  // To multiply by 31, we multiply by 32 and subtract
  unsigned mod = x - ((q << 5) - q);
  return mod < 31 ? mod : mod - 31;
}

The actual algorithm used by gcc avoids the test at the end by using a slightly more accurate computation based on multiplying by 237//31 + 1. That always produces the correct quotient, but at the cost of some extra shifts and adds to avoid integer overflow. As it turns out, the version above is slightly faster -- in the same benchmark as above, it took only 6.3 seconds.


Other benchmarked functions, for completeness:

Naive unrolled loop

unsigned mod31b(unsigned x) {
  unsigned tmp = x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31;

  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

@chux's improvement, slightly optimized

static const unsigned mask1 = (31U << 0) | (31U << 10) | (31U << 20) | (31U << 30);
static const unsigned mask2 = (31U << 5) | (31U << 15) | (31U << 25);
unsigned mod31c(unsigned x) {
  x = (x & mask1) + ((x & mask2) >> 5);
  x += x >> 20;
  x += x >> 10;

  x = (x & 31) + ((x >> 5) & 31);
  return x >= 31 ? x - 31: x;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...