Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
809 views
in Technique[技术] by (71.8m points)

visual c++ - _addcarry_u64 and _addcarryx_u64 with MSVC and ICC

MSVC and ICC both support the intrinsics _addcarry_u64 and _addcarryx_u64.

According to Intel's Intrinsic Guide and white paper these should map to adcx and adox respectively. However, by looking at the generated assembly it's clear they map to adc and adcx respectively and there is no intrinsic which maps to adox.

Additionally, telling the compiler to enable AVX2 with /arch:AVX2 in MSVC or -march=core-avx2 with ICC on Linux makes no difference. I'm not sure how to enable ADX with MSVC and ICC.

The documentation for MSVC lists _addcarryx_u64 with the technology of ADX whereas _addcarry_u64 has no listed technology. However, the link in MSVC's documentation for these intrinsics goes directly to the Intel Intrinsic guide which contradicts MSVC's own documentation and the generated assembly.

From this I conclude that Intel's Intrinsic guide and white paper are wrong.

This makes some sense for MSVC sense it does not allow inline assembly it should provide a way to use adc which it does with _addcarry_u64.

One of the big advantages of adcx and adox is that they operate on different flags (carry CF and overflow OF) and this allows two independent parallel carry chains. However, since there is no intrinsic for adox how is this possible? With ICC at least one can use inline assembly but this is not possible with MSVC in 64-bit mode.


Microsoft and Intel's documentation (both the white paper and the intrinsic guide online) both agree now.

The _addcarry_u64 intrinsic documentation says produces only adc. The _addcarryx_u64 intrinsic can produce either adcx or adox. With MSVC 2013 and 2015, however, _addcarryx_u64 only produces adcx. ICC produces both.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

They map to adc, adcx AND adox. The compiler decides which instructions to use, based on how you use them. If you perform two big-int additions in parallel the compiler will use adcx and adox, for higher throughput. For example:

unsigned char c1 = 0, c2 = 0
for(i=0; i< 100; i++){ 
    c1 = _addcarry_u64(c1, res[i], a[i], &res[i]);
    c2 = _addcarry_u64(c2, res[i], b[i], &res[i]);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...