Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
728 views
in Technique[技术] by (71.8m points)

x86 - What's the difference between __popcnt() and _mm_popcnt_u32()?

MS Visual C++ supports 2 flavors of the popcnt instruction on CPUs with SSE4.2:

  1. __popcnt()
  2. _mm_popcnt_u32()

The only difference I found was that the docs for __popcnt() are marked as "Microsoft Specific", and _mm_popcnt_u32() seems to be an intrinsic command name (non-MS-specific).

Is this the only difference, where the MS __popcnt() just calls the HW _mm_popcnt_u32()?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

These are two different intrinsic names for the same machine instruction, thanks to Intel and AMD. The instruction is the same on all CPUs that support it, and the different intrinsics also have no difference in C or C++.


The __popcnt*() builtins are for AMD's Advanced Bit Manipulation (ABM) instructions. See http://blogs.amd.com/developer/2007/09/26/barcelona-processor-feature-advanced-bit-manipulation-abm/

The _mm_popcnt_u*() intrinsics are for Intel's implementation, which aren't part of SSE4.2 per se, but were implemented around the same time. See http://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT

According to https://www.chessprogramming.org/Population_Count , both implementations are binary compatible, in spite of their different intrinsic names.

Intel's architecture manual states that:

Before an application attempts to use the POPCNT instruction, it must check that the processor supports SSE4.2 (if CPUID.01H:ECX.SSE4_2[bit 20] = 1) and POPCNT (if CPUID.01H:ECX.POPCNT[bit 23] = 1).

AMD's AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions says

Support for the POPCNT instruction is indicated by ECX bit 23 (POPCNT) as returned by CPUID function 0000_0001h. Software MUST check the CPUID bit once per program or library initialization before using the POPCNT instruction, or inconsistent behavior may result.

I can't see any reason why popcnt would require the presence of SSE4.2, so I think that checking bit 23 of ECX is sufficient to determine popcnt's presence.


AMD's Barcelona, the first AMD CPU to have popcnt, didn't fully implement SSE4, so it's possible that Intel's architecture manual suggests a method for determine presence which will work on Intel CPUs and fail on even qualified AMD CPUs.

Intel's current documentation for popcnt in their vol.2 instruction-set reference manual only says #UD If CPUID.01H:ECX.POPCNT [Bit 23] = 0 so the anti-competitive suggestion that would lead to software not taking advantage of popcnt on some AMD CPUs without SSE4.2 is gone.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...