The cmpeq/cmpgt instructions create a mask, all ones or all zeros. The overall process goes as follows:
auto mask=_mm_cmpeq_ps(_mm_setzero_ps(), w);
mask=_mm_andnot_ps(mask, entropy);
w = _mm_add_ps(w, mask);
Other option is to accumulate anyway, but use blendv to select between added/not added.
auto w2=_mm_add_ps(e,w);
auto mask=_mm_cmpeq_ps(zero,w);
w=_mm_blendv_ps(w2,w, mask);
Third option uses the fact that w+e = 0, when w=0
m=(w==0); // make mask as in above
w+=e; // add
w&=~m; // revert adding for w==0
(I'm using cmpeq instead of cmpneq to make it usable for integers as well.)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…