If you don't mind using SSE intrinsics, then _mm_movemask_epi8 is an excellent fit. It uses 16 bytes, but you can just set the others to zero.
For example (not tested)
__m128i values = _mm_loadl_epi64((__m128i*)array);
__m128i order = _mm_set_epi8(0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0, 1, 2, 3, 4, 5, 6, 7);
values = _mm_shuffle_epi8(values, order);
int result = _mm_movemask_epi8(_mm_slli_epi32(values, 7));
This assumes the array is an array of chars. If you can't make that happen, it takes some more loads and packs and it becomes a bit annoying.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…