Here's a solution with only 4 main steps:
const uint64_t column_mask = 0x8080808080808080ull;
const uint64_t magic = 0x2040810204081ull;
int get_col(uint64_t board, int col) {
uint64_t column = (board << col) & column_mask;
column *= magic;
return (column >> 56) & 0xff;
}
It works like this:
- the board is shifted to align the column with the left side
- it's masked to only contain the required column (0..8)
- it's multiplied by a magic number which results in all the original bits pushed to the left side
- the left-most byte is shifted to the right
The magic number is chosen to copy only the needed bits and let the rest fall into unused places / overflow over the number. The process looks like this (digits are bit "IDs", rather than the number itself):
original column: ...1.......2.......3.......4.......5.......6.......7.......8....
aligned column: 1.......2.......3.......4.......5.......6.......7.......8.......
multiplied: 123456782345678.345678..45678...5678....678.....78......8.......
shifted to right:........................................................12345678
If you add the const
keywords, assembly becomes quite nice actually:
get_col:
.LFB7:
.cfi_startproc
movl %esi, %ecx
movabsq $-9187201950435737472, %rax
salq %cl, %rdi
andq %rax, %rdi
movabsq $567382630219905, %rax
imulq %rax, %rdi
shrq $56, %rdi
movl %edi, %eax
ret
No branching, no external data, around 0.4ns per calculation.
Edit: takes around 6th of the time using NPE's solution as baseline (next fastest one)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…