I work with two computers. One without AVX support and one with AVX. It would be convenient to have my code find the instruction set supported by my CPU at run-time and choose the appropriate code path.
I've follow the suggestions by Agner Fog to make a CPU dispatcher (http://www.agner.org/optimize/#vectorclass). However, on my maching ithout AVX compiling and linking with visual studio the code with AVX enabled causes the code to crash when I run it.
I mean for example I have two source files one with the SSE2 instruction set defined with some SSE2 instructions and another one with the AVX instruction set defined and with some AVX instructions. In my main function if I only reference the SSE2 functions the code still crashes by virtue of having any source code with AVX enabled and with AVX instructions. Any clues to how I can fix this?
Edit:
Okay, I think I isolated the problem. I'm using Agner Fog's vector class and I have defined three source files as:
//file sse2.cpp - compiled with /arch:SSE2
#include "vectorclass.h"
float func_sse2(const float* a) {
Vec8f v1 = Vec8f().load(a);
float sum = horizontal_add(v1);
return sum;
}
//file avx.cpp - compiled with /arch:AVX
#include "vectorclass.h"
float func_avx(const float* a) {
Vec8f v1 = Vec8f().load(a);
float sum = horizontal_add(v1);
return sum;
}
//file foo.cpp - compiled with /arch:SSE2
#include <stdio.h>
extern float func_sse2(const float* a);
extern float func_avx(const float* a);
int main() {
float (*fp)(const float*a);
float a[] = {1,2,3,4,5,6,7,8};
int iset = 6;
if(iset>=7) {
fp = func_avx;
}
else {
fp = func_sse2;
}
float sum = (*fp)(a);
printf("sum %f
", sum);
}
This crashes. If I instead use Vec4f in func_SSE2 it does not crash. I don't understand this. I can use Vec8f with SSE2 by itself as long as I don't have another source file with AVX. Agner Fog's manual says
"There is no advantage in using the 256-bit floating point vector classes (Vec8f,
Vec4d) unless the AVX instruction set is specified, but it can be convenient to use
these classes anyway if the same source code is used with and without AVX.
Each 256-bit vector will simply be split up into two 128-bit vectors when compiling
without AVX."
However, when I have two source files with Vec8f one compiled with SSE2 and one compiled with AVX then I get a crash.
Edit2:
I can get it to work from the command line
>cl -c sse2.cpp
>cl -c /arch:AVX avx.cpp
>cl foo.cpp sse2.obj avx.obj
>foo.exe
Edit3:
This, however, crashes
>cl -c sse2.cpp
>cl -c /arch:AVX avx.cpp
>cl foo.cpp avx.obj sse2.obj
>foo.exe
Another clue. Apparently, the order of linking matters. It crashes if avx.obj is before sse2.obj but if sse2.obj is before avx.obj it does not crash. I'm not sure if it chooses the correct code path (I don't have access to my AVX system right now) but at least it does not crash.
See Question&Answers more detail:
os