Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
246 views
in Technique[技术] by (71.8m points)

glibc - "Illegal instruction" when run precompiled program on other machine

I have to build my program on CentOS 7 and deploy on other Linux machine. The program required newer version glibc, and some library which was not (and will not be) installed on target machine. So I decided to ship the executable with dynamic library. I used patchelf to patch interpreter and rpath.

I tested the executable on my machine and it work (also checked with ldd to make sure new rpath is used). But when I copy to other machine with libs, the program is failed to run. Only this line was printed:

Illegal instruction

Here is backtrace from gdb enter image description here

Update: Binary enter image description here So the SIGILL was caused by shlx instruction in __tls_init() function. I don't know which library provide this function, I'm not sure it is from glibc.

I removed my glibc, which coppied from another computer and use glibc already installed on target computer, but the problem was not fixed.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I used patchelf to patch interpreter and rpath

Your question is very unclear: you changed the interpreter and the rpath to what?

I think what you did is:

  1. Build a new GLIBC version in non-standard path
  2. Used patchelf to change your binary to point to the non-standard path
  3. Copied the binary and the non-standard GLIBC to the target machine
  4. Observed SIGILL.

Most likely cause: the non-standard GLIBC you built is not configured for your target processor, which is different from the processor used on the build machine.

By default, GCC will use -march=native, which means that if you build on e.g. Haswell machine, then the binary will use AVX2 instructions, which are not supported by the target machine.

To fix this, you will need to add -march=generic or -march=$target_architecture to CFLAGS (and CXXFLAGS), and rebuild both GLIBC and the main program.

On the other hand, your GDB backtrace shows standard paths to GLIBC: /lib64/ld-linux-x86-64.so.2 and /lib64/libc.so.6, so maybe I didn't understand the steps you made at all.

Update:

I didn't build a new glibc but copy it from my machine to the target machine. My machine using E5-2690v4 but the target machine using E5-2470.

The E5-2690v4 is a Broadwell. The E5-2470 is an Ivy Bridge.

The former supports AVX2, but the latter doesn't. Copying GLIBC built with AVX2 to an Ivy Bridge is likely to fail with exactly the symptoms you described (and in fact should render the Ivy Bridge completely non-working; I am surprised anything works on it at all).

Using GDB x/i $pc command, you can see which instruction generates SIGILL. If it's an AVX2 instruction, that's likely the answer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...