For machine instruction profiling use valgrind's callgrind (also, cachegrind can do cache and branch prediction profiling which is quite nice).
For time measurements use google's cpu profiler, it gives way better results than gprof. You can set sampling frequency and it can show the output as a nice annotated call graph.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…