There is a performance difference, especially for enter
. On modern processors this decodes to some 10 to 20 μops, while the three instruction sequence is about 4 to 6, depending on the architecture. For details consult Agner Fog's instruction tables.
Additionally the enter
instruction usually has a quite high latency, for example 8 clocks on a core2, compared to the 3 clocks dependency chain of the three instruction sequence.
Furthermore the three instruction sequence may be spread out by the compiler for scheduling purposes, depending on the surrounding code of course, to allow more parallel execution of instructions.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…