I've been writing in x86 assembly lately (for fun) and was wondering whether or not rep prefixed string instructions actually have a performance edge on modern processors or if they're just implemented for back compatibility.
I can understand why Intel would have originally implemented the rep instructions back when processors only ran one instruction at a time, but is there a benefit to using them now?
With a loop that compiles to more instructions, there is more to fill up the pipeline and/or be issued out-of-order. Are modern processors built to optimize for these rep-prefixed instructions, or are rep instructions used so rarely in modern code that they're not important to the manufacturers?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…