You are correct that in the third instruction (sub
), has already read an incorrect (e.g. stale) value in decode stage, and thus requires mitigation such as forwarding.
In fact, that sub
instruction has read two incorrect (stale) values, one for the first operand, t0
, and one for the second operand, t3
, as that register is updated by the immediately prior instruction.
The first actual register update (of t0
by add
) is available in cycle 5 (1-based counting), yet the decode of the sub
happens in cycle 4.? A forward is required: here it could be from the W stage of the add
to the ALU stage of the sub
-or- it could be done from the M stage of the add
to the D stage of the sub
.
Only in the next cycle after (4th instruction, not shown) could the decode obtain the proper up-to-date value from the earlier instruction's W stage — if the W stage overlaps with a subsequent instruction's D stage, no forward is necessary since the W stage finishes early in the cycle and the D stage is able to pick up that result.
There is also a straightforward ALU-ALU dependency, a (read-after-write) hazard, on t3
between instruction 2 (the writer) and instruction 3 (the reader) that the diagram does not call out, so that is good evidence that the diagram is incomplete with respect to showing all the hazards.
Sometimes educators only show the most clear example of the read-after-write hazard.? There are many other hazards that are often overlooked.
Another involve load hazards.? Normally, a load hazard is seen as requiring both a forward and a stall; this if there is a use of the load result done in the next instruction at the ALU.? However, if a load instruction is succeeded by a store instruction (storing the loaded data), a forward from M (of load) to M of store can mitigate this hazard without a stall (much the same way that X to X forward can mitigate and ALU dependency hazard).
So we might note that a store instruction has two register sources, but the register for the value being stored isn't actually needed until the M stage, whereas the register for the base address computation is needed in the X (ALU) stage.? (That makes store somewhat different from, say, add
which also has two register sources, in that there both are needed for the X stage.)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…