io - How do Intel CPUs that use the ring bus topology decode and handle port I/O operations

Question

Welcome To Ask or Share your Answers For Others

io - How do Intel CPUs that use the ring bus topology decode and handle port I/O operations

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

io - How do Intel CPUs that use the ring bus topology decode and handle port I/O operations

I understand Port I/O from a hardware abstraction level (i.e. asserts a pin that indicates to devices on the bus that the address is a port address, which makes sense on earlier CPUs with a simple address bus model) but I'm not really sure how it's implemented on modern CPUs microarchitecturally but also particularly how the Port I/O operation appears on the ring bus.

Firstly. Where does the IN/OUT instruction get allocated to, the reservation station or the load/store buffer? My initial thoughts were that it would be allocated in the load/store buffer and the memory scheduler recognises it, sends it to the L1d indicating that it is a port-mapped operation. A line fill buffer is allocated and it gets sent to L2 and then to the ring. I'm guessing that the message on the ring has some port-mapped indicator which only the system agent accepts and then it checks its internal components and relays the port-mapped indicated request to them; i.e. PCIe root bridge would pick up CF8h and CFCh. I'm guessing the DMI controller is fixed to pick up all the standardised ports that will appear on the PCH, such as the one for the legacy DMA controller.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:42:51+0000

Yes, I assume the message over the ring bus has some kind of tag that flags it as being to I/O space, not a physical memory address, and that the system agent sorts this out.

If anyone knows more details, that might be interesting, but this simple mental model is probably fine.

I don't know how port I/O turns into PCIe messages, but I think PCIe devices can have I/O ports in I/O space, not just MMIO.

IN/OUT are pretty close to serializing (but not officially defined using that term for some reason How many memory barriers instructions does an x86 CPU have?). They do drain the store buffer before executing, and are full memory barriers.

the reservation station or the load/store buffer?

Both. For normal loads/stores, the front-end allocates a load buffer entry for a load, or a store buffer entry for a store, and issues the uop into the ROB and RS.

For example, when the RS dispatches a store-address or store-data uop to port 4 (store-data) or p2/p3 (load or store-address), that execution unit will use the store-buffer entry as the place where it writes the data, or where it writes the address.

Having the store-buffer entry allocated by the issue/allocate/rename logic means that either store-address or store-data can execute first, whichever one has its inputs ready first, and free its space in the RS after completing successfully. The ROB entry stays allocated until the store retires. The store buffer entry stays allocated until some time after that, when the store commits to L1d cache. (Or for a store to uncacheable memory, commits to an LFB or something to be send out the memory hierarchy where the system agent will pick it up if it's to a MMIO region.)

Obviously IN/OUT are micro-coded as multiple uops, and all those uops are allocated in the ROB and reservation station as they issue from the front-end, like any other uop. (Well, some of them might not need a back-end execution unit, in which case they'd only be allocated in the ROB in an already-executed state. e.g. the uops for lfence are like this on Skylake.)

I'd assume they use the normal store buffer / load buffer mechanism for communicating off-core, but since they're more or less serializing there's no real performance implication to how they're implemented. (Later instructions can't start executing until after the "data phase" of the I/O transaction, and they drain the store buffer before executing.)

Categories

io - How do Intel CPUs that use the ring bus topology decode and handle port I/O operations

io - How do Intel CPUs that use the ring bus topology decode and handle port I/O operations

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags