Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
229 views
in Technique[技术] by (71.8m points)

performance - asynchronous IO io_submit latency in Ubuntu Linux

I am looking for advice on how to get efficient and high performance asynchronous IO working for my application that runs on Ubuntu Linux 14.04.

My app processes transactions and creates a file on disk/flash. As the app is progressing through transactions additional blocks are created that must be appended to the file on disk/flash. The app needs also to frequently read blocks of this file as it is processing new transactions. Each transaction might need to read a different block from this file in addition to also creating a new block that has to be appended to this file. There is an incoming queue of transactions and the app can continue to process transactions from the queue to create a deep enough pipeline of IO ops to hide the latency of read accesses or write completions on disk or flash. For a read of a block (which was put in the write queue by a previous transaction) that has not yet been written to disk/flash, the app will stall until the corresponding write completes.

I have an important performance objective – the app should incur the lowest possible latency to issue the IO operation. My app takes approximately 10 microseconds to process each transaction and be ready to issue a write to or a read from the file on disk/flash. The additional latency to issue an asynchronous read or write should be as small as possible so that the app can complete processing each transaction at a rate as close to 10 usecs per transaction as possible, when only a file write is needed.

We are experimenting with an implementation that uses io_submit to issue write and read requests. I would appreciate any suggestions or feedback on the best approach for our requirement. Is io_submit going to give us the best performance to meet our objective? What should I expect for the latency of each write io_submit and the latency of each read io_submit?

Using our experimental code (running on a 2.3 GHz Haswell Macbook Pro, Ubuntu Linux 14.04), we are measuring about 50 usecs for a write io_submit when extending the output file. This is too long and we aren't even close to our performance requirements. Any guidance to help me launch a write request with the least latency will be greatly appreciated.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Linux AIO (sometimes known as KAIO or libaio) is something of a black art where experienced practitioners know the pitfalls but for some reason it's taboo to tell someone about gotchas they don't already know. From scratching around on the web and experience I've come up with a few examples where Linux's asynchronous I/O submission via io_submit() may become (silently) synchronous, thereby turning it into a blocking (i.e. no longer fast) call:

  1. You're submitting buffered (aka non-direct) I/O. You're at the mercy of Linux's caching and your submit can go synchronous when:
    • What you're reading isn't already in the "read cache".
    • The "write cache" is full and the new write request can't be accepted until some existing writeback has been completed.
  2. You're asking for direct I/O to a file in a filesystem but for whatever reason the filesystem decides to ignore the O_DIRECT "hint" (e.g. how you submitted the I/O didn't meet O_DIRECT alignment constraints, filesystem or particular filesystem's configuration doesn't support O_DIRECT) and it chooses to silently perform buffered I/O instead, resulting in the case above.
  3. You're doing direct I/O to a file in a filesystem but the filesystem has to do a synchronous operation (such as reading metadata/updating metadata via writeback) in order to fulfill your I/O. A common example of this is issuing an "allocating write" (e.g. because you're appending/extending the end of a file or filling in an unallocated hole) and this sounds like what the questioner is doing ("appended to the file"). Some filesystems such as XFS try harder to provide good AIO behaviour but even there a user has to be careful to avoid sending certain operations to the filesystem in parallel otherwise io_submit() again will turn into a blocking call while the other operation completes. The Seastar framework contains a small lookup table of filesystem specific cases.
  4. You're submitting too much outstanding I/O. Your disk/disk controller will have a maximum number of I/O requests that can be processed at the same time and there are maximum request queue sizes for each specific device (see the /sys/block/[disk]/queue/nr_requests documentation and the un(der) documented /sys/block/[disk]/device/queue_depth) within the kernel. Making I/O requests back-up and exceed the size of the kernel queues leads to blocking.
    • If you submit I/Os that are "too large" (e.g. bigger than /sys/block/[disk]/queue/max_sectors_kb but the true limit may be something smaller like 512 KiB) they will be split up within the block layer and go on to chew up more than one request.
    • The system global maximum number of concurrent AIO requests (see the /proc/sys/fs/aio-max-nr documentation) can also have an impact but the result will be seen in io_setup() rather than io_submit().
  5. A layer in the Linux block device stack between the request and the submission to the disk has to block. For example, things like Linux software RAID (md) can make I/O requests passing through it stall while updating RAID 1 metadata on individual disks.
  6. Your submission causes the kernel to wait because:
    • It needs to take a particular lock (e.g. i_rwsem) that is in use.
    • It needs to allocate some extra memory or page something in.
  7. You're submitting I/O to a file descriptor that's not a "regular" file or a block device (e.g. your descriptor is a pipe or a socket).

The list above is not exhaustive.

With >= 4.14 kernels the RWF_NONBLOCK flag can be used to make some of the blocking scenarios above noisy. For example, when using buffering and trying to read data not yet in the page cache, the RWF_NONBLOCK flag will cause submission to fail with EAGAIN when blocking would otherwise occur. Obviously you still a) need a 4.14 (or later) kernel that supports this flag and b) have to be aware of the cases it doesn't cover. I notice there are patches that have been accepted or are being proposed to return EAGAIN in more scenarios that would otherwise block but at the time of writing (2019) RWF_NONBLOCK is not supported for buffered filesystem writes.

Alternatives

If your kernel is >=5.1, you could try using io_uring which does far better at not blocking on submission (it's an entirely different interface and was new in 2020).

References

Related:

Hopefully this post helps someone (and if does help you could you upvote it? Thanks!).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...