From: Suparna Bhattacharya This patch modifies do_generic_mapping_read to readahead upto ra_pages pages in the range requested upfront for AIO reads before it starts waiting for any of the pages to become uptodate. This leads to sane readahead behaviour and I/O ordering for the kind of I/O patterns generated by streaming AIO reads, by ensuring that I/O for as many consecutive blocks as possible in the first request is issued before before submission of the next request (notice that unlike sync I/O, AIO can't wait for completion of the first request before submitting the next). The patch also takes care not to repeatedly issue readaheads for subsequent AIO retries for the same request. Upfront readahead is clipped to ra_pages (128K) to maintain pipelined behaviour for very large requests, like sendfile of a large file. The tradeoff is that in the cases where individual request sizes exceed ra_pages (typically 128KB) I/O ordering wouldn't be optimal for streaming AIOs. There's a good reason why these changes are limited only to AIO. For sendfile with O_NONBLOCK in a loop, the extra upfront readahead getting issued on every iteration disturbs sequentiality of the readahead pattern resulting in non-optimal behaviour (this showed up as a regression in O_NONBLOCK sendfile for a large file). This isn't likely to be a problem with AIO sendfile when it is implemented because that wouldn't be likely to use O_NONBLOCK. filemap.c | 37 ++++++++++++++++++++++++++++++++++++- 1 files changed, 36 insertions(+), 1 deletion(-) --- 2.6.2-rc3/mm/filemap.c 2004-01-23 14:30:19.000000000 +0530 +++ 2.6.2-rc3-fsaio/mm/filemap.c 2004-02-02 13:10:36.000000000 +0530 @@ -664,6 +664,34 @@ index = *ppos >> PAGE_CACHE_SHIFT; offset = *ppos & ~PAGE_CACHE_MASK; + if (unlikely(in_aio())) { + unsigned long i, last, nr; + /* + * Let the readahead logic know upfront about all + * the pages we'll need to satisfy this request while + * taking care to avoid repeat readaheads during retries. + * Required for reasonable IO ordering with multipage + * streaming AIO requests. + */ + if ((!is_retried_kiocb(io_wait_to_kiocb(current->io_wait))) + || (ra->prev_page + 1 == index)) { + + last = (*ppos + desc->count - 1) >> PAGE_CACHE_SHIFT; + nr = max_sane_readahead(last - index + 1); + + for (i = 0; (i < nr) && ((i == 0)||(i < ra->ra_pages)); + i++) { + page_cache_readahead(mapping, ra, filp, + index + i); + if (bdi_read_congested( + mapping->backing_dev_info)) { + //printk("AIO readahead congestion\n"); + break; + } + } + } + } + for (;;) { struct page *page; unsigned long end_index, nr, ret; @@ -681,7 +709,14 @@ } cond_resched(); - page_cache_readahead(mapping, ra, filp, index); + /* + * Take care to avoid disturbing the existing readahead + * window (concurrent reads may be active for the same fd, + * in the AIO case) + */ + if (!in_aio() || (ra->prev_page + 1 == index)) + page_cache_readahead(mapping, ra, filp, index); + nr = nr - offset; find_page: