From: Suparna Bhattacharya DESC aio O_DIRECT no readahead EDESC From: Daniel McNeil More testing on AIO with O_DIRECT and /dev/raw/. Doing AIO reads was using a lot more cpu time than using dd on the raw partition even with the io_queue_wait patch. Found out that aio is doing readahead even for O_DIRECT. Here's a patch that fixes it. DESC Unified page range readahead for aio and regular reads EDESC From: Suparna Bhattacharya The open coded readahead logic which was added in aio_pread is best avoided if possible. Duplicating similar checks across sync and aio paths (e.g. checking for O_DIRECT) and the divergence of logic between these paths isn't good from a long term maintainability standpoint. Secondly, this logic really belongs in the generic fops methods for aio_read rather than in the high level aio handlers; it should be possible for a filesystem to override the logic with its own if suitable. So, this patch moves the readahead out of aio_pread, and instead modifies do_generic_mapping_read to readahead _all_ the pages in the range requested upfront before it starts waiting for any of the pages to become uptodate. This leads to sane readahead behaviour for the kind of i/o patterns generated by streaming aio reads. It also takes care not to repeatedly issue readaheads for subsequent AIO retries for the same request. fs/aio.c | 1 + include/linux/aio.h | 3 +++ mm/filemap.c | 29 +++++++++++++++++++++++------ 3 files changed, 27 insertions(+), 6 deletions(-) diff -puN fs/aio.c~aio-12-readahead fs/aio.c --- 25/fs/aio.c~aio-12-readahead 2003-10-03 22:17:49.000000000 -0700 +++ 25-akpm/fs/aio.c 2003-10-03 22:17:50.000000000 -0700 @@ -28,6 +28,7 @@ #include #include #include +#include #include #include diff -puN mm/filemap.c~aio-12-readahead mm/filemap.c --- 25/mm/filemap.c~aio-12-readahead 2003-10-03 22:17:50.000000000 -0700 +++ 25-akpm/mm/filemap.c 2003-10-03 22:17:50.000000000 -0700 @@ -660,21 +660,39 @@ void do_generic_mapping_read(struct addr read_actor_t actor) { struct inode *inode = mapping->host; - unsigned long index, offset; + unsigned long index, offset, last, end_index; struct page *cached_page; + loff_t isize = i_size_read(inode); int error; cached_page = NULL; index = *ppos >> PAGE_CACHE_SHIFT; offset = *ppos & ~PAGE_CACHE_MASK; + last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT; + end_index = isize >> PAGE_CACHE_SHIFT; + if (last > end_index) + last = end_index; + + /* Don't repeat the readahead if we are executing aio retries */ + if (in_aio()) { + if (is_retried_kiocb(io_wait_to_kiocb(current->io_wait))) + goto done_readahead; + } + + /* + * Let the readahead logic know upfront about all + * the pages we'll need to satisfy this request + */ + for (; index < last; index++) + page_cache_readahead(mapping, ra, filp, index); + index = *ppos >> PAGE_CACHE_SHIFT; + +done_readahead: for (;;) { struct page *page; - unsigned long end_index, nr, ret; - loff_t isize = i_size_read(inode); + unsigned long nr, ret; - end_index = isize >> PAGE_CACHE_SHIFT; - if (index > end_index) break; nr = PAGE_CACHE_SIZE; @@ -685,7 +703,6 @@ void do_generic_mapping_read(struct addr } cond_resched(); - page_cache_readahead(mapping, ra, filp, index); nr = nr - offset; find_page: diff -puN include/linux/aio.h~aio-12-readahead include/linux/aio.h --- 25/include/linux/aio.h~aio-12-readahead 2003-10-03 22:17:50.000000000 -0700 +++ 25-akpm/include/linux/aio.h 2003-10-03 22:17:50.000000000 -0700 @@ -184,6 +184,9 @@ int FASTCALL(io_submit_one(struct kioctx dump_stack(); \ } +#define io_wait_to_kiocb(wait) container_of(wait, struct kiocb, ki_wait) +#define is_retried_kiocb(iocb) ((iocb)->ki_retried > 1) + #include static inline struct kiocb *list_kiocb(struct list_head *h) _