From: Suparna Bhattacharya DESC aio O_DIRECT no readahead EDESC From: Daniel McNeil More testing on AIO with O_DIRECT and /dev/raw/. Doing AIO reads was using a lot more cpu time than using dd on the raw partition even with the io_queue_wait patch. Found out that aio is doing readahead even for O_DIRECT. Here's a patch that fixes it. DESC Unified page range readahead for aio and regular reads EDESC From: Suparna Bhattacharya The open coded readahead logic which was added in aio_pread is best avoided if possible. Duplicating similar checks across sync and aio paths (e.g. checking for O_DIRECT) and the divergence of logic between these paths isn't good from a long term maintainability standpoint. Secondly, this logic really belongs in the generic fops methods for aio_read rather than in the high level aio handlers; it should be possible for a filesystem to override the logic with its own if suitable. So, this patch moves the readahead out of aio_pread, and instead modifies do_generic_mapping_read to readahead _all_ the pages in the range requested upfront before it starts waiting for any of the pages to become uptodate. This leads to sane readahead behaviour for the kind of i/o patterns generated by streaming aio reads. It also takes care not to repeatedly issue readaheads for subsequent AIO retries for the same request. fs/aio.c | 1 + include/linux/aio.h | 3 +++ mm/filemap.c | 18 +++++++++++++----- 3 files changed, 17 insertions(+), 5 deletions(-) diff -puN fs/aio.c~aio-12-readahead fs/aio.c --- 25/fs/aio.c~aio-12-readahead 2004-01-05 18:58:15.000000000 -0800 +++ 25-akpm/fs/aio.c 2004-01-05 18:58:15.000000000 -0800 @@ -28,6 +28,7 @@ #include #include #include +#include #include #include diff -puN mm/filemap.c~aio-12-readahead mm/filemap.c --- 25/mm/filemap.c~aio-12-readahead 2004-01-05 18:58:15.000000000 -0800 +++ 25-akpm/mm/filemap.c 2004-01-05 18:58:15.000000000 -0800 @@ -665,7 +665,8 @@ void do_generic_mapping_read(struct addr read_actor_t actor) { struct inode *inode = mapping->host; - unsigned long index, offset, last; + unsigned long index, offset, last, end_index; + loff_t isize = i_size_read(inode); struct page *cached_page; int error; @@ -673,6 +674,15 @@ void do_generic_mapping_read(struct addr index = *ppos >> PAGE_CACHE_SHIFT; offset = *ppos & ~PAGE_CACHE_MASK; last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT; + end_index = isize >> PAGE_CACHE_SHIFT; + if (last > end_index) + last = end_index; + + /* Don't repeat the readahead if we are executing aio retries */ + if (in_aio()) { + if (is_retried_kiocb(io_wait_to_kiocb(current->io_wait))) + goto done_readahead; + } /* * Let the readahead logic know upfront about all @@ -682,13 +692,11 @@ void do_generic_mapping_read(struct addr page_cache_readahead(mapping, ra, filp, index); index = *ppos >> PAGE_CACHE_SHIFT; +done_readahead: for (;;) { struct page *page; - unsigned long end_index, nr, ret; - loff_t isize = i_size_read(inode); + unsigned long nr, ret; - end_index = isize >> PAGE_CACHE_SHIFT; - if (index > end_index) break; nr = PAGE_CACHE_SIZE; diff -puN include/linux/aio.h~aio-12-readahead include/linux/aio.h --- 25/include/linux/aio.h~aio-12-readahead 2004-01-05 18:58:15.000000000 -0800 +++ 25-akpm/include/linux/aio.h 2004-01-05 18:58:15.000000000 -0800 @@ -184,6 +184,9 @@ int FASTCALL(io_submit_one(struct kioctx dump_stack(); \ } +#define io_wait_to_kiocb(wait) container_of(wait, struct kiocb, ki_wait) +#define is_retried_kiocb(iocb) ((iocb)->ki_retried > 1) + #include static inline struct kiocb *list_kiocb(struct list_head *h) _