From: "Chen, Kenneth W" We hit a memory ordering race condition on AIO ring buffer tail pointer between function aio_complete() and aio_read_evt(). What happens is that on an architecture that has a relaxed memory ordering model like IPF(ia64), explicit memory barrier is required in a SMP execution environment. Considering the following case: 1 CPU is executing a tight loop of aio_read_evt. It is pulling event off the ring buffer. During that loop, another CPU is executing aio_complete() where it is putting event into the ring buffer and then update the tail pointer. However, due to relaxed memory ordering model, the tail pointer can be visible before the actual event is being updated. So the other CPU sees the updated tail pointer but picks up a staled event data. A memory barrier is required in this case between the event data and tail pointer update. Same is true for the head pointer but the window of the race condition is nil. For function correctness, it is fixed here as well. By the way, this bug is fixed in the major distributor's kernel on 2.4.x kernel series for a while, but somehow hasn't been propagated to 2.5 kernel yet. 25-akpm/fs/aio.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) diff -puN fs/aio.c~aio_complete-barrier-fix fs/aio.c --- 25/fs/aio.c~aio_complete-barrier-fix Tue Jul 8 16:06:56 2003 +++ 25-akpm/fs/aio.c Tue Jul 8 16:06:56 2003 @@ -679,12 +679,11 @@ int aio_complete(struct kiocb *iocb, lon /* after flagging the request as done, we * must never even look at it again */ - barrier(); + smp_wmb(); /* make event visible before updating tail */ info->tail = tail; ring->tail = tail; - wmb(); put_aio_ring_event(event, KM_IRQ0); kunmap_atomic(ring, KM_IRQ1); @@ -721,7 +720,7 @@ static int aio_read_evt(struct kioctx *i dprintk("in aio_read_evt h%lu t%lu m%lu\n", (unsigned long)ring->head, (unsigned long)ring->tail, (unsigned long)ring->nr); - barrier(); + if (ring->head == ring->tail) goto out; @@ -732,7 +731,7 @@ static int aio_read_evt(struct kioctx *i struct io_event *evp = aio_ring_event(info, head, KM_USER1); *ent = *evp; head = (head + 1) % info->nr; - barrier(); + smp_mb(); /* finish reading the event before updatng the head */ ring->head = head; ret = 1; put_aio_ring_event(evp, KM_USER1); _