From: Miquel van Smoorenburg The VM and VFS use the address_space_backing_dev_info to track the realtime status of the device which backs the mapping. The read_congested and write_congested fields are used to determine whether a read or write against that device may block. We use this infrastructure to a) allow pdflush to service many queues in parallel (by not getting stuck on any particular one) and b) to avoid undesirable and uncontrolled latencies in places such as page reclaim and c) To avoid blocking in readahead operations The current code only supports simple disk queues (and I have a patch here for NFS). Stacked queues (MD and DM) don't get this information right and problems were expected. Efficiency problems have now been noted and it's time to fix it. This patch lays down the infrastructure which permits the queue implementation to get control when someone at a higher level is querying the queue's congestion state. So DM (for example) can run around and examine all the queues which contribute to the higher-level queue. It also adds bdi_rw_congested() for code in xfs and ext2 that calls both bdi_read_congested() and bdi_write_congested() in a row, and it was "free" anyway. --- include/linux/backing-dev.h | 21 +++++++++++++++++++-- 1 files changed, 19 insertions(+), 2 deletions(-) diff -puN include/linux/backing-dev.h~queue-congestion-callout include/linux/backing-dev.h --- 25/include/linux/backing-dev.h~queue-congestion-callout 2004-03-07 15:54:02.000000000 -0800 +++ 25-akpm/include/linux/backing-dev.h 2004-03-07 15:54:02.000000000 -0800 @@ -20,10 +20,14 @@ enum bdi_state { BDI_unused, /* Available bits start here */ }; +typedef int (congested_fn)(void *, int); + struct backing_dev_info { unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */ unsigned long state; /* Always use atomic bitops on this */ int memory_backed; /* Cannot clean pages with writepage */ + congested_fn *congested_fn; /* Function pointer if device is md/dm */ + void *congested_data; /* Pointer to aux data for congested func */ }; extern struct backing_dev_info default_backing_dev_info; @@ -32,14 +36,27 @@ int writeback_acquire(struct backing_dev int writeback_in_progress(struct backing_dev_info *bdi); void writeback_release(struct backing_dev_info *bdi); +static inline int bdi_congested(struct backing_dev_info *bdi, int bdi_bits) +{ + if (bdi->congested_fn) + return bdi->congested_fn(bdi->congested_data, bdi_bits); + return (bdi->state & bdi_bits); +} + static inline int bdi_read_congested(struct backing_dev_info *bdi) { - return test_bit(BDI_read_congested, &bdi->state); + return bdi_congested(bdi, 1 << BDI_read_congested); } static inline int bdi_write_congested(struct backing_dev_info *bdi) { - return test_bit(BDI_write_congested, &bdi->state); + return bdi_congested(bdi, 1 << BDI_write_congested); +} + +static inline int bdi_rw_congested(struct backing_dev_info *bdi) +{ + return bdi_congested(bdi, (1 << BDI_read_congested)| + (1 << BDI_write_congested)); } #endif /* _LINUX_BACKING_DEV_H */ _