From: NeilBrown From: "J. Bruce Fields" Currently we are counting the number of threads already asleep and returning an immediate NFS4ERR_DELAY (==JUKEBOX) error if more than half are already asleep. This patch removes that logic, so instead we only return NFS4ERR_DELAY if an upcall times out (if it takes more than a second to return). With the thread counting there is the risk that even when all the relevant subsystems are responsive, the client may still see occasional NFS4ERR_DELAY returns just because, by coincidence, several upcalls were initiated at the same time. I expect clients will delay several seconds before retrying after NFS4ERR_DELAY, so this will be quite noticeable to users. Sporadic long delays like this are likely to lead users to suspect a problem somewhere, when in fact there is none. The current scheme ensures that we can still process requests not depending on upcalls, even when all threads would otherwise be tied up waiting on upcalls. However, this is not something that should happen under normal circumstances; if a server spends a significant portion of its time with all threads waiting for upcalls, this a sign that something is seriously wrong. In such a circumstance (e.g., an ldap server dies), we can, at least, bound the waiting time to a second without the need for counting threads. In short, removing the thread-counting will allow us to behave predictably when things are working, while still allowing some progress when they don't. It would be a worthwhile project to measure the amount of time threads spend waiting for upcalls (or for reads, for that matter); if a significant portion of the time they spend handling requests is spent sleeping, then there's an opportunity to improve nfsd performance: if we can break the one-to-one mapping between requests and threads, then we can lower the number of threads required to keep the nfs server busy. However, both the currently available options for doing this are problematic: returning JUKEBOX/DELAY errors at random times will lead to unpredictable performance, and saving a copy of the request to be processed from scratch again later is wasteful and makes it difficult to provide correct semantics, especially in the NFSv4 case. So for now I believe waits with short timeouts are the best option. --- 25-akpm/fs/nfsd/nfs4idmap.c | 27 ++------------------------- 1 files changed, 2 insertions(+), 25 deletions(-) diff -puN fs/nfsd/nfs4idmap.c~knfsd-9-of-10-remove-check-on-number-of-threads-waiting-on-user-space fs/nfsd/nfs4idmap.c --- 25/fs/nfsd/nfs4idmap.c~knfsd-9-of-10-remove-check-on-number-of-threads-waiting-on-user-space Tue May 18 15:27:53 2004 +++ 25-akpm/fs/nfsd/nfs4idmap.c Tue May 18 15:27:53 2004 @@ -454,30 +454,6 @@ do_idmap_lookup(struct ent *(*lookup_fn) return cache_check(detail, &(*item)->h, &mdr->req); } -static int threads_waiting = 0; - -static inline int -idmap_lookup_wait(struct idmap_defer_req *mdr, struct svc_rqst *rqstp, - struct ent *item)) -{ - DEFINE_WAIT(wait); - - prepare_to_wait(&mdr->waitq, &wait, TASK_INTERRUPTIBLE); - /* XXX: Does it matter that threads_waiting isn't per-server? */ - /* Note: BKL prevents races with nfsd_svc and other lookups */ - lock_kernel(); - if (!test_bit(CACHE_VALID, &item->h.flags)) { - if (2 * threads_waiting > rqstp->rq_server->sv_nrthreads) - goto out; - threads_waiting++; - schedule_timeout(1 * HZ); - threads_waiting--; - } -out: - unlock_kernel(); - finish_wait(&mdr->waitq, &wait); -} - static inline int do_idmap_lookup_nowait(struct ent *(*lookup_fn)(struct ent *, int), struct ent *key, struct cache_detail *detail, @@ -521,7 +497,8 @@ idmap_lookup(struct svc_rqst *rqstp, mdr->req.defer = idmap_defer; ret = do_idmap_lookup(lookup_fn, key, detail, item, mdr); if (ret == -EAGAIN) { - idmap_wait(mdr, rqstp, *item); + wait_event_interruptible_timeout(mdr->waitq, + test_bit(CACHE_VALID, &(*item)->h.flags), 1 * HZ); ret = do_idmap_lookup_nowait(lookup_fn, key, detail, item); } put_mdr(mdr); _