From: NeilBrown <neilb@cse.unsw.edu.au>

From: "J. Bruce Fields" <bfields@fieldses.org>

Currently we are counting the number of threads already asleep and returning
an immediate NFS4ERR_DELAY (==JUKEBOX) error if more than half are already
asleep.

This patch removes that logic, so instead we only return NFS4ERR_DELAY if an
upcall times out (if it takes more than a second to return).

With the thread counting there is the risk that even when all the relevant
subsystems are responsive, the client may still see occasional NFS4ERR_DELAY
returns just because, by coincidence, several upcalls were initiated at the
same time.  I expect clients will delay several seconds before retrying after
NFS4ERR_DELAY, so this will be quite noticeable to users.  Sporadic long
delays like this are likely to lead users to suspect a problem somewhere, when
in fact there is none.

The current scheme ensures that we can still process requests not depending on
upcalls, even when all threads would otherwise be tied up waiting on upcalls. 
However, this is not something that should happen under normal circumstances;
if a server spends a significant portion of its time with all threads waiting
for upcalls, this a sign that something is seriously wrong.

In such a circumstance (e.g., an ldap server dies), we can, at least, bound
the waiting time to a second without the need for counting threads.

In short, removing the thread-counting will allow us to behave predictably
when things are working, while still allowing some progress when they don't.

It would be a worthwhile project to measure the amount of time threads spend
waiting for upcalls (or for reads, for that matter); if a significant portion
of the time they spend handling requests is spent sleeping, then there's an
opportunity to improve nfsd performance: if we can break the one-to-one
mapping between requests and threads, then we can lower the number of threads
required to keep the nfs server busy.

However, both the currently available options for doing this are problematic:
returning JUKEBOX/DELAY errors at random times will lead to unpredictable
performance, and saving a copy of the request to be processed from scratch
again later is wasteful and makes it difficult to provide correct semantics,
especially in the NFSv4 case.

So for now I believe waits with short timeouts are the best option.


---

 25-akpm/fs/nfsd/nfs4idmap.c |   27 ++-------------------------
 1 files changed, 2 insertions(+), 25 deletions(-)

diff -puN fs/nfsd/nfs4idmap.c~knfsd-9-of-10-remove-check-on-number-of-threads-waiting-on-user-space fs/nfsd/nfs4idmap.c
--- 25/fs/nfsd/nfs4idmap.c~knfsd-9-of-10-remove-check-on-number-of-threads-waiting-on-user-space	Tue May 18 15:27:53 2004
+++ 25-akpm/fs/nfsd/nfs4idmap.c	Tue May 18 15:27:53 2004
@@ -454,30 +454,6 @@ do_idmap_lookup(struct ent *(*lookup_fn)
 	return cache_check(detail, &(*item)->h, &mdr->req);
 }
 
-static int threads_waiting = 0;
-
-static inline int
-idmap_lookup_wait(struct idmap_defer_req *mdr, struct svc_rqst *rqstp,
-		struct ent *item))
-{
-	DEFINE_WAIT(wait);
-
-	prepare_to_wait(&mdr->waitq, &wait, TASK_INTERRUPTIBLE);
-	/* XXX: Does it matter that threads_waiting isn't per-server? */
-	/* Note: BKL prevents races with nfsd_svc and other lookups */
-	lock_kernel();
-	if (!test_bit(CACHE_VALID, &item->h.flags)) {
-		if (2 * threads_waiting > rqstp->rq_server->sv_nrthreads)
-			goto out;
-		threads_waiting++;
-		schedule_timeout(1 * HZ);
-		threads_waiting--;
-	}
-out:
-	unlock_kernel();
-	finish_wait(&mdr->waitq, &wait);
-}
-
 static inline int
 do_idmap_lookup_nowait(struct ent *(*lookup_fn)(struct ent *, int),
 			struct ent *key, struct cache_detail *detail,
@@ -521,7 +497,8 @@ idmap_lookup(struct svc_rqst *rqstp,
 	mdr->req.defer = idmap_defer;
 	ret = do_idmap_lookup(lookup_fn, key, detail, item, mdr);
 	if (ret == -EAGAIN) {
-		idmap_wait(mdr, rqstp, *item);
+		wait_event_interruptible_timeout(mdr->waitq,
+			test_bit(CACHE_VALID, &(*item)->h.flags), 1 * HZ);
 		ret = do_idmap_lookup_nowait(lookup_fn, key, detail, item);
 	}
 	put_mdr(mdr);

_