diff options
author | Nick Piggin <nickpiggin@yahoo.com.au> | 2004-06-04 20:58:39 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2004-06-04 20:58:39 -0700 |
commit | a2ea2d4ce970a2a59b3d3a7ef5f69d314db69298 (patch) | |
tree | 57fc5d7ba1cc329a1a59ee106a05fc77bcf2fbdc /kernel | |
parent | a9b7c3577aa273dba2d44a6286cb39dc2b5d3d0e (diff) | |
download | history-a2ea2d4ce970a2a59b3d3a7ef5f69d314db69298.tar.gz |
[PATCH] sched: improve wakeup-affinity
David Mosberger noticed bw_pipe was way down on sched-domains kernels on
SMP systems.
That is due to two things: first, the previous wake-affine logic would
*always* move a pipe wakee onto the waker's CPU. With the scheduler
rework, this was toned down a lot (but extended to all types of wakeups).
One of the ways this was damped was with the logic: don't move the wakee if
its CPU is relatively idle compared to the waker's CPU. Without this, some
workloads would pile everything up onto a few CPUs and get lots of idle
time.
However, the fix was a bit of a blunt hack: if the wakee runqueue was below
50% busy, and the waker's was above 50% busy, we wouldn't do the move. I
think a better way to capture it is what this patch does: if the wakee
runqueue is below 100% busy, and the sum of the two runqueue's loads is
above 100% busy, and the wakee runqueue is less busy than the waker
runqueue (ie. CPU utilisation would drop if we do the move), then we don't
do the move.
After I fixed this, I found things were still getting bounced around quite
a bit. The reason is that we were attempting very aggressive idle
balancing in order to cut down idle time in a dbt2-pgsql workload, which is
particularly sensitive to idle.
After having Mark Wong (markw@osdl.org) retest this load with this patch,
it looks like we don't need to be so aggressive. I'm glad to be rid of
this because it never sat too well with me. We should see slightly lower
cost of schedule and slightly improved cache impact with this change too.
Mark said:
---
This looks pretty good:
metric kernel
2334 2.6.7-rc2
2298 2.6.7-rc2-mm2
2329 2.6.7-rc2-mm2-sched-more-wakeaffine
---
ie. within the noise.
David said:
---
Oooh, me likeee!
Host OS Pipe AF
UNIX
--------- ------------- ---- ----
caldera.h Linux 2.6.6 3424 2057 (plain 2.6.6)
caldera.h Linux 2.6.7-r 333. 1402 (original 2.6.7-rc1)
caldera.h Linux 2.6.7-r 3086 4301 (2.6.7-rc1 with your patch)
Pipe-bandwidth is still down about 10% but that may be due to
unrelated changes (or perhaps warmup effects?). The AF UNIX bandwidth
is just mindboggling. Moreover, with your patch 2.6.7-rc1 shows
better context-switch times and lower communication latencies (more
like the numbers you're getting on UP).
So it seems like the overall balance of keeping things on the same CPU
vs. distributing them across CPUs is improved.
---
I also ran some tests on the NUMAQ. kernbench, dbench, hackbench, reaim
were much the same. tbench was improved, very much so when clients < NR_CPU.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/sched.c | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/kernel/sched.c b/kernel/sched.c index 019e0db3e0bf93..e632c92fd80f50 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -763,7 +763,8 @@ static int try_to_wake_up(task_t * p, unsigned int state, int sync) this_load = target_load(this_cpu); /* Don't pull the task off an idle CPU to a busy one */ - if (load < SCHED_LOAD_SCALE/2 && this_load > SCHED_LOAD_SCALE/2) + if (load < SCHED_LOAD_SCALE && load + this_load > SCHED_LOAD_SCALE + && this_load > load) goto out_set_cpu; new_cpu = this_cpu; /* Wake to this CPU if we can */ @@ -1625,8 +1626,7 @@ nextgroup: return busiest; out_balanced: - if (busiest && (idle == NEWLY_IDLE || - (idle == IDLE && max_load > SCHED_LOAD_SCALE)) ) { + if (busiest && idle != NOT_IDLE && max_load > SCHED_LOAD_SCALE) { *imbalance = 1; return busiest; } |