diff options
author | Andrew Morton <akpm@osdl.org> | 2004-05-09 23:23:54 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2004-05-09 23:23:54 -0700 |
commit | 8c136f71934b05aebb7ffb9631415b74f0906bad (patch) | |
tree | 768560d3a765886c3d8fec71fc3365a9687638f7 /init | |
parent | 067e0480fda41baaf5b6b2d8c2066848e775d457 (diff) | |
download | history-8c136f71934b05aebb7ffb9631415b74f0906bad.tar.gz |
[PATCH] sched: scheduler domain support
From: Nick Piggin <piggin@cyberone.com.au>
This is the core sched domains patch. It can handle any number of levels
in a scheduling heirachy, and allows architectures to easily customize how
the scheduler behaves. It also provides progressive balancing backoff
needed by SGI on their large systems (although they have not yet tested
it).
It is built on top of (well, uses ideas from) my previous SMP/NUMA work, and
gets results very similar to them when using the default scheduling
description.
Benchmarks
==========
Martin was seeing I think 10-20% better system times in kernbench on the 32
way. I was seeing improvements in dbench, tbench, kernbench, reaim,
hackbench on a 16-way NUMAQ. Hackbench in fact had a non linear element
which is all but eliminated. Large improvements in volanomark.
Cross node task migration was decreased in all above benchmarks, sometimes by
a factor of 100!! Cross CPU migration was also generally decreased. See
this post:
http://groups.google.com.au/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=a406c910b30cbac4&seekm=UAdQ.3hj.5%40gated-at.bofh.it#link2
Results on a hyperthreading P4 are equivalent to Ingo's shared runqueues
patch (which is a big improvement).
Some examples on the 16-way NUMAQ (this is slightly older sched domain code):
http://www.kerneltrap.org/~npiggin/w26/hbench.png
http://www.kerneltrap.org/~npiggin/w26/vmark.html
From: Jes Sorensen <jes@wildopensource.com>
Tiny patch to make -mm3 compile on an NUMA box with NR_CPUS >
BITS_PER_LONG.
From: "Martin J. Bligh" <mbligh@aracnet.com>
Fix a minor nit with the find_busiest_group code. No functional change,
but makes the code simpler and clearer. This patch does two things ...
adds some more expansive comments, and removes this if clause:
if (*imbalance < SCHED_LOAD_SCALE
&& max_load - this_load > SCHED_LOAD_SCALE)
*imbalance = SCHED_LOAD_SCALE;
If we remove the scaling factor, we're basically conditionally doing:
if (*imbalance < 1)
*imbalance = 1;
Which is pointless, as the very next thing we do is to remove the
scaling factor, rounding up to the nearest integer as we do:
*imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT;
Thus the if statement is redundant, and only makes the code harder to
read ;-)
From: Rick Lindsley <ricklind@us.ibm.com>
In find_busiest_group(), after we exit the do/while, we select our
imbalance. But max_load, avg_load, and this_load are all unsigned, so
min(x,y) will make a bad choice if max_load < avg_load < this_load (that
is, a choice between two negative [very large] numbers).
Unfortunately, there is a bug when max_load never gets changed from zero
(look in the loop and think what happens if the only load on the machine is
being created by cpu groups of which we are a member). And you have a
recipe for some really bogus values for imbalance.
Even if you fix the max_load == 0 bug, there will still be times when
avg_load - this_load will be negative (thus very large) and you'll make the
decision to move stuff when you shouldn't have.
This patch allows for this_load to set max_load, which if I understand
the logic properly is correct. With this patch applied, the algorithm is
*much* more conservative ... maybe *too* conservative but that's for
another round of testing ...
From: Ingo Molnar <mingo@elte.hu>
sched-find-busiest-fix
Diffstat (limited to 'init')
-rw-r--r-- | init/main.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/init/main.c b/init/main.c index 77e962511616e6..6f2ccbff72a901 100644 --- a/init/main.c +++ b/init/main.c @@ -567,7 +567,6 @@ static void do_pre_smp_initcalls(void) migration_init(); #endif - node_nr_running_init(); spawn_ksoftirqd(); } @@ -596,6 +595,7 @@ static int init(void * unused) do_pre_smp_initcalls(); smp_init(); + sched_init_smp(); /* * Do this before initcalls, because some drivers want to access |