[PATCH] sched: scheduler domain support

From: Nick Piggin <piggin@cyberone.com.au> This is the core sched domains patch. It can handle any number of levels in a scheduling heirachy, and allows architectures to easily customize how the scheduler behaves. It also provides progressive balancing backoff needed by SGI on their large systems (although they have not yet tested it). It is built on top of (well, uses ideas from) my previous SMP/NUMA work, and gets results very similar to them when using the default scheduling description. Benchmarks ========== Martin was seeing I think 10-20% better system times in kernbench on the 32 way. I was seeing improvements in dbench, tbench, kernbench, reaim, hackbench on a 16-way NUMAQ. Hackbench in fact had a non linear element which is all but eliminated. Large improvements in volanomark. Cross node task migration was decreased in all above benchmarks, sometimes by a factor of 100!! Cross CPU migration was also generally decreased. See this post: http://groups.google.com.au/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=a406c910b30cbac4&seekm=UAdQ.3hj.5%40gated-at.bofh.it#link2 Results on a hyperthreading P4 are equivalent to Ingo's shared runqueues patch (which is a big improvement). Some examples on the 16-way NUMAQ (this is slightly older sched domain code): http://www.kerneltrap.org/~npiggin/w26/hbench.png http://www.kerneltrap.org/~npiggin/w26/vmark.html From: Jes Sorensen <jes@wildopensource.com> Tiny patch to make -mm3 compile on an NUMA box with NR_CPUS > BITS_PER_LONG. From: "Martin J. Bligh" <mbligh@aracnet.com> Fix a minor nit with the find_busiest_group code. No functional change, but makes the code simpler and clearer. This patch does two things ... adds some more expansive comments, and removes this if clause: if (*imbalance < SCHED_LOAD_SCALE && max_load - this_load > SCHED_LOAD_SCALE) *imbalance = SCHED_LOAD_SCALE; If we remove the scaling factor, we're basically conditionally doing: if (*imbalance < 1) *imbalance = 1; Which is pointless, as the very next thing we do is to remove the scaling factor, rounding up to the nearest integer as we do: *imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT; Thus the if statement is redundant, and only makes the code harder to read ;-) From: Rick Lindsley <ricklind@us.ibm.com> In find_busiest_group(), after we exit the do/while, we select our imbalance. But max_load, avg_load, and this_load are all unsigned, so min(x,y) will make a bad choice if max_load < avg_load < this_load (that is, a choice between two negative [very large] numbers). Unfortunately, there is a bug when max_load never gets changed from zero (look in the loop and think what happens if the only load on the machine is being created by cpu groups of which we are a member). And you have a recipe for some really bogus values for imbalance. Even if you fix the max_load == 0 bug, there will still be times when avg_load - this_load will be negative (thus very large) and you'll make the decision to move stuff when you shouldn't have. This patch allows for this_load to set max_load, which if I understand the logic properly is correct. With this patch applied, the algorithm is *much* more conservative ... maybe *too* conservative but that's for another round of testing ... From: Ingo Molnar <mingo@elte.hu> sched-find-busiest-fix
author: Andrew Morton <akpm@osdl.org> 2004-05-09 23:23:54 -0700
committer: Linus Torvalds <torvalds@ppc970.osdl.org> 2004-05-09 23:23:54 -0700
commit: 8c136f71934b05aebb7ffb9631415b74f0906bad (patch)
tree: 768560d3a765886c3d8fec71fc3365a9687638f7 /init
parent: 067e0480fda41baaf5b6b2d8c2066848e775d457 (diff)
download: history-8c136f71934b05aebb7ffb9631415b74f0906bad.tar.gz
1 files changed, 1 insertions, 1 deletions
diff --git a/init/main.c b/init/main.c
index 77e962511616e6..6f2ccbff72a901 100644
--- a/init/main.c
+++ b/init/main.c
@@ -567,7 +567,6 @@ static void do_pre_smp_initcalls(void)
 
 	migration_init();
 #endif
-	node_nr_running_init();
 	spawn_ksoftirqd();
 }
 
@@ -596,6 +595,7 @@ static int init(void * unused)
 	do_pre_smp_initcalls();
 
 	smp_init();
+	sched_init_smp();
 
 	/*
 	 * Do this before initcalls, because some drivers want to access
author	Andrew Morton <akpm@osdl.org>	2004-05-09 23:23:54 -0700
committer	Linus Torvalds <torvalds@ppc970.osdl.org>	2004-05-09 23:23:54 -0700
commit	8c136f71934b05aebb7ffb9631415b74f0906bad (patch)
tree	768560d3a765886c3d8fec71fc3365a9687638f7 /init
parent	067e0480fda41baaf5b6b2d8c2066848e775d457 (diff)
download	history-8c136f71934b05aebb7ffb9631415b74f0906bad.tar.gz