From: Rusty Russell Stephen Rothwell noted a case where one CPU was sitting in userspace, one in stop_machine() waiting for everyone to enter stopmachine(). This can happen if migration occurs at exactly the wrong time with more than 2 CPUS. Say we have 4 CPUS: 1) stop_machine() on CPU 0creates stopmachine() threads for CPUS 1, 2 and 3, and yields waiting for them to migrate to their CPUs and ack. 2) stopmachine(2) gets rebalanced (probably on exec) to CPU 1. 3) stopmachine(2) calls set_cpus_allowed on CPU 1, sleeps awaiting migration thread. 4) stopmachine(1) calls set_cpus_allowed on CPU 0, moves onto CPU1 and starts spinning. Now the migration thread never runs, and we deadlock. The simplest solution is for stopmachine() to yield until they are all in place. Signed-off-by: Rusty Russell Signed-off-by: Andrew Morton --- 25-akpm/kernel/stop_machine.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletion(-) diff -puN kernel/stop_machine.c~fix-occasional-stop_machine-lockup-with-2-cpus kernel/stop_machine.c --- 25/kernel/stop_machine.c~fix-occasional-stop_machine-lockup-with-2-cpus 2004-11-29 21:44:09.977833880 -0800 +++ 25-akpm/kernel/stop_machine.c 2004-11-29 21:44:09.981833272 -0800 @@ -52,7 +52,12 @@ static int stopmachine(void *cpu) mb(); /* Must read state first. */ atomic_inc(&stopmachine_thread_ack); } - cpu_relax(); + /* Yield in first stage: migration threads need to + * help our sisters onto their CPUs. */ + if (!prepared && !irqs_disabled) + yield(); + else + cpu_relax(); } /* Ack: we are exiting. */ _