diff options
author | Andrew Morton <akpm@osdl.org> | 2004-05-09 23:26:19 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2004-05-09 23:26:19 -0700 |
commit | 91bc0bf78432c2f0350adf355324bdbc822bc35f (patch) | |
tree | 0f07870209357097b5c57a67a7da7f2c3b9cf173 /kernel | |
parent | 850f7d78d3912cbba31e4f6c266f1182a7ae8a20 (diff) | |
download | history-91bc0bf78432c2f0350adf355324bdbc822bc35f.tar.gz |
[PATCH] Hotplug CPU sched_balance_exec Fix
From: Rusty Russell <rusty@rustcorp.com.au>
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
From: Andrew Morton <akpm@osdl.org>
From: Rusty Russell <rusty@rustcorp.com.au>
We want to get rid of lock_cpu_hotplug() in sched_migrate_task. Found
that lockless migration of execing task is _extremely_ racy. The
races I hit are described below, alongwith probable solutions.
Task migration done elsewhere should be safe (?) since they either
hold the lock (sys_sched_setaffinity) or are done entirely with preemption
disabled (load_balance).
sched_balance_exec does:
a. disables preemption
b. finds new_cpu for current
c. enables preemption
d. calls sched_migrate_task to migrate current to new_cpu
and sched_migrate_task does:
e. task_rq_lock(p)
f. migrate_task(p, dest_cpu ..)
(if we have to wait for migration thread)
g. task_rq_unlock()
h. wake_up_process(rq->migration_thread)
i. wait_for_completion()
Several things can happen here:
1. new_cpu can go down after h and before migration thread has
got around to handle the request
==> we need to add a cpu_is_offline check in __migrate_task
2. new_cpu can go down between c and d or before f.
===> Even though this case is automatically handled by the above
change (migrate_task being called on a running task, current,
will delegate migration to migration thread), would it be
good practice to avoid calling migrate_task in the first place
itself when dest_cpu is offline. This means adding another
cpu_is_offline check after e in sched_migrate_task
3. The 'current' task can get preempted _immediately_ after
g and when it comes back, task_cpu(p) can be dead. In
which case, it is invalid to do wake_up on a non-existent migration
thread. (rq->migration_thread can be NULL).
===> We should disable preemption thr' g and h
4. Before migration thread gets around to handle the request, its cpu
goes dead. This will leave unhandled migration requests in the dead
cpu.
===> We need to wakeup sleeping requestors (if any) in CPU_DEAD
notification.
I really wonder if we can get rid of these issues by avoiding balancing at
exec time and instead have it balanced during load_balance ..Alternately
if this is valuable and we want to retain it, I think we still need to
consider a read/write sem, with sched_migrate_task doing down_read_trylock.
This may eliminate the deadlock I hit between cpu_up and CPU_UP_PREPARE
notification, which had forced me away from r/w sem.
Anyway patch below addresses the above races. Its against 2.6.6-rc2-mm1
and has been tested on a 4way Intel Pentium SMP m/c.
Rusty sez:
Two other changes:
1) I grabbed a reference to the thread, rather than using
preempt_disable(). It's the more obvious way I think.
2) Why the wait_to_die code? It might be needed if we move tasks after
stop_machine, but for nowI don't see the problem with the migration
thread running on the wrong CPU for a bit: nothing is on this runqueue
so active_load_balance is safe, and __migrate task will be a noop (due
to cpu_is_offline() check). If there is a problem, your fix is racy,
because we could be preempted immediately afterwards.
So I just stop the kthread then wakeup any remaining...
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/sched.c | 37 |
1 files changed, 28 insertions, 9 deletions
diff --git a/kernel/sched.c b/kernel/sched.c index 6a1c9a4df37e60..22c8bd8a9e27de 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -1124,16 +1124,19 @@ static void sched_migrate_task(task_t *p, int dest_cpu) runqueue_t *rq; unsigned long flags; - lock_cpu_hotplug(); rq = task_rq_lock(p, &flags); - if (!cpu_isset(dest_cpu, p->cpus_allowed)) + if (!cpu_isset(dest_cpu, p->cpus_allowed) + || unlikely(cpu_is_offline(dest_cpu))) goto out; /* force the process onto the specified CPU */ if (migrate_task(p, dest_cpu, &req)) { - /* Need to wait for migration thread. */ + /* Need to wait for migration thread (might exit: take ref). */ + struct task_struct *mt = rq->migration_thread; + get_task_struct(mt); task_rq_unlock(rq, &flags); - wake_up_process(rq->migration_thread); + wake_up_process(mt); + put_task_struct(mt); wait_for_completion(&req.done); /* @@ -1142,13 +1145,11 @@ static void sched_migrate_task(task_t *p, int dest_cpu) * the migration. */ tlb_migrate_prepare(current->mm); - unlock_cpu_hotplug(); return; } out: task_rq_unlock(rq, &flags); - unlock_cpu_hotplug(); } /* @@ -3191,6 +3192,9 @@ static void __migrate_task(struct task_struct *p, int dest_cpu) { runqueue_t *rq_dest; + if (unlikely(cpu_is_offline(dest_cpu))) + return; + rq_dest = cpu_rq(dest_cpu); double_rq_lock(this_rq(), rq_dest); @@ -3349,9 +3353,24 @@ static int migration_call(struct notifier_block *nfb, unsigned long action, /* Unbind it from offline cpu so it can run. Fall thru. */ kthread_bind(cpu_rq(cpu)->migration_thread,smp_processor_id()); case CPU_DEAD: - kthread_stop(cpu_rq(cpu)->migration_thread); - cpu_rq(cpu)->migration_thread = NULL; - BUG_ON(cpu_rq(cpu)->nr_running != 0); + rq = cpu_rq(cpu); + kthread_stop(rq->migration_thread); + rq->migration_thread = NULL; + BUG_ON(rq->nr_running != 0); + + /* No need to migrate the tasks: it was best-effort if + * they didn't do lock_cpu_hotplug(). Just wake up + * the requestors. */ + spin_lock_irq(&rq->lock); + while (!list_empty(&rq->migration_queue)) { + migration_req_t *req; + req = list_entry(rq->migration_queue.next, + migration_req_t, list); + BUG_ON(req->type != REQ_MOVE_TASK); + list_del_init(&req->list); + complete(&req->done); + } + spin_unlock_irq(&rq->lock); break; #endif } |