diff options
author | Preeti U Murthy <preeti@linux.vnet.ibm.com> | 2014-04-14 13:42:54 +0530 |
---|---|---|
committer | Wang Sen <wangsen@linux.vnet.ibm.com> | 2014-04-14 17:02:54 +0800 |
commit | 6c71c41f52f9b763f52c333b781d939e9fbdba5f (patch) | |
tree | 5ce344d2f9ef7ca0b4ea5581580fc889012ee1ba | |
parent | aaed5aa6b25ab969dac816449adeaba07d92882f (diff) | |
download | powerkvm-6c71c41f52f9b763f52c333b781d939e9fbdba5f.tar.gz |
tick, broadcast:Keep the cpu_online_mask and broadcast masks in sync with each other
Its possible that the tick_broadcast_force_mask contains cpus which are not
in cpu_online_mask when a broadcast tick occurs. This could happen under the
following circumstance assuming CPU1 is among the CPUs waiting for broadcast
and the cpu being hotplugged out.
CPU0 CPU1
Run CPU_DOWN_PREPARE notifiers
Start stop_machine Gets woken up by IPI to run
stop_machine, sets itself in
tick_broadcast_force_mask if the
time of broadcast interrupt is around
the same time as this IPI.
Start stop_machine
set_cpu_online(cpu1, false)
End stop_machine End stop_machine
Broadcast interrupt
Finds that cpu1 in
tick_broadcast_force_mask is offline
and triggers the WARN_ON in
tick_handle_oneshot_broadcast()
Clears all broadcast masks
in CPU_DEAD stage.
While the hotplugged cpu clears its bit in the tick_broadcast_oneshot_mask
and tick_broadcast_pending mask during BROADCAST_EXIT, it *sets* its bit
in the tick_broadcast_force_mask if the broadcast interrupt is found to be
around the same time as the present time. Today we clear all the broadcast
masks and shutdown tick devices in the CPU_DEAD stage. But as shown above
the broadcast interrupt could occur before this stage is reached and the
WARN_ON() gets triggered when it is found that the tick_broadcast_force_mask
contains an offline cpu.
Please note that a scenario such as above will occur *only if the broadcast
interrupt is delayed under some circumstance*. Ideally the broadcast interrupt
in the above scenario should have occured before we reach the irq_disabled
stage of stop_machine and should have seen a valid broadcast mask. But for
some reason that is yet to be understood it is getting delayed leading to the
above scenario.
Besides this another point to notice is that for a small duration between
the CPU_DYING stage where the hotplugged cpu clears its bit from the
cpu_online_mask and the CPU_DEAD stage where the broadcast_force_mask gets
cleared of the same, both these masks are out of sync with each other during that
time thus triggering the above scenario.
The temporary solution to this is to move the clearing of broadcast masks to
the CPU_DYING notification stage. The reason is, it is during this stage that
the hotplugged cpu clears itself from the cpu_online_mask() and runs
notifications relevant to this stage including those to clear the broadcast masks
(with this patch).
All this, while the rest of the cpus are busy spinning in stop_machine to notice
this change. By the time this stage ends and all cpus resume work, the hotplugged
cpu would have cleared itself from the cpu_online_mask and the broadcast cpu mask
thus keeping them in sync with each other at such times when the rest of the cpus
can read these masks.
Since the above mentioned delay in the broadcast interrupt has not triggered
any soft lockups so far, we are assuming its a non-fatal issue and have this
patch to prevent the warning from popping up in this case.
Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@au1.ibm.com>
-rw-r--r-- | kernel/time/tick-common.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 086216c433fa3d..fdd53f8a6e1d27 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -386,10 +386,10 @@ void tick_notify(unsigned long reason, void *dev) case CLOCK_EVT_NOTIFY_CPU_DYING: tick_handover_do_timer(dev); + tick_shutdown_broadcast_oneshot(dev); break; case CLOCK_EVT_NOTIFY_CPU_DEAD: - tick_shutdown_broadcast_oneshot(dev); tick_shutdown_broadcast(dev); tick_shutdown(dev); break; |