aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPreeti U Murthy <preeti@linux.vnet.ibm.com>2014-04-14 13:42:54 +0530
committerWang Sen <wangsen@linux.vnet.ibm.com>2014-04-14 17:02:54 +0800
commit6c71c41f52f9b763f52c333b781d939e9fbdba5f (patch)
tree5ce344d2f9ef7ca0b4ea5581580fc889012ee1ba
parentaaed5aa6b25ab969dac816449adeaba07d92882f (diff)
downloadpowerkvm-6c71c41f52f9b763f52c333b781d939e9fbdba5f.tar.gz
tick, broadcast:Keep the cpu_online_mask and broadcast masks in sync with each other
Its possible that the tick_broadcast_force_mask contains cpus which are not in cpu_online_mask when a broadcast tick occurs. This could happen under the following circumstance assuming CPU1 is among the CPUs waiting for broadcast and the cpu being hotplugged out. CPU0 CPU1 Run CPU_DOWN_PREPARE notifiers Start stop_machine Gets woken up by IPI to run stop_machine, sets itself in tick_broadcast_force_mask if the time of broadcast interrupt is around the same time as this IPI. Start stop_machine set_cpu_online(cpu1, false) End stop_machine End stop_machine Broadcast interrupt Finds that cpu1 in tick_broadcast_force_mask is offline and triggers the WARN_ON in tick_handle_oneshot_broadcast() Clears all broadcast masks in CPU_DEAD stage. While the hotplugged cpu clears its bit in the tick_broadcast_oneshot_mask and tick_broadcast_pending mask during BROADCAST_EXIT, it *sets* its bit in the tick_broadcast_force_mask if the broadcast interrupt is found to be around the same time as the present time. Today we clear all the broadcast masks and shutdown tick devices in the CPU_DEAD stage. But as shown above the broadcast interrupt could occur before this stage is reached and the WARN_ON() gets triggered when it is found that the tick_broadcast_force_mask contains an offline cpu. Please note that a scenario such as above will occur *only if the broadcast interrupt is delayed under some circumstance*. Ideally the broadcast interrupt in the above scenario should have occured before we reach the irq_disabled stage of stop_machine and should have seen a valid broadcast mask. But for some reason that is yet to be understood it is getting delayed leading to the above scenario. Besides this another point to notice is that for a small duration between the CPU_DYING stage where the hotplugged cpu clears its bit from the cpu_online_mask and the CPU_DEAD stage where the broadcast_force_mask gets cleared of the same, both these masks are out of sync with each other during that time thus triggering the above scenario. The temporary solution to this is to move the clearing of broadcast masks to the CPU_DYING notification stage. The reason is, it is during this stage that the hotplugged cpu clears itself from the cpu_online_mask() and runs notifications relevant to this stage including those to clear the broadcast masks (with this patch). All this, while the rest of the cpus are busy spinning in stop_machine to notice this change. By the time this stage ends and all cpus resume work, the hotplugged cpu would have cleared itself from the cpu_online_mask and the broadcast cpu mask thus keeping them in sync with each other at such times when the rest of the cpus can read these masks. Since the above mentioned delay in the broadcast interrupt has not triggered any soft lockups so far, we are assuming its a non-fatal issue and have this patch to prevent the warning from popping up in this case. Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@au1.ibm.com>
-rw-r--r--kernel/time/tick-common.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 086216c433fa3d..fdd53f8a6e1d27 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -386,10 +386,10 @@ void tick_notify(unsigned long reason, void *dev)
case CLOCK_EVT_NOTIFY_CPU_DYING:
tick_handover_do_timer(dev);
+ tick_shutdown_broadcast_oneshot(dev);
break;
case CLOCK_EVT_NOTIFY_CPU_DEAD:
- tick_shutdown_broadcast_oneshot(dev);
tick_shutdown_broadcast(dev);
tick_shutdown(dev);
break;