summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorSebastian Andrzej Siewior <bigeasy@linutronix.de>2015-07-14 15:03:44 +0200
committerSebastian Andrzej Siewior <bigeasy@linutronix.de>2015-01-01 00:00:00 +0100
commit96f1ac50158c44831efd5e16de84a912a80492b4 (patch)
tree2ae9a85fe612503b4773bb0db5b7b4a736d9948c
download4.12-rt-patches-96f1ac50158c44831efd5e16de84a912a80492b4.tar.gz
[ANNOUNCE] 4.1.2-rt1
Dear RT folks! I'm pleased to announce the v4.1.2-rt1 patch set. The move from 4.0 to 4.1 was rather smooth, so we took the time for some overdue cleanups and restructuring of the patch queue. 1) Patch folding - Fold all fixlets into the proper patches - Consolidate the patches which change the same piece of code over and over (e.g. add/revert/redo). These patches were mostly kept to be easily picked up for stable. 2) Dropping obsolete patches Some patches have been superseeded by different upstream changes, so the RT variant is redundant. 3) Changelogs Quite some patches had no or useless changelogs. We updated them all. Each patch has now a From+Subject+Date field. That means "git quiltimport" will produce now the same commit id for each patch (as long as the commit author and date remain unchanged). 4) Reordering The patches got reordered in topics, so patches related to the same subsystem or problem space are grouped together. 5) Ability to build and boot Each step in the queue now builds with RT=n and RT=y. All steps boot with RT=n. With RT=y the functionality is obviously dependent on all patches, so a boot bisectability can not be achieved. As of now we provide a git tree with the RT changes as well. The tree is similar structured as Stevens stable RT tree. For each kernel version we provide 3 branches: linux-m.n.y-rt This branch starts when we move to a new kernel version. After the first release this branch gets only incremental updates (either from the mainline stable tree or from updates to the rt patch queue) linux-m.n.y-rt-rebase This branch is rebased when a new stable version or a new RT patch queue is available. The RT patch queue is applied on top of the latest mainline stable version. linux-m.n.y-queue This branch contains the revisions of the rt patch queue - patches and series file. Known issues: - My AMD box throws a lot of "cpufreq_stat_notifier_trans: No policy found" warnings after boot. It is gone after manually setting the policy (to something else than reported). - bcache is disabled. - CPU hotplug works in general. Steven's test script however deadlocks usually on the second invocation. - xor / raid_pq I had max latency jumping up to 67563us on one CPU while the next lower max was 58us. I tracked it down to module's init code of xor and raid_pq. Both disable preemption while measuring the performance of the individual implementation. The git URLs for this release are git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt-rebase git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-4.1.y-rt-queue The RT patch against 4.1.2 can be found here: https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.2-rt1.patch.xz The split quilt queue is available at: https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patches-4.1.2-rt1.tar.xz Sebastian Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-rw-r--r--patches/0001-arm64-Mark-PMU-interrupt-IRQF_NO_THREAD.patch26
-rw-r--r--patches/0001-sched-Implement-lockless-wake-queues.patch166
-rw-r--r--patches/0001-uaccess-count-pagefault_disable-levels-in-pagefault_.patch119
-rw-r--r--patches/0002-arm64-Allow-forced-irq-threading.patch26
-rw-r--r--patches/0002-futex-Implement-lockless-wakeups.patch181
-rw-r--r--patches/0002-mm-uaccess-trigger-might_sleep-in-might_fault-with-d.patch100
-rw-r--r--patches/0003-uaccess-clarify-that-uaccess-may-only-sleep-if-pagef.patch641
-rw-r--r--patches/0004-ipc-mqueue-Implement-lockless-pipelined-wakeups.patch183
-rw-r--r--patches/0004-mm-explicitly-disable-enable-preemption-in-kmap_atom.patch367
-rw-r--r--patches/0005-futex-Ensure-lock-unlock-symetry-versus-pi_lock-and-.patch42
-rw-r--r--patches/0005-mips-kmap_coherent-relies-on-disabled-preemption.patch40
-rw-r--r--patches/0006-mm-use-pagefault_disable-to-check-for-disabled-pagef.patch646
-rw-r--r--patches/0007-drm-i915-use-pagefault_disabled-to-check-for-disable.patch32
-rw-r--r--patches/0008-futex-UP-futex_atomic_op_inuser-relies-on-disabled-p.patch45
-rw-r--r--patches/0009-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on-dis.patch36
-rw-r--r--patches/0010-arm-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on.patch36
-rw-r--r--patches/0011-arm-futex-UP-futex_atomic_op_inuser-relies-on-disabl.patch47
-rw-r--r--patches/0012-futex-clarify-that-preemption-doesn-t-have-to-be-dis.patch85
-rw-r--r--patches/0013-mips-properly-lock-access-to-the-fpu.patch33
-rw-r--r--patches/0014-uaccess-decouple-preemption-from-the-pagefault-logic.patch60
-rw-r--r--patches/ARM-cmpxchg-define-__HAVE_ARCH_CMPXCHG-for-armv6-and.patch39
-rw-r--r--patches/ARM-enable-irq-in-translation-section-permission-fau.patch85
-rw-r--r--patches/ASoC-Intel-sst-use-instead-of-at-the-of-a-C-statemen.patch26
-rw-r--r--patches/HACK-printk-drop-the-logbuf_lock-more-often.patch76
-rw-r--r--patches/KVM-lapic-mark-LAPIC-timer-handler-as-irqsafe.patch100
-rw-r--r--patches/KVM-use-simple-waitqueue-for-vcpu-wq.patch341
-rw-r--r--patches/acpi-rt-Convert-acpi_gbl_hardware-lock-back-to-a-raw.patch173
-rw-r--r--patches/arch-arm64-Add-lazy-preempt-support.patch103
-rw-r--r--patches/arm-at91-pit-remove-irq-handler-when-clock-is-unused.patch56
-rw-r--r--patches/arm-at91-tclib-default-to-tclib-timer-for-rt.patch32
-rw-r--r--patches/arm-convert-boot-lock-to-raw.patch465
-rw-r--r--patches/arm-enable-highmem-for-rt.patch148
-rw-r--r--patches/arm-highmem-flush-tlb-on-unmap.patch27
-rw-r--r--patches/arm-preempt-lazy-support.patch105
-rw-r--r--patches/arm-unwind-use_raw_lock.patch83
-rw-r--r--patches/ata-disable-interrupts-if-non-rt.patch64
-rw-r--r--patches/blk-mq-revert-raw-locks-post-pone-notifier-to-POST_D.patchto-POST_D.patch83
-rw-r--r--patches/block-blk-mq-use-swait.patch114
-rw-r--r--patches/block-mq-don-t-complete-requests-via-IPI.patch101
-rw-r--r--patches/block-mq-drop-per-ctx-cpu_lock.patch124
-rw-r--r--patches/block-mq-drop-preempt-disable.patch51
-rw-r--r--patches/block-mq-use-cpu_light.patch89
-rw-r--r--patches/block-shorten-interrupt-disabled-regions.patch96
-rw-r--r--patches/block-use-cpu-chill.patch45
-rw-r--r--patches/bug-rt-dependend-variants.patch36
-rw-r--r--patches/cgroups-scheduling-while-atomic-in-cgroup-code.patch64
-rw-r--r--patches/cgroups-use-simple-wait-in-css_release.patch86
-rw-r--r--patches/clocksource-tclib-allow-higher-clockrates.patch160
-rw-r--r--patches/completion-use-simple-wait-queues.patch224
-rw-r--r--patches/cond-resched-lock-rt-tweak.patch23
-rw-r--r--patches/cond-resched-softirq-rt.patch52
-rw-r--r--patches/cpu-hotplug-Document-why-PREEMPT_RT-uses-a-spinlock.patch55
-rw-r--r--patches/cpu-rt-make-hotplug-lock-a-sleeping-spinlock-on-rt.patch130
-rw-r--r--patches/cpu-rt-rework-cpu-down.patch561
-rw-r--r--patches/cpu_chill-Add-a-UNINTERRUPTIBLE-hrtimer_nanosleep.patch106
-rw-r--r--patches/cpu_down_move_migrate_enable_back.patch52
-rw-r--r--patches/cpufreq-drop-K8-s-driver-from-beeing-selected.patch32
-rw-r--r--patches/cpumask-disable-offstack-on-rt.patch34
-rw-r--r--patches/crypto-Reduce-preempt-disabled-regions-more-algos.patch241
-rw-r--r--patches/debugobjects-rt.patch25
-rw-r--r--patches/dm-make-rt-aware.patch26
-rw-r--r--patches/drivers-net-8139-disable-irq-nosync.patch25
-rw-r--r--patches/drivers-net-fix-livelock-issues.patch126
-rw-r--r--patches/drivers-net-vortex-fix-locking-issues.patch48
-rw-r--r--patches/drivers-random-reduce-preempt-disabled-region.patch32
-rw-r--r--patches/drivers-tty-fix-omap-lock-crap.patch42
-rw-r--r--patches/drivers-tty-pl011-irq-disable-madness.patch47
-rw-r--r--patches/drm-i915-drop-trace_i915_gem_ring_dispatch-onrt.patch58
-rw-r--r--patches/epoll-use-get-cpu-light.patch30
-rw-r--r--patches/fix-rt-int3-x86_32-3.2-rt.patch101
-rw-r--r--patches/fs-aio-simple-simple-work.patch106
-rw-r--r--patches/fs-block-rt-support.patch22
-rw-r--r--patches/fs-dcache-use-cpu-chill-in-trylock-loops.patch85
-rw-r--r--patches/fs-jbd-pull-plug-when-waiting-for-space.patch29
-rw-r--r--patches/fs-jbd-replace-bh_state-lock.patch100
-rw-r--r--patches/fs-jbd2-pull-your-plug-when-waiting-for-space.patch31
-rw-r--r--patches/fs-namespace-preemption-fix.patch30
-rw-r--r--patches/fs-ntfs-disable-interrupt-non-rt.patch59
-rw-r--r--patches/fs-replace-bh_uptodate_lock-for-rt.patch161
-rw-r--r--patches/ftrace-migrate-disable-tracing.patch73
-rw-r--r--patches/futex-avoid-double-wake-up-in-PI-futex-wait-wake-on-.patch223
-rw-r--r--patches/futex-requeue-pi-fix.patch113
-rw-r--r--patches/genirq-disable-irqpoll-on-rt.patch37
-rw-r--r--patches/genirq-do-not-invoke-the-affinity-callback-via-a-wor.patch144
-rw-r--r--patches/genirq-force-threading.patch48
-rw-r--r--patches/gpio-omap-use-raw-locks-for-locking.patch316
-rw-r--r--patches/hotplug-Use-set_cpus_allowed_ptr-in-sync_unplug_thre.patch46
-rw-r--r--patches/hotplug-light-get-online-cpus.patch204
-rw-r--r--patches/hotplug-sync_unplug-no-27-5cn-27-in-task-name.patch24
-rw-r--r--patches/hotplug-use-migrate-disable.patch39
-rw-r--r--patches/hrtimer-Move-schedule_work-call-to-helper-thread.patch117
-rw-r--r--patches/hrtimer-fixup-hrtimer-callback-changes-for-preempt-r.patch462
-rw-r--r--patches/hrtimer-raise-softirq-if-hrtimer-irq-stalled.patch37
-rw-r--r--patches/hrtimers-prepare-full-preemption.patch195
-rw-r--r--patches/hwlat-detector-Don-t-ignore-threshold-module-paramet.patch25
-rw-r--r--patches/hwlat-detector-Update-hwlat_detector-to-add-outer-lo.patch125
-rw-r--r--patches/hwlat-detector-Use-thread-instead-of-stop-machine.patch183
-rw-r--r--patches/hwlat-detector-Use-trace_clock_local-if-available.patch92
-rw-r--r--patches/hwlatdetect.patch1347
-rw-r--r--patches/i2c-omap-drop-the-lock-hard-irq-context.patch33
-rw-r--r--patches/i915-bogus-warning-from-i915-when-running-on-PREEMPT.patch29
-rw-r--r--patches/i915_compile_fix.patch23
-rw-r--r--patches/ide-use-nort-local-irq-variants.patch169
-rw-r--r--patches/idr-use-local-lock-for-protection.patch96
-rw-r--r--patches/infiniband-mellanox-ib-use-nort-irq.patch40
-rw-r--r--patches/inpt-gameport-use-local-irq-nort.patch44
-rw-r--r--patches/introduce_migrate_disable_cpu_light.patch340
-rw-r--r--patches/ipc-make-rt-aware.patch67
-rw-r--r--patches/ipc-sem-rework-semaphore-wakeups.patch69
-rw-r--r--patches/irq-allow-disabling-of-softirq-processing-in-irq-thread-context.patch146
-rw-r--r--patches/irqwork-push_most_work_into_softirq_context.patch197
-rw-r--r--patches/jump-label-rt.patch35
-rw-r--r--patches/kconfig-disable-a-few-options-rt.patch33
-rw-r--r--patches/kconfig-preempt-rt-full.patch58
-rw-r--r--patches/kernel-SRCU-provide-a-static-initializer.patch124
-rw-r--r--patches/kernel-cpu-fix-cpu-down-problem-if-kthread-s-cpu-is-.patch85
-rw-r--r--patches/kernel-hotplug-restore-original-cpu-mask-oncpu-down.patch58
-rw-r--r--patches/kgb-serial-hackaround.patch101
-rw-r--r--patches/latency-hist.patch1808
-rw-r--r--patches/leds-trigger-disable-CPU-trigger-on-RT.patch35
-rw-r--r--patches/lglocks-rt.patch182
-rw-r--r--patches/list_bl.h-make-list-head-locking-RT-safe.patch114
-rw-r--r--patches/local-irq-rt-depending-variants.patch52
-rw-r--r--patches/localversion.patch15
-rw-r--r--patches/lockdep-no-softirq-accounting-on-rt.patch58
-rw-r--r--patches/lockdep-selftest-fix-warnings-due-to-missing-PREEMPT.patch141
-rw-r--r--patches/lockdep-selftest-only-do-hardirq-context-test-for-raw-spinlock.patch56
-rw-r--r--patches/md-disable-bcache.patch31
-rw-r--r--patches/md-raid5-percpu-handling-rt-aware.patch61
-rw-r--r--patches/mips-disable-highmem-on-rt.patch22
-rw-r--r--patches/mm-bounce-local-irq-save-nort.patch27
-rw-r--r--patches/mm-convert-swap-to-percpu-locked.patch134
-rw-r--r--patches/mm-disable-sloub-rt.patch31
-rw-r--r--patches/mm-enable-slub.patch394
-rw-r--r--patches/mm-make-vmstat-rt-aware.patch88
-rw-r--r--patches/mm-memcontrol-Don-t-call-schedule_work_on-in-preempt.patch68
-rw-r--r--patches/mm-memcontrol-do_not_disable_irq.patch137
-rw-r--r--patches/mm-page-alloc-use-local-lock-on-target-cpu.patch27
-rw-r--r--patches/mm-page_alloc-reduce-lock-sections-further.patch192
-rw-r--r--patches/mm-page_alloc-rt-friendly-per-cpu-pages.patch201
-rw-r--r--patches/mm-protect-activate-switch-mm.patch71
-rw-r--r--patches/mm-rt-kmap-atomic-scheduling.patch288
-rw-r--r--patches/mm-scatterlist-dont-disable-irqs-on-RT.patch43
-rw-r--r--patches/mm-slub-move-slab-initialization-into-irq-enabled-region.patch162
-rw-r--r--patches/mm-vmalloc-use-get-cpu-light.patch65
-rw-r--r--patches/mm-workingset-do-not-protect-workingset_shadow_nodes.patch150
-rw-r--r--patches/mmc-sdhci-don-t-provide-hard-irq-handler.patch73
-rw-r--r--patches/mmci-remove-bogus-irq-save.patch39
-rw-r--r--patches/move_sched_delayed_work_to_helper.patch88
-rw-r--r--patches/mutex-no-spin-on-rt.patch28
-rw-r--r--patches/net-another-local-irq-disable-alloc-atomic-headache.patch41
-rw-r--r--patches/net-fix-iptable-xt-write-recseq-begin-rt-fallout.patch73
-rw-r--r--patches/net-gianfar-do-not-disable-interrupts.patch76
-rw-r--r--patches/net-make-devnet_rename_seq-a-mutex.patch106
-rw-r--r--patches/net-prevent-abba-deadlock.patch111
-rw-r--r--patches/net-sched-dev_deactivate_many-use-msleep-1-instead-o.patch57
-rw-r--r--patches/net-tx-action-avoid-livelock-on-rt.patch92
-rw-r--r--patches/net-use-cpu-chill.patch62
-rw-r--r--patches/net-wireless-warn-nort.patch23
-rw-r--r--patches/oleg-signal-rt-fix.patch143
-rw-r--r--patches/panic-disable-random-on-rt.patch26
-rw-r--r--patches/patch-to-introduce-rcu-bh-qs-where-safe-from-softirq.patch111
-rw-r--r--patches/pci-access-use-__wake_up_all_locked.patch25
-rw-r--r--patches/percpu_ida-use-locklocks.patch101
-rw-r--r--patches/perf-make-swevent-hrtimer-irqsafe.patch68
-rw-r--r--patches/peter_zijlstra-frob-rcu.patch166
-rw-r--r--patches/peterz-srcu-crypto-chain.patch182
-rw-r--r--patches/ping-sysrq.patch121
-rw-r--r--patches/posix-timers-no-broadcast.patch33
-rw-r--r--patches/posix-timers-thread-posix-cpu-timers-on-rt.patch315
-rw-r--r--patches/power-disable-highmem-on-rt.patch22
-rw-r--r--patches/power-use-generic-rwsem-on-rt.patch26
-rw-r--r--patches/powerpc-kvm-Disable-in-kernel-MPIC-emulation-for-PRE.patch37
-rw-r--r--patches/powerpc-preempt-lazy-support.patch173
-rw-r--r--patches/powerpc-ps3-device-init.c-adapt-to-completions-using.patch31
-rw-r--r--patches/preempt-lazy-support.patch589
-rw-r--r--patches/preempt-nort-rt-variants.patch47
-rw-r--r--patches/printk-27force_early_printk-27-boot-param-to-help-with-debugging.patch31
-rw-r--r--patches/printk-kill.patch162
-rw-r--r--patches/printk-rt-aware.patch100
-rw-r--r--patches/ptrace-fix-ptrace-vs-tasklist_lock-race.patch160
-rw-r--r--patches/radix-tree-rt-aware.patch72
-rw-r--r--patches/random-make-it-work-on-rt.patch115
-rw-r--r--patches/rcu-Eliminate-softirq-processing-from-rcutree.patch422
-rw-r--r--patches/rcu-disable-rcu-fast-no-hz-on-rt.patch24
-rw-r--r--patches/rcu-make-RCU_BOOST-default-on-RT.patch26
-rw-r--r--patches/rcu-merge-rcu-bh-into-rcu-preempt-for-rt.patch271
-rw-r--r--patches/rcu-more-swait-conversions.patch174
-rw-r--r--patches/rcutree-rcu_bh_qs-disable-irq-while-calling-rcu_pree.patch48
-rw-r--r--patches/re-migrate_disable-race-with-cpu-hotplug-3f.patch34
-rw-r--r--patches/re-preempt_rt_full-arm-coredump-fails-for-cpu-3e-3d-4.patch68
-rw-r--r--patches/relay-fix-timer-madness.patch52
-rw-r--r--patches/rt-add-rt-locks.patch1982
-rw-r--r--patches/rt-introduce-cpu-chill.patch128
-rw-r--r--patches/rt-local-irq-lock.patch323
-rw-r--r--patches/rt-preempt-base-config.patch53
-rw-r--r--patches/rt-serial-warn-fix.patch37
-rw-r--r--patches/rtmutex-add-a-first-shot-of-ww_mutex.patch423
-rw-r--r--patches/rtmutex-avoid-include-hell.patch23
-rw-r--r--patches/rtmutex-futex-prepare-rt.patch238
-rw-r--r--patches/rtmutex-lock-killable.patch51
-rw-r--r--patches/sas-ata-isci-dont-t-disable-interrupts-in-qc_issue-h.patch78
-rw-r--r--patches/sched-deadline-dl_task_timer-has-to-be-irqsafe.patch22
-rw-r--r--patches/sched-delay-put-task.patch81
-rw-r--r--patches/sched-disable-rt-group-sched-on-rt.patch28
-rw-r--r--patches/sched-disable-ttwu-queue.patch31
-rw-r--r--patches/sched-limit-nr-migrate.patch26
-rw-r--r--patches/sched-might-sleep-do-not-account-rcu-depth.patch48
-rw-r--r--patches/sched-mmdrop-delayed.patch133
-rw-r--r--patches/sched-rt-mutex-wakeup.patch93
-rw-r--r--patches/sched-ttwu-ensure-success-return-is-correct.patch34
-rw-r--r--patches/sched-workqueue-Only-wake-up-idle-workers-if-not-blo.patch37
-rw-r--r--patches/scsi-fcoe-rt-aware.patch114
-rw-r--r--patches/scsi-qla2xxx-fix-bug-sleeping-function-called-from-invalid-context.patch47
-rw-r--r--patches/seqlock-prevent-rt-starvation.patch190
-rw-r--r--patches/series571
-rw-r--r--patches/signal-fix-up-rcu-wreckage.patch38
-rw-r--r--patches/signal-revert-ptrace-preempt-magic.patch31
-rw-r--r--patches/signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch213
-rw-r--r--patches/skbufhead-raw-lock.patch113
-rw-r--r--patches/slub-disable-SLUB_CPU_PARTIAL.patch47
-rw-r--r--patches/slub-enable-irqs-for-no-wait.patch46
-rw-r--r--patches/snd-pcm-fix-snd_pcm_stream_lock-irqs_disabled-splats.patch69
-rw-r--r--patches/softirq-disable-softirq-stacks-for-rt.patch156
-rw-r--r--patches/softirq-preempt-fix-3-re.patch153
-rw-r--r--patches/softirq-split-locks.patch826
-rw-r--r--patches/sparc64-use-generic-rwsem-spinlocks-rt.patch28
-rw-r--r--patches/spinlock-types-separate-raw.patch208
-rw-r--r--patches/stomp-machine-create-lg_global_trylock_relax-primiti.patch86
-rw-r--r--patches/stomp-machine-use-lg_global_trylock_relax-to-dead-wi.patch100
-rw-r--r--patches/stop-machine-raw-lock.patch196
-rw-r--r--patches/stop_machine-convert-stop_machine_run-to-PREEMPT_RT.patch34
-rw-r--r--patches/sunrpc-make-svc_xprt_do_enqueue-use-get_cpu_light.patch62
-rw-r--r--patches/suspend-prevernt-might-sleep-splats.patch106
-rw-r--r--patches/sysfs-realtime-entry.patch47
-rw-r--r--patches/tasklet-rt-prevent-tasklets-from-going-into-infinite-spin-in-rt.patch391
-rw-r--r--patches/tasklist-lock-fix-section-conflict.patch55
-rw-r--r--patches/thermal-Defer-thermal-wakups-to-threads.patch132
-rw-r--r--patches/timekeeping-split-jiffies-lock.patch156
-rw-r--r--patches/timer-delay-waking-softirqs-from-the-jiffy-tick.patch75
-rw-r--r--patches/timer-fd-avoid-live-lock.patch30
-rw-r--r--patches/timers-avoid-the-base-null-otptimization-on-rt.patch68
-rw-r--r--patches/timers-preempt-rt-support.patch54
-rw-r--r--patches/timers-prepare-for-full-preemption.patch145
-rw-r--r--patches/tracing-account-for-preempt-off-in-preempt_schedule.patch46
-rw-r--r--patches/upstream-net-rt-remove-preemption-disabling-in-netif_rx.patch65
-rw-r--r--patches/usb-use-_nort-in-giveback.patch57
-rw-r--r--patches/user-use-local-irq-nort.patch29
-rw-r--r--patches/vtime-split-lock-and-seqcount.patch205
-rw-r--r--patches/wait-simple-implementation.patch362
-rw-r--r--patches/wait.h-include-atomic.h.patch32
-rw-r--r--patches/work-queue-work-around-irqsafe-timer-optimization.patch132
-rw-r--r--patches/work-simple-Simple-work-queue-implemenation.patch232
-rw-r--r--patches/workqueue-distangle-from-rq-lock.patch260
-rw-r--r--patches/workqueue-prevent-deadlock-stall.patch200
-rw-r--r--patches/workqueue-use-locallock.patch144
-rw-r--r--patches/workqueue-use-rcu.patch342
-rw-r--r--patches/x86-UV-raw_spinlock-conversion.patch244
-rw-r--r--patches/x86-crypto-reduce-preempt-disabled-regions.patch112
-rw-r--r--patches/x86-highmem-add-a-already-used-pte-check.patch22
-rw-r--r--patches/x86-io-apic-migra-no-unmask.patch26
-rw-r--r--patches/x86-kvm-require-const-tsc-for-rt.patch30
-rw-r--r--patches/x86-mce-timer-hrtimer.patch179
-rw-r--r--patches/x86-mce-use-swait-queue-for-mce-wakeups.patch159
-rw-r--r--patches/x86-preempt-lazy.patch170
-rw-r--r--patches/x86-stackprot-no-random-on-rt.patch47
-rw-r--r--patches/x86-use-gen-rwsem-spinlocks-rt.patch28
267 files changed, 35249 insertions, 0 deletions
diff --git a/patches/0001-arm64-Mark-PMU-interrupt-IRQF_NO_THREAD.patch b/patches/0001-arm64-Mark-PMU-interrupt-IRQF_NO_THREAD.patch
new file mode 100644
index 00000000000000..1abc7c53698776
--- /dev/null
+++ b/patches/0001-arm64-Mark-PMU-interrupt-IRQF_NO_THREAD.patch
@@ -0,0 +1,26 @@
+From: Anders Roxell <anders.roxell@linaro.org>
+Date: Mon, 27 Apr 2015 22:53:08 +0200
+Subject: arm64: Mark PMU interrupt IRQF_NO_THREAD
+
+Mark the PMU interrupts as non-threadable, as is the case with
+arch/arm: d9c3365 ARM: 7813/1: Mark pmu interupt IRQF_NO_THREAD
+
+[ upstream commit: 96045ed486b0 ]
+
+Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
+---
+ arch/arm64/kernel/perf_event.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/arm64/kernel/perf_event.c
++++ b/arch/arm64/kernel/perf_event.c
+@@ -488,7 +488,7 @@ armpmu_reserve_hardware(struct arm_pmu *
+ }
+
+ err = request_irq(irq, armpmu->handle_irq,
+- IRQF_NOBALANCING,
++ IRQF_NOBALANCING | IRQF_NO_THREAD,
+ "arm-pmu", armpmu);
+ if (err) {
+ pr_err("unable to request IRQ%d for ARM PMU counters\n",
diff --git a/patches/0001-sched-Implement-lockless-wake-queues.patch b/patches/0001-sched-Implement-lockless-wake-queues.patch
new file mode 100644
index 00000000000000..23931132a606ef
--- /dev/null
+++ b/patches/0001-sched-Implement-lockless-wake-queues.patch
@@ -0,0 +1,166 @@
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Fri, 1 May 2015 08:27:50 -0700
+Subject: sched: Implement lockless wake-queues
+
+This is useful for locking primitives that can effect multiple
+wakeups per operation and want to avoid lock internal lock contention
+by delaying the wakeups until we've released the lock internal locks.
+
+Alternatively it can be used to avoid issuing multiple wakeups, and
+thus save a few cycles, in packet processing. Queue all target tasks
+and wakeup once you've processed all packets. That way you avoid
+waking the target task multiple times if there were multiple packets
+for the same task.
+
+Properties of a wake_q are:
+- Lockless, as queue head must reside on the stack.
+- Being a queue, maintains wakeup order passed by the callers. This can
+ be important for otherwise, in scenarios where highly contended locks
+ could affect any reliance on lock fairness.
+- A queued task cannot be added again until it is woken up.
+
+This patch adds the needed infrastructure into the scheduler code
+and uses the new wake_list to delay the futex wakeups until
+after we've released the hash bucket locks.
+
+[upstream commit 7675104990ed255b9315a82ae827ff312a2a88a2]
+
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+[tweaks, adjustments, comments, etc.]
+Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Acked-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Chris Mason <clm@fb.com>
+Cc: Davidlohr Bueso <dave@stgolabs.net>
+Cc: George Spelvin <linux@horizon.com>
+Cc: H. Peter Anvin <hpa@zytor.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Manfred Spraul <manfred@colorfullife.com>
+Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Link: http://lkml.kernel.org/r/1430494072-30283-2-git-send-email-dave@stgolabs.net
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/sched.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++
+ kernel/sched/core.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
+ 2 files changed, 92 insertions(+)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -900,6 +900,50 @@ enum cpu_idle_type {
+ #define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT)
+
+ /*
++ * Wake-queues are lists of tasks with a pending wakeup, whose
++ * callers have already marked the task as woken internally,
++ * and can thus carry on. A common use case is being able to
++ * do the wakeups once the corresponding user lock as been
++ * released.
++ *
++ * We hold reference to each task in the list across the wakeup,
++ * thus guaranteeing that the memory is still valid by the time
++ * the actual wakeups are performed in wake_up_q().
++ *
++ * One per task suffices, because there's never a need for a task to be
++ * in two wake queues simultaneously; it is forbidden to abandon a task
++ * in a wake queue (a call to wake_up_q() _must_ follow), so if a task is
++ * already in a wake queue, the wakeup will happen soon and the second
++ * waker can just skip it.
++ *
++ * The WAKE_Q macro declares and initializes the list head.
++ * wake_up_q() does NOT reinitialize the list; it's expected to be
++ * called near the end of a function, where the fact that the queue is
++ * not used again will be easy to see by inspection.
++ *
++ * Note that this can cause spurious wakeups. schedule() callers
++ * must ensure the call is done inside a loop, confirming that the
++ * wakeup condition has in fact occurred.
++ */
++struct wake_q_node {
++ struct wake_q_node *next;
++};
++
++struct wake_q_head {
++ struct wake_q_node *first;
++ struct wake_q_node **lastp;
++};
++
++#define WAKE_Q_TAIL ((struct wake_q_node *) 0x01)
++
++#define WAKE_Q(name) \
++ struct wake_q_head name = { WAKE_Q_TAIL, &name.first }
++
++extern void wake_q_add(struct wake_q_head *head,
++ struct task_struct *task);
++extern void wake_up_q(struct wake_q_head *head);
++
++/*
+ * sched-domains (multiprocessor balancing) declarations:
+ */
+ #ifdef CONFIG_SMP
+@@ -1511,6 +1555,8 @@ struct task_struct {
+ /* Protection of the PI data structures: */
+ raw_spinlock_t pi_lock;
+
++ struct wake_q_node wake_q;
++
+ #ifdef CONFIG_RT_MUTEXES
+ /* PI waiters blocked on a rt_mutex held by this task */
+ struct rb_root pi_waiters;
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -541,6 +541,52 @@ static bool set_nr_if_polling(struct tas
+ #endif
+ #endif
+
++void wake_q_add(struct wake_q_head *head, struct task_struct *task)
++{
++ struct wake_q_node *node = &task->wake_q;
++
++ /*
++ * Atomically grab the task, if ->wake_q is !nil already it means
++ * its already queued (either by us or someone else) and will get the
++ * wakeup due to that.
++ *
++ * This cmpxchg() implies a full barrier, which pairs with the write
++ * barrier implied by the wakeup in wake_up_list().
++ */
++ if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL))
++ return;
++
++ get_task_struct(task);
++
++ /*
++ * The head is context local, there can be no concurrency.
++ */
++ *head->lastp = node;
++ head->lastp = &node->next;
++}
++
++void wake_up_q(struct wake_q_head *head)
++{
++ struct wake_q_node *node = head->first;
++
++ while (node != WAKE_Q_TAIL) {
++ struct task_struct *task;
++
++ task = container_of(node, struct task_struct, wake_q);
++ BUG_ON(!task);
++ /* task can safely be re-inserted now */
++ node = node->next;
++ task->wake_q.next = NULL;
++
++ /*
++ * wake_up_process() implies a wmb() to pair with the queueing
++ * in wake_q_add() so as not to miss wakeups.
++ */
++ wake_up_process(task);
++ put_task_struct(task);
++ }
++}
++
+ /*
+ * resched_curr - mark rq's current task 'to be rescheduled now'.
+ *
diff --git a/patches/0001-uaccess-count-pagefault_disable-levels-in-pagefault_.patch b/patches/0001-uaccess-count-pagefault_disable-levels-in-pagefault_.patch
new file mode 100644
index 00000000000000..784d0ab15c7615
--- /dev/null
+++ b/patches/0001-uaccess-count-pagefault_disable-levels-in-pagefault_.patch
@@ -0,0 +1,119 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:06 +0200
+Subject: sched/preempt, mm/fault: Count pagefault_disable() levels in pagefault_disabled
+
+Until now, pagefault_disable()/pagefault_enabled() used the preempt
+count to track whether in an environment with pagefaults disabled (can
+be queried via in_atomic()).
+
+This patch introduces a separate counter in task_struct to count the
+level of pagefault_disable() calls. We'll keep manipulating the preempt
+count to retain compatibility to existing pagefault handlers.
+
+It is now possible to verify whether in a pagefault_disable() envionment
+by calling pagefault_disabled(). In contrast to in_atomic() it will not
+be influenced by preempt_enable()/preempt_disable().
+
+This patch is based on a patch from Ingo Molnar.
+
+[upstream commit 8bcbde5480f9777f8b74d71493722c663e22c21b]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ include/linux/sched.h | 1 +
+ include/linux/uaccess.h | 36 +++++++++++++++++++++++++++++-------
+ kernel/fork.c | 3 +++
+ 3 files changed, 33 insertions(+), 7 deletions(-)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1724,6 +1724,7 @@ struct task_struct {
+ #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ unsigned long task_state_change;
+ #endif
++ int pagefault_disabled;
+ };
+
+ /* Future-safe accessor for struct task_struct's cpus_allowed. */
+--- a/include/linux/uaccess.h
++++ b/include/linux/uaccess.h
+@@ -2,20 +2,36 @@
+ #define __LINUX_UACCESS_H__
+
+ #include <linux/preempt.h>
++#include <linux/sched.h>
+ #include <asm/uaccess.h>
+
++static __always_inline void pagefault_disabled_inc(void)
++{
++ current->pagefault_disabled++;
++}
++
++static __always_inline void pagefault_disabled_dec(void)
++{
++ current->pagefault_disabled--;
++ WARN_ON(current->pagefault_disabled < 0);
++}
++
+ /*
+- * These routines enable/disable the pagefault handler in that
+- * it will not take any locks and go straight to the fixup table.
++ * These routines enable/disable the pagefault handler. If disabled, it will
++ * not take any locks and go straight to the fixup table.
++ *
++ * We increase the preempt and the pagefault count, to be able to distinguish
++ * whether we run in simple atomic context or in a real pagefault_disable()
++ * context.
++ *
++ * For now, after pagefault_disabled() has been called, we run in atomic
++ * context. User access methods will not sleep.
+ *
+- * They have great resemblance to the preempt_disable/enable calls
+- * and in fact they are identical; this is because currently there is
+- * no other way to make the pagefault handlers do this. So we do
+- * disable preemption but we don't necessarily care about that.
+ */
+ static inline void pagefault_disable(void)
+ {
+ preempt_count_inc();
++ pagefault_disabled_inc();
+ /*
+ * make sure to have issued the store before a pagefault
+ * can hit.
+@@ -25,18 +41,24 @@ static inline void pagefault_disable(voi
+
+ static inline void pagefault_enable(void)
+ {
+-#ifndef CONFIG_PREEMPT
+ /*
+ * make sure to issue those last loads/stores before enabling
+ * the pagefault handler again.
+ */
+ barrier();
++ pagefault_disabled_dec();
++#ifndef CONFIG_PREEMPT
+ preempt_count_dec();
+ #else
+ preempt_enable();
+ #endif
+ }
+
++/*
++ * Is the pagefault handler disabled? If so, user access methods will not sleep.
++ */
++#define pagefault_disabled() (current->pagefault_disabled != 0)
++
+ #ifndef ARCH_HAS_NOCACHE_UACCESS
+
+ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1396,6 +1396,9 @@ static struct task_struct *copy_process(
+ p->hardirq_context = 0;
+ p->softirq_context = 0;
+ #endif
++
++ p->pagefault_disabled = 0;
++
+ #ifdef CONFIG_LOCKDEP
+ p->lockdep_depth = 0; /* no locks held yet */
+ p->curr_chain_key = 0;
diff --git a/patches/0002-arm64-Allow-forced-irq-threading.patch b/patches/0002-arm64-Allow-forced-irq-threading.patch
new file mode 100644
index 00000000000000..5b450f04b794b6
--- /dev/null
+++ b/patches/0002-arm64-Allow-forced-irq-threading.patch
@@ -0,0 +1,26 @@
+From: Anders Roxell <anders.roxell@linaro.org>
+Date: Mon, 27 Apr 2015 22:53:09 +0200
+Subject: arm64: Allow forced irq threading
+
+Now its safe to allow forced interrupt threading for arm64,
+all timer interrupts and the perf interrupt are marked NO_THREAD, as is
+the case with arch/arm: da0ec6f ARM: 7814/2: Allow forced irq threading
+
+[ upstream commit: e8557d1f0c4d ]
+
+Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
+---
+ arch/arm64/Kconfig | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/arch/arm64/Kconfig
++++ b/arch/arm64/Kconfig
+@@ -71,6 +71,7 @@ config ARM64
+ select HAVE_RCU_TABLE_FREE
+ select HAVE_SYSCALL_TRACEPOINTS
+ select IRQ_DOMAIN
++ select IRQ_FORCED_THREADING
+ select MODULES_USE_ELF_RELA
+ select NO_BOOTMEM
+ select OF
diff --git a/patches/0002-futex-Implement-lockless-wakeups.patch b/patches/0002-futex-Implement-lockless-wakeups.patch
new file mode 100644
index 00000000000000..9aaa32d24e70fe
--- /dev/null
+++ b/patches/0002-futex-Implement-lockless-wakeups.patch
@@ -0,0 +1,181 @@
+From: Davidlohr Bueso <dave@stgolabs.net>
+Date: Fri, 1 May 2015 08:27:51 -0700
+Subject: futex: Implement lockless wakeups
+
+Given the overall futex architecture, any chance of reducing
+hb->lock contention is welcome. In this particular case, using
+wake-queues to enable lockless wakeups addresses very much real
+world performance concerns, even cases of soft-lockups in cases
+of large amounts of blocked tasks (which is not hard to find in
+large boxes, using but just a handful of futex).
+
+At the lowest level, this patch can reduce latency of a single thread
+attempting to acquire hb->lock in highly contended scenarios by a
+up to 2x. At lower counts of nr_wake there are no regressions,
+confirming, of course, that the wake_q handling overhead is practically
+non existent. For instance, while a fair amount of variation,
+the extended pef-bench wakeup benchmark shows for a 20 core machine
+the following avg per-thread time to wakeup its share of tasks:
+
+ nr_thr ms-before ms-after
+ 16 0.0590 0.0215
+ 32 0.0396 0.0220
+ 48 0.0417 0.0182
+ 64 0.0536 0.0236
+ 80 0.0414 0.0097
+ 96 0.0672 0.0152
+
+Naturally, this can cause spurious wakeups. However there is no core code
+that cannot handle them afaict, and furthermore tglx does have the point
+that other events can already trigger them anyway.
+
+[upstream commit 1d0dcb3ad9d336e6d6ee020a750a7f8d907e28de]
+
+Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Acked-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Chris Mason <clm@fb.com>
+Cc: Davidlohr Bueso <dave@stgolabs.net>
+Cc: George Spelvin <linux@horizon.com>
+Cc: H. Peter Anvin <hpa@zytor.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Manfred Spraul <manfred@colorfullife.com>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.net
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/futex.c | 33 +++++++++++++++++----------------
+ 1 file changed, 17 insertions(+), 16 deletions(-)
+
+--- a/kernel/futex.c
++++ b/kernel/futex.c
+@@ -1090,9 +1090,11 @@ static void __unqueue_futex(struct futex
+
+ /*
+ * The hash bucket lock must be held when this is called.
+- * Afterwards, the futex_q must not be accessed.
++ * Afterwards, the futex_q must not be accessed. Callers
++ * must ensure to later call wake_up_q() for the actual
++ * wakeups to occur.
+ */
+-static void wake_futex(struct futex_q *q)
++static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q)
+ {
+ struct task_struct *p = q->task;
+
+@@ -1100,14 +1102,10 @@ static void wake_futex(struct futex_q *q
+ return;
+
+ /*
+- * We set q->lock_ptr = NULL _before_ we wake up the task. If
+- * a non-futex wake up happens on another CPU then the task
+- * might exit and p would dereference a non-existing task
+- * struct. Prevent this by holding a reference on p across the
+- * wake up.
++ * Queue the task for later wakeup for after we've released
++ * the hb->lock. wake_q_add() grabs reference to p.
+ */
+- get_task_struct(p);
+-
++ wake_q_add(wake_q, p);
+ __unqueue_futex(q);
+ /*
+ * The waiting task can free the futex_q as soon as
+@@ -1117,9 +1115,6 @@ static void wake_futex(struct futex_q *q
+ */
+ smp_wmb();
+ q->lock_ptr = NULL;
+-
+- wake_up_state(p, TASK_NORMAL);
+- put_task_struct(p);
+ }
+
+ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
+@@ -1217,6 +1212,7 @@ futex_wake(u32 __user *uaddr, unsigned i
+ struct futex_q *this, *next;
+ union futex_key key = FUTEX_KEY_INIT;
+ int ret;
++ WAKE_Q(wake_q);
+
+ if (!bitset)
+ return -EINVAL;
+@@ -1244,13 +1240,14 @@ futex_wake(u32 __user *uaddr, unsigned i
+ if (!(this->bitset & bitset))
+ continue;
+
+- wake_futex(this);
++ mark_wake_futex(&wake_q, this);
+ if (++ret >= nr_wake)
+ break;
+ }
+ }
+
+ spin_unlock(&hb->lock);
++ wake_up_q(&wake_q);
+ out_put_key:
+ put_futex_key(&key);
+ out:
+@@ -1269,6 +1266,7 @@ futex_wake_op(u32 __user *uaddr1, unsign
+ struct futex_hash_bucket *hb1, *hb2;
+ struct futex_q *this, *next;
+ int ret, op_ret;
++ WAKE_Q(wake_q);
+
+ retry:
+ ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
+@@ -1320,7 +1318,7 @@ futex_wake_op(u32 __user *uaddr1, unsign
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+- wake_futex(this);
++ mark_wake_futex(&wake_q, this);
+ if (++ret >= nr_wake)
+ break;
+ }
+@@ -1334,7 +1332,7 @@ futex_wake_op(u32 __user *uaddr1, unsign
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+- wake_futex(this);
++ mark_wake_futex(&wake_q, this);
+ if (++op_ret >= nr_wake2)
+ break;
+ }
+@@ -1344,6 +1342,7 @@ futex_wake_op(u32 __user *uaddr1, unsign
+
+ out_unlock:
+ double_unlock_hb(hb1, hb2);
++ wake_up_q(&wake_q);
+ out_put_keys:
+ put_futex_key(&key2);
+ out_put_key1:
+@@ -1503,6 +1502,7 @@ static int futex_requeue(u32 __user *uad
+ struct futex_pi_state *pi_state = NULL;
+ struct futex_hash_bucket *hb1, *hb2;
+ struct futex_q *this, *next;
++ WAKE_Q(wake_q);
+
+ if (requeue_pi) {
+ /*
+@@ -1679,7 +1679,7 @@ static int futex_requeue(u32 __user *uad
+ * woken by futex_unlock_pi().
+ */
+ if (++task_count <= nr_wake && !requeue_pi) {
+- wake_futex(this);
++ mark_wake_futex(&wake_q, this);
+ continue;
+ }
+
+@@ -1719,6 +1719,7 @@ static int futex_requeue(u32 __user *uad
+ out_unlock:
+ free_pi_state(pi_state);
+ double_unlock_hb(hb1, hb2);
++ wake_up_q(&wake_q);
+ hb_waiters_dec(hb2);
+
+ /*
diff --git a/patches/0002-mm-uaccess-trigger-might_sleep-in-might_fault-with-d.patch b/patches/0002-mm-uaccess-trigger-might_sleep-in-might_fault-with-d.patch
new file mode 100644
index 00000000000000..52eedd81987539
--- /dev/null
+++ b/patches/0002-mm-uaccess-trigger-might_sleep-in-might_fault-with-d.patch
@@ -0,0 +1,100 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:07 +0200
+Subject: mm, uaccess: trigger might_sleep() in might_fault() with disabled pagefaults
+
+Commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
+pagefault_disable()") removed might_sleep() checks for all user access
+code (that uses might_fault()).
+
+The reason was to disable wrong "sleep in atomic" warnings in the
+following scenario:
+ pagefault_disable()
+ rc = copy_to_user(...)
+ pagefault_enable()
+
+Which is valid, as pagefault_disable() increments the preempt counter
+and therefore disables the pagefault handler. copy_to_user() will not
+sleep and return an error code if a page is not available.
+
+However, as all might_sleep() checks are removed,
+CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario:
+ spin_lock(&lock);
+ rc = copy_to_user(...)
+ spin_unlock(&lock)
+
+If the kernel is compiled with preemption turned on, preempt_disable()
+will make in_atomic() detect disabled preemption. The fault handler would
+correctly never sleep on user access.
+However, with preemption turned off, preempt_disable() is usually a NOP
+(with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to
+detect disabled preemption nor disabled pagefaults. The fault handler
+could sleep.
+We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access
+functions again, otherwise we can end up with horrible deadlocks.
+
+Root of all evil is that pagefault_disable() acts almost as
+preempt_disable(), depending on preemption being turned on/off.
+
+As we now have pagefault_disabled(), we can use it to distinguish
+whether user acces functions might sleep.
+
+Convert might_fault() into a makro that calls __might_fault(), to
+allow proper file + line messages in case of a might_sleep() warning.
+
+[upstream commit 9ec23531fd48031d1b6ca5366f5f967d17a8bc28]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ include/linux/kernel.h | 3 ++-
+ mm/memory.c | 18 ++++++------------
+ 2 files changed, 8 insertions(+), 13 deletions(-)
+
+--- a/include/linux/kernel.h
++++ b/include/linux/kernel.h
+@@ -244,7 +244,8 @@ static inline u32 reciprocal_scale(u32 v
+
+ #if defined(CONFIG_MMU) && \
+ (defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP))
+-void might_fault(void);
++#define might_fault() __might_fault(__FILE__, __LINE__)
++void __might_fault(const char *file, int line);
+ #else
+ static inline void might_fault(void) { }
+ #endif
+--- a/mm/memory.c
++++ b/mm/memory.c
+@@ -3737,7 +3737,7 @@ void print_vma_addr(char *prefix, unsign
+ }
+
+ #if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP)
+-void might_fault(void)
++void __might_fault(const char *file, int line)
+ {
+ /*
+ * Some code (nfs/sunrpc) uses socket ops on kernel memory while
+@@ -3747,21 +3747,15 @@ void might_fault(void)
+ */
+ if (segment_eq(get_fs(), KERNEL_DS))
+ return;
+-
+- /*
+- * it would be nicer only to annotate paths which are not under
+- * pagefault_disable, however that requires a larger audit and
+- * providing helpers like get_user_atomic.
+- */
+- if (in_atomic())
++ if (pagefault_disabled())
+ return;
+-
+- __might_sleep(__FILE__, __LINE__, 0);
+-
++ __might_sleep(file, line, 0);
++#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
+ if (current->mm)
+ might_lock_read(&current->mm->mmap_sem);
++#endif
+ }
+-EXPORT_SYMBOL(might_fault);
++EXPORT_SYMBOL(__might_fault);
+ #endif
+
+ #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
diff --git a/patches/0003-uaccess-clarify-that-uaccess-may-only-sleep-if-pagef.patch b/patches/0003-uaccess-clarify-that-uaccess-may-only-sleep-if-pagef.patch
new file mode 100644
index 00000000000000..81c503208b7f36
--- /dev/null
+++ b/patches/0003-uaccess-clarify-that-uaccess-may-only-sleep-if-pagef.patch
@@ -0,0 +1,641 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:08 +0200
+Subject: [PATCH] sched/preempt, futex: Update comments to clarify that preemption doesn't have to be disabled
+
+In general, non-atomic variants of user access functions must not sleep
+if pagefaults are disabled.
+
+Let's update all relevant comments in uaccess code. This also reflects
+the might_sleep() checks in might_fault().
+
+[upstream commit 2f09b227eeed4b3a072fe818c82a4c773b778cde]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/avr32/include/asm/uaccess.h | 12 ++++++---
+ arch/hexagon/include/asm/uaccess.h | 3 +-
+ arch/m32r/include/asm/uaccess.h | 30 +++++++++++++++-------
+ arch/microblaze/include/asm/uaccess.h | 6 +++-
+ arch/mips/include/asm/uaccess.h | 45 ++++++++++++++++++++++------------
+ arch/s390/include/asm/uaccess.h | 15 +++++++----
+ arch/score/include/asm/uaccess.h | 15 +++++++----
+ arch/tile/include/asm/uaccess.h | 18 +++++++++----
+ arch/x86/include/asm/uaccess.h | 15 +++++++----
+ arch/x86/include/asm/uaccess_32.h | 6 +++-
+ arch/x86/lib/usercopy_32.c | 6 +++-
+ lib/strnlen_user.c | 6 +++-
+ 12 files changed, 118 insertions(+), 59 deletions(-)
+
+--- a/arch/avr32/include/asm/uaccess.h
++++ b/arch/avr32/include/asm/uaccess.h
+@@ -97,7 +97,8 @@ static inline __kernel_size_t __copy_fro
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -116,7 +117,8 @@ static inline __kernel_size_t __copy_fro
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -136,7 +138,8 @@ static inline __kernel_size_t __copy_fro
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -158,7 +161,8 @@ static inline __kernel_size_t __copy_fro
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+--- a/arch/hexagon/include/asm/uaccess.h
++++ b/arch/hexagon/include/asm/uaccess.h
+@@ -36,7 +36,8 @@
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Checks if a pointer to a block of memory in user space is valid.
+ *
+--- a/arch/m32r/include/asm/uaccess.h
++++ b/arch/m32r/include/asm/uaccess.h
+@@ -91,7 +91,8 @@ static inline void set_fs(mm_segment_t s
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Checks if a pointer to a block of memory in user space is valid.
+ *
+@@ -155,7 +156,8 @@ extern int fixup_exception(struct pt_reg
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -175,7 +177,8 @@ extern int fixup_exception(struct pt_reg
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -194,7 +197,8 @@ extern int fixup_exception(struct pt_reg
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -274,7 +278,8 @@ do { \
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -568,7 +573,8 @@ unsigned long __generic_copy_from_user(v
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -588,7 +594,8 @@ unsigned long __generic_copy_from_user(v
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space.
+ *
+@@ -606,7 +613,8 @@ unsigned long __generic_copy_from_user(v
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -626,7 +634,8 @@ unsigned long __generic_copy_from_user(v
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space.
+ *
+@@ -677,7 +686,8 @@ unsigned long clear_user(void __user *me
+ * strlen_user: - Get the size of a string in user space.
+ * @str: The string to measure.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
+--- a/arch/microblaze/include/asm/uaccess.h
++++ b/arch/microblaze/include/asm/uaccess.h
+@@ -178,7 +178,8 @@ extern long __user_bad(void);
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -290,7 +291,8 @@ extern long __user_bad(void);
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+--- a/arch/mips/include/asm/uaccess.h
++++ b/arch/mips/include/asm/uaccess.h
+@@ -103,7 +103,8 @@ extern u64 __ua_limit;
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Checks if a pointer to a block of memory in user space is valid.
+ *
+@@ -138,7 +139,8 @@ extern u64 __ua_limit;
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -157,7 +159,8 @@ extern u64 __ua_limit;
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -177,7 +180,8 @@ extern u64 __ua_limit;
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -199,7 +203,8 @@ extern u64 __ua_limit;
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -498,7 +503,8 @@ extern void __put_user_unknown(void);
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -517,7 +523,8 @@ extern void __put_user_unknown(void);
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -537,7 +544,8 @@ extern void __put_user_unknown(void);
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -559,7 +567,8 @@ extern void __put_user_unknown(void);
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -815,7 +824,8 @@ extern size_t __copy_user(void *__to, co
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -888,7 +898,8 @@ extern size_t __copy_user_inatomic(void
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space.
+ *
+@@ -1075,7 +1086,8 @@ extern size_t __copy_in_user_eva(void *_
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -1107,7 +1119,8 @@ extern size_t __copy_in_user_eva(void *_
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space.
+ *
+@@ -1329,7 +1342,8 @@ strncpy_from_user(char *__to, const char
+ * strlen_user: - Get the size of a string in user space.
+ * @str: The string to measure.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
+@@ -1398,7 +1412,8 @@ static inline long __strnlen_user(const
+ * strnlen_user: - Get the size of a string in user space.
+ * @str: The string to measure.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
+--- a/arch/s390/include/asm/uaccess.h
++++ b/arch/s390/include/asm/uaccess.h
+@@ -98,7 +98,8 @@ static inline unsigned long extable_fixu
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -118,7 +119,8 @@ unsigned long __must_check __copy_from_u
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -264,7 +266,8 @@ int __get_user_bad(void) __attribute__((
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space.
+ *
+@@ -290,7 +293,8 @@ void copy_from_user_overflow(void)
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space.
+ *
+@@ -348,7 +352,8 @@ static inline unsigned long strnlen_user
+ * strlen_user: - Get the size of a string in user space.
+ * @str: The string to measure.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
+--- a/arch/score/include/asm/uaccess.h
++++ b/arch/score/include/asm/uaccess.h
+@@ -36,7 +36,8 @@
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Checks if a pointer to a block of memory in user space is valid.
+ *
+@@ -61,7 +62,8 @@
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -79,7 +81,8 @@
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -98,7 +101,8 @@
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -119,7 +123,8 @@
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+--- a/arch/tile/include/asm/uaccess.h
++++ b/arch/tile/include/asm/uaccess.h
+@@ -78,7 +78,8 @@ int __range_ok(unsigned long addr, unsig
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Checks if a pointer to a block of memory in user space is valid.
+ *
+@@ -192,7 +193,8 @@ extern int __get_user_bad(void)
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -274,7 +276,8 @@ extern int __put_user_bad(void)
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -330,7 +333,8 @@ extern int __put_user_bad(void)
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -366,7 +370,8 @@ copy_to_user(void __user *to, const void
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -437,7 +442,8 @@ static inline unsigned long __must_check
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to user space. Caller must check
+ * the specified blocks with access_ok() before calling this function.
+--- a/arch/x86/include/asm/uaccess.h
++++ b/arch/x86/include/asm/uaccess.h
+@@ -74,7 +74,8 @@ static inline bool __chk_range_not_ok(un
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Checks if a pointer to a block of memory in user space is valid.
+ *
+@@ -145,7 +146,8 @@ extern int __get_user_bad(void);
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -240,7 +242,8 @@ extern void __put_user_8(void);
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+@@ -455,7 +458,8 @@ struct __large_struct { unsigned long bu
+ * @x: Variable to store result.
+ * @ptr: Source address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple variable from user space to kernel
+ * space. It supports simple types like char and int, but not larger
+@@ -479,7 +483,8 @@ struct __large_struct { unsigned long bu
+ * @x: Value to copy to user space.
+ * @ptr: Destination address, in user space.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * This macro copies a single simple value from kernel space to user
+ * space. It supports simple types like char and int, but not larger
+--- a/arch/x86/include/asm/uaccess_32.h
++++ b/arch/x86/include/asm/uaccess_32.h
+@@ -70,7 +70,8 @@ static __always_inline unsigned long __m
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space. Caller must check
+ * the specified block with access_ok() before calling this function.
+@@ -117,7 +118,8 @@ static __always_inline unsigned long
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space. Caller must check
+ * the specified block with access_ok() before calling this function.
+--- a/arch/x86/lib/usercopy_32.c
++++ b/arch/x86/lib/usercopy_32.c
+@@ -647,7 +647,8 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocach
+ * @from: Source address, in kernel space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from kernel space to user space.
+ *
+@@ -668,7 +669,8 @@ EXPORT_SYMBOL(_copy_to_user);
+ * @from: Source address, in user space.
+ * @n: Number of bytes to copy.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Copy data from user space to kernel space.
+ *
+--- a/lib/strnlen_user.c
++++ b/lib/strnlen_user.c
+@@ -85,7 +85,8 @@ static inline long do_strnlen_user(const
+ * @str: The string to measure.
+ * @count: Maximum count (including NUL character)
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
+@@ -121,7 +122,8 @@ EXPORT_SYMBOL(strnlen_user);
+ * strlen_user: - Get the size of a user string INCLUDING final NUL.
+ * @str: The string to measure.
+ *
+- * Context: User context only. This function may sleep.
++ * Context: User context only. This function may sleep if pagefaults are
++ * enabled.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
diff --git a/patches/0004-ipc-mqueue-Implement-lockless-pipelined-wakeups.patch b/patches/0004-ipc-mqueue-Implement-lockless-pipelined-wakeups.patch
new file mode 100644
index 00000000000000..4a5b153bd6f0c0
--- /dev/null
+++ b/patches/0004-ipc-mqueue-Implement-lockless-pipelined-wakeups.patch
@@ -0,0 +1,183 @@
+From: Davidlohr Bueso <dave@stgolabs.net>
+Date: Mon, 4 May 2015 07:02:46 -0700
+Subject: ipc/mqueue: Implement lockless pipelined wakeups
+
+This patch moves the wakeup_process() invocation so it is not done under
+the info->lock by making use of a lockless wake_q. With this change, the
+waiter is woken up once it is STATE_READY and it does not need to loop
+on SMP if it is still in STATE_PENDING. In the timeout case we still need
+to grab the info->lock to verify the state.
+
+This change should also avoid the introduction of preempt_disable() in -rt
+which avoids a busy-loop which pools for the STATE_PENDING -> STATE_READY
+change if the waiter has a higher priority compared to the waker.
+
+Additionally, this patch micro-optimizes wq_sleep by using the cheaper
+cousin of set_current_state(TASK_INTERRUPTABLE) as we will block no
+matter what, thus get rid of the implied barrier.
+
+[upstream commit fa6004ad4528153b699a4d5ce5ea6b33acce74cc]
+
+Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Acked-by: George Spelvin <linux@horizon.com>
+Acked-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Chris Mason <clm@fb.com>
+Cc: H. Peter Anvin <hpa@zytor.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Manfred Spraul <manfred@colorfullife.com>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Cc: dave@stgolabs.net
+Link: http://lkml.kernel.org/r/1430748166.1940.17.camel@stgolabs.net
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ ipc/mqueue.c | 54 +++++++++++++++++++++++++++++++++---------------------
+ 1 file changed, 33 insertions(+), 21 deletions(-)
+
+--- a/ipc/mqueue.c
++++ b/ipc/mqueue.c
+@@ -47,8 +47,7 @@
+ #define RECV 1
+
+ #define STATE_NONE 0
+-#define STATE_PENDING 1
+-#define STATE_READY 2
++#define STATE_READY 1
+
+ struct posix_msg_tree_node {
+ struct rb_node rb_node;
+@@ -571,15 +570,12 @@ static int wq_sleep(struct mqueue_inode_
+ wq_add(info, sr, ewp);
+
+ for (;;) {
+- set_current_state(TASK_INTERRUPTIBLE);
++ __set_current_state(TASK_INTERRUPTIBLE);
+
+ spin_unlock(&info->lock);
+ time = schedule_hrtimeout_range_clock(timeout, 0,
+ HRTIMER_MODE_ABS, CLOCK_REALTIME);
+
+- while (ewp->state == STATE_PENDING)
+- cpu_relax();
+-
+ if (ewp->state == STATE_READY) {
+ retval = 0;
+ goto out;
+@@ -907,11 +903,15 @@ SYSCALL_DEFINE1(mq_unlink, const char __
+ * list of waiting receivers. A sender checks that list before adding the new
+ * message into the message array. If there is a waiting receiver, then it
+ * bypasses the message array and directly hands the message over to the
+- * receiver.
+- * The receiver accepts the message and returns without grabbing the queue
+- * spinlock. Therefore an intermediate STATE_PENDING state and memory barriers
+- * are necessary. The same algorithm is used for sysv semaphores, see
+- * ipc/sem.c for more details.
++ * receiver. The receiver accepts the message and returns without grabbing the
++ * queue spinlock:
++ *
++ * - Set pointer to message.
++ * - Queue the receiver task for later wakeup (without the info->lock).
++ * - Update its state to STATE_READY. Now the receiver can continue.
++ * - Wake up the process after the lock is dropped. Should the process wake up
++ * before this wakeup (due to a timeout or a signal) it will either see
++ * STATE_READY and continue or acquire the lock to check the state again.
+ *
+ * The same algorithm is used for senders.
+ */
+@@ -919,21 +919,29 @@ SYSCALL_DEFINE1(mq_unlink, const char __
+ /* pipelined_send() - send a message directly to the task waiting in
+ * sys_mq_timedreceive() (without inserting message into a queue).
+ */
+-static inline void pipelined_send(struct mqueue_inode_info *info,
++static inline void pipelined_send(struct wake_q_head *wake_q,
++ struct mqueue_inode_info *info,
+ struct msg_msg *message,
+ struct ext_wait_queue *receiver)
+ {
+ receiver->msg = message;
+ list_del(&receiver->list);
+- receiver->state = STATE_PENDING;
+- wake_up_process(receiver->task);
+- smp_wmb();
++ wake_q_add(wake_q, receiver->task);
++ /*
++ * Rely on the implicit cmpxchg barrier from wake_q_add such
++ * that we can ensure that updating receiver->state is the last
++ * write operation: As once set, the receiver can continue,
++ * and if we don't have the reference count from the wake_q,
++ * yet, at that point we can later have a use-after-free
++ * condition and bogus wakeup.
++ */
+ receiver->state = STATE_READY;
+ }
+
+ /* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
+ * gets its message and put to the queue (we have one free place for sure). */
+-static inline void pipelined_receive(struct mqueue_inode_info *info)
++static inline void pipelined_receive(struct wake_q_head *wake_q,
++ struct mqueue_inode_info *info)
+ {
+ struct ext_wait_queue *sender = wq_get_first_waiter(info, SEND);
+
+@@ -944,10 +952,9 @@ static inline void pipelined_receive(str
+ }
+ if (msg_insert(sender->msg, info))
+ return;
++
+ list_del(&sender->list);
+- sender->state = STATE_PENDING;
+- wake_up_process(sender->task);
+- smp_wmb();
++ wake_q_add(wake_q, sender->task);
+ sender->state = STATE_READY;
+ }
+
+@@ -965,6 +972,7 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqd
+ struct timespec ts;
+ struct posix_msg_tree_node *new_leaf = NULL;
+ int ret = 0;
++ WAKE_Q(wake_q);
+
+ if (u_abs_timeout) {
+ int res = prepare_timeout(u_abs_timeout, &expires, &ts);
+@@ -1049,7 +1057,7 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqd
+ } else {
+ receiver = wq_get_first_waiter(info, RECV);
+ if (receiver) {
+- pipelined_send(info, msg_ptr, receiver);
++ pipelined_send(&wake_q, info, msg_ptr, receiver);
+ } else {
+ /* adds message to the queue */
+ ret = msg_insert(msg_ptr, info);
+@@ -1062,6 +1070,7 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqd
+ }
+ out_unlock:
+ spin_unlock(&info->lock);
++ wake_up_q(&wake_q);
+ out_free:
+ if (ret)
+ free_msg(msg_ptr);
+@@ -1149,14 +1158,17 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t,
+ msg_ptr = wait.msg;
+ }
+ } else {
++ WAKE_Q(wake_q);
++
+ msg_ptr = msg_get(info);
+
+ inode->i_atime = inode->i_mtime = inode->i_ctime =
+ CURRENT_TIME;
+
+ /* There is now free space in queue. */
+- pipelined_receive(info);
++ pipelined_receive(&wake_q, info);
+ spin_unlock(&info->lock);
++ wake_up_q(&wake_q);
+ ret = 0;
+ }
+ if (ret == 0) {
diff --git a/patches/0004-mm-explicitly-disable-enable-preemption-in-kmap_atom.patch b/patches/0004-mm-explicitly-disable-enable-preemption-in-kmap_atom.patch
new file mode 100644
index 00000000000000..c74a65d9272f01
--- /dev/null
+++ b/patches/0004-mm-explicitly-disable-enable-preemption-in-kmap_atom.patch
@@ -0,0 +1,367 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:09 +0200
+Subject: sched/preempt, mm/kmap: Explicitly disable/enable preemption in kmap_atomic_*
+
+The existing code relies on pagefault_disable() implicitly disabling
+preemption, so that no schedule will happen between kmap_atomic() and
+kunmap_atomic().
+
+Let's make this explicit, to prepare for pagefault_disable() not
+touching preemption anymore.
+
+[uptream commit 2cb7c9cb426660b5ed58b643d9e7dd5d50ba901f]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/arm/mm/highmem.c | 3 +++
+ arch/frv/mm/highmem.c | 2 ++
+ arch/metag/mm/highmem.c | 4 +++-
+ arch/microblaze/mm/highmem.c | 4 +++-
+ arch/mips/mm/highmem.c | 5 ++++-
+ arch/mn10300/include/asm/highmem.h | 3 +++
+ arch/parisc/include/asm/cacheflush.h | 2 ++
+ arch/powerpc/mm/highmem.c | 4 +++-
+ arch/sparc/mm/highmem.c | 4 +++-
+ arch/tile/mm/highmem.c | 3 ++-
+ arch/x86/mm/highmem_32.c | 3 ++-
+ arch/x86/mm/iomap_32.c | 2 ++
+ arch/xtensa/mm/highmem.c | 2 ++
+ include/linux/highmem.h | 2 ++
+ include/linux/io-mapping.h | 2 ++
+ 15 files changed, 38 insertions(+), 7 deletions(-)
+
+--- a/arch/arm/mm/highmem.c
++++ b/arch/arm/mm/highmem.c
+@@ -59,6 +59,7 @@ void *kmap_atomic(struct page *page)
+ void *kmap;
+ int type;
+
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -121,6 +122,7 @@ void __kunmap_atomic(void *kvaddr)
+ kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
+ }
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+@@ -130,6 +132,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
+ int idx, type;
+ struct page *page = pfn_to_page(pfn);
+
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+--- a/arch/frv/mm/highmem.c
++++ b/arch/frv/mm/highmem.c
+@@ -42,6 +42,7 @@ void *kmap_atomic(struct page *page)
+ unsigned long paddr;
+ int type;
+
++ preempt_disable();
+ pagefault_disable();
+ type = kmap_atomic_idx_push();
+ paddr = page_to_phys(page);
+@@ -85,5 +86,6 @@ void __kunmap_atomic(void *kvaddr)
+ }
+ kmap_atomic_idx_pop();
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+--- a/arch/metag/mm/highmem.c
++++ b/arch/metag/mm/highmem.c
+@@ -43,7 +43,7 @@ void *kmap_atomic(struct page *page)
+ unsigned long vaddr;
+ int type;
+
+- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -82,6 +82,7 @@ void __kunmap_atomic(void *kvaddr)
+ }
+
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+@@ -95,6 +96,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
+ unsigned long vaddr;
+ int type;
+
++ preempt_disable();
+ pagefault_disable();
+
+ type = kmap_atomic_idx_push();
+--- a/arch/microblaze/mm/highmem.c
++++ b/arch/microblaze/mm/highmem.c
+@@ -37,7 +37,7 @@ void *kmap_atomic_prot(struct page *page
+ unsigned long vaddr;
+ int idx, type;
+
+- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -63,6 +63,7 @@ void __kunmap_atomic(void *kvaddr)
+
+ if (vaddr < __fix_to_virt(FIX_KMAP_END)) {
+ pagefault_enable();
++ preempt_enable();
+ return;
+ }
+
+@@ -84,5 +85,6 @@ void __kunmap_atomic(void *kvaddr)
+ #endif
+ kmap_atomic_idx_pop();
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+--- a/arch/mips/mm/highmem.c
++++ b/arch/mips/mm/highmem.c
+@@ -47,7 +47,7 @@ void *kmap_atomic(struct page *page)
+ unsigned long vaddr;
+ int idx, type;
+
+- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -72,6 +72,7 @@ void __kunmap_atomic(void *kvaddr)
+
+ if (vaddr < FIXADDR_START) { // FIXME
+ pagefault_enable();
++ preempt_enable();
+ return;
+ }
+
+@@ -92,6 +93,7 @@ void __kunmap_atomic(void *kvaddr)
+ #endif
+ kmap_atomic_idx_pop();
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+@@ -104,6 +106,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
+ unsigned long vaddr;
+ int idx, type;
+
++ preempt_disable();
+ pagefault_disable();
+
+ type = kmap_atomic_idx_push();
+--- a/arch/mn10300/include/asm/highmem.h
++++ b/arch/mn10300/include/asm/highmem.h
+@@ -75,6 +75,7 @@ static inline void *kmap_atomic(struct p
+ unsigned long vaddr;
+ int idx, type;
+
++ preempt_disable();
+ pagefault_disable();
+ if (page < highmem_start_page)
+ return page_address(page);
+@@ -98,6 +99,7 @@ static inline void __kunmap_atomic(unsig
+
+ if (vaddr < FIXADDR_START) { /* FIXME */
+ pagefault_enable();
++ preempt_enable();
+ return;
+ }
+
+@@ -122,6 +124,7 @@ static inline void __kunmap_atomic(unsig
+
+ kmap_atomic_idx_pop();
+ pagefault_enable();
++ preempt_enable();
+ }
+ #endif /* __KERNEL__ */
+
+--- a/arch/parisc/include/asm/cacheflush.h
++++ b/arch/parisc/include/asm/cacheflush.h
+@@ -142,6 +142,7 @@ static inline void kunmap(struct page *p
+
+ static inline void *kmap_atomic(struct page *page)
+ {
++ preempt_disable();
+ pagefault_disable();
+ return page_address(page);
+ }
+@@ -150,6 +151,7 @@ static inline void __kunmap_atomic(void
+ {
+ flush_kernel_dcache_page_addr(addr);
+ pagefault_enable();
++ preempt_enable();
+ }
+
+ #define kmap_atomic_prot(page, prot) kmap_atomic(page)
+--- a/arch/powerpc/mm/highmem.c
++++ b/arch/powerpc/mm/highmem.c
+@@ -34,7 +34,7 @@ void *kmap_atomic_prot(struct page *page
+ unsigned long vaddr;
+ int idx, type;
+
+- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -59,6 +59,7 @@ void __kunmap_atomic(void *kvaddr)
+
+ if (vaddr < __fix_to_virt(FIX_KMAP_END)) {
+ pagefault_enable();
++ preempt_enable();
+ return;
+ }
+
+@@ -82,5 +83,6 @@ void __kunmap_atomic(void *kvaddr)
+
+ kmap_atomic_idx_pop();
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+--- a/arch/sparc/mm/highmem.c
++++ b/arch/sparc/mm/highmem.c
+@@ -53,7 +53,7 @@ void *kmap_atomic(struct page *page)
+ unsigned long vaddr;
+ long idx, type;
+
+- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -91,6 +91,7 @@ void __kunmap_atomic(void *kvaddr)
+
+ if (vaddr < FIXADDR_START) { // FIXME
+ pagefault_enable();
++ preempt_enable();
+ return;
+ }
+
+@@ -126,5 +127,6 @@ void __kunmap_atomic(void *kvaddr)
+
+ kmap_atomic_idx_pop();
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+--- a/arch/tile/mm/highmem.c
++++ b/arch/tile/mm/highmem.c
+@@ -201,7 +201,7 @@ void *kmap_atomic_prot(struct page *page
+ int idx, type;
+ pte_t *pte;
+
+- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
++ preempt_disable();
+ pagefault_disable();
+
+ /* Avoid icache flushes by disallowing atomic executable mappings. */
+@@ -259,6 +259,7 @@ void __kunmap_atomic(void *kvaddr)
+ }
+
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+--- a/arch/x86/mm/highmem_32.c
++++ b/arch/x86/mm/highmem_32.c
+@@ -35,7 +35,7 @@ void *kmap_atomic_prot(struct page *page
+ unsigned long vaddr;
+ int idx, type;
+
+- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
++ preempt_disable();
+ pagefault_disable();
+
+ if (!PageHighMem(page))
+@@ -100,6 +100,7 @@ void __kunmap_atomic(void *kvaddr)
+ #endif
+
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+--- a/arch/x86/mm/iomap_32.c
++++ b/arch/x86/mm/iomap_32.c
+@@ -59,6 +59,7 @@ void *kmap_atomic_prot_pfn(unsigned long
+ unsigned long vaddr;
+ int idx, type;
+
++ preempt_disable();
+ pagefault_disable();
+
+ type = kmap_atomic_idx_push();
+@@ -117,5 +118,6 @@ iounmap_atomic(void __iomem *kvaddr)
+ }
+
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL_GPL(iounmap_atomic);
+--- a/arch/xtensa/mm/highmem.c
++++ b/arch/xtensa/mm/highmem.c
+@@ -42,6 +42,7 @@ void *kmap_atomic(struct page *page)
+ enum fixed_addresses idx;
+ unsigned long vaddr;
+
++ preempt_disable();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -79,6 +80,7 @@ void __kunmap_atomic(void *kvaddr)
+ }
+
+ pagefault_enable();
++ preempt_enable();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+--- a/include/linux/highmem.h
++++ b/include/linux/highmem.h
+@@ -65,6 +65,7 @@ static inline void kunmap(struct page *p
+
+ static inline void *kmap_atomic(struct page *page)
+ {
++ preempt_disable();
+ pagefault_disable();
+ return page_address(page);
+ }
+@@ -73,6 +74,7 @@ static inline void *kmap_atomic(struct p
+ static inline void __kunmap_atomic(void *addr)
+ {
+ pagefault_enable();
++ preempt_enable();
+ }
+
+ #define kmap_atomic_pfn(pfn) kmap_atomic(pfn_to_page(pfn))
+--- a/include/linux/io-mapping.h
++++ b/include/linux/io-mapping.h
+@@ -141,6 +141,7 @@ static inline void __iomem *
+ io_mapping_map_atomic_wc(struct io_mapping *mapping,
+ unsigned long offset)
+ {
++ preempt_disable();
+ pagefault_disable();
+ return ((char __force __iomem *) mapping) + offset;
+ }
+@@ -149,6 +150,7 @@ static inline void
+ io_mapping_unmap_atomic(void __iomem *vaddr)
+ {
+ pagefault_enable();
++ preempt_enable();
+ }
+
+ /* Non-atomic map/unmap */
diff --git a/patches/0005-futex-Ensure-lock-unlock-symetry-versus-pi_lock-and-.patch b/patches/0005-futex-Ensure-lock-unlock-symetry-versus-pi_lock-and-.patch
new file mode 100644
index 00000000000000..caabe01f3fd389
--- /dev/null
+++ b/patches/0005-futex-Ensure-lock-unlock-symetry-versus-pi_lock-and-.patch
@@ -0,0 +1,42 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 1 Mar 2013 11:17:42 +0100
+Subject: futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock
+
+In exit_pi_state_list() we have the following locking construct:
+
+ spin_lock(&hb->lock);
+ raw_spin_lock_irq(&curr->pi_lock);
+
+ ...
+ spin_unlock(&hb->lock);
+
+In !RT this works, but on RT the migrate_enable() function which is
+called from spin_unlock() sees atomic context due to the held pi_lock
+and just decrements the migrate_disable_atomic counter of the
+task. Now the next call to migrate_disable() sees the counter being
+negative and issues a warning. That check should be in
+migrate_enable() already.
+
+Fix this by dropping pi_lock before unlocking hb->lock and reaquire
+pi_lock after that again. This is safe as the loop code reevaluates
+head again under the pi_lock.
+
+Reported-by: Yong Zhang <yong.zhang@windriver.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/futex.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/kernel/futex.c
++++ b/kernel/futex.c
+@@ -738,7 +738,9 @@ void exit_pi_state_list(struct task_stru
+ * task still owns the PI-state:
+ */
+ if (head->next != next) {
++ raw_spin_unlock_irq(&curr->pi_lock);
+ spin_unlock(&hb->lock);
++ raw_spin_lock_irq(&curr->pi_lock);
+ continue;
+ }
+
diff --git a/patches/0005-mips-kmap_coherent-relies-on-disabled-preemption.patch b/patches/0005-mips-kmap_coherent-relies-on-disabled-preemption.patch
new file mode 100644
index 00000000000000..694fad25ad8017
--- /dev/null
+++ b/patches/0005-mips-kmap_coherent-relies-on-disabled-preemption.patch
@@ -0,0 +1,40 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:10 +0200
+Subject: sched/preempt, mm/kmap, MIPS: Disable preemption in kmap_coherent() explicitly
+
+k(un)map_coherent relies on pagefault_disable() to also disable
+preemption.
+
+Let's make this explicit, to prepare for pagefault_disable() not
+touching preemption anymore.
+
+This patch is based on a patch by Yang Shi on the -rt tree:
+"k{un}map_coherent are just called when cpu_has_dc_aliases == 1 with VIPT
+cache. However, actually, the most modern MIPS processors have PIPT dcache
+without dcache alias issue. In such case, k{un}map_atomic will be called
+with preempt enabled."
+
+[upstream commit ce01948eb85da733558fa77c2a554144a57ab0fb]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/mips/mm/init.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/arch/mips/mm/init.c
++++ b/arch/mips/mm/init.c
+@@ -90,6 +90,7 @@ static void *__kmap_pgprot(struct page *
+
+ BUG_ON(Page_dcache_dirty(page));
+
++ preempt_disable();
+ pagefault_disable();
+ idx = (addr >> PAGE_SHIFT) & (FIX_N_COLOURS - 1);
+ idx += in_interrupt() ? FIX_N_COLOURS : 0;
+@@ -152,6 +153,7 @@ void kunmap_coherent(void)
+ write_c0_entryhi(old_ctx);
+ local_irq_restore(flags);
+ pagefault_enable();
++ preempt_enable();
+ }
+
+ void copy_user_highpage(struct page *to, struct page *from,
diff --git a/patches/0006-mm-use-pagefault_disable-to-check-for-disabled-pagef.patch b/patches/0006-mm-use-pagefault_disable-to-check-for-disabled-pagef.patch
new file mode 100644
index 00000000000000..b3c7d3760cfa5c
--- /dev/null
+++ b/patches/0006-mm-use-pagefault_disable-to-check-for-disabled-pagef.patch
@@ -0,0 +1,646 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:11 +0200
+Subject: mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler
+
+Introduce faulthandler_disabled() and use it to check for irq context and
+disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
+
+Please note that we keep the in_atomic() checks in place - to detect
+whether in irq context (in which case preemption is always properly
+disabled).
+
+In contrast, preempt_disable() should never be used to disable pagefaults.
+With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
+counter, and therefore the result of in_atomic() differs.
+We validate that condition by using might_fault() checks when calling
+might_sleep().
+
+Therefore, add a comment to faulthandler_disabled(), describing why this
+is needed.
+
+faulthandler_disabled() and pagefault_disable() are defined in
+linux/uaccess.h, so let's properly add that include to all relevant files.
+
+This patch is based on a patch from Thomas Gleixner.
+
+[upstream commit 70ffdb9393a7264a069265edded729078dcf0425]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/alpha/mm/fault.c | 5 ++---
+ arch/arc/mm/fault.c | 2 +-
+ arch/arm/mm/fault.c | 2 +-
+ arch/arm64/mm/fault.c | 2 +-
+ arch/avr32/mm/fault.c | 4 ++--
+ arch/cris/mm/fault.c | 6 +++---
+ arch/frv/mm/fault.c | 4 ++--
+ arch/ia64/mm/fault.c | 4 ++--
+ arch/m32r/mm/fault.c | 8 ++++----
+ arch/m68k/mm/fault.c | 4 ++--
+ arch/metag/mm/fault.c | 2 +-
+ arch/microblaze/mm/fault.c | 8 ++++----
+ arch/mips/mm/fault.c | 4 ++--
+ arch/mn10300/mm/fault.c | 4 ++--
+ arch/nios2/mm/fault.c | 2 +-
+ arch/parisc/kernel/traps.c | 4 ++--
+ arch/parisc/mm/fault.c | 4 ++--
+ arch/powerpc/mm/fault.c | 9 +++++----
+ arch/s390/mm/fault.c | 2 +-
+ arch/score/mm/fault.c | 3 ++-
+ arch/sh/mm/fault.c | 5 +++--
+ arch/sparc/mm/fault_32.c | 4 ++--
+ arch/sparc/mm/fault_64.c | 4 ++--
+ arch/sparc/mm/init_64.c | 2 +-
+ arch/tile/mm/fault.c | 4 ++--
+ arch/um/kernel/trap.c | 4 ++--
+ arch/unicore32/mm/fault.c | 2 +-
+ arch/x86/mm/fault.c | 5 +++--
+ arch/xtensa/mm/fault.c | 4 ++--
+ include/linux/uaccess.h | 12 ++++++++++++
+ 30 files changed, 72 insertions(+), 57 deletions(-)
+
+--- a/arch/alpha/mm/fault.c
++++ b/arch/alpha/mm/fault.c
+@@ -23,8 +23,7 @@
+ #include <linux/smp.h>
+ #include <linux/interrupt.h>
+ #include <linux/module.h>
+-
+-#include <asm/uaccess.h>
++#include <linux/uaccess.h>
+
+ extern void die_if_kernel(char *,struct pt_regs *,long, unsigned long *);
+
+@@ -107,7 +106,7 @@ do_page_fault(unsigned long address, uns
+
+ /* If we're in an interrupt context, or have no user context,
+ we must not take the fault. */
+- if (!mm || in_atomic())
++ if (!mm || faulthandler_disabled())
+ goto no_context;
+
+ #ifdef CONFIG_ALPHA_LARGE_VMALLOC
+--- a/arch/arc/mm/fault.c
++++ b/arch/arc/mm/fault.c
+@@ -86,7 +86,7 @@ void do_page_fault(unsigned long address
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(regs))
+--- a/arch/arm/mm/fault.c
++++ b/arch/arm/mm/fault.c
+@@ -276,7 +276,7 @@ do_page_fault(unsigned long addr, unsign
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(regs))
+--- a/arch/arm64/mm/fault.c
++++ b/arch/arm64/mm/fault.c
+@@ -211,7 +211,7 @@ static int __kprobes do_page_fault(unsig
+ * If we're in an interrupt or have no user context, we must not take
+ * the fault.
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(regs))
+--- a/arch/avr32/mm/fault.c
++++ b/arch/avr32/mm/fault.c
+@@ -14,11 +14,11 @@
+ #include <linux/pagemap.h>
+ #include <linux/kdebug.h>
+ #include <linux/kprobes.h>
++#include <linux/uaccess.h>
+
+ #include <asm/mmu_context.h>
+ #include <asm/sysreg.h>
+ #include <asm/tlb.h>
+-#include <asm/uaccess.h>
+
+ #ifdef CONFIG_KPROBES
+ static inline int notify_page_fault(struct pt_regs *regs, int trap)
+@@ -81,7 +81,7 @@ asmlinkage void do_page_fault(unsigned l
+ * If we're in an interrupt or have no user context, we must
+ * not take the fault...
+ */
+- if (in_atomic() || !mm || regs->sr & SYSREG_BIT(GM))
++ if (faulthandler_disabled() || !mm || regs->sr & SYSREG_BIT(GM))
+ goto no_context;
+
+ local_irq_enable();
+--- a/arch/cris/mm/fault.c
++++ b/arch/cris/mm/fault.c
+@@ -8,7 +8,7 @@
+ #include <linux/interrupt.h>
+ #include <linux/module.h>
+ #include <linux/wait.h>
+-#include <asm/uaccess.h>
++#include <linux/uaccess.h>
+ #include <arch/system.h>
+
+ extern int find_fixup_code(struct pt_regs *);
+@@ -109,11 +109,11 @@ do_page_fault(unsigned long address, str
+ info.si_code = SEGV_MAPERR;
+
+ /*
+- * If we're in an interrupt or "atomic" operation or have no
++ * If we're in an interrupt, have pagefaults disabled or have no
+ * user context, we must not take the fault.
+ */
+
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(regs))
+--- a/arch/frv/mm/fault.c
++++ b/arch/frv/mm/fault.c
+@@ -19,9 +19,9 @@
+ #include <linux/kernel.h>
+ #include <linux/ptrace.h>
+ #include <linux/hardirq.h>
++#include <linux/uaccess.h>
+
+ #include <asm/pgtable.h>
+-#include <asm/uaccess.h>
+ #include <asm/gdb-stub.h>
+
+ /*****************************************************************************/
+@@ -78,7 +78,7 @@ asmlinkage void do_page_fault(int datamm
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(__frame))
+--- a/arch/ia64/mm/fault.c
++++ b/arch/ia64/mm/fault.c
+@@ -11,10 +11,10 @@
+ #include <linux/kprobes.h>
+ #include <linux/kdebug.h>
+ #include <linux/prefetch.h>
++#include <linux/uaccess.h>
+
+ #include <asm/pgtable.h>
+ #include <asm/processor.h>
+-#include <asm/uaccess.h>
+
+ extern int die(char *, struct pt_regs *, long);
+
+@@ -96,7 +96,7 @@ ia64_do_page_fault (unsigned long addres
+ /*
+ * If we're in an interrupt or have no user context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ #ifdef CONFIG_VIRTUAL_MEM_MAP
+--- a/arch/m32r/mm/fault.c
++++ b/arch/m32r/mm/fault.c
+@@ -24,9 +24,9 @@
+ #include <linux/vt_kern.h> /* For unblank_screen() */
+ #include <linux/highmem.h>
+ #include <linux/module.h>
++#include <linux/uaccess.h>
+
+ #include <asm/m32r.h>
+-#include <asm/uaccess.h>
+ #include <asm/hardirq.h>
+ #include <asm/mmu_context.h>
+ #include <asm/tlbflush.h>
+@@ -111,10 +111,10 @@ asmlinkage void do_page_fault(struct pt_
+ mm = tsk->mm;
+
+ /*
+- * If we're in an interrupt or have no user context or are running in an
+- * atomic region then we must not take the fault..
++ * If we're in an interrupt or have no user context or have pagefaults
++ * disabled then we must not take the fault.
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto bad_area_nosemaphore;
+
+ if (error_code & ACE_USERMODE)
+--- a/arch/m68k/mm/fault.c
++++ b/arch/m68k/mm/fault.c
+@@ -10,10 +10,10 @@
+ #include <linux/ptrace.h>
+ #include <linux/interrupt.h>
+ #include <linux/module.h>
++#include <linux/uaccess.h>
+
+ #include <asm/setup.h>
+ #include <asm/traps.h>
+-#include <asm/uaccess.h>
+ #include <asm/pgalloc.h>
+
+ extern void die_if_kernel(char *, struct pt_regs *, long);
+@@ -81,7 +81,7 @@ int do_page_fault(struct pt_regs *regs,
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(regs))
+--- a/arch/metag/mm/fault.c
++++ b/arch/metag/mm/fault.c
+@@ -105,7 +105,7 @@ int do_page_fault(struct pt_regs *regs,
+
+ mm = tsk->mm;
+
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(regs))
+--- a/arch/microblaze/mm/fault.c
++++ b/arch/microblaze/mm/fault.c
+@@ -107,14 +107,14 @@ void do_page_fault(struct pt_regs *regs,
+ if ((error_code & 0x13) == 0x13 || (error_code & 0x11) == 0x11)
+ is_write = 0;
+
+- if (unlikely(in_atomic() || !mm)) {
++ if (unlikely(faulthandler_disabled() || !mm)) {
+ if (kernel_mode(regs))
+ goto bad_area_nosemaphore;
+
+- /* in_atomic() in user mode is really bad,
++ /* faulthandler_disabled() in user mode is really bad,
+ as is current->mm == NULL. */
+- pr_emerg("Page fault in user mode with in_atomic(), mm = %p\n",
+- mm);
++ pr_emerg("Page fault in user mode with faulthandler_disabled(), mm = %p\n",
++ mm);
+ pr_emerg("r15 = %lx MSR = %lx\n",
+ regs->r15, regs->msr);
+ die("Weird page fault", regs, SIGSEGV);
+--- a/arch/mips/mm/fault.c
++++ b/arch/mips/mm/fault.c
+@@ -21,10 +21,10 @@
+ #include <linux/module.h>
+ #include <linux/kprobes.h>
+ #include <linux/perf_event.h>
++#include <linux/uaccess.h>
+
+ #include <asm/branch.h>
+ #include <asm/mmu_context.h>
+-#include <asm/uaccess.h>
+ #include <asm/ptrace.h>
+ #include <asm/highmem.h> /* For VMALLOC_END */
+ #include <linux/kdebug.h>
+@@ -94,7 +94,7 @@ static void __kprobes __do_page_fault(st
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto bad_area_nosemaphore;
+
+ if (user_mode(regs))
+--- a/arch/mn10300/mm/fault.c
++++ b/arch/mn10300/mm/fault.c
+@@ -23,8 +23,8 @@
+ #include <linux/interrupt.h>
+ #include <linux/init.h>
+ #include <linux/vt_kern.h> /* For unblank_screen() */
++#include <linux/uaccess.h>
+
+-#include <asm/uaccess.h>
+ #include <asm/pgalloc.h>
+ #include <asm/hardirq.h>
+ #include <asm/cpu-regs.h>
+@@ -168,7 +168,7 @@ asmlinkage void do_page_fault(struct pt_
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR)
+--- a/arch/nios2/mm/fault.c
++++ b/arch/nios2/mm/fault.c
+@@ -77,7 +77,7 @@ asmlinkage void do_page_fault(struct pt_
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto bad_area_nosemaphore;
+
+ if (user_mode(regs))
+--- a/arch/parisc/kernel/traps.c
++++ b/arch/parisc/kernel/traps.c
+@@ -26,9 +26,9 @@
+ #include <linux/console.h>
+ #include <linux/bug.h>
+ #include <linux/ratelimit.h>
++#include <linux/uaccess.h>
+
+ #include <asm/assembly.h>
+-#include <asm/uaccess.h>
+ #include <asm/io.h>
+ #include <asm/irq.h>
+ #include <asm/traps.h>
+@@ -800,7 +800,7 @@ void notrace handle_interruption(int cod
+ * unless pagefault_disable() was called before.
+ */
+
+- if (fault_space == 0 && !in_atomic())
++ if (fault_space == 0 && !faulthandler_disabled())
+ {
+ pdc_chassis_send_status(PDC_CHASSIS_DIRECT_PANIC);
+ parisc_terminate("Kernel Fault", regs, code, fault_address);
+--- a/arch/parisc/mm/fault.c
++++ b/arch/parisc/mm/fault.c
+@@ -15,8 +15,8 @@
+ #include <linux/sched.h>
+ #include <linux/interrupt.h>
+ #include <linux/module.h>
++#include <linux/uaccess.h>
+
+-#include <asm/uaccess.h>
+ #include <asm/traps.h>
+
+ /* Various important other fields */
+@@ -207,7 +207,7 @@ void do_page_fault(struct pt_regs *regs,
+ int fault;
+ unsigned int flags;
+
+- if (in_atomic())
++ if (pagefault_disabled())
+ goto no_context;
+
+ tsk = current;
+--- a/arch/powerpc/mm/fault.c
++++ b/arch/powerpc/mm/fault.c
+@@ -33,13 +33,13 @@
+ #include <linux/ratelimit.h>
+ #include <linux/context_tracking.h>
+ #include <linux/hugetlb.h>
++#include <linux/uaccess.h>
+
+ #include <asm/firmware.h>
+ #include <asm/page.h>
+ #include <asm/pgtable.h>
+ #include <asm/mmu.h>
+ #include <asm/mmu_context.h>
+-#include <asm/uaccess.h>
+ #include <asm/tlbflush.h>
+ #include <asm/siginfo.h>
+ #include <asm/debug.h>
+@@ -272,15 +272,16 @@ int __kprobes do_page_fault(struct pt_re
+ if (!arch_irq_disabled_regs(regs))
+ local_irq_enable();
+
+- if (in_atomic() || mm == NULL) {
++ if (faulthandler_disabled() || mm == NULL) {
+ if (!user_mode(regs)) {
+ rc = SIGSEGV;
+ goto bail;
+ }
+- /* in_atomic() in user mode is really bad,
++ /* faulthandler_disabled() in user mode is really bad,
+ as is current->mm == NULL. */
+ printk(KERN_EMERG "Page fault in user mode with "
+- "in_atomic() = %d mm = %p\n", in_atomic(), mm);
++ "faulthandler_disabled() = %d mm = %p\n",
++ faulthandler_disabled(), mm);
+ printk(KERN_EMERG "NIP = %lx MSR = %lx\n",
+ regs->nip, regs->msr);
+ die("Weird page fault", regs, SIGSEGV);
+--- a/arch/s390/mm/fault.c
++++ b/arch/s390/mm/fault.c
+@@ -399,7 +399,7 @@ static inline int do_exception(struct pt
+ * user context.
+ */
+ fault = VM_FAULT_BADCONTEXT;
+- if (unlikely(!user_space_fault(regs) || in_atomic() || !mm))
++ if (unlikely(!user_space_fault(regs) || faulthandler_disabled() || !mm))
+ goto out;
+
+ address = trans_exc_code & __FAIL_ADDR_MASK;
+--- a/arch/score/mm/fault.c
++++ b/arch/score/mm/fault.c
+@@ -34,6 +34,7 @@
+ #include <linux/string.h>
+ #include <linux/types.h>
+ #include <linux/ptrace.h>
++#include <linux/uaccess.h>
+
+ /*
+ * This routine handles page faults. It determines the address,
+@@ -73,7 +74,7 @@ asmlinkage void do_page_fault(struct pt_
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (pagefault_disabled() || !mm)
+ goto bad_area_nosemaphore;
+
+ if (user_mode(regs))
+--- a/arch/sh/mm/fault.c
++++ b/arch/sh/mm/fault.c
+@@ -17,6 +17,7 @@
+ #include <linux/kprobes.h>
+ #include <linux/perf_event.h>
+ #include <linux/kdebug.h>
++#include <linux/uaccess.h>
+ #include <asm/io_trapped.h>
+ #include <asm/mmu_context.h>
+ #include <asm/tlbflush.h>
+@@ -438,9 +439,9 @@ asmlinkage void __kprobes do_page_fault(
+
+ /*
+ * If we're in an interrupt, have no user context or are running
+- * in an atomic region then we must not take the fault:
++ * with pagefaults disabled then we must not take the fault:
+ */
+- if (unlikely(in_atomic() || !mm)) {
++ if (unlikely(faulthandler_disabled() || !mm)) {
+ bad_area_nosemaphore(regs, error_code, address);
+ return;
+ }
+--- a/arch/sparc/mm/fault_32.c
++++ b/arch/sparc/mm/fault_32.c
+@@ -21,6 +21,7 @@
+ #include <linux/perf_event.h>
+ #include <linux/interrupt.h>
+ #include <linux/kdebug.h>
++#include <linux/uaccess.h>
+
+ #include <asm/page.h>
+ #include <asm/pgtable.h>
+@@ -29,7 +30,6 @@
+ #include <asm/setup.h>
+ #include <asm/smp.h>
+ #include <asm/traps.h>
+-#include <asm/uaccess.h>
+
+ #include "mm_32.h"
+
+@@ -196,7 +196,7 @@ asmlinkage void do_sparc_fault(struct pt
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (pagefault_disabled() || !mm)
+ goto no_context;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+--- a/arch/sparc/mm/fault_64.c
++++ b/arch/sparc/mm/fault_64.c
+@@ -22,12 +22,12 @@
+ #include <linux/kdebug.h>
+ #include <linux/percpu.h>
+ #include <linux/context_tracking.h>
++#include <linux/uaccess.h>
+
+ #include <asm/page.h>
+ #include <asm/pgtable.h>
+ #include <asm/openprom.h>
+ #include <asm/oplib.h>
+-#include <asm/uaccess.h>
+ #include <asm/asi.h>
+ #include <asm/lsu.h>
+ #include <asm/sections.h>
+@@ -330,7 +330,7 @@ asmlinkage void __kprobes do_sparc64_fau
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto intr_or_no_mm;
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
+--- a/arch/sparc/mm/init_64.c
++++ b/arch/sparc/mm/init_64.c
+@@ -2738,7 +2738,7 @@ void hugetlb_setup(struct pt_regs *regs)
+ struct mm_struct *mm = current->mm;
+ struct tsb_config *tp;
+
+- if (in_atomic() || !mm) {
++ if (faulthandler_disabled() || !mm) {
+ const struct exception_table_entry *entry;
+
+ entry = search_exception_tables(regs->tpc);
+--- a/arch/tile/mm/fault.c
++++ b/arch/tile/mm/fault.c
+@@ -354,9 +354,9 @@ static int handle_page_fault(struct pt_r
+
+ /*
+ * If we're in an interrupt, have no user context or are running in an
+- * atomic region then we must not take the fault.
++ * region with pagefaults disabled then we must not take the fault.
+ */
+- if (in_atomic() || !mm) {
++ if (pagefault_disabled() || !mm) {
+ vma = NULL; /* happy compiler */
+ goto bad_area_nosemaphore;
+ }
+--- a/arch/um/kernel/trap.c
++++ b/arch/um/kernel/trap.c
+@@ -35,10 +35,10 @@ int handle_page_fault(unsigned long addr
+ *code_out = SEGV_MAPERR;
+
+ /*
+- * If the fault was during atomic operation, don't take the fault, just
++ * If the fault was with pagefaults disabled, don't take the fault, just
+ * fail.
+ */
+- if (in_atomic())
++ if (faulthandler_disabled())
+ goto out_nosemaphore;
+
+ if (is_user)
+--- a/arch/unicore32/mm/fault.c
++++ b/arch/unicore32/mm/fault.c
+@@ -218,7 +218,7 @@ static int do_pf(unsigned long addr, uns
+ * If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm)
++ if (faulthandler_disabled() || !mm)
+ goto no_context;
+
+ if (user_mode(regs))
+--- a/arch/x86/mm/fault.c
++++ b/arch/x86/mm/fault.c
+@@ -13,6 +13,7 @@
+ #include <linux/hugetlb.h> /* hstate_index_to_shift */
+ #include <linux/prefetch.h> /* prefetchw */
+ #include <linux/context_tracking.h> /* exception_enter(), ... */
++#include <linux/uaccess.h> /* faulthandler_disabled() */
+
+ #include <asm/traps.h> /* dotraplinkage, ... */
+ #include <asm/pgalloc.h> /* pgd_*(), ... */
+@@ -1126,9 +1127,9 @@ static noinline void
+
+ /*
+ * If we're in an interrupt, have no user context or are running
+- * in an atomic region then we must not take the fault:
++ * in a region with pagefaults disabled then we must not take the fault
+ */
+- if (unlikely(in_atomic() || !mm)) {
++ if (unlikely(faulthandler_disabled() || !mm)) {
+ bad_area_nosemaphore(regs, error_code, address);
+ return;
+ }
+--- a/arch/xtensa/mm/fault.c
++++ b/arch/xtensa/mm/fault.c
+@@ -15,10 +15,10 @@
+ #include <linux/mm.h>
+ #include <linux/module.h>
+ #include <linux/hardirq.h>
++#include <linux/uaccess.h>
+ #include <asm/mmu_context.h>
+ #include <asm/cacheflush.h>
+ #include <asm/hardirq.h>
+-#include <asm/uaccess.h>
+ #include <asm/pgalloc.h>
+
+ DEFINE_PER_CPU(unsigned long, asid_cache) = ASID_USER_FIRST;
+@@ -57,7 +57,7 @@ void do_page_fault(struct pt_regs *regs)
+ /* If we're in an interrupt or have no user
+ * context, we must not take the fault..
+ */
+- if (in_atomic() || !mm) {
++ if (faulthandler_disabled() || !mm) {
+ bad_page_fault(regs, address, SIGSEGV);
+ return;
+ }
+--- a/include/linux/uaccess.h
++++ b/include/linux/uaccess.h
+@@ -59,6 +59,18 @@ static inline void pagefault_enable(void
+ */
+ #define pagefault_disabled() (current->pagefault_disabled != 0)
+
++/*
++ * The pagefault handler is in general disabled by pagefault_disable() or
++ * when in irq context (via in_atomic()).
++ *
++ * This function should only be used by the fault handlers. Other users should
++ * stick to pagefault_disabled().
++ * Please NEVER use preempt_disable() to disable the fault handler. With
++ * !CONFIG_PREEMPT_COUNT, this is like a NOP. So the handler won't be disabled.
++ * in_atomic() will report different values based on !CONFIG_PREEMPT_COUNT.
++ */
++#define faulthandler_disabled() (pagefault_disabled() || in_atomic())
++
+ #ifndef ARCH_HAS_NOCACHE_UACCESS
+
+ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
diff --git a/patches/0007-drm-i915-use-pagefault_disabled-to-check-for-disable.patch b/patches/0007-drm-i915-use-pagefault_disabled-to-check-for-disable.patch
new file mode 100644
index 00000000000000..fd0ac3bc496a5b
--- /dev/null
+++ b/patches/0007-drm-i915-use-pagefault_disabled-to-check-for-disable.patch
@@ -0,0 +1,32 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:12 +0200
+Subject: mm/fault, drm/i915: Use pagefault_disabled() to check for disabled pagefaults
+
+Now that the pagefault disabled counter is in place, we can replace
+the in_atomic() check by a pagefault_disabled() checks.
+
+[upstream commit 32d8206725bcf6e3ce7832ac39e61a6ecfd558db]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
++++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+@@ -32,6 +32,7 @@
+ #include "i915_trace.h"
+ #include "intel_drv.h"
+ #include <linux/dma_remapping.h>
++#include <linux/uaccess.h>
+
+ #define __EXEC_OBJECT_HAS_PIN (1<<31)
+ #define __EXEC_OBJECT_HAS_FENCE (1<<30)
+@@ -465,7 +466,7 @@ i915_gem_execbuffer_relocate_entry(struc
+ }
+
+ /* We can't wait for rendering with pagefaults disabled */
+- if (obj->active && in_atomic())
++ if (obj->active && pagefault_disabled())
+ return -EFAULT;
+
+ if (use_cpu_reloc(obj))
diff --git a/patches/0008-futex-UP-futex_atomic_op_inuser-relies-on-disabled-p.patch b/patches/0008-futex-UP-futex_atomic_op_inuser-relies-on-disabled-p.patch
new file mode 100644
index 00000000000000..1701eac2505f87
--- /dev/null
+++ b/patches/0008-futex-UP-futex_atomic_op_inuser-relies-on-disabled-p.patch
@@ -0,0 +1,45 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:13 +0200
+Subject: sched/preempt, futex: Disable preemption in UP futex_atomic_op_inuser() explicitly
+
+Let's explicitly disable/enable preemption in the !CONFIG_SMP version
+of futex_atomic_op_inuser, to prepare for pagefault_disable() not
+touching preemption anymore.
+
+Otherwise we might break mutual exclusion when relying on a get_user()/
+put_user() implementation.
+
+[upstream commit f3dae07e442a8131a5485b6a38db2aa22a7a48cf]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ include/asm-generic/futex.h | 5 +++--
+ 1 file changed, 3 insertions(+), 2 deletions(-)
+
+--- a/include/asm-generic/futex.h
++++ b/include/asm-generic/futex.h
+@@ -8,8 +8,7 @@
+ #ifndef CONFIG_SMP
+ /*
+ * The following implementation only for uniprocessor machines.
+- * For UP, it's relies on the fact that pagefault_disable() also disables
+- * preemption to ensure mutual exclusion.
++ * It relies on preempt_disable() ensuring mutual exclusion.
+ *
+ */
+
+@@ -38,6 +37,7 @@ futex_atomic_op_inuser(int encoded_op, u
+ if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
+ oparg = 1 << oparg;
+
++ preempt_disable();
+ pagefault_disable();
+
+ ret = -EFAULT;
+@@ -72,6 +72,7 @@ futex_atomic_op_inuser(int encoded_op, u
+
+ out_pagefault_enable:
+ pagefault_enable();
++ preempt_enable();
+
+ if (ret == 0) {
+ switch (cmp) {
diff --git a/patches/0009-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on-dis.patch b/patches/0009-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on-dis.patch
new file mode 100644
index 00000000000000..ed74ddae816aa5
--- /dev/null
+++ b/patches/0009-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on-dis.patch
@@ -0,0 +1,36 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:14 +0200
+Subject: sched/preempt, futex: Disable preemption in UP futex_atomic_op_inuser() explicitly
+
+Let's explicitly disable/enable preemption in the !CONFIG_SMP version
+of futex_atomic_cmpxchg_inatomic, to prepare for pagefault_disable() not
+touching preemption anymore. This is needed for this function to be
+callable from both, atomic and non-atomic context.
+
+Otherwise we might break mutual exclusion when relying on a get_user()/
+put_user() implementation.
+
+[upstream commit f3dae07e442a8131a5485b6a38db2aa22a7a48cf]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ include/asm-generic/futex.h | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/include/asm-generic/futex.h
++++ b/include/asm-generic/futex.h
+@@ -107,6 +107,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
+ {
+ u32 val;
+
++ preempt_disable();
+ if (unlikely(get_user(val, uaddr) != 0))
+ return -EFAULT;
+
+@@ -114,6 +115,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
+ return -EFAULT;
+
+ *uval = val;
++ preempt_enable();
+
+ return 0;
+ }
diff --git a/patches/0010-arm-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on.patch b/patches/0010-arm-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on.patch
new file mode 100644
index 00000000000000..48aebb2d4152e9
--- /dev/null
+++ b/patches/0010-arm-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on.patch
@@ -0,0 +1,36 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:15 +0200
+Subject: sched/preempt, arm/futex: Disable preemption in UP futex_atomic_cmpxchg_inatomic() explicitly
+
+The !CONFIG_SMP implementation of futex_atomic_cmpxchg_inatomic()
+requires preemption to be disabled to guarantee mutual exclusion.
+Let's make this explicit.
+
+This patch is based on a patch by Sebastian Andrzej Siewior on the
+-rt branch.
+
+[upstream commit 39919b01ae4c1949736b40b79e27178d0c0bc406]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/arm/include/asm/futex.h | 3 +++
+ 1 file changed, 3 insertions(+)
+
+--- a/arch/arm/include/asm/futex.h
++++ b/arch/arm/include/asm/futex.h
+@@ -93,6 +93,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
+ return -EFAULT;
+
++ preempt_disable();
+ __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
+ "1: " TUSER(ldr) " %1, [%4]\n"
+ " teq %1, %2\n"
+@@ -104,6 +105,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
+ : "cc", "memory");
+
+ *uval = val;
++ preempt_enable();
++
+ return ret;
+ }
+
diff --git a/patches/0011-arm-futex-UP-futex_atomic_op_inuser-relies-on-disabl.patch b/patches/0011-arm-futex-UP-futex_atomic_op_inuser-relies-on-disabl.patch
new file mode 100644
index 00000000000000..e837433a78789b
--- /dev/null
+++ b/patches/0011-arm-futex-UP-futex_atomic_op_inuser-relies-on-disabl.patch
@@ -0,0 +1,47 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:16 +0200
+Subject: sched/preempt, arm/futex: Disable preemption in UP futex_atomic_op_inuser() explicitly
+
+The !CONFIG_SMP implementation of futex_atomic_op_inuser() seems to rely
+on disabled preemption to guarantee mutual exclusion.
+
+From commit e589ed23dd27:
+ "For UP it's enough to disable preemption to ensure mutual exclusion..."
+From the code itself:
+ "!SMP, we can work around lack of atomic ops by disabling preemption"
+
+Let's make this explicit, to prepare for pagefault_disable() not
+touching preemption anymore.
+
+[upstream commit 388b0e0adbc98a1b12a077dc92851a3ce016db42]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/arm/include/asm/futex.h | 10 ++++++++--
+ 1 file changed, 8 insertions(+), 2 deletions(-)
+
+--- a/arch/arm/include/asm/futex.h
++++ b/arch/arm/include/asm/futex.h
+@@ -127,7 +127,10 @@ futex_atomic_op_inuser (int encoded_op,
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
+ return -EFAULT;
+
+- pagefault_disable(); /* implies preempt_disable() */
++#ifndef CONFIG_SMP
++ preempt_disable();
++#endif
++ pagefault_disable();
+
+ switch (op) {
+ case FUTEX_OP_SET:
+@@ -149,7 +152,10 @@ futex_atomic_op_inuser (int encoded_op,
+ ret = -ENOSYS;
+ }
+
+- pagefault_enable(); /* subsumes preempt_enable() */
++ pagefault_enable();
++#ifndef CONFIG_SMP
++ preempt_enable();
++#endif
+
+ if (!ret) {
+ switch (cmp) {
diff --git a/patches/0012-futex-clarify-that-preemption-doesn-t-have-to-be-dis.patch b/patches/0012-futex-clarify-that-preemption-doesn-t-have-to-be-dis.patch
new file mode 100644
index 00000000000000..42dde8e91c0a0c
--- /dev/null
+++ b/patches/0012-futex-clarify-that-preemption-doesn-t-have-to-be-dis.patch
@@ -0,0 +1,85 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:17 +0200
+Subject: sched/preempt, futex: Update comments to clarify that preemption doesn't have to be disabled
+
+As arm64 and arc have no special implementations for !CONFIG_SMP, mutual
+exclusion doesn't seem to rely on preemption.
+
+Let's make it clear in the comments that preemption doesn't have to be
+disabled when accessing user space in the futex code, so we can remove
+preempt_disable() from pagefault_disable().
+
+[upstream commit 2f09b227eeed4b3a072fe818c82a4c773b778cde]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/arc/include/asm/futex.h | 10 +++++-----
+ arch/arm64/include/asm/futex.h | 4 ++--
+ 2 files changed, 7 insertions(+), 7 deletions(-)
+
+--- a/arch/arc/include/asm/futex.h
++++ b/arch/arc/include/asm/futex.h
+@@ -53,7 +53,7 @@ static inline int futex_atomic_op_inuser
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
+ return -EFAULT;
+
+- pagefault_disable(); /* implies preempt_disable() */
++ pagefault_disable();
+
+ switch (op) {
+ case FUTEX_OP_SET:
+@@ -75,7 +75,7 @@ static inline int futex_atomic_op_inuser
+ ret = -ENOSYS;
+ }
+
+- pagefault_enable(); /* subsumes preempt_enable() */
++ pagefault_enable();
+
+ if (!ret) {
+ switch (cmp) {
+@@ -104,7 +104,7 @@ static inline int futex_atomic_op_inuser
+ return ret;
+ }
+
+-/* Compare-xchg with preemption disabled.
++/* Compare-xchg with pagefaults disabled.
+ * Notes:
+ * -Best-Effort: Exchg happens only if compare succeeds.
+ * If compare fails, returns; leaving retry/looping to upper layers
+@@ -121,7 +121,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
+ return -EFAULT;
+
+- pagefault_disable(); /* implies preempt_disable() */
++ pagefault_disable();
+
+ /* TBD : can use llock/scond */
+ __asm__ __volatile__(
+@@ -142,7 +142,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
+ : "r"(oldval), "r"(newval), "r"(uaddr), "ir"(-EFAULT)
+ : "cc", "memory");
+
+- pagefault_enable(); /* subsumes preempt_enable() */
++ pagefault_enable();
+
+ *uval = val;
+ return val;
+--- a/arch/arm64/include/asm/futex.h
++++ b/arch/arm64/include/asm/futex.h
+@@ -58,7 +58,7 @@ futex_atomic_op_inuser (int encoded_op,
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
+ return -EFAULT;
+
+- pagefault_disable(); /* implies preempt_disable() */
++ pagefault_disable();
+
+ switch (op) {
+ case FUTEX_OP_SET:
+@@ -85,7 +85,7 @@ futex_atomic_op_inuser (int encoded_op,
+ ret = -ENOSYS;
+ }
+
+- pagefault_enable(); /* subsumes preempt_enable() */
++ pagefault_enable();
+
+ if (!ret) {
+ switch (cmp) {
diff --git a/patches/0013-mips-properly-lock-access-to-the-fpu.patch b/patches/0013-mips-properly-lock-access-to-the-fpu.patch
new file mode 100644
index 00000000000000..5acfb89ffb0dd8
--- /dev/null
+++ b/patches/0013-mips-properly-lock-access-to-the-fpu.patch
@@ -0,0 +1,33 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:19 +0200
+Subject: sched/preempt, MIPS: Properly lock access to the FPU
+
+Let's always disable preemption and pagefaults when locking the fpu,
+so we can be sure that the owner won't change in between.
+
+This is a preparation for pagefault_disable() not touching preemption
+anymore.
+
+[upstream commit 76deabd1867d6d2895152f31fdec819e3505738b]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ arch/mips/kernel/signal-common.h | 9 ++-------
+ 1 file changed, 2 insertions(+), 7 deletions(-)
+
+--- a/arch/mips/kernel/signal-common.h
++++ b/arch/mips/kernel/signal-common.h
+@@ -28,12 +28,7 @@ extern void __user *get_sigframe(struct
+ extern int fpcsr_pending(unsigned int __user *fpcsr);
+
+ /* Make sure we will not lose FPU ownership */
+-#ifdef CONFIG_PREEMPT
+-#define lock_fpu_owner() preempt_disable()
+-#define unlock_fpu_owner() preempt_enable()
+-#else
+-#define lock_fpu_owner() pagefault_disable()
+-#define unlock_fpu_owner() pagefault_enable()
+-#endif
++#define lock_fpu_owner() ({ preempt_disable(); pagefault_disable(); })
++#define unlock_fpu_owner() ({ pagefault_enable(); preempt_enable(); })
+
+ #endif /* __SIGNAL_COMMON_H */
diff --git a/patches/0014-uaccess-decouple-preemption-from-the-pagefault-logic.patch b/patches/0014-uaccess-decouple-preemption-from-the-pagefault-logic.patch
new file mode 100644
index 00000000000000..e9b148bf5d9687
--- /dev/null
+++ b/patches/0014-uaccess-decouple-preemption-from-the-pagefault-logic.patch
@@ -0,0 +1,60 @@
+From: David Hildenbrand <dahi@linux.vnet.ibm.com>
+Date: Mon, 11 May 2015 17:52:20 +0200
+Subject: sched/preempt, mm/fault: Decouple preemption from the page fault logic
+
+As the fault handlers now all rely on the pagefault_disabled() checks
+and implicit preempt_disable() calls by pagefault_disable() have been
+made explicit, we can completely rely on the pagefault_disableD counter.
+
+So let's no longer touch the preempt count when disabling/enabling
+pagefaults. After a call to pagefault_disable(), pagefault_disabled()
+will return true, but in_atomic() won't.
+
+[upstream commit 8222dbe21e79338de92d5e1956cd1e3994cc9f93]
+Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
+---
+ include/linux/uaccess.h | 16 ++--------------
+ 1 file changed, 2 insertions(+), 14 deletions(-)
+
+--- a/include/linux/uaccess.h
++++ b/include/linux/uaccess.h
+@@ -1,7 +1,6 @@
+ #ifndef __LINUX_UACCESS_H__
+ #define __LINUX_UACCESS_H__
+
+-#include <linux/preempt.h>
+ #include <linux/sched.h>
+ #include <asm/uaccess.h>
+
+@@ -20,17 +19,11 @@ static __always_inline void pagefault_di
+ * These routines enable/disable the pagefault handler. If disabled, it will
+ * not take any locks and go straight to the fixup table.
+ *
+- * We increase the preempt and the pagefault count, to be able to distinguish
+- * whether we run in simple atomic context or in a real pagefault_disable()
+- * context.
+- *
+- * For now, after pagefault_disabled() has been called, we run in atomic
+- * context. User access methods will not sleep.
+- *
++ * User access methods will not sleep when called from a pagefault_disabled()
++ * environment.
+ */
+ static inline void pagefault_disable(void)
+ {
+- preempt_count_inc();
+ pagefault_disabled_inc();
+ /*
+ * make sure to have issued the store before a pagefault
+@@ -47,11 +40,6 @@ static inline void pagefault_enable(void
+ */
+ barrier();
+ pagefault_disabled_dec();
+-#ifndef CONFIG_PREEMPT
+- preempt_count_dec();
+-#else
+- preempt_enable();
+-#endif
+ }
+
+ /*
diff --git a/patches/ARM-cmpxchg-define-__HAVE_ARCH_CMPXCHG-for-armv6-and.patch b/patches/ARM-cmpxchg-define-__HAVE_ARCH_CMPXCHG-for-armv6-and.patch
new file mode 100644
index 00000000000000..8c3abb11a6189a
--- /dev/null
+++ b/patches/ARM-cmpxchg-define-__HAVE_ARCH_CMPXCHG-for-armv6-and.patch
@@ -0,0 +1,39 @@
+From: Yong Zhang <yong.zhang at windriver.com>
+Date: Thu, 29 Jan 2015 12:56:18 -0600
+Subject: ARM: cmpxchg: define __HAVE_ARCH_CMPXCHG for armv6 and later
+
+Both pi_stress and sigwaittest in rt-test show performance gain with
+__HAVE_ARCH_CMPXCHG. Testing result on coretile_express_a9x4:
+
+pi_stress -p 99 --duration=300 (on linux-3.4-rc5; bigger is better)
+ vanilla: Total inversion performed: 5493381
+ patched: Total inversion performed: 5621746
+
+sigwaittest -p 99 -l 100000 (on linux-3.4-rc5-rt6; less is better)
+ 3.4-rc5-rt6: Min 24, Cur 27, Avg 30, Max 98
+ patched: Min 19, Cur 21, Avg 23, Max 96
+
+Signed-off-by: Yong Zhang <yong.zhang0 at gmail.com>
+Cc: Russell King <rmk+kernel at arm.linux.org.uk>
+Cc: Nicolas Pitre <nico at linaro.org>
+Cc: Will Deacon <will.deacon at arm.com>
+Cc: Catalin Marinas <catalin.marinas at arm.com>
+Cc: Thomas Gleixner <tglx at linutronix.de>
+Cc: linux-arm-kernel at lists.infradead.org
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/arm/include/asm/cmpxchg.h | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/arch/arm/include/asm/cmpxchg.h
++++ b/arch/arm/include/asm/cmpxchg.h
+@@ -129,6 +129,8 @@ static inline unsigned long __xchg(unsig
+
+ #else /* min ARCH >= ARMv6 */
+
++#define __HAVE_ARCH_CMPXCHG 1
++
+ extern void __bad_cmpxchg(volatile void *ptr, int size);
+
+ /*
diff --git a/patches/ARM-enable-irq-in-translation-section-permission-fau.patch b/patches/ARM-enable-irq-in-translation-section-permission-fau.patch
new file mode 100644
index 00000000000000..c1db6bee2360f6
--- /dev/null
+++ b/patches/ARM-enable-irq-in-translation-section-permission-fau.patch
@@ -0,0 +1,85 @@
+From: "Yadi.hu" <yadi.hu@windriver.com>
+Date: Wed, 10 Dec 2014 10:32:09 +0800
+Subject: ARM: enable irq in translation/section permission fault handlers
+
+Probably happens on all ARM, with
+CONFIG_PREEMPT_RT_FULL
+CONFIG_DEBUG_ATOMIC_SLEEP
+
+This simple program....
+
+int main() {
+ *((char*)0xc0001000) = 0;
+};
+
+[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
+[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
+[ 512.743217] INFO: lockdep is turned off.
+[ 512.743360] irq event stamp: 0
+[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
+[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
+[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
+[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
+[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
+[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
+[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
+[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
+[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
+[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
+[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
+[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
+[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
+[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
+[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
+
+Oxc0000000 belongs to kernel address space, user task can not be
+allowed to access it. For above condition, correct result is that
+test case should receive a “segment fault” and exits but not stacks.
+
+the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
+prefetch/data abort handlers"),it deletes irq enable block in Data
+abort assemble code and move them into page/breakpiont/alignment fault
+handlers instead. But author does not enable irq in translation/section
+permission fault handlers. ARM disables irq when it enters exception/
+interrupt mode, if kernel doesn't enable irq, it would be still disabled
+during translation/section permission fault.
+
+We see the above splat because do_force_sig_info is still called with
+IRQs off, and that code eventually does a:
+
+ spin_lock_irqsave(&t->sighand->siglock, flags);
+
+As this is architecture independent code, and we've not seen any other
+need for other arch to have the siglock converted to raw lock, we can
+conclude that we should enable irq for ARM translation/section
+permission exception.
+
+
+Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/arm/mm/fault.c | 6 ++++++
+ 1 file changed, 6 insertions(+)
+
+--- a/arch/arm/mm/fault.c
++++ b/arch/arm/mm/fault.c
+@@ -430,6 +430,9 @@ do_translation_fault(unsigned long addr,
+ if (addr < TASK_SIZE)
+ return do_page_fault(addr, fsr, regs);
+
++ if (interrupts_enabled(regs))
++ local_irq_enable();
++
+ if (user_mode(regs))
+ goto bad_area;
+
+@@ -497,6 +500,9 @@ do_translation_fault(unsigned long addr,
+ static int
+ do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
+ {
++ if (interrupts_enabled(regs))
++ local_irq_enable();
++
+ do_bad_area(addr, fsr, regs);
+ return 0;
+ }
diff --git a/patches/ASoC-Intel-sst-use-instead-of-at-the-of-a-C-statemen.patch b/patches/ASoC-Intel-sst-use-instead-of-at-the-of-a-C-statemen.patch
new file mode 100644
index 00000000000000..53e65cde292712
--- /dev/null
+++ b/patches/ASoC-Intel-sst-use-instead-of-at-the-of-a-C-statemen.patch
@@ -0,0 +1,26 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 11 Jun 2015 14:17:06 +0200
+Subject: ASoC: Intel: sst: use ; instead of , at the of a C statement
+
+This was spotted by Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
+while he tried to compile a -RT kernel with this driver enabled.
+"make C=2" would also warn about this. This is is based on his patch.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ sound/soc/intel/atom/sst/sst.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/sound/soc/intel/atom/sst/sst.c
++++ b/sound/soc/intel/atom/sst/sst.c
+@@ -368,8 +368,8 @@ static inline void sst_restore_shim64(st
+ * initialize by FW or driver when firmware is loaded
+ */
+ spin_lock_irqsave(&ctx->ipc_spin_lock, irq_flags);
+- sst_shim_write64(shim, SST_IMRX, shim_regs->imrx),
+- sst_shim_write64(shim, SST_CSR, shim_regs->csr),
++ sst_shim_write64(shim, SST_IMRX, shim_regs->imrx);
++ sst_shim_write64(shim, SST_CSR, shim_regs->csr);
+ spin_unlock_irqrestore(&ctx->ipc_spin_lock, irq_flags);
+ }
+
diff --git a/patches/HACK-printk-drop-the-logbuf_lock-more-often.patch b/patches/HACK-printk-drop-the-logbuf_lock-more-often.patch
new file mode 100644
index 00000000000000..b9da129f8558ee
--- /dev/null
+++ b/patches/HACK-printk-drop-the-logbuf_lock-more-often.patch
@@ -0,0 +1,76 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 21 Mar 2013 19:01:05 +0100
+Subject: printk: Drop the logbuf_lock more often
+
+The lock is hold with irgs off. The latency drops 500us+ on my arm bugs
+with a "full" buffer after executing "dmesg" on the shell.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/printk/printk.c | 27 ++++++++++++++++++++++++++-
+ 1 file changed, 26 insertions(+), 1 deletion(-)
+
+--- a/kernel/printk/printk.c
++++ b/kernel/printk/printk.c
+@@ -1162,6 +1162,7 @@ static int syslog_print_all(char __user
+ {
+ char *text;
+ int len = 0;
++ int attempts = 0;
+
+ text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+ if (!text)
+@@ -1173,7 +1174,14 @@ static int syslog_print_all(char __user
+ u64 seq;
+ u32 idx;
+ enum log_flags prev;
+-
++ int num_msg;
++try_again:
++ attempts++;
++ if (attempts > 10) {
++ len = -EBUSY;
++ goto out;
++ }
++ num_msg = 0;
+ if (clear_seq < log_first_seq) {
+ /* messages are gone, move to first available one */
+ clear_seq = log_first_seq;
+@@ -1194,6 +1202,14 @@ static int syslog_print_all(char __user
+ prev = msg->flags;
+ idx = log_next(idx);
+ seq++;
++ num_msg++;
++ if (num_msg > 5) {
++ num_msg = 0;
++ raw_spin_unlock_irq(&logbuf_lock);
++ raw_spin_lock_irq(&logbuf_lock);
++ if (clear_seq < log_first_seq)
++ goto try_again;
++ }
+ }
+
+ /* move first record forward until length fits into the buffer */
+@@ -1207,6 +1223,14 @@ static int syslog_print_all(char __user
+ prev = msg->flags;
+ idx = log_next(idx);
+ seq++;
++ num_msg++;
++ if (num_msg > 5) {
++ num_msg = 0;
++ raw_spin_unlock_irq(&logbuf_lock);
++ raw_spin_lock_irq(&logbuf_lock);
++ if (clear_seq < log_first_seq)
++ goto try_again;
++ }
+ }
+
+ /* last message fitting into this dump */
+@@ -1247,6 +1271,7 @@ static int syslog_print_all(char __user
+ clear_seq = log_next_seq;
+ clear_idx = log_next_idx;
+ }
++out:
+ raw_spin_unlock_irq(&logbuf_lock);
+
+ kfree(text);
diff --git a/patches/KVM-lapic-mark-LAPIC-timer-handler-as-irqsafe.patch b/patches/KVM-lapic-mark-LAPIC-timer-handler-as-irqsafe.patch
new file mode 100644
index 00000000000000..875dc538bc9a0a
--- /dev/null
+++ b/patches/KVM-lapic-mark-LAPIC-timer-handler-as-irqsafe.patch
@@ -0,0 +1,100 @@
+From: Marcelo Tosatti <mtosatti@redhat.com>
+Date: Wed, 8 Apr 2015 20:33:25 -0300
+Subject: KVM: lapic: mark LAPIC timer handler as irqsafe
+
+Since lapic timer handler only wakes up a simple waitqueue,
+it can be executed from hardirq context.
+
+Also handle the case where hrtimer_start_expires fails due to -ETIME,
+by injecting the interrupt to the guest immediately.
+
+Reduces average cyclictest latency by 3us.
+
+Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/x86/kvm/lapic.c | 40 +++++++++++++++++++++++++++++++++++++---
+ 1 file changed, 37 insertions(+), 3 deletions(-)
+
+--- a/arch/x86/kvm/lapic.c
++++ b/arch/x86/kvm/lapic.c
+@@ -1167,8 +1167,36 @@ void wait_lapic_expire(struct kvm_vcpu *
+ __delay(tsc_deadline - guest_tsc);
+ }
+
++static enum hrtimer_restart apic_timer_fn(struct hrtimer *data);
++
++static void __apic_timer_expired(struct hrtimer *data)
++{
++ int ret, i = 0;
++ enum hrtimer_restart r;
++ struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
++
++ r = apic_timer_fn(data);
++
++ if (r == HRTIMER_RESTART) {
++ do {
++ ret = hrtimer_start_expires(data, HRTIMER_MODE_ABS);
++ if (ret == -ETIME)
++ hrtimer_add_expires_ns(&ktimer->timer,
++ ktimer->period);
++ i++;
++ } while (ret == -ETIME && i < 10);
++
++ if (ret == -ETIME) {
++ printk_once(KERN_ERR "%s: failed to reprogram timer\n",
++ __func__);
++ WARN_ON_ONCE(1);
++ }
++ }
++}
++
+ static void start_apic_timer(struct kvm_lapic *apic)
+ {
++ int ret;
+ ktime_t now;
+
+ atomic_set(&apic->lapic_timer.pending, 0);
+@@ -1199,9 +1227,11 @@ static void start_apic_timer(struct kvm_
+ }
+ }
+
+- hrtimer_start(&apic->lapic_timer.timer,
++ ret = hrtimer_start(&apic->lapic_timer.timer,
+ ktime_add_ns(now, apic->lapic_timer.period),
+ HRTIMER_MODE_ABS);
++ if (ret == -ETIME)
++ __apic_timer_expired(&apic->lapic_timer.timer);
+
+ apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
+ PRIx64 ", "
+@@ -1233,8 +1263,10 @@ static void start_apic_timer(struct kvm_
+ do_div(ns, this_tsc_khz);
+ expire = ktime_add_ns(now, ns);
+ expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
+- hrtimer_start(&apic->lapic_timer.timer,
++ ret = hrtimer_start(&apic->lapic_timer.timer,
+ expire, HRTIMER_MODE_ABS);
++ if (ret == -ETIME)
++ __apic_timer_expired(&apic->lapic_timer.timer);
+ } else
+ apic_timer_expired(apic);
+
+@@ -1707,6 +1739,7 @@ int kvm_create_lapic(struct kvm_vcpu *vc
+ hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS);
+ apic->lapic_timer.timer.function = apic_timer_fn;
++ apic->lapic_timer.timer.irqsafe = 1;
+
+ /*
+ * APIC is created enabled. This will prevent kvm_lapic_set_base from
+@@ -1834,7 +1867,8 @@ void __kvm_migrate_apic_timer(struct kvm
+
+ timer = &vcpu->arch.apic->lapic_timer.timer;
+ if (hrtimer_cancel(timer))
+- hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
++ if (hrtimer_start_expires(timer, HRTIMER_MODE_ABS) == -ETIME)
++ __apic_timer_expired(timer);
+ }
+
+ /*
diff --git a/patches/KVM-use-simple-waitqueue-for-vcpu-wq.patch b/patches/KVM-use-simple-waitqueue-for-vcpu-wq.patch
new file mode 100644
index 00000000000000..9ec1315628bcef
--- /dev/null
+++ b/patches/KVM-use-simple-waitqueue-for-vcpu-wq.patch
@@ -0,0 +1,341 @@
+From: Marcelo Tosatti <mtosatti@redhat.com>
+Date: Wed, 8 Apr 2015 20:33:24 -0300
+Subject: KVM: use simple waitqueue for vcpu->wq
+
+The problem:
+
+On -RT, an emulated LAPIC timer instances has the following path:
+
+1) hard interrupt
+2) ksoftirqd is scheduled
+3) ksoftirqd wakes up vcpu thread
+4) vcpu thread is scheduled
+
+This extra context switch introduces unnecessary latency in the
+LAPIC path for a KVM guest.
+
+The solution:
+
+Allow waking up vcpu thread from hardirq context,
+thus avoiding the need for ksoftirqd to be scheduled.
+
+Normal waitqueues make use of spinlocks, which on -RT
+are sleepable locks. Therefore, waking up a waitqueue
+waiter involves locking a sleeping lock, which
+is not allowed from hard interrupt context.
+
+cyclictest command line:
+# cyclictest -m -n -q -p99 -l 1000000 -h60 -D 1m
+
+This patch reduces the average latency in my tests from 14us to 11us.
+
+Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/arm/kvm/arm.c | 4 ++--
+ arch/arm/kvm/psci.c | 4 ++--
+ arch/powerpc/include/asm/kvm_host.h | 4 ++--
+ arch/powerpc/kvm/book3s_hv.c | 23 +++++++++++------------
+ arch/s390/include/asm/kvm_host.h | 2 +-
+ arch/s390/kvm/interrupt.c | 8 ++++----
+ arch/x86/kvm/lapic.c | 6 +++---
+ include/linux/kvm_host.h | 4 ++--
+ virt/kvm/async_pf.c | 4 ++--
+ virt/kvm/kvm_main.c | 16 ++++++++--------
+ 10 files changed, 37 insertions(+), 38 deletions(-)
+
+--- a/arch/arm/kvm/arm.c
++++ b/arch/arm/kvm/arm.c
+@@ -474,9 +474,9 @@ bool kvm_arch_intc_initialized(struct kv
+
+ static void vcpu_pause(struct kvm_vcpu *vcpu)
+ {
+- wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
++ struct swait_head *wq = kvm_arch_vcpu_wq(vcpu);
+
+- wait_event_interruptible(*wq, !vcpu->arch.pause);
++ swait_event_interruptible(*wq, !vcpu->arch.pause);
+ }
+
+ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
+--- a/arch/arm/kvm/psci.c
++++ b/arch/arm/kvm/psci.c
+@@ -68,7 +68,7 @@ static unsigned long kvm_psci_vcpu_on(st
+ {
+ struct kvm *kvm = source_vcpu->kvm;
+ struct kvm_vcpu *vcpu = NULL;
+- wait_queue_head_t *wq;
++ struct swait_head *wq;
+ unsigned long cpu_id;
+ unsigned long context_id;
+ phys_addr_t target_pc;
+@@ -117,7 +117,7 @@ static unsigned long kvm_psci_vcpu_on(st
+ smp_mb(); /* Make sure the above is visible */
+
+ wq = kvm_arch_vcpu_wq(vcpu);
+- wake_up_interruptible(wq);
++ swait_wake_interruptible(wq);
+
+ return PSCI_RET_SUCCESS;
+ }
+--- a/arch/powerpc/include/asm/kvm_host.h
++++ b/arch/powerpc/include/asm/kvm_host.h
+@@ -280,7 +280,7 @@ struct kvmppc_vcore {
+ u8 in_guest;
+ struct list_head runnable_threads;
+ spinlock_t lock;
+- wait_queue_head_t wq;
++ struct swait_head wq;
+ spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */
+ u64 stolen_tb;
+ u64 preempt_tb;
+@@ -613,7 +613,7 @@ struct kvm_vcpu_arch {
+ u8 prodded;
+ u32 last_inst;
+
+- wait_queue_head_t *wqp;
++ struct swait_head *wqp;
+ struct kvmppc_vcore *vcore;
+ int ret;
+ int trap;
+--- a/arch/powerpc/kvm/book3s_hv.c
++++ b/arch/powerpc/kvm/book3s_hv.c
+@@ -115,11 +115,11 @@ static bool kvmppc_ipi_thread(int cpu)
+ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
+ {
+ int cpu = vcpu->cpu;
+- wait_queue_head_t *wqp;
++ struct swait_head *wqp;
+
+ wqp = kvm_arch_vcpu_wq(vcpu);
+- if (waitqueue_active(wqp)) {
+- wake_up_interruptible(wqp);
++ if (swaitqueue_active(wqp)) {
++ swait_wake_interruptible(wqp);
+ ++vcpu->stat.halt_wakeup;
+ }
+
+@@ -686,8 +686,8 @@ int kvmppc_pseries_do_hcall(struct kvm_v
+ tvcpu->arch.prodded = 1;
+ smp_mb();
+ if (vcpu->arch.ceded) {
+- if (waitqueue_active(&vcpu->wq)) {
+- wake_up_interruptible(&vcpu->wq);
++ if (swaitqueue_active(&vcpu->wq)) {
++ swait_wake_interruptible(&vcpu->wq);
+ vcpu->stat.halt_wakeup++;
+ }
+ }
+@@ -1426,7 +1426,7 @@ static struct kvmppc_vcore *kvmppc_vcore
+ INIT_LIST_HEAD(&vcore->runnable_threads);
+ spin_lock_init(&vcore->lock);
+ spin_lock_init(&vcore->stoltb_lock);
+- init_waitqueue_head(&vcore->wq);
++ init_swait_head(&vcore->wq);
+ vcore->preempt_tb = TB_NIL;
+ vcore->lpcr = kvm->arch.lpcr;
+ vcore->first_vcpuid = core * threads_per_subcore;
+@@ -2073,10 +2073,9 @@ static void kvmppc_vcore_blocked(struct
+ {
+ struct kvm_vcpu *vcpu;
+ int do_sleep = 1;
++ DEFINE_SWAITER(wait);
+
+- DEFINE_WAIT(wait);
+-
+- prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
++ swait_prepare(&vc->wq, &wait, TASK_INTERRUPTIBLE);
+
+ /*
+ * Check one last time for pending exceptions and ceded state after
+@@ -2090,7 +2089,7 @@ static void kvmppc_vcore_blocked(struct
+ }
+
+ if (!do_sleep) {
+- finish_wait(&vc->wq, &wait);
++ swait_finish(&vc->wq, &wait);
+ return;
+ }
+
+@@ -2098,7 +2097,7 @@ static void kvmppc_vcore_blocked(struct
+ trace_kvmppc_vcore_blocked(vc, 0);
+ spin_unlock(&vc->lock);
+ schedule();
+- finish_wait(&vc->wq, &wait);
++ swait_finish(&vc->wq, &wait);
+ spin_lock(&vc->lock);
+ vc->vcore_state = VCORE_INACTIVE;
+ trace_kvmppc_vcore_blocked(vc, 1);
+@@ -2142,7 +2141,7 @@ static int kvmppc_run_vcpu(struct kvm_ru
+ kvmppc_start_thread(vcpu);
+ trace_kvm_guest_enter(vcpu);
+ } else if (vc->vcore_state == VCORE_SLEEPING) {
+- wake_up(&vc->wq);
++ swait_wake(&vc->wq);
+ }
+
+ }
+--- a/arch/s390/include/asm/kvm_host.h
++++ b/arch/s390/include/asm/kvm_host.h
+@@ -419,7 +419,7 @@ struct kvm_s390_irq_payload {
+ struct kvm_s390_local_interrupt {
+ spinlock_t lock;
+ struct kvm_s390_float_interrupt *float_int;
+- wait_queue_head_t *wq;
++ struct swait_head *wq;
+ atomic_t *cpuflags;
+ DECLARE_BITMAP(sigp_emerg_pending, KVM_MAX_VCPUS);
+ struct kvm_s390_irq_payload irq;
+--- a/arch/s390/kvm/interrupt.c
++++ b/arch/s390/kvm/interrupt.c
+@@ -875,13 +875,13 @@ int kvm_s390_handle_wait(struct kvm_vcpu
+
+ void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu)
+ {
+- if (waitqueue_active(&vcpu->wq)) {
++ if (swaitqueue_active(&vcpu->wq)) {
+ /*
+ * The vcpu gave up the cpu voluntarily, mark it as a good
+ * yield-candidate.
+ */
+ vcpu->preempted = true;
+- wake_up_interruptible(&vcpu->wq);
++ swait_wake_interruptible(&vcpu->wq);
+ vcpu->stat.halt_wakeup++;
+ }
+ }
+@@ -987,7 +987,7 @@ int kvm_s390_inject_program_int(struct k
+ spin_lock(&li->lock);
+ irq.u.pgm.code = code;
+ __inject_prog(vcpu, &irq);
+- BUG_ON(waitqueue_active(li->wq));
++ BUG_ON(swaitqueue_active(li->wq));
+ spin_unlock(&li->lock);
+ return 0;
+ }
+@@ -1006,7 +1006,7 @@ int kvm_s390_inject_prog_irq(struct kvm_
+ spin_lock(&li->lock);
+ irq.u.pgm = *pgm_info;
+ rc = __inject_prog(vcpu, &irq);
+- BUG_ON(waitqueue_active(li->wq));
++ BUG_ON(swaitqueue_active(li->wq));
+ spin_unlock(&li->lock);
+ return rc;
+ }
+--- a/arch/x86/kvm/lapic.c
++++ b/arch/x86/kvm/lapic.c
+@@ -1104,7 +1104,7 @@ static void apic_update_lvtt(struct kvm_
+ static void apic_timer_expired(struct kvm_lapic *apic)
+ {
+ struct kvm_vcpu *vcpu = apic->vcpu;
+- wait_queue_head_t *q = &vcpu->wq;
++ struct swait_head *q = &vcpu->wq;
+ struct kvm_timer *ktimer = &apic->lapic_timer;
+
+ if (atomic_read(&apic->lapic_timer.pending))
+@@ -1113,8 +1113,8 @@ static void apic_timer_expired(struct kv
+ atomic_inc(&apic->lapic_timer.pending);
+ kvm_set_pending_timer(vcpu);
+
+- if (waitqueue_active(q))
+- wake_up_interruptible(q);
++ if (swaitqueue_active(q))
++ swait_wake_interruptible(q);
+
+ if (apic_lvtt_tscdeadline(apic))
+ ktimer->expired_tscdeadline = ktimer->tscdeadline;
+--- a/include/linux/kvm_host.h
++++ b/include/linux/kvm_host.h
+@@ -230,7 +230,7 @@ struct kvm_vcpu {
+
+ int fpu_active;
+ int guest_fpu_loaded, guest_xcr0_loaded;
+- wait_queue_head_t wq;
++ struct swait_head wq;
+ struct pid *pid;
+ int sigset_active;
+ sigset_t sigset;
+@@ -690,7 +690,7 @@ static inline bool kvm_arch_has_noncoher
+ }
+ #endif
+
+-static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
++static inline struct swait_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
+ {
+ #ifdef __KVM_HAVE_ARCH_WQP
+ return vcpu->arch.wqp;
+--- a/virt/kvm/async_pf.c
++++ b/virt/kvm/async_pf.c
+@@ -94,8 +94,8 @@ static void async_pf_execute(struct work
+
+ trace_kvm_async_pf_completed(addr, gva);
+
+- if (waitqueue_active(&vcpu->wq))
+- wake_up_interruptible(&vcpu->wq);
++ if (swaitqueue_active(&vcpu->wq))
++ swait_wake_interruptible(&vcpu->wq);
+
+ mmput(mm);
+ kvm_put_kvm(vcpu->kvm);
+--- a/virt/kvm/kvm_main.c
++++ b/virt/kvm/kvm_main.c
+@@ -218,7 +218,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu,
+ vcpu->kvm = kvm;
+ vcpu->vcpu_id = id;
+ vcpu->pid = NULL;
+- init_waitqueue_head(&vcpu->wq);
++ init_swait_head(&vcpu->wq);
+ kvm_async_pf_vcpu_init(vcpu);
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+@@ -1779,7 +1779,7 @@ static int kvm_vcpu_check_block(struct k
+ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
+ {
+ ktime_t start, cur;
+- DEFINE_WAIT(wait);
++ DEFINE_SWAITER(wait);
+ bool waited = false;
+
+ start = cur = ktime_get();
+@@ -1800,7 +1800,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcp
+ }
+
+ for (;;) {
+- prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
++ swait_prepare(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+ if (kvm_vcpu_check_block(vcpu) < 0)
+ break;
+@@ -1809,7 +1809,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcp
+ schedule();
+ }
+
+- finish_wait(&vcpu->wq, &wait);
++ swait_finish(&vcpu->wq, &wait);
+ cur = ktime_get();
+
+ out:
+@@ -1825,11 +1825,11 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu
+ {
+ int me;
+ int cpu = vcpu->cpu;
+- wait_queue_head_t *wqp;
++ struct swait_head *wqp;
+
+ wqp = kvm_arch_vcpu_wq(vcpu);
+- if (waitqueue_active(wqp)) {
+- wake_up_interruptible(wqp);
++ if (swaitqueue_active(wqp)) {
++ swait_wake_interruptible(wqp);
+ ++vcpu->stat.halt_wakeup;
+ }
+
+@@ -1930,7 +1930,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *m
+ continue;
+ if (vcpu == me)
+ continue;
+- if (waitqueue_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
++ if (swaitqueue_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
+ continue;
+ if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
+ continue;
diff --git a/patches/acpi-rt-Convert-acpi_gbl_hardware-lock-back-to-a-raw.patch b/patches/acpi-rt-Convert-acpi_gbl_hardware-lock-back-to-a-raw.patch
new file mode 100644
index 00000000000000..67dbcae408a4df
--- /dev/null
+++ b/patches/acpi-rt-Convert-acpi_gbl_hardware-lock-back-to-a-raw.patch
@@ -0,0 +1,173 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Wed, 13 Feb 2013 09:26:05 -0500
+Subject: acpi/rt: Convert acpi_gbl_hardware lock back to a raw_spinlock_t
+
+We hit the following bug with 3.6-rt:
+
+[ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002
+[ 5.898991] no locks held by swapper/3/0.
+[ 5.898993] Modules linked in:
+[ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
+[ 5.898997] Call Trace:
+[ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
+[ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0
+[ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
+[ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70
+[ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002
+[ 5.899037] no locks held by swapper/7/0.
+[ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
+[ 5.899040] Modules linked in:
+[ 5.899041]
+[ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
+[ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
+[ 5.899047] Call Trace:
+[ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
+[ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
+[ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
+[ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0
+[ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
+[ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
+[ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0
+[ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70
+[ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51
+[ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
+[ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e
+[ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
+[ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30
+[ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
+[ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20
+[ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
+[ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50
+[ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
+[ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ?
+
+As the acpi code disables interrupts in acpi_idle_enter_bm, and calls
+code that grabs the acpi lock, it causes issues as the lock is currently
+in RT a sleeping lock.
+
+The lock was converted from a raw to a sleeping lock due to some
+previous issues, and tests that showed it didn't seem to matter.
+Unfortunately, it did matter for one of our boxes.
+
+This patch converts the lock back to a raw lock. I've run this code on a
+few of my own machines, one being my laptop that uses the acpi quite
+extensively. I've been able to suspend and resume without issues.
+
+[ tglx: Made the change exclusive for acpi_gbl_hardware_lock ]
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Cc: John Kacur <jkacur@gmail.com>
+Cc: Clark Williams <clark@redhat.com>
+Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/acpi/acpica/acglobal.h | 2 +-
+ drivers/acpi/acpica/hwregs.c | 4 ++--
+ drivers/acpi/acpica/hwxface.c | 4 ++--
+ drivers/acpi/acpica/utmutex.c | 4 ++--
+ include/acpi/platform/aclinux.h | 15 +++++++++++++++
+ 5 files changed, 22 insertions(+), 7 deletions(-)
+
+--- a/drivers/acpi/acpica/acglobal.h
++++ b/drivers/acpi/acpica/acglobal.h
+@@ -112,7 +112,7 @@ ACPI_GLOBAL(u8, acpi_gbl_global_lock_pen
+ * interrupt level
+ */
+ ACPI_GLOBAL(acpi_spinlock, acpi_gbl_gpe_lock); /* For GPE data structs and registers */
+-ACPI_GLOBAL(acpi_spinlock, acpi_gbl_hardware_lock); /* For ACPI H/W except GPE registers */
++ACPI_GLOBAL(acpi_raw_spinlock, acpi_gbl_hardware_lock); /* For ACPI H/W except GPE registers */
+ ACPI_GLOBAL(acpi_spinlock, acpi_gbl_reference_count_lock);
+
+ /* Mutex for _OSI support */
+--- a/drivers/acpi/acpica/hwregs.c
++++ b/drivers/acpi/acpica/hwregs.c
+@@ -269,14 +269,14 @@ acpi_status acpi_hw_clear_acpi_status(vo
+ ACPI_BITMASK_ALL_FIXED_STATUS,
+ ACPI_FORMAT_UINT64(acpi_gbl_xpm1a_status.address)));
+
+- lock_flags = acpi_os_acquire_lock(acpi_gbl_hardware_lock);
++ raw_spin_lock_irqsave(acpi_gbl_hardware_lock, lock_flags);
+
+ /* Clear the fixed events in PM1 A/B */
+
+ status = acpi_hw_register_write(ACPI_REGISTER_PM1_STATUS,
+ ACPI_BITMASK_ALL_FIXED_STATUS);
+
+- acpi_os_release_lock(acpi_gbl_hardware_lock, lock_flags);
++ raw_spin_unlock_irqrestore(acpi_gbl_hardware_lock, lock_flags);
+
+ if (ACPI_FAILURE(status)) {
+ goto exit;
+--- a/drivers/acpi/acpica/hwxface.c
++++ b/drivers/acpi/acpica/hwxface.c
+@@ -374,7 +374,7 @@ acpi_status acpi_write_bit_register(u32
+ return_ACPI_STATUS(AE_BAD_PARAMETER);
+ }
+
+- lock_flags = acpi_os_acquire_lock(acpi_gbl_hardware_lock);
++ raw_spin_lock_irqsave(acpi_gbl_hardware_lock, lock_flags);
+
+ /*
+ * At this point, we know that the parent register is one of the
+@@ -435,7 +435,7 @@ acpi_status acpi_write_bit_register(u32
+
+ unlock_and_exit:
+
+- acpi_os_release_lock(acpi_gbl_hardware_lock, lock_flags);
++ raw_spin_unlock_irqrestore(acpi_gbl_hardware_lock, lock_flags);
+ return_ACPI_STATUS(status);
+ }
+
+--- a/drivers/acpi/acpica/utmutex.c
++++ b/drivers/acpi/acpica/utmutex.c
+@@ -88,7 +88,7 @@ acpi_status acpi_ut_mutex_initialize(voi
+ return_ACPI_STATUS (status);
+ }
+
+- status = acpi_os_create_lock (&acpi_gbl_hardware_lock);
++ status = acpi_os_create_raw_lock (&acpi_gbl_hardware_lock);
+ if (ACPI_FAILURE (status)) {
+ return_ACPI_STATUS (status);
+ }
+@@ -141,7 +141,7 @@ void acpi_ut_mutex_terminate(void)
+ /* Delete the spinlocks */
+
+ acpi_os_delete_lock(acpi_gbl_gpe_lock);
+- acpi_os_delete_lock(acpi_gbl_hardware_lock);
++ acpi_os_delete_raw_lock(acpi_gbl_hardware_lock);
+ acpi_os_delete_lock(acpi_gbl_reference_count_lock);
+
+ /* Delete the reader/writer lock */
+--- a/include/acpi/platform/aclinux.h
++++ b/include/acpi/platform/aclinux.h
+@@ -123,6 +123,7 @@
+
+ #define acpi_cache_t struct kmem_cache
+ #define acpi_spinlock spinlock_t *
++#define acpi_raw_spinlock raw_spinlock_t *
+ #define acpi_cpu_flags unsigned long
+
+ /* Use native linux version of acpi_os_allocate_zeroed */
+@@ -141,6 +142,20 @@
+ #define ACPI_USE_ALTERNATE_PROTOTYPE_acpi_os_get_thread_id
+ #define ACPI_USE_ALTERNATE_PROTOTYPE_acpi_os_create_lock
+
++#define acpi_os_create_raw_lock(__handle) \
++({ \
++ raw_spinlock_t *lock = ACPI_ALLOCATE(sizeof(*lock)); \
++ \
++ if (lock) { \
++ *(__handle) = lock; \
++ raw_spin_lock_init(*(__handle)); \
++ } \
++ lock ? AE_OK : AE_NO_MEMORY; \
++ })
++
++#define acpi_os_delete_raw_lock(__handle) kfree(__handle)
++
++
+ /*
+ * OSL interfaces used by debugger/disassembler
+ */
diff --git a/patches/arch-arm64-Add-lazy-preempt-support.patch b/patches/arch-arm64-Add-lazy-preempt-support.patch
new file mode 100644
index 00000000000000..5952089d3dde01
--- /dev/null
+++ b/patches/arch-arm64-Add-lazy-preempt-support.patch
@@ -0,0 +1,103 @@
+From: Anders Roxell <anders.roxell@linaro.org>
+Date: Thu, 14 May 2015 17:52:17 +0200
+Subject: arch/arm64: Add lazy preempt support
+
+arm64 is missing support for PREEMPT_RT. The main feature which is
+lacking is support for lazy preemption. The arch-specific entry code,
+thread information structure definitions, and associated data tables
+have to be extended to provide this support. Then the Kconfig file has
+to be extended to indicate the support is available, and also to
+indicate that support for full RT preemption is now available.
+
+Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
+---
+ arch/arm64/Kconfig | 1 +
+ arch/arm64/include/asm/thread_info.h | 3 +++
+ arch/arm64/kernel/asm-offsets.c | 1 +
+ arch/arm64/kernel/entry.S | 13 ++++++++++---
+ 4 files changed, 15 insertions(+), 3 deletions(-)
+
+--- a/arch/arm64/Kconfig
++++ b/arch/arm64/Kconfig
+@@ -69,6 +69,7 @@ config ARM64
+ select HAVE_PERF_REGS
+ select HAVE_PERF_USER_STACK_DUMP
+ select HAVE_RCU_TABLE_FREE
++ select HAVE_PREEMPT_LAZY
+ select HAVE_SYSCALL_TRACEPOINTS
+ select IRQ_DOMAIN
+ select IRQ_FORCED_THREADING
+--- a/arch/arm64/include/asm/thread_info.h
++++ b/arch/arm64/include/asm/thread_info.h
+@@ -47,6 +47,7 @@ struct thread_info {
+ mm_segment_t addr_limit; /* address limit */
+ struct task_struct *task; /* main task structure */
+ int preempt_count; /* 0 => preemptable, <0 => bug */
++ int preempt_lazy_count; /* 0 => preemptable, <0 => bug */
+ int cpu; /* cpu */
+ };
+
+@@ -101,6 +102,7 @@ static inline struct thread_info *curren
+ #define TIF_NEED_RESCHED 1
+ #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
+ #define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */
++#define TIF_NEED_RESCHED_LAZY 4
+ #define TIF_NOHZ 7
+ #define TIF_SYSCALL_TRACE 8
+ #define TIF_SYSCALL_AUDIT 9
+@@ -117,6 +119,7 @@ static inline struct thread_info *curren
+ #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
+ #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
+ #define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE)
++#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY)
+ #define _TIF_NOHZ (1 << TIF_NOHZ)
+ #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
+ #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
+--- a/arch/arm64/kernel/asm-offsets.c
++++ b/arch/arm64/kernel/asm-offsets.c
+@@ -35,6 +35,7 @@ int main(void)
+ BLANK();
+ DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
+ DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
++ DEFINE(TI_PREEMPT_LAZY, offsetof(struct thread_info, preempt_lazy_count));
+ DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit));
+ DEFINE(TI_TASK, offsetof(struct thread_info, task));
+ DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
+--- a/arch/arm64/kernel/entry.S
++++ b/arch/arm64/kernel/entry.S
+@@ -367,11 +367,16 @@ ENDPROC(el1_sync)
+ #ifdef CONFIG_PREEMPT
+ get_thread_info tsk
+ ldr w24, [tsk, #TI_PREEMPT] // get preempt count
+- cbnz w24, 1f // preempt count != 0
++ cbnz w24, 2f // preempt count != 0
+ ldr x0, [tsk, #TI_FLAGS] // get flags
+- tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling?
+- bl el1_preempt
++ tbnz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling?
++
++ ldr w24, [tsk, #TI_PREEMPT_LAZY] // get preempt lazy count
++ cbnz w24, 2f // preempt lazy count != 0
++ tbz x0, #TIF_NEED_RESCHED_LAZY, 2f // needs rescheduling?
+ 1:
++ bl el1_preempt
++2:
+ #endif
+ #ifdef CONFIG_TRACE_IRQFLAGS
+ bl trace_hardirqs_on
+@@ -385,6 +390,7 @@ ENDPROC(el1_irq)
+ 1: bl preempt_schedule_irq // irq en/disable is done inside
+ ldr x0, [tsk, #TI_FLAGS] // get new tasks TI_FLAGS
+ tbnz x0, #TIF_NEED_RESCHED, 1b // needs rescheduling?
++ tbnz x0, #TIF_NEED_RESCHED_LAZY, 1b // needs rescheduling?
+ ret x24
+ #endif
+
+@@ -621,6 +627,7 @@ ENDPROC(cpu_switch_to)
+ str x0, [sp, #S_X0] // returned x0
+ work_pending:
+ tbnz x1, #TIF_NEED_RESCHED, work_resched
++ tbnz x1, #TIF_NEED_RESCHED_LAZY, work_resched
+ /* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
+ ldr x2, [sp, #S_PSTATE]
+ mov x0, sp // 'regs'
diff --git a/patches/arm-at91-pit-remove-irq-handler-when-clock-is-unused.patch b/patches/arm-at91-pit-remove-irq-handler-when-clock-is-unused.patch
new file mode 100644
index 00000000000000..7cf425a86288c7
--- /dev/null
+++ b/patches/arm-at91-pit-remove-irq-handler-when-clock-is-unused.patch
@@ -0,0 +1,56 @@
+From: Benedikt Spranger <b.spranger@linutronix.de>
+Date: Sat, 6 Mar 2010 17:47:10 +0100
+Subject: ARM: AT91: PIT: Remove irq handler when clock event is unused
+
+Setup and remove the interrupt handler in clock event mode selection.
+This avoids calling the (shared) interrupt handler when the device is
+not used.
+
+Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+[bigeasy: redo the patch with NR_IRQS_LEGACY which is probably required since
+commit 8fe82a55 ("ARM: at91: sparse irq support") which is included since v3.6.
+Patch based on what Sami Pietikäinen <Sami.Pietikainen@wapice.com> suggested].
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/clocksource/timer-atmel-pit.c | 4 ++++
+ drivers/clocksource/timer-atmel-st.c | 1 +
+ 2 files changed, 5 insertions(+)
+
+--- a/drivers/clocksource/timer-atmel-pit.c
++++ b/drivers/clocksource/timer-atmel-pit.c
+@@ -90,6 +90,7 @@ static cycle_t read_pit_clk(struct clock
+ return elapsed;
+ }
+
++static struct irqaction at91sam926x_pit_irq;
+ /*
+ * Clockevent device: interrupts every 1/HZ (== pit_cycles * MCK/16)
+ */
+@@ -100,6 +101,8 @@ pit_clkevt_mode(enum clock_event_mode mo
+
+ switch (mode) {
+ case CLOCK_EVT_MODE_PERIODIC:
++ /* Set up irq handler */
++ setup_irq(at91sam926x_pit_irq.irq, &at91sam926x_pit_irq);
+ /* update clocksource counter */
+ data->cnt += data->cycle * PIT_PICNT(pit_read(data->base, AT91_PIT_PIVR));
+ pit_write(data->base, AT91_PIT_MR,
+@@ -113,6 +116,7 @@ pit_clkevt_mode(enum clock_event_mode mo
+ /* disable irq, leaving the clocksource active */
+ pit_write(data->base, AT91_PIT_MR,
+ (data->cycle - 1) | AT91_PIT_PITEN);
++ remove_irq(at91sam926x_pit_irq.irq, &at91sam926x_pit_irq);
+ break;
+ case CLOCK_EVT_MODE_RESUME:
+ break;
+--- a/drivers/clocksource/timer-atmel-st.c
++++ b/drivers/clocksource/timer-atmel-st.c
+@@ -131,6 +131,7 @@ clkevt32k_mode(enum clock_event_mode mod
+ break;
+ case CLOCK_EVT_MODE_SHUTDOWN:
+ case CLOCK_EVT_MODE_UNUSED:
++ remove_irq(NR_IRQS_LEGACY + AT91_ID_SYS, &at91rm9200_timer_irq);
+ case CLOCK_EVT_MODE_RESUME:
+ irqmask = 0;
+ break;
diff --git a/patches/arm-at91-tclib-default-to-tclib-timer-for-rt.patch b/patches/arm-at91-tclib-default-to-tclib-timer-for-rt.patch
new file mode 100644
index 00000000000000..f81c2037b22e4b
--- /dev/null
+++ b/patches/arm-at91-tclib-default-to-tclib-timer-for-rt.patch
@@ -0,0 +1,32 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sat, 1 May 2010 18:29:35 +0200
+Subject: ARM: at91: tclib: Default to tclib timer for RT
+
+RT is not too happy about the shared timer interrupt in AT91
+devices. Default to tclib timer for RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/misc/Kconfig | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/drivers/misc/Kconfig
++++ b/drivers/misc/Kconfig
+@@ -54,6 +54,7 @@ config AD525X_DPOT_SPI
+ config ATMEL_TCLIB
+ bool "Atmel AT32/AT91 Timer/Counter Library"
+ depends on (AVR32 || ARCH_AT91)
++ default y if PREEMPT_RT_FULL
+ help
+ Select this if you want a library to allocate the Timer/Counter
+ blocks found on many Atmel processors. This facilitates using
+@@ -86,7 +87,7 @@ config ATMEL_TCB_CLKSRC_BLOCK
+ config ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK
+ bool "TC Block use 32 KiHz clock"
+ depends on ATMEL_TCB_CLKSRC
+- default y
++ default y if !PREEMPT_RT_FULL
+ help
+ Select this to use 32 KiHz base clock rate as TC block clock
+ source for clock events.
diff --git a/patches/arm-convert-boot-lock-to-raw.patch b/patches/arm-convert-boot-lock-to-raw.patch
new file mode 100644
index 00000000000000..aa79108db45fe6
--- /dev/null
+++ b/patches/arm-convert-boot-lock-to-raw.patch
@@ -0,0 +1,465 @@
+From: Frank Rowand <frank.rowand@am.sony.com>
+Date: Mon, 19 Sep 2011 14:51:14 -0700
+Subject: arm: Convert arm boot_lock to raw
+
+The arm boot_lock is used by the secondary processor startup code. The locking
+task is the idle thread, which has idle->sched_class == &idle_sched_class.
+idle_sched_class->enqueue_task == NULL, so if the idle task blocks on the
+lock, the attempt to wake it when the lock becomes available will fail:
+
+try_to_wake_up()
+ ...
+ activate_task()
+ enqueue_task()
+ p->sched_class->enqueue_task(rq, p, flags)
+
+Fix by converting boot_lock to a raw spin lock.
+
+Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
+Link: http://lkml.kernel.org/r/4E77B952.3010606@am.sony.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/arm/mach-exynos/platsmp.c | 12 ++++++------
+ arch/arm/mach-hisi/platmcpm.c | 26 +++++++++++++-------------
+ arch/arm/mach-omap2/omap-smp.c | 10 +++++-----
+ arch/arm/mach-prima2/platsmp.c | 10 +++++-----
+ arch/arm/mach-qcom/platsmp.c | 10 +++++-----
+ arch/arm/mach-spear/platsmp.c | 10 +++++-----
+ arch/arm/mach-sti/platsmp.c | 10 +++++-----
+ arch/arm/mach-ux500/platsmp.c | 10 +++++-----
+ arch/arm/plat-versatile/platsmp.c | 10 +++++-----
+ 9 files changed, 54 insertions(+), 54 deletions(-)
+
+--- a/arch/arm/mach-exynos/platsmp.c
++++ b/arch/arm/mach-exynos/platsmp.c
+@@ -231,7 +231,7 @@ static void __iomem *scu_base_addr(void)
+ return (void __iomem *)(S5P_VA_SCU);
+ }
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ static void exynos_secondary_init(unsigned int cpu)
+ {
+@@ -244,8 +244,8 @@ static void exynos_secondary_init(unsign
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static int exynos_boot_secondary(unsigned int cpu, struct task_struct *idle)
+@@ -259,7 +259,7 @@ static int exynos_boot_secondary(unsigne
+ * Set synchronisation state between this boot processor
+ * and the secondary one
+ */
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * The secondary processor is waiting to be released from
+@@ -286,7 +286,7 @@ static int exynos_boot_secondary(unsigne
+
+ if (timeout == 0) {
+ printk(KERN_ERR "cpu1 power enable failed");
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ return -ETIMEDOUT;
+ }
+ }
+@@ -342,7 +342,7 @@ static int exynos_boot_secondary(unsigne
+ * calibrations, then wait for it to finish
+ */
+ fail:
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return pen_release != -1 ? ret : 0;
+ }
+--- a/arch/arm/mach-hisi/platmcpm.c
++++ b/arch/arm/mach-hisi/platmcpm.c
+@@ -57,7 +57,7 @@
+
+ static void __iomem *sysctrl, *fabric;
+ static int hip04_cpu_table[HIP04_MAX_CLUSTERS][HIP04_MAX_CPUS_PER_CLUSTER];
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+ static u32 fabric_phys_addr;
+ /*
+ * [0]: bootwrapper physical address
+@@ -104,7 +104,7 @@ static int hip04_mcpm_power_up(unsigned
+ if (cluster >= HIP04_MAX_CLUSTERS || cpu >= HIP04_MAX_CPUS_PER_CLUSTER)
+ return -EINVAL;
+
+- spin_lock_irq(&boot_lock);
++ raw_spin_lock_irq(&boot_lock);
+
+ if (hip04_cpu_table[cluster][cpu])
+ goto out;
+@@ -133,7 +133,7 @@ static int hip04_mcpm_power_up(unsigned
+ udelay(20);
+ out:
+ hip04_cpu_table[cluster][cpu]++;
+- spin_unlock_irq(&boot_lock);
++ raw_spin_unlock_irq(&boot_lock);
+
+ return 0;
+ }
+@@ -149,7 +149,7 @@ static void hip04_mcpm_power_down(void)
+
+ __mcpm_cpu_going_down(cpu, cluster);
+
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+ BUG_ON(__mcpm_cluster_state(cluster) != CLUSTER_UP);
+ hip04_cpu_table[cluster][cpu]--;
+ if (hip04_cpu_table[cluster][cpu] == 1) {
+@@ -162,7 +162,7 @@ static void hip04_mcpm_power_down(void)
+
+ last_man = hip04_cluster_is_down(cluster);
+ if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ /* Since it's Cortex A15, disable L2 prefetching. */
+ asm volatile(
+ "mcr p15, 1, %0, c15, c0, 3 \n\t"
+@@ -173,7 +173,7 @@ static void hip04_mcpm_power_down(void)
+ hip04_set_snoop_filter(cluster, 0);
+ __mcpm_outbound_leave_critical(cluster, CLUSTER_DOWN);
+ } else {
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ v7_exit_coherency_flush(louis);
+ }
+
+@@ -192,7 +192,7 @@ static int hip04_mcpm_wait_for_powerdown
+ cpu >= HIP04_MAX_CPUS_PER_CLUSTER);
+
+ count = TIMEOUT_MSEC / POLL_MSEC;
+- spin_lock_irq(&boot_lock);
++ raw_spin_lock_irq(&boot_lock);
+ for (tries = 0; tries < count; tries++) {
+ if (hip04_cpu_table[cluster][cpu]) {
+ ret = -EBUSY;
+@@ -202,10 +202,10 @@ static int hip04_mcpm_wait_for_powerdown
+ data = readl_relaxed(sysctrl + SC_CPU_RESET_STATUS(cluster));
+ if (data & CORE_WFI_STATUS(cpu))
+ break;
+- spin_unlock_irq(&boot_lock);
++ raw_spin_unlock_irq(&boot_lock);
+ /* Wait for clean L2 when the whole cluster is down. */
+ msleep(POLL_MSEC);
+- spin_lock_irq(&boot_lock);
++ raw_spin_lock_irq(&boot_lock);
+ }
+ if (tries >= count)
+ goto err;
+@@ -220,10 +220,10 @@ static int hip04_mcpm_wait_for_powerdown
+ }
+ if (tries >= count)
+ goto err;
+- spin_unlock_irq(&boot_lock);
++ raw_spin_unlock_irq(&boot_lock);
+ return 0;
+ err:
+- spin_unlock_irq(&boot_lock);
++ raw_spin_unlock_irq(&boot_lock);
+ return ret;
+ }
+
+@@ -235,10 +235,10 @@ static void hip04_mcpm_powered_up(void)
+ cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+ cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+ if (!hip04_cpu_table[cluster][cpu])
+ hip04_cpu_table[cluster][cpu] = 1;
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static void __naked hip04_mcpm_power_up_setup(unsigned int affinity_level)
+--- a/arch/arm/mach-omap2/omap-smp.c
++++ b/arch/arm/mach-omap2/omap-smp.c
+@@ -43,7 +43,7 @@
+ /* SCU base address */
+ static void __iomem *scu_base;
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ void __iomem *omap4_get_scu_base(void)
+ {
+@@ -74,8 +74,8 @@ static void omap4_secondary_init(unsigne
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static int omap4_boot_secondary(unsigned int cpu, struct task_struct *idle)
+@@ -89,7 +89,7 @@ static int omap4_boot_secondary(unsigned
+ * Set synchronisation state between this boot processor
+ * and the secondary one
+ */
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * Update the AuxCoreBoot0 with boot state for secondary core.
+@@ -166,7 +166,7 @@ static int omap4_boot_secondary(unsigned
+ * Now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return 0;
+ }
+--- a/arch/arm/mach-prima2/platsmp.c
++++ b/arch/arm/mach-prima2/platsmp.c
+@@ -22,7 +22,7 @@
+
+ static void __iomem *clk_base;
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ static void sirfsoc_secondary_init(unsigned int cpu)
+ {
+@@ -36,8 +36,8 @@ static void sirfsoc_secondary_init(unsig
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static const struct of_device_id clk_ids[] = {
+@@ -75,7 +75,7 @@ static int sirfsoc_boot_secondary(unsign
+ /* make sure write buffer is drained */
+ mb();
+
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * The secondary processor is waiting to be released from
+@@ -107,7 +107,7 @@ static int sirfsoc_boot_secondary(unsign
+ * now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return pen_release != -1 ? -ENOSYS : 0;
+ }
+--- a/arch/arm/mach-qcom/platsmp.c
++++ b/arch/arm/mach-qcom/platsmp.c
+@@ -46,7 +46,7 @@
+
+ extern void secondary_startup_arm(void);
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ #ifdef CONFIG_HOTPLUG_CPU
+ static void __ref qcom_cpu_die(unsigned int cpu)
+@@ -60,8 +60,8 @@ static void qcom_secondary_init(unsigned
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static int scss_release_secondary(unsigned int cpu)
+@@ -284,7 +284,7 @@ static int qcom_boot_secondary(unsigned
+ * set synchronisation state between this boot processor
+ * and the secondary one
+ */
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * Send the secondary CPU a soft interrupt, thereby causing
+@@ -297,7 +297,7 @@ static int qcom_boot_secondary(unsigned
+ * now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return ret;
+ }
+--- a/arch/arm/mach-spear/platsmp.c
++++ b/arch/arm/mach-spear/platsmp.c
+@@ -32,7 +32,7 @@ static void write_pen_release(int val)
+ sync_cache_w(&pen_release);
+ }
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ static void __iomem *scu_base = IOMEM(VA_SCU_BASE);
+
+@@ -47,8 +47,8 @@ static void spear13xx_secondary_init(uns
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static int spear13xx_boot_secondary(unsigned int cpu, struct task_struct *idle)
+@@ -59,7 +59,7 @@ static int spear13xx_boot_secondary(unsi
+ * set synchronisation state between this boot processor
+ * and the secondary one
+ */
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * The secondary processor is waiting to be released from
+@@ -84,7 +84,7 @@ static int spear13xx_boot_secondary(unsi
+ * now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return pen_release != -1 ? -ENOSYS : 0;
+ }
+--- a/arch/arm/mach-sti/platsmp.c
++++ b/arch/arm/mach-sti/platsmp.c
+@@ -34,7 +34,7 @@ static void write_pen_release(int val)
+ sync_cache_w(&pen_release);
+ }
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ static void sti_secondary_init(unsigned int cpu)
+ {
+@@ -49,8 +49,8 @@ static void sti_secondary_init(unsigned
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static int sti_boot_secondary(unsigned int cpu, struct task_struct *idle)
+@@ -61,7 +61,7 @@ static int sti_boot_secondary(unsigned i
+ * set synchronisation state between this boot processor
+ * and the secondary one
+ */
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * The secondary processor is waiting to be released from
+@@ -92,7 +92,7 @@ static int sti_boot_secondary(unsigned i
+ * now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return pen_release != -1 ? -ENOSYS : 0;
+ }
+--- a/arch/arm/mach-ux500/platsmp.c
++++ b/arch/arm/mach-ux500/platsmp.c
+@@ -51,7 +51,7 @@ static void __iomem *scu_base_addr(void)
+ return NULL;
+ }
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ static void ux500_secondary_init(unsigned int cpu)
+ {
+@@ -64,8 +64,8 @@ static void ux500_secondary_init(unsigne
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ static int ux500_boot_secondary(unsigned int cpu, struct task_struct *idle)
+@@ -76,7 +76,7 @@ static int ux500_boot_secondary(unsigned
+ * set synchronisation state between this boot processor
+ * and the secondary one
+ */
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * The secondary processor is waiting to be released from
+@@ -97,7 +97,7 @@ static int ux500_boot_secondary(unsigned
+ * now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return pen_release != -1 ? -ENOSYS : 0;
+ }
+--- a/arch/arm/plat-versatile/platsmp.c
++++ b/arch/arm/plat-versatile/platsmp.c
+@@ -30,7 +30,7 @@ static void write_pen_release(int val)
+ sync_cache_w(&pen_release);
+ }
+
+-static DEFINE_SPINLOCK(boot_lock);
++static DEFINE_RAW_SPINLOCK(boot_lock);
+
+ void versatile_secondary_init(unsigned int cpu)
+ {
+@@ -43,8 +43,8 @@ void versatile_secondary_init(unsigned i
+ /*
+ * Synchronise with the boot thread.
+ */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
++ raw_spin_lock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+ }
+
+ int versatile_boot_secondary(unsigned int cpu, struct task_struct *idle)
+@@ -55,7 +55,7 @@ int versatile_boot_secondary(unsigned in
+ * Set synchronisation state between this boot processor
+ * and the secondary one
+ */
+- spin_lock(&boot_lock);
++ raw_spin_lock(&boot_lock);
+
+ /*
+ * This is really belt and braces; we hold unintended secondary
+@@ -85,7 +85,7 @@ int versatile_boot_secondary(unsigned in
+ * now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+- spin_unlock(&boot_lock);
++ raw_spin_unlock(&boot_lock);
+
+ return pen_release != -1 ? -ENOSYS : 0;
+ }
diff --git a/patches/arm-enable-highmem-for-rt.patch b/patches/arm-enable-highmem-for-rt.patch
new file mode 100644
index 00000000000000..fe8d45a62817d2
--- /dev/null
+++ b/patches/arm-enable-highmem-for-rt.patch
@@ -0,0 +1,148 @@
+Subject: arm: Enable highmem for rt
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 13 Feb 2013 11:03:11 +0100
+
+fixup highmem for ARM.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/arm/include/asm/switch_to.h | 8 ++++++
+ arch/arm/mm/highmem.c | 46 ++++++++++++++++++++++++++++++++++-----
+ include/linux/highmem.h | 1
+ 3 files changed, 50 insertions(+), 5 deletions(-)
+
+--- a/arch/arm/include/asm/switch_to.h
++++ b/arch/arm/include/asm/switch_to.h
+@@ -3,6 +3,13 @@
+
+ #include <linux/thread_info.h>
+
++#if defined CONFIG_PREEMPT_RT_FULL && defined CONFIG_HIGHMEM
++void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p);
++#else
++static inline void
++switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) { }
++#endif
++
+ /*
+ * For v7 SMP cores running a preemptible kernel we may be pre-empted
+ * during a TLB maintenance operation, so execute an inner-shareable dsb
+@@ -22,6 +29,7 @@ extern struct task_struct *__switch_to(s
+
+ #define switch_to(prev,next,last) \
+ do { \
++ switch_kmaps(prev, next); \
+ last = __switch_to(prev,task_thread_info(prev), task_thread_info(next)); \
+ } while (0)
+
+--- a/arch/arm/mm/highmem.c
++++ b/arch/arm/mm/highmem.c
+@@ -54,12 +54,13 @@ EXPORT_SYMBOL(kunmap);
+
+ void *kmap_atomic(struct page *page)
+ {
++ pte_t pte = mk_pte(page, kmap_prot);
+ unsigned int idx;
+ unsigned long vaddr;
+ void *kmap;
+ int type;
+
+- preempt_disable();
++ preempt_disable_nort();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -93,7 +94,10 @@ void *kmap_atomic(struct page *page)
+ * in place, so the contained TLB flush ensures the TLB is updated
+ * with the new mapping.
+ */
+- set_fixmap_pte(idx, mk_pte(page, kmap_prot));
++#ifdef CONFIG_PREEMPT_RT_FULL
++ current->kmap_pte[type] = pte;
++#endif
++ set_fixmap_pte(idx, pte);
+
+ return (void *)vaddr;
+ }
+@@ -110,6 +114,9 @@ void __kunmap_atomic(void *kvaddr)
+
+ if (cache_is_vivt())
+ __cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE);
++#ifdef CONFIG_PREEMPT_RT_FULL
++ current->kmap_pte[type] = __pte(0);
++#endif
+ #ifdef CONFIG_DEBUG_HIGHMEM
+ BUG_ON(vaddr != __fix_to_virt(idx));
+ #else
+@@ -122,17 +129,18 @@ void __kunmap_atomic(void *kvaddr)
+ kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
+ }
+ pagefault_enable();
+- preempt_enable();
++ preempt_enable_nort();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+ void *kmap_atomic_pfn(unsigned long pfn)
+ {
++ pte_t pte = pfn_pte(pfn, kmap_prot);
+ unsigned long vaddr;
+ int idx, type;
+ struct page *page = pfn_to_page(pfn);
+
+- preempt_disable();
++ preempt_disable_nort();
+ pagefault_disable();
+ if (!PageHighMem(page))
+ return page_address(page);
+@@ -143,7 +151,10 @@ void *kmap_atomic_pfn(unsigned long pfn)
+ #ifdef CONFIG_DEBUG_HIGHMEM
+ BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
+ #endif
+- set_fixmap_pte(idx, pfn_pte(pfn, kmap_prot));
++#ifdef CONFIG_PREEMPT_RT_FULL
++ current->kmap_pte[type] = pte;
++#endif
++ set_fixmap_pte(idx, pte);
+
+ return (void *)vaddr;
+ }
+@@ -157,3 +168,28 @@ struct page *kmap_atomic_to_page(const v
+
+ return pte_page(get_fixmap_pte(vaddr));
+ }
++
++#if defined CONFIG_PREEMPT_RT_FULL
++void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p)
++{
++ int i;
++
++ /*
++ * Clear @prev's kmap_atomic mappings
++ */
++ for (i = 0; i < prev_p->kmap_idx; i++) {
++ int idx = i + KM_TYPE_NR * smp_processor_id();
++
++ set_fixmap_pte(idx, __pte(0));
++ }
++ /*
++ * Restore @next_p's kmap_atomic mappings
++ */
++ for (i = 0; i < next_p->kmap_idx; i++) {
++ int idx = i + KM_TYPE_NR * smp_processor_id();
++
++ if (!pte_none(next_p->kmap_pte[i]))
++ set_fixmap_pte(idx, next_p->kmap_pte[i]);
++ }
++}
++#endif
+--- a/include/linux/highmem.h
++++ b/include/linux/highmem.h
+@@ -7,6 +7,7 @@
+ #include <linux/mm.h>
+ #include <linux/uaccess.h>
+ #include <linux/hardirq.h>
++#include <linux/sched.h>
+
+ #include <asm/cacheflush.h>
+
diff --git a/patches/arm-highmem-flush-tlb-on-unmap.patch b/patches/arm-highmem-flush-tlb-on-unmap.patch
new file mode 100644
index 00000000000000..08e17cab7ea81f
--- /dev/null
+++ b/patches/arm-highmem-flush-tlb-on-unmap.patch
@@ -0,0 +1,27 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Mon, 11 Mar 2013 21:37:27 +0100
+Subject: arm/highmem: Flush tlb on unmap
+
+The tlb should be flushed on unmap and thus make the mapping entry
+invalid. This is only done in the non-debug case which does not look
+right.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/arm/mm/highmem.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/arm/mm/highmem.c
++++ b/arch/arm/mm/highmem.c
+@@ -112,10 +112,10 @@ void __kunmap_atomic(void *kvaddr)
+ __cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE);
+ #ifdef CONFIG_DEBUG_HIGHMEM
+ BUG_ON(vaddr != __fix_to_virt(idx));
+- set_fixmap_pte(idx, __pte(0));
+ #else
+ (void) idx; /* to kill a warning */
+ #endif
++ set_fixmap_pte(idx, __pte(0));
+ kmap_atomic_idx_pop();
+ } else if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
+ /* this address was obtained through kmap_high_get() */
diff --git a/patches/arm-preempt-lazy-support.patch b/patches/arm-preempt-lazy-support.patch
new file mode 100644
index 00000000000000..0cf8b5d539d590
--- /dev/null
+++ b/patches/arm-preempt-lazy-support.patch
@@ -0,0 +1,105 @@
+Subject: arm: Add support for lazy preemption
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 31 Oct 2012 12:04:11 +0100
+
+Implement the arm pieces for lazy preempt.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/arm/Kconfig | 1 +
+ arch/arm/include/asm/thread_info.h | 3 +++
+ arch/arm/kernel/asm-offsets.c | 1 +
+ arch/arm/kernel/entry-armv.S | 13 +++++++++++--
+ arch/arm/kernel/signal.c | 3 ++-
+ 5 files changed, 18 insertions(+), 3 deletions(-)
+
+--- a/arch/arm/Kconfig
++++ b/arch/arm/Kconfig
+@@ -66,6 +66,7 @@ config ARM
+ select HAVE_PERF_EVENTS
+ select HAVE_PERF_REGS
+ select HAVE_PERF_USER_STACK_DUMP
++ select HAVE_PREEMPT_LAZY
+ select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
+ select HAVE_REGS_AND_STACK_ACCESS_API
+ select HAVE_SYSCALL_TRACEPOINTS
+--- a/arch/arm/include/asm/thread_info.h
++++ b/arch/arm/include/asm/thread_info.h
+@@ -50,6 +50,7 @@ struct cpu_context_save {
+ struct thread_info {
+ unsigned long flags; /* low level flags */
+ int preempt_count; /* 0 => preemptable, <0 => bug */
++ int preempt_lazy_count; /* 0 => preemptable, <0 => bug */
+ mm_segment_t addr_limit; /* address limit */
+ struct task_struct *task; /* main task structure */
+ __u32 cpu; /* cpu */
+@@ -147,6 +148,7 @@ extern int vfp_restore_user_hwstate(stru
+ #define TIF_SIGPENDING 0
+ #define TIF_NEED_RESCHED 1
+ #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
++#define TIF_NEED_RESCHED_LAZY 3
+ #define TIF_UPROBE 7
+ #define TIF_SYSCALL_TRACE 8
+ #define TIF_SYSCALL_AUDIT 9
+@@ -160,6 +162,7 @@ extern int vfp_restore_user_hwstate(stru
+ #define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
+ #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
+ #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
++#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY)
+ #define _TIF_UPROBE (1 << TIF_UPROBE)
+ #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
+ #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
+--- a/arch/arm/kernel/asm-offsets.c
++++ b/arch/arm/kernel/asm-offsets.c
+@@ -65,6 +65,7 @@ int main(void)
+ BLANK();
+ DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
+ DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
++ DEFINE(TI_PREEMPT_LAZY, offsetof(struct thread_info, preempt_lazy_count));
+ DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit));
+ DEFINE(TI_TASK, offsetof(struct thread_info, task));
+ DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
+--- a/arch/arm/kernel/entry-armv.S
++++ b/arch/arm/kernel/entry-armv.S
+@@ -208,11 +208,18 @@ ENDPROC(__dabt_svc)
+ #ifdef CONFIG_PREEMPT
+ get_thread_info tsk
+ ldr r8, [tsk, #TI_PREEMPT] @ get preempt count
+- ldr r0, [tsk, #TI_FLAGS] @ get flags
+ teq r8, #0 @ if preempt count != 0
++ bne 1f @ return from exeption
++ ldr r0, [tsk, #TI_FLAGS] @ get flags
++ tst r0, #_TIF_NEED_RESCHED @ if NEED_RESCHED is set
++ blne svc_preempt @ preempt!
++
++ ldr r8, [tsk, #TI_PREEMPT_LAZY] @ get preempt lazy count
++ teq r8, #0 @ if preempt lazy count != 0
+ movne r0, #0 @ force flags to 0
+- tst r0, #_TIF_NEED_RESCHED
++ tst r0, #_TIF_NEED_RESCHED_LAZY
+ blne svc_preempt
++1:
+ #endif
+
+ svc_exit r5, irq = 1 @ return from exception
+@@ -227,6 +234,8 @@ ENDPROC(__irq_svc)
+ 1: bl preempt_schedule_irq @ irq en/disable is done inside
+ ldr r0, [tsk, #TI_FLAGS] @ get new tasks TI_FLAGS
+ tst r0, #_TIF_NEED_RESCHED
++ bne 1b
++ tst r0, #_TIF_NEED_RESCHED_LAZY
+ reteq r8 @ go again
+ b 1b
+ #endif
+--- a/arch/arm/kernel/signal.c
++++ b/arch/arm/kernel/signal.c
+@@ -563,7 +563,8 @@ asmlinkage int
+ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
+ {
+ do {
+- if (likely(thread_flags & _TIF_NEED_RESCHED)) {
++ if (likely(thread_flags & (_TIF_NEED_RESCHED |
++ _TIF_NEED_RESCHED_LAZY))) {
+ schedule();
+ } else {
+ if (unlikely(!user_mode(regs)))
diff --git a/patches/arm-unwind-use_raw_lock.patch b/patches/arm-unwind-use_raw_lock.patch
new file mode 100644
index 00000000000000..9c10dd91c29535
--- /dev/null
+++ b/patches/arm-unwind-use_raw_lock.patch
@@ -0,0 +1,83 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 20 Sep 2013 14:31:54 +0200
+Subject: arm/unwind: use a raw_spin_lock
+
+Mostly unwind is done with irqs enabled however SLUB may call it with
+irqs disabled while creating a new SLUB cache.
+
+I had system freeze while loading a module which called
+kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
+interrupts and then
+
+->new_slab_objects()
+ ->new_slab()
+ ->setup_object()
+ ->setup_object_debug()
+ ->init_tracking()
+ ->set_track()
+ ->save_stack_trace()
+ ->save_stack_trace_tsk()
+ ->walk_stackframe()
+ ->unwind_frame()
+ ->unwind_find_idx()
+ =>spin_lock_irqsave(&unwind_lock);
+
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/arm/kernel/unwind.c | 14 +++++++-------
+ 1 file changed, 7 insertions(+), 7 deletions(-)
+
+--- a/arch/arm/kernel/unwind.c
++++ b/arch/arm/kernel/unwind.c
+@@ -93,7 +93,7 @@ extern const struct unwind_idx __start_u
+ static const struct unwind_idx *__origin_unwind_idx;
+ extern const struct unwind_idx __stop_unwind_idx[];
+
+-static DEFINE_SPINLOCK(unwind_lock);
++static DEFINE_RAW_SPINLOCK(unwind_lock);
+ static LIST_HEAD(unwind_tables);
+
+ /* Convert a prel31 symbol to an absolute address */
+@@ -201,7 +201,7 @@ static const struct unwind_idx *unwind_f
+ /* module unwind tables */
+ struct unwind_table *table;
+
+- spin_lock_irqsave(&unwind_lock, flags);
++ raw_spin_lock_irqsave(&unwind_lock, flags);
+ list_for_each_entry(table, &unwind_tables, list) {
+ if (addr >= table->begin_addr &&
+ addr < table->end_addr) {
+@@ -213,7 +213,7 @@ static const struct unwind_idx *unwind_f
+ break;
+ }
+ }
+- spin_unlock_irqrestore(&unwind_lock, flags);
++ raw_spin_unlock_irqrestore(&unwind_lock, flags);
+ }
+
+ pr_debug("%s: idx = %p\n", __func__, idx);
+@@ -529,9 +529,9 @@ struct unwind_table *unwind_table_add(un
+ tab->begin_addr = text_addr;
+ tab->end_addr = text_addr + text_size;
+
+- spin_lock_irqsave(&unwind_lock, flags);
++ raw_spin_lock_irqsave(&unwind_lock, flags);
+ list_add_tail(&tab->list, &unwind_tables);
+- spin_unlock_irqrestore(&unwind_lock, flags);
++ raw_spin_unlock_irqrestore(&unwind_lock, flags);
+
+ return tab;
+ }
+@@ -543,9 +543,9 @@ void unwind_table_del(struct unwind_tabl
+ if (!tab)
+ return;
+
+- spin_lock_irqsave(&unwind_lock, flags);
++ raw_spin_lock_irqsave(&unwind_lock, flags);
+ list_del(&tab->list);
+- spin_unlock_irqrestore(&unwind_lock, flags);
++ raw_spin_unlock_irqrestore(&unwind_lock, flags);
+
+ kfree(tab);
+ }
diff --git a/patches/ata-disable-interrupts-if-non-rt.patch b/patches/ata-disable-interrupts-if-non-rt.patch
new file mode 100644
index 00000000000000..d3cd3c3fd5dfc1
--- /dev/null
+++ b/patches/ata-disable-interrupts-if-non-rt.patch
@@ -0,0 +1,64 @@
+From: Steven Rostedt <srostedt@redhat.com>
+Date: Fri, 3 Jul 2009 08:44:29 -0500
+Subject: ata: Do not disable interrupts in ide code for preempt-rt
+
+Use the local_irq_*_nort variants.
+
+Signed-off-by: Steven Rostedt <srostedt@redhat.com>
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/ata/libata-sff.c | 12 ++++++------
+ 1 file changed, 6 insertions(+), 6 deletions(-)
+
+--- a/drivers/ata/libata-sff.c
++++ b/drivers/ata/libata-sff.c
+@@ -678,9 +678,9 @@ unsigned int ata_sff_data_xfer_noirq(str
+ unsigned long flags;
+ unsigned int consumed;
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ consumed = ata_sff_data_xfer32(dev, buf, buflen, rw);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ return consumed;
+ }
+@@ -719,7 +719,7 @@ static void ata_pio_sector(struct ata_qu
+ unsigned long flags;
+
+ /* FIXME: use a bounce buffer */
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ buf = kmap_atomic(page);
+
+ /* do the actual data transfer */
+@@ -727,7 +727,7 @@ static void ata_pio_sector(struct ata_qu
+ do_write);
+
+ kunmap_atomic(buf);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ } else {
+ buf = page_address(page);
+ ap->ops->sff_data_xfer(qc->dev, buf + offset, qc->sect_size,
+@@ -864,7 +864,7 @@ static int __atapi_pio_bytes(struct ata_
+ unsigned long flags;
+
+ /* FIXME: use bounce buffer */
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ buf = kmap_atomic(page);
+
+ /* do the actual data transfer */
+@@ -872,7 +872,7 @@ static int __atapi_pio_bytes(struct ata_
+ count, rw);
+
+ kunmap_atomic(buf);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ } else {
+ buf = page_address(page);
+ consumed = ap->ops->sff_data_xfer(dev, buf + offset,
diff --git a/patches/blk-mq-revert-raw-locks-post-pone-notifier-to-POST_D.patchto-POST_D.patch b/patches/blk-mq-revert-raw-locks-post-pone-notifier-to-POST_D.patchto-POST_D.patch
new file mode 100644
index 00000000000000..ff11c7c9677b45
--- /dev/null
+++ b/patches/blk-mq-revert-raw-locks-post-pone-notifier-to-POST_D.patchto-POST_D.patch
@@ -0,0 +1,83 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Sat, 3 May 2014 11:00:29 +0200
+Subject: blk-mq: revert raw locks, post pone notifier to POST_DEAD
+
+The blk_mq_cpu_notify_lock should be raw because some CPU down levels
+are called with interrupts off. The notifier itself calls currently one
+function that is blk_mq_hctx_notify().
+That function acquires the ctx->lock lock which is sleeping and I would
+prefer to keep it that way. That function only moves IO-requests from
+the CPU that is going offline to another CPU and it is currently the
+only one. Therefore I revert the list lock back to sleeping spinlocks
+and let the notifier run at POST_DEAD time.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ block/blk-mq-cpu.c | 17 ++++++++++-------
+ block/blk-mq.c | 2 +-
+ 2 files changed, 11 insertions(+), 8 deletions(-)
+
+--- a/block/blk-mq-cpu.c
++++ b/block/blk-mq-cpu.c
+@@ -16,7 +16,7 @@
+ #include "blk-mq.h"
+
+ static LIST_HEAD(blk_mq_cpu_notify_list);
+-static DEFINE_RAW_SPINLOCK(blk_mq_cpu_notify_lock);
++static DEFINE_SPINLOCK(blk_mq_cpu_notify_lock);
+
+ static int blk_mq_main_cpu_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+@@ -25,7 +25,10 @@ static int blk_mq_main_cpu_notify(struct
+ struct blk_mq_cpu_notifier *notify;
+ int ret = NOTIFY_OK;
+
+- raw_spin_lock(&blk_mq_cpu_notify_lock);
++ if (action != CPU_POST_DEAD)
++ return NOTIFY_OK;
++
++ spin_lock(&blk_mq_cpu_notify_lock);
+
+ list_for_each_entry(notify, &blk_mq_cpu_notify_list, list) {
+ ret = notify->notify(notify->data, action, cpu);
+@@ -33,7 +36,7 @@ static int blk_mq_main_cpu_notify(struct
+ break;
+ }
+
+- raw_spin_unlock(&blk_mq_cpu_notify_lock);
++ spin_unlock(&blk_mq_cpu_notify_lock);
+ return ret;
+ }
+
+@@ -41,16 +44,16 @@ void blk_mq_register_cpu_notifier(struct
+ {
+ BUG_ON(!notifier->notify);
+
+- raw_spin_lock(&blk_mq_cpu_notify_lock);
++ spin_lock(&blk_mq_cpu_notify_lock);
+ list_add_tail(&notifier->list, &blk_mq_cpu_notify_list);
+- raw_spin_unlock(&blk_mq_cpu_notify_lock);
++ spin_unlock(&blk_mq_cpu_notify_lock);
+ }
+
+ void blk_mq_unregister_cpu_notifier(struct blk_mq_cpu_notifier *notifier)
+ {
+- raw_spin_lock(&blk_mq_cpu_notify_lock);
++ spin_lock(&blk_mq_cpu_notify_lock);
+ list_del(&notifier->list);
+- raw_spin_unlock(&blk_mq_cpu_notify_lock);
++ spin_unlock(&blk_mq_cpu_notify_lock);
+ }
+
+ void blk_mq_init_cpu_notifier(struct blk_mq_cpu_notifier *notifier,
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -1612,7 +1612,7 @@ static int blk_mq_hctx_notify(void *data
+ {
+ struct blk_mq_hw_ctx *hctx = data;
+
+- if (action == CPU_DEAD || action == CPU_DEAD_FROZEN)
++ if (action == CPU_POST_DEAD)
+ return blk_mq_hctx_cpu_offline(hctx, cpu);
+
+ /*
diff --git a/patches/block-blk-mq-use-swait.patch b/patches/block-blk-mq-use-swait.patch
new file mode 100644
index 00000000000000..0792dc7d084490
--- /dev/null
+++ b/patches/block-blk-mq-use-swait.patch
@@ -0,0 +1,114 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 13 Feb 2015 11:01:26 +0100
+Subject: block: blk-mq: Use swait
+
+| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914
+| in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6
+| 5 locks held by kworker/u257:6/255:
+| #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0
+| #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0
+| #2: (&shost->scan_mutex){+.+.+.}, at: [<ffffffffa000faa3>] __scsi_add_device+0xa3/0x130 [scsi_mod]
+| #3: (&set->tag_list_lock){+.+...}, at: [<ffffffff812f09fa>] blk_mq_init_queue+0x96a/0xa50
+| #4: (rcu_read_lock_sched){......}, at: [<ffffffff8132887d>] percpu_ref_kill_and_confirm+0x1d/0x120
+| Preemption disabled at:[<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70
+|
+| CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1
+| Workqueue: events_unbound async_run_entry_fn
+| 0000000000000003 ffff8800bc29f998 ffffffff815b3a12 0000000000000000
+| 0000000000000000 ffff8800bc29f9b8 ffffffff8109aa16 ffff8800bc29fa28
+| ffff8800bc5d1bc8 ffff8800bc29f9e8 ffffffff815b8dd4 ffff880000000000
+| Call Trace:
+| [<ffffffff815b3a12>] dump_stack+0x4f/0x7c
+| [<ffffffff8109aa16>] __might_sleep+0x116/0x190
+| [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60
+| [<ffffffff810b6089>] __wake_up+0x29/0x60
+| [<ffffffff812ee06e>] blk_mq_usage_counter_release+0x1e/0x20
+| [<ffffffff81328966>] percpu_ref_kill_and_confirm+0x106/0x120
+| [<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70
+| [<ffffffff812f0000>] blk_mq_update_tag_set_depth+0x40/0xd0
+| [<ffffffff812f0a1c>] blk_mq_init_queue+0x98c/0xa50
+| [<ffffffffa000dcf0>] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod]
+| [<ffffffffa000ea35>] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod]
+| [<ffffffffa000f494>] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod]
+| [<ffffffffa000fb26>] __scsi_add_device+0x126/0x130 [scsi_mod]
+| [<ffffffffa013033f>] ata_scsi_scan_host+0xaf/0x200 [libata]
+| [<ffffffffa012b5b6>] async_port_probe+0x46/0x60 [libata]
+| [<ffffffff810978fb>] async_run_entry_fn+0x3b/0xf0
+| [<ffffffff8108ee81>] process_one_work+0x201/0x5e0
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ block/blk-core.c | 2 +-
+ block/blk-mq.c | 10 +++++-----
+ include/linux/blkdev.h | 2 +-
+ 3 files changed, 7 insertions(+), 7 deletions(-)
+
+--- a/block/blk-core.c
++++ b/block/blk-core.c
+@@ -664,7 +664,7 @@ struct request_queue *blk_alloc_queue_no
+ q->bypass_depth = 1;
+ __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
+
+- init_waitqueue_head(&q->mq_freeze_wq);
++ init_swait_head(&q->mq_freeze_wq);
+
+ if (blkcg_init_queue(q))
+ goto fail_bdi;
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -88,7 +88,7 @@ static int blk_mq_queue_enter(struct req
+ if (!(gfp & __GFP_WAIT))
+ return -EBUSY;
+
+- ret = wait_event_interruptible(q->mq_freeze_wq,
++ ret = swait_event_interruptible(q->mq_freeze_wq,
+ !q->mq_freeze_depth || blk_queue_dying(q));
+ if (blk_queue_dying(q))
+ return -ENODEV;
+@@ -107,7 +107,7 @@ static void blk_mq_usage_counter_release
+ struct request_queue *q =
+ container_of(ref, struct request_queue, mq_usage_counter);
+
+- wake_up_all(&q->mq_freeze_wq);
++ swait_wake_all(&q->mq_freeze_wq);
+ }
+
+ void blk_mq_freeze_queue_start(struct request_queue *q)
+@@ -127,7 +127,7 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_st
+
+ static void blk_mq_freeze_queue_wait(struct request_queue *q)
+ {
+- wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
++ swait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
+ }
+
+ /*
+@@ -151,7 +151,7 @@ void blk_mq_unfreeze_queue(struct reques
+ spin_unlock_irq(q->queue_lock);
+ if (wake) {
+ percpu_ref_reinit(&q->mq_usage_counter);
+- wake_up_all(&q->mq_freeze_wq);
++ swait_wake_all(&q->mq_freeze_wq);
+ }
+ }
+ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
+@@ -170,7 +170,7 @@ void blk_mq_wake_waiters(struct request_
+ * dying, we need to ensure that processes currently waiting on
+ * the queue are notified as well.
+ */
+- wake_up_all(&q->mq_freeze_wq);
++ swait_wake_all(&q->mq_freeze_wq);
+ }
+
+ bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
+--- a/include/linux/blkdev.h
++++ b/include/linux/blkdev.h
+@@ -483,7 +483,7 @@ struct request_queue {
+ struct throtl_data *td;
+ #endif
+ struct rcu_head rcu_head;
+- wait_queue_head_t mq_freeze_wq;
++ struct swait_head mq_freeze_wq;
+ struct percpu_ref mq_usage_counter;
+ struct list_head all_q_node;
+
diff --git a/patches/block-mq-don-t-complete-requests-via-IPI.patch b/patches/block-mq-don-t-complete-requests-via-IPI.patch
new file mode 100644
index 00000000000000..7f9f91ff18b706
--- /dev/null
+++ b/patches/block-mq-don-t-complete-requests-via-IPI.patch
@@ -0,0 +1,101 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 29 Jan 2015 15:10:08 +0100
+Subject: block/mq: don't complete requests via IPI
+
+The IPI runs in hardirq context and there are sleeping locks. This patch
+moves the completion into a workqueue.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ block/blk-core.c | 3 +++
+ block/blk-mq.c | 20 ++++++++++++++++++++
+ include/linux/blk-mq.h | 1 +
+ include/linux/blkdev.h | 1 +
+ 4 files changed, 25 insertions(+)
+
+--- a/block/blk-core.c
++++ b/block/blk-core.c
+@@ -100,6 +100,9 @@ void blk_rq_init(struct request_queue *q
+
+ INIT_LIST_HEAD(&rq->queuelist);
+ INIT_LIST_HEAD(&rq->timeout_list);
++#ifdef CONFIG_PREEMPT_RT_FULL
++ INIT_WORK(&rq->work, __blk_mq_complete_request_remote_work);
++#endif
+ rq->cpu = -1;
+ rq->q = q;
+ rq->__sector = (sector_t) -1;
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -217,6 +217,9 @@ static void blk_mq_rq_ctx_init(struct re
+ rq->resid_len = 0;
+ rq->sense = NULL;
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++ INIT_WORK(&rq->work, __blk_mq_complete_request_remote_work);
++#endif
+ INIT_LIST_HEAD(&rq->timeout_list);
+ rq->timeout = 0;
+
+@@ -346,6 +349,17 @@ void blk_mq_end_request(struct request *
+ }
+ EXPORT_SYMBOL(blk_mq_end_request);
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++
++void __blk_mq_complete_request_remote_work(struct work_struct *work)
++{
++ struct request *rq = container_of(work, struct request, work);
++
++ rq->q->softirq_done_fn(rq);
++}
++
++#else
++
+ static void __blk_mq_complete_request_remote(void *data)
+ {
+ struct request *rq = data;
+@@ -353,6 +367,8 @@ static void __blk_mq_complete_request_re
+ rq->q->softirq_done_fn(rq);
+ }
+
++#endif
++
+ static void blk_mq_ipi_complete_request(struct request *rq)
+ {
+ struct blk_mq_ctx *ctx = rq->mq_ctx;
+@@ -369,10 +385,14 @@ static void blk_mq_ipi_complete_request(
+ shared = cpus_share_cache(cpu, ctx->cpu);
+
+ if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
++#ifdef CONFIG_PREEMPT_RT_FULL
++ schedule_work_on(ctx->cpu, &rq->work);
++#else
+ rq->csd.func = __blk_mq_complete_request_remote;
+ rq->csd.info = rq;
+ rq->csd.flags = 0;
+ smp_call_function_single_async(ctx->cpu, &rq->csd);
++#endif
+ } else {
+ rq->q->softirq_done_fn(rq);
+ }
+--- a/include/linux/blk-mq.h
++++ b/include/linux/blk-mq.h
+@@ -202,6 +202,7 @@ static inline u16 blk_mq_unique_tag_to_t
+
+ struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *, const int ctx_index);
+ struct blk_mq_hw_ctx *blk_mq_alloc_single_hw_queue(struct blk_mq_tag_set *, unsigned int, int);
++void __blk_mq_complete_request_remote_work(struct work_struct *work);
+
+ int blk_mq_request_started(struct request *rq);
+ void blk_mq_start_request(struct request *rq);
+--- a/include/linux/blkdev.h
++++ b/include/linux/blkdev.h
+@@ -101,6 +101,7 @@ struct request {
+ struct list_head queuelist;
+ union {
+ struct call_single_data csd;
++ struct work_struct work;
+ unsigned long fifo_time;
+ };
+
diff --git a/patches/block-mq-drop-per-ctx-cpu_lock.patch b/patches/block-mq-drop-per-ctx-cpu_lock.patch
new file mode 100644
index 00000000000000..66cea706d8f54f
--- /dev/null
+++ b/patches/block-mq-drop-per-ctx-cpu_lock.patch
@@ -0,0 +1,124 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Wed, 18 Feb 2015 18:37:26 +0100
+Subject: block/mq: drop per ctx cpu_lock
+
+While converting the get_cpu() to get_cpu_light() I added a cpu lock to
+ensure the same code is not invoked twice on the same CPU. And now I run
+into this:
+
+| kernel BUG at kernel/locking/rtmutex.c:996!
+| invalid opcode: 0000 [#1] PREEMPT SMP
+| CPU0: 13 PID: 75 Comm: kworker/u258:0 Tainted: G I 3.18.7-rt1.5+ #12
+| Workqueue: writeback bdi_writeback_workfn (flush-8:0)
+| task: ffff88023742a620 ti: ffff88023743c000 task.ti: ffff88023743c000
+| RIP: 0010:[<ffffffff81523cc0>] [<ffffffff81523cc0>] rt_spin_lock_slowlock+0x280/0x2d0
+| Call Trace:
+| [<ffffffff815254e7>] rt_spin_lock+0x27/0x60
+taking the same lock again
+|
+| [<ffffffff8127c771>] blk_mq_insert_requests+0x51/0x130
+| [<ffffffff8127d4a9>] blk_mq_flush_plug_list+0x129/0x140
+| [<ffffffff81272461>] blk_flush_plug_list+0xd1/0x250
+| [<ffffffff81522075>] schedule+0x75/0xa0
+| [<ffffffff8152474d>] do_nanosleep+0xdd/0x180
+| [<ffffffff810c8312>] __hrtimer_nanosleep+0xd2/0x1c0
+| [<ffffffff810c8456>] cpu_chill+0x56/0x80
+| [<ffffffff8107c13d>] try_to_grab_pending+0x1bd/0x390
+| [<ffffffff8107c431>] cancel_delayed_work+0x21/0x170
+| [<ffffffff81279a98>] blk_mq_stop_hw_queue+0x18/0x40
+| [<ffffffffa000ac6f>] scsi_queue_rq+0x7f/0x830 [scsi_mod]
+| [<ffffffff8127b0de>] __blk_mq_run_hw_queue+0x1ee/0x360
+| [<ffffffff8127b528>] blk_mq_map_request+0x108/0x190
+take the lock ^^^
+|
+| [<ffffffff8127c8d2>] blk_sq_make_request+0x82/0x350
+| [<ffffffff8126f6c0>] generic_make_request+0xd0/0x120
+| [<ffffffff8126f788>] submit_bio+0x78/0x190
+| [<ffffffff811bd537>] _submit_bh+0x117/0x180
+| [<ffffffff811bf528>] __block_write_full_page.constprop.38+0x138/0x3f0
+| [<ffffffff811bf880>] block_write_full_page+0xa0/0xe0
+| [<ffffffff811c02b3>] blkdev_writepage+0x13/0x20
+| [<ffffffff81127b25>] __writepage+0x15/0x40
+| [<ffffffff8112873b>] write_cache_pages+0x1fb/0x440
+| [<ffffffff811289be>] generic_writepages+0x3e/0x60
+| [<ffffffff8112a17c>] do_writepages+0x1c/0x30
+| [<ffffffff811b3603>] __writeback_single_inode+0x33/0x140
+| [<ffffffff811b462d>] writeback_sb_inodes+0x2bd/0x490
+| [<ffffffff811b4897>] __writeback_inodes_wb+0x97/0xd0
+| [<ffffffff811b4a9b>] wb_writeback+0x1cb/0x210
+| [<ffffffff811b505b>] bdi_writeback_workfn+0x25b/0x380
+| [<ffffffff8107b50b>] process_one_work+0x1bb/0x490
+| [<ffffffff8107c7ab>] worker_thread+0x6b/0x4f0
+| [<ffffffff81081863>] kthread+0xe3/0x100
+| [<ffffffff8152627c>] ret_from_fork+0x7c/0xb0
+
+After looking at this for a while it seems that it is save if blk_mq_ctx is
+used multiple times, the in struct lock protects the access.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ block/blk-mq.c | 4 ----
+ block/blk-mq.h | 8 --------
+ 2 files changed, 12 deletions(-)
+
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -1386,9 +1386,7 @@ static void blk_sq_make_request(struct r
+ if (list_empty(&plug->mq_list))
+ trace_block_plug(q);
+ else if (request_count >= BLK_MAX_REQUEST_COUNT) {
+- spin_unlock(&data.ctx->cpu_lock);
+ blk_flush_plug_list(plug, false);
+- spin_lock(&data.ctx->cpu_lock);
+ trace_block_plug(q);
+ }
+ list_add_tail(&rq->queuelist, &plug->mq_list);
+@@ -1581,7 +1579,6 @@ static int blk_mq_hctx_cpu_offline(struc
+ blk_mq_hctx_clear_pending(hctx, ctx);
+ }
+ spin_unlock(&ctx->lock);
+- __blk_mq_put_ctx(ctx);
+
+ if (list_empty(&tmp))
+ return NOTIFY_OK;
+@@ -1775,7 +1772,6 @@ static void blk_mq_init_cpu_queues(struc
+ memset(__ctx, 0, sizeof(*__ctx));
+ __ctx->cpu = i;
+ spin_lock_init(&__ctx->lock);
+- spin_lock_init(&__ctx->cpu_lock);
+ INIT_LIST_HEAD(&__ctx->rq_list);
+ __ctx->queue = q;
+
+--- a/block/blk-mq.h
++++ b/block/blk-mq.h
+@@ -9,7 +9,6 @@ struct blk_mq_ctx {
+ struct list_head rq_list;
+ } ____cacheline_aligned_in_smp;
+
+- spinlock_t cpu_lock;
+ unsigned int cpu;
+ unsigned int index_hw;
+
+@@ -80,7 +79,6 @@ static inline struct blk_mq_ctx *__blk_m
+ struct blk_mq_ctx *ctx;
+
+ ctx = per_cpu_ptr(q->queue_ctx, cpu);
+- spin_lock(&ctx->cpu_lock);
+ return ctx;
+ }
+
+@@ -95,14 +93,8 @@ static inline struct blk_mq_ctx *blk_mq_
+ return __blk_mq_get_ctx(q, get_cpu_light());
+ }
+
+-static void __blk_mq_put_ctx(struct blk_mq_ctx *ctx)
+-{
+- spin_unlock(&ctx->cpu_lock);
+-}
+-
+ static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
+ {
+- __blk_mq_put_ctx(ctx);
+ put_cpu_light();
+ }
+
diff --git a/patches/block-mq-drop-preempt-disable.patch b/patches/block-mq-drop-preempt-disable.patch
new file mode 100644
index 00000000000000..3f711e18a55569
--- /dev/null
+++ b/patches/block-mq-drop-preempt-disable.patch
@@ -0,0 +1,51 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: block/mq: do not invoke preempt_disable()
+
+preempt_disable() and get_cpu() don't play well together with the sleeping
+locks it tries to allocate later.
+It seems to be enough to replace it with get_cpu_light() and migrate_disable().
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ block/blk-mq.c | 10 +++++-----
+ 1 file changed, 5 insertions(+), 5 deletions(-)
+
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -364,7 +364,7 @@ static void blk_mq_ipi_complete_request(
+ return;
+ }
+
+- cpu = get_cpu();
++ cpu = get_cpu_light();
+ if (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags))
+ shared = cpus_share_cache(cpu, ctx->cpu);
+
+@@ -376,7 +376,7 @@ static void blk_mq_ipi_complete_request(
+ } else {
+ rq->q->softirq_done_fn(rq);
+ }
+- put_cpu();
++ put_cpu_light();
+ }
+
+ void __blk_mq_complete_request(struct request *rq)
+@@ -905,14 +905,14 @@ void blk_mq_run_hw_queue(struct blk_mq_h
+ return;
+
+ if (!async) {
+- int cpu = get_cpu();
++ int cpu = get_cpu_light();
+ if (cpumask_test_cpu(cpu, hctx->cpumask)) {
+ __blk_mq_run_hw_queue(hctx);
+- put_cpu();
++ put_cpu_light();
+ return;
+ }
+
+- put_cpu();
++ put_cpu_light();
+ }
+
+ kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
diff --git a/patches/block-mq-use-cpu_light.patch b/patches/block-mq-use-cpu_light.patch
new file mode 100644
index 00000000000000..ac403d37865ad1
--- /dev/null
+++ b/patches/block-mq-use-cpu_light.patch
@@ -0,0 +1,89 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Wed, 9 Apr 2014 10:37:23 +0200
+Subject: block: mq: use cpu_light()
+
+there is a might sleep splat because get_cpu() disables preemption and
+later we grab a lock. As a workaround for this we use get_cpu_light()
+and an additional lock to prevent taking the same ctx.
+
+There is a lock member in the ctx already but there some functions which do ++
+on the member and this works with irq off but on RT we would need the extra lock.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ block/blk-mq.c | 4 ++++
+ block/blk-mq.h | 17 ++++++++++++++---
+ 2 files changed, 18 insertions(+), 3 deletions(-)
+
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -1366,7 +1366,9 @@ static void blk_sq_make_request(struct r
+ if (list_empty(&plug->mq_list))
+ trace_block_plug(q);
+ else if (request_count >= BLK_MAX_REQUEST_COUNT) {
++ spin_unlock(&data.ctx->cpu_lock);
+ blk_flush_plug_list(plug, false);
++ spin_lock(&data.ctx->cpu_lock);
+ trace_block_plug(q);
+ }
+ list_add_tail(&rq->queuelist, &plug->mq_list);
+@@ -1559,6 +1561,7 @@ static int blk_mq_hctx_cpu_offline(struc
+ blk_mq_hctx_clear_pending(hctx, ctx);
+ }
+ spin_unlock(&ctx->lock);
++ __blk_mq_put_ctx(ctx);
+
+ if (list_empty(&tmp))
+ return NOTIFY_OK;
+@@ -1752,6 +1755,7 @@ static void blk_mq_init_cpu_queues(struc
+ memset(__ctx, 0, sizeof(*__ctx));
+ __ctx->cpu = i;
+ spin_lock_init(&__ctx->lock);
++ spin_lock_init(&__ctx->cpu_lock);
+ INIT_LIST_HEAD(&__ctx->rq_list);
+ __ctx->queue = q;
+
+--- a/block/blk-mq.h
++++ b/block/blk-mq.h
+@@ -9,6 +9,7 @@ struct blk_mq_ctx {
+ struct list_head rq_list;
+ } ____cacheline_aligned_in_smp;
+
++ spinlock_t cpu_lock;
+ unsigned int cpu;
+ unsigned int index_hw;
+
+@@ -76,7 +77,11 @@ struct blk_align_bitmap {
+ static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
+ unsigned int cpu)
+ {
+- return per_cpu_ptr(q->queue_ctx, cpu);
++ struct blk_mq_ctx *ctx;
++
++ ctx = per_cpu_ptr(q->queue_ctx, cpu);
++ spin_lock(&ctx->cpu_lock);
++ return ctx;
+ }
+
+ /*
+@@ -87,12 +92,18 @@ static inline struct blk_mq_ctx *__blk_m
+ */
+ static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
+ {
+- return __blk_mq_get_ctx(q, get_cpu());
++ return __blk_mq_get_ctx(q, get_cpu_light());
++}
++
++static void __blk_mq_put_ctx(struct blk_mq_ctx *ctx)
++{
++ spin_unlock(&ctx->cpu_lock);
+ }
+
+ static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
+ {
+- put_cpu();
++ __blk_mq_put_ctx(ctx);
++ put_cpu_light();
+ }
+
+ struct blk_mq_alloc_data {
diff --git a/patches/block-shorten-interrupt-disabled-regions.patch b/patches/block-shorten-interrupt-disabled-regions.patch
new file mode 100644
index 00000000000000..c849c258d15c2b
--- /dev/null
+++ b/patches/block-shorten-interrupt-disabled-regions.patch
@@ -0,0 +1,96 @@
+Subject: block: Shorten interrupt disabled regions
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 22 Jun 2011 19:47:02 +0200
+
+Moving the blk_sched_flush_plug() call out of the interrupt/preempt
+disabled region in the scheduler allows us to replace
+local_irq_save/restore(flags) by local_irq_disable/enable() in
+blk_flush_plug().
+
+Now instead of doing this we disable interrupts explicitely when we
+lock the request_queue and reenable them when we drop the lock. That
+allows interrupts to be handled when the plug list contains requests
+for more than one queue.
+
+Aside of that this change makes the scope of the irq disabled region
+more obvious. The current code confused the hell out of me when
+looking at:
+
+ local_irq_save(flags);
+ spin_lock(q->queue_lock);
+ ...
+ queue_unplugged(q...);
+ scsi_request_fn();
+ spin_unlock(q->queue_lock);
+ spin_lock(shost->host_lock);
+ spin_unlock_irq(shost->host_lock);
+
+-------------------^^^ ????
+
+ spin_lock_irq(q->queue_lock);
+ spin_unlock(q->lock);
+ local_irq_restore(flags);
+
+Also add a comment to __blk_run_queue() documenting that
+q->request_fn() can drop q->queue_lock and reenable interrupts, but
+must return with q->queue_lock held and interrupts disabled.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Tejun Heo <tj@kernel.org>
+Cc: Jens Axboe <axboe@kernel.dk>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.de
+---
+ block/blk-core.c | 12 ++----------
+ 1 file changed, 2 insertions(+), 10 deletions(-)
+
+--- a/block/blk-core.c
++++ b/block/blk-core.c
+@@ -3077,7 +3077,7 @@ static void queue_unplugged(struct reque
+ blk_run_queue_async(q);
+ else
+ __blk_run_queue(q);
+- spin_unlock(q->queue_lock);
++ spin_unlock_irq(q->queue_lock);
+ }
+
+ static void flush_plug_callbacks(struct blk_plug *plug, bool from_schedule)
+@@ -3125,7 +3125,6 @@ EXPORT_SYMBOL(blk_check_plugged);
+ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
+ {
+ struct request_queue *q;
+- unsigned long flags;
+ struct request *rq;
+ LIST_HEAD(list);
+ unsigned int depth;
+@@ -3145,11 +3144,6 @@ void blk_flush_plug_list(struct blk_plug
+ q = NULL;
+ depth = 0;
+
+- /*
+- * Save and disable interrupts here, to avoid doing it for every
+- * queue lock we have to take.
+- */
+- local_irq_save(flags);
+ while (!list_empty(&list)) {
+ rq = list_entry_rq(list.next);
+ list_del_init(&rq->queuelist);
+@@ -3162,7 +3156,7 @@ void blk_flush_plug_list(struct blk_plug
+ queue_unplugged(q, depth, from_schedule);
+ q = rq->q;
+ depth = 0;
+- spin_lock(q->queue_lock);
++ spin_lock_irq(q->queue_lock);
+ }
+
+ /*
+@@ -3189,8 +3183,6 @@ void blk_flush_plug_list(struct blk_plug
+ */
+ if (q)
+ queue_unplugged(q, depth, from_schedule);
+-
+- local_irq_restore(flags);
+ }
+
+ void blk_finish_plug(struct blk_plug *plug)
diff --git a/patches/block-use-cpu-chill.patch b/patches/block-use-cpu-chill.patch
new file mode 100644
index 00000000000000..47fb1412057e2f
--- /dev/null
+++ b/patches/block-use-cpu-chill.patch
@@ -0,0 +1,45 @@
+Subject: block: Use cpu_chill() for retry loops
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 20 Dec 2012 18:28:26 +0100
+
+Retry loops on RT might loop forever when the modifying side was
+preempted. Steven also observed a live lock when there was a
+concurrent priority boosting going on.
+
+Use cpu_chill() instead of cpu_relax() to let the system
+make progress.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ block/blk-ioc.c | 5 +++--
+ 1 file changed, 3 insertions(+), 2 deletions(-)
+
+--- a/block/blk-ioc.c
++++ b/block/blk-ioc.c
+@@ -7,6 +7,7 @@
+ #include <linux/bio.h>
+ #include <linux/blkdev.h>
+ #include <linux/slab.h>
++#include <linux/delay.h>
+
+ #include "blk.h"
+
+@@ -109,7 +110,7 @@ static void ioc_release_fn(struct work_s
+ spin_unlock(q->queue_lock);
+ } else {
+ spin_unlock_irqrestore(&ioc->lock, flags);
+- cpu_relax();
++ cpu_chill();
+ spin_lock_irqsave_nested(&ioc->lock, flags, 1);
+ }
+ }
+@@ -187,7 +188,7 @@ void put_io_context_active(struct io_con
+ spin_unlock(icq->q->queue_lock);
+ } else {
+ spin_unlock_irqrestore(&ioc->lock, flags);
+- cpu_relax();
++ cpu_chill();
+ goto retry;
+ }
+ }
diff --git a/patches/bug-rt-dependend-variants.patch b/patches/bug-rt-dependend-variants.patch
new file mode 100644
index 00000000000000..85d0be51e82cdb
--- /dev/null
+++ b/patches/bug-rt-dependend-variants.patch
@@ -0,0 +1,36 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:58 -0500
+Subject: bug: BUG_ON/WARN_ON variants dependend on RT/!RT
+
+Introduce RT/NON-RT WARN/BUG statements to avoid ifdefs in the code.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/asm-generic/bug.h | 14 ++++++++++++++
+ 1 file changed, 14 insertions(+)
+
+--- a/include/asm-generic/bug.h
++++ b/include/asm-generic/bug.h
+@@ -206,6 +206,20 @@ extern void warn_slowpath_null(const cha
+ # define WARN_ON_SMP(x) ({0;})
+ #endif
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++# define BUG_ON_RT(c) BUG_ON(c)
++# define BUG_ON_NONRT(c) do { } while (0)
++# define WARN_ON_RT(condition) WARN_ON(condition)
++# define WARN_ON_NONRT(condition) do { } while (0)
++# define WARN_ON_ONCE_NONRT(condition) do { } while (0)
++#else
++# define BUG_ON_RT(c) do { } while (0)
++# define BUG_ON_NONRT(c) BUG_ON(c)
++# define WARN_ON_RT(condition) do { } while (0)
++# define WARN_ON_NONRT(condition) WARN_ON(condition)
++# define WARN_ON_ONCE_NONRT(condition) WARN_ON_ONCE(condition)
++#endif
++
+ #endif /* __ASSEMBLY__ */
+
+ #endif
diff --git a/patches/cgroups-scheduling-while-atomic-in-cgroup-code.patch b/patches/cgroups-scheduling-while-atomic-in-cgroup-code.patch
new file mode 100644
index 00000000000000..eb6fa606628278
--- /dev/null
+++ b/patches/cgroups-scheduling-while-atomic-in-cgroup-code.patch
@@ -0,0 +1,64 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Sat, 21 Jun 2014 10:09:48 +0200
+Subject: memcontrol: Prevent scheduling while atomic in cgroup code
+
+mm, memcg: make refill_stock() use get_cpu_light()
+
+Nikita reported the following memcg scheduling while atomic bug:
+
+Call Trace:
+[e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable)
+[e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0
+[e22d5ae0] [c060b9ec] __schedule+0x530/0x550
+[e22d5bf0] [c060bacc] schedule+0x30/0xbc
+[e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c
+[e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4
+[e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98
+[e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac
+[e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70
+[e22d5d90] [c0117284] __do_fault+0x38c/0x510
+[e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858
+[e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc
+[e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80
+
+What happens:
+
+ refill_stock()
+ get_cpu_var()
+ drain_stock()
+ res_counter_uncharge()
+ res_counter_uncharge_until()
+ spin_lock() <== boom
+
+Fix it by replacing get/put_cpu_var() with get/put_cpu_light().
+
+
+Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru>
+Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ mm/memcontrol.c | 7 +++++--
+ 1 file changed, 5 insertions(+), 2 deletions(-)
+
+--- a/mm/memcontrol.c
++++ b/mm/memcontrol.c
+@@ -2127,14 +2127,17 @@ static void drain_local_stock(struct wor
+ */
+ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+ {
+- struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
++ struct memcg_stock_pcp *stock;
++ int cpu = get_cpu_light();
++
++ stock = &per_cpu(memcg_stock, cpu);
+
+ if (stock->cached != memcg) { /* reset if necessary */
+ drain_stock(stock);
+ stock->cached = memcg;
+ }
+ stock->nr_pages += nr_pages;
+- put_cpu_var(memcg_stock);
++ put_cpu_light();
+ }
+
+ /*
diff --git a/patches/cgroups-use-simple-wait-in-css_release.patch b/patches/cgroups-use-simple-wait-in-css_release.patch
new file mode 100644
index 00000000000000..b4e6bdc8b97fdb
--- /dev/null
+++ b/patches/cgroups-use-simple-wait-in-css_release.patch
@@ -0,0 +1,86 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 13 Feb 2015 15:52:24 +0100
+Subject: cgroups: use simple wait in css_release()
+
+To avoid:
+|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914
+|in_atomic(): 1, irqs_disabled(): 0, pid: 92, name: rcuc/11
+|2 locks held by rcuc/11/92:
+| #0: (rcu_callback){......}, at: [<ffffffff810e037e>] rcu_cpu_kthread+0x3de/0x940
+| #1: (rcu_read_lock_sched){......}, at: [<ffffffff81328390>] percpu_ref_call_confirm_rcu+0x0/0xd0
+|Preemption disabled at:[<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
+|CPU: 11 PID: 92 Comm: rcuc/11 Not tainted 3.18.7-rt0+ #1
+| ffff8802398cdf80 ffff880235f0bc28 ffffffff815b3a12 0000000000000000
+| 0000000000000000 ffff880235f0bc48 ffffffff8109aa16 0000000000000000
+| ffff8802398cdf80 ffff880235f0bc78 ffffffff815b8dd4 000000000000df80
+|Call Trace:
+| [<ffffffff815b3a12>] dump_stack+0x4f/0x7c
+| [<ffffffff8109aa16>] __might_sleep+0x116/0x190
+| [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60
+| [<ffffffff8108d2cd>] queue_work_on+0x6d/0x1d0
+| [<ffffffff8110c881>] css_release+0x81/0x90
+| [<ffffffff8132844e>] percpu_ref_call_confirm_rcu+0xbe/0xd0
+| [<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
+| [<ffffffff810e03e5>] rcu_cpu_kthread+0x445/0x940
+| [<ffffffff81098a2d>] smpboot_thread_fn+0x18d/0x2d0
+| [<ffffffff810948d8>] kthread+0xe8/0x100
+| [<ffffffff815b9c3c>] ret_from_fork+0x7c/0xb0
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/cgroup.h | 2 ++
+ kernel/cgroup.c | 9 +++++----
+ 2 files changed, 7 insertions(+), 4 deletions(-)
+
+--- a/include/linux/cgroup.h
++++ b/include/linux/cgroup.h
+@@ -22,6 +22,7 @@
+ #include <linux/seq_file.h>
+ #include <linux/kernfs.h>
+ #include <linux/wait.h>
++#include <linux/work-simple.h>
+
+ #ifdef CONFIG_CGROUPS
+
+@@ -91,6 +92,7 @@ struct cgroup_subsys_state {
+ /* percpu_ref killing and RCU release */
+ struct rcu_head rcu_head;
+ struct work_struct destroy_work;
++ struct swork_event destroy_swork;
+ };
+
+ /* bits in struct cgroup_subsys_state flags field */
+--- a/kernel/cgroup.c
++++ b/kernel/cgroup.c
+@@ -4423,10 +4423,10 @@ static void css_free_rcu_fn(struct rcu_h
+ queue_work(cgroup_destroy_wq, &css->destroy_work);
+ }
+
+-static void css_release_work_fn(struct work_struct *work)
++static void css_release_work_fn(struct swork_event *sev)
+ {
+ struct cgroup_subsys_state *css =
+- container_of(work, struct cgroup_subsys_state, destroy_work);
++ container_of(sev, struct cgroup_subsys_state, destroy_swork);
+ struct cgroup_subsys *ss = css->ss;
+ struct cgroup *cgrp = css->cgroup;
+
+@@ -4465,8 +4465,8 @@ static void css_release(struct percpu_re
+ struct cgroup_subsys_state *css =
+ container_of(ref, struct cgroup_subsys_state, refcnt);
+
+- INIT_WORK(&css->destroy_work, css_release_work_fn);
+- queue_work(cgroup_destroy_wq, &css->destroy_work);
++ INIT_SWORK(&css->destroy_swork, css_release_work_fn);
++ swork_queue(&css->destroy_swork);
+ }
+
+ static void init_and_link_css(struct cgroup_subsys_state *css,
+@@ -5070,6 +5070,7 @@ static int __init cgroup_wq_init(void)
+ */
+ cgroup_destroy_wq = alloc_workqueue("cgroup_destroy", 0, 1);
+ BUG_ON(!cgroup_destroy_wq);
++ BUG_ON(swork_get());
+
+ /*
+ * Used to destroy pidlists and separate to serve as flush domain.
diff --git a/patches/clocksource-tclib-allow-higher-clockrates.patch b/patches/clocksource-tclib-allow-higher-clockrates.patch
new file mode 100644
index 00000000000000..92c47ba80adb8a
--- /dev/null
+++ b/patches/clocksource-tclib-allow-higher-clockrates.patch
@@ -0,0 +1,160 @@
+From: Benedikt Spranger <b.spranger@linutronix.de>
+Date: Mon, 8 Mar 2010 18:57:04 +0100
+Subject: clocksource: TCLIB: Allow higher clock rates for clock events
+
+As default the TCLIB uses the 32KiHz base clock rate for clock events.
+Add a compile time selection to allow higher clock resulution.
+
+(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)
+
+Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/clocksource/tcb_clksrc.c | 37 ++++++++++++++++++++++---------------
+ drivers/misc/Kconfig | 12 ++++++++++--
+ 2 files changed, 32 insertions(+), 17 deletions(-)
+
+--- a/drivers/clocksource/tcb_clksrc.c
++++ b/drivers/clocksource/tcb_clksrc.c
+@@ -23,8 +23,7 @@
+ * this 32 bit free-running counter. the second channel is not used.
+ *
+ * - The third channel may be used to provide a 16-bit clockevent
+- * source, used in either periodic or oneshot mode. This runs
+- * at 32 KiHZ, and can handle delays of up to two seconds.
++ * source, used in either periodic or oneshot mode.
+ *
+ * A boot clocksource and clockevent source are also currently needed,
+ * unless the relevant platforms (ARM/AT91, AVR32/AT32) are changed so
+@@ -74,6 +73,7 @@ static struct clocksource clksrc = {
+ struct tc_clkevt_device {
+ struct clock_event_device clkevt;
+ struct clk *clk;
++ u32 freq;
+ void __iomem *regs;
+ };
+
+@@ -82,13 +82,6 @@ static struct tc_clkevt_device *to_tc_cl
+ return container_of(clkevt, struct tc_clkevt_device, clkevt);
+ }
+
+-/* For now, we always use the 32K clock ... this optimizes for NO_HZ,
+- * because using one of the divided clocks would usually mean the
+- * tick rate can never be less than several dozen Hz (vs 0.5 Hz).
+- *
+- * A divided clock could be good for high resolution timers, since
+- * 30.5 usec resolution can seem "low".
+- */
+ static u32 timer_clock;
+
+ static void tc_mode(enum clock_event_mode m, struct clock_event_device *d)
+@@ -111,11 +104,12 @@ static void tc_mode(enum clock_event_mod
+ case CLOCK_EVT_MODE_PERIODIC:
+ clk_enable(tcd->clk);
+
+- /* slow clock, count up to RC, then irq and restart */
++ /* count up to RC, then irq and restart */
+ __raw_writel(timer_clock
+ | ATMEL_TC_WAVE | ATMEL_TC_WAVESEL_UP_AUTO,
+ regs + ATMEL_TC_REG(2, CMR));
+- __raw_writel((32768 + HZ/2) / HZ, tcaddr + ATMEL_TC_REG(2, RC));
++ __raw_writel((tcd->freq + HZ / 2) / HZ,
++ tcaddr + ATMEL_TC_REG(2, RC));
+
+ /* Enable clock and interrupts on RC compare */
+ __raw_writel(ATMEL_TC_CPCS, regs + ATMEL_TC_REG(2, IER));
+@@ -128,7 +122,7 @@ static void tc_mode(enum clock_event_mod
+ case CLOCK_EVT_MODE_ONESHOT:
+ clk_enable(tcd->clk);
+
+- /* slow clock, count up to RC, then irq and stop */
++ /* count up to RC, then irq and stop */
+ __raw_writel(timer_clock | ATMEL_TC_CPCSTOP
+ | ATMEL_TC_WAVE | ATMEL_TC_WAVESEL_UP_AUTO,
+ regs + ATMEL_TC_REG(2, CMR));
+@@ -157,8 +151,12 @@ static struct tc_clkevt_device clkevt =
+ .name = "tc_clkevt",
+ .features = CLOCK_EVT_FEAT_PERIODIC
+ | CLOCK_EVT_FEAT_ONESHOT,
++#ifdef CONFIG_ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK
+ /* Should be lower than at91rm9200's system timer */
+ .rating = 125,
++#else
++ .rating = 200,
++#endif
+ .set_next_event = tc_next_event,
+ .set_mode = tc_mode,
+ },
+@@ -178,8 +176,9 @@ static irqreturn_t ch2_irq(int irq, void
+ return IRQ_NONE;
+ }
+
+-static int __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx)
++static int __init setup_clkevents(struct atmel_tc *tc, int divisor_idx)
+ {
++ unsigned divisor = atmel_tc_divisors[divisor_idx];
+ int ret;
+ struct clk *t2_clk = tc->clk[2];
+ int irq = tc->irq[2];
+@@ -193,7 +192,11 @@ static int __init setup_clkevents(struct
+ clkevt.regs = tc->regs;
+ clkevt.clk = t2_clk;
+
+- timer_clock = clk32k_divisor_idx;
++ timer_clock = divisor_idx;
++ if (!divisor)
++ clkevt.freq = 32768;
++ else
++ clkevt.freq = clk_get_rate(t2_clk) / divisor;
+
+ clkevt.clkevt.cpumask = cpumask_of(0);
+
+@@ -203,7 +206,7 @@ static int __init setup_clkevents(struct
+ return ret;
+ }
+
+- clockevents_config_and_register(&clkevt.clkevt, 32768, 1, 0xffff);
++ clockevents_config_and_register(&clkevt.clkevt, clkevt.freq, 1, 0xffff);
+
+ return ret;
+ }
+@@ -340,7 +343,11 @@ static int __init tcb_clksrc_init(void)
+ goto err_disable_t1;
+
+ /* channel 2: periodic and oneshot timer support */
++#ifdef CONFIG_ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK
+ ret = setup_clkevents(tc, clk32k_divisor_idx);
++#else
++ ret = setup_clkevents(tc, best_divisor_idx);
++#endif
+ if (ret)
+ goto err_unregister_clksrc;
+
+--- a/drivers/misc/Kconfig
++++ b/drivers/misc/Kconfig
+@@ -69,8 +69,7 @@ config ATMEL_TCB_CLKSRC
+ are combined to make a single 32-bit timer.
+
+ When GENERIC_CLOCKEVENTS is defined, the third timer channel
+- may be used as a clock event device supporting oneshot mode
+- (delays of up to two seconds) based on the 32 KiHz clock.
++ may be used as a clock event device supporting oneshot mode.
+
+ config ATMEL_TCB_CLKSRC_BLOCK
+ int
+@@ -84,6 +83,15 @@ config ATMEL_TCB_CLKSRC_BLOCK
+ TC can be used for other purposes, such as PWM generation and
+ interval timing.
+
++config ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK
++ bool "TC Block use 32 KiHz clock"
++ depends on ATMEL_TCB_CLKSRC
++ default y
++ help
++ Select this to use 32 KiHz base clock rate as TC block clock
++ source for clock events.
++
++
+ config DUMMY_IRQ
+ tristate "Dummy IRQ handler"
+ default n
diff --git a/patches/completion-use-simple-wait-queues.patch b/patches/completion-use-simple-wait-queues.patch
new file mode 100644
index 00000000000000..1d4da0f360208d
--- /dev/null
+++ b/patches/completion-use-simple-wait-queues.patch
@@ -0,0 +1,224 @@
+Subject: completion: Use simple wait queues
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 11 Jan 2013 11:23:51 +0100
+
+Completions have no long lasting callbacks and therefor do not need
+the complex waitqueue variant. Use simple waitqueues which reduces the
+contention on the waitqueue lock.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/net/wireless/orinoco/orinoco_usb.c | 2 -
+ drivers/usb/gadget/function/f_fs.c | 2 -
+ drivers/usb/gadget/legacy/inode.c | 4 +--
+ include/linux/completion.h | 9 +++-----
+ include/linux/uprobes.h | 1
+ kernel/sched/completion.c | 32 ++++++++++++++---------------
+ kernel/sched/core.c | 10 +++++++--
+ 7 files changed, 33 insertions(+), 27 deletions(-)
+
+--- a/drivers/net/wireless/orinoco/orinoco_usb.c
++++ b/drivers/net/wireless/orinoco/orinoco_usb.c
+@@ -697,7 +697,7 @@ static void ezusb_req_ctx_wait(struct ez
+ while (!ctx->done.done && msecs--)
+ udelay(1000);
+ } else {
+- wait_event_interruptible(ctx->done.wait,
++ swait_event_interruptible(ctx->done.wait,
+ ctx->done.done);
+ }
+ break;
+--- a/drivers/usb/gadget/function/f_fs.c
++++ b/drivers/usb/gadget/function/f_fs.c
+@@ -1403,7 +1403,7 @@ static void ffs_data_put(struct ffs_data
+ pr_info("%s(): freeing\n", __func__);
+ ffs_data_clear(ffs);
+ BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
+- waitqueue_active(&ffs->ep0req_completion.wait));
++ swaitqueue_active(&ffs->ep0req_completion.wait));
+ kfree(ffs->dev_name);
+ kfree(ffs);
+ }
+--- a/drivers/usb/gadget/legacy/inode.c
++++ b/drivers/usb/gadget/legacy/inode.c
+@@ -345,7 +345,7 @@ ep_io (struct ep_data *epdata, void *buf
+ spin_unlock_irq (&epdata->dev->lock);
+
+ if (likely (value == 0)) {
+- value = wait_event_interruptible (done.wait, done.done);
++ value = swait_event_interruptible (done.wait, done.done);
+ if (value != 0) {
+ spin_lock_irq (&epdata->dev->lock);
+ if (likely (epdata->ep != NULL)) {
+@@ -354,7 +354,7 @@ ep_io (struct ep_data *epdata, void *buf
+ usb_ep_dequeue (epdata->ep, epdata->req);
+ spin_unlock_irq (&epdata->dev->lock);
+
+- wait_event (done.wait, done.done);
++ swait_event (done.wait, done.done);
+ if (epdata->status == -ECONNRESET)
+ epdata->status = -EINTR;
+ } else {
+--- a/include/linux/completion.h
++++ b/include/linux/completion.h
+@@ -7,8 +7,7 @@
+ * Atomic wait-for-completion handler data structures.
+ * See kernel/sched/completion.c for details.
+ */
+-
+-#include <linux/wait.h>
++#include <linux/wait-simple.h>
+
+ /*
+ * struct completion - structure used to maintain state for a "completion"
+@@ -24,11 +23,11 @@
+ */
+ struct completion {
+ unsigned int done;
+- wait_queue_head_t wait;
++ struct swait_head wait;
+ };
+
+ #define COMPLETION_INITIALIZER(work) \
+- { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
++ { 0, SWAIT_HEAD_INITIALIZER((work).wait) }
+
+ #define COMPLETION_INITIALIZER_ONSTACK(work) \
+ ({ init_completion(&work); work; })
+@@ -73,7 +72,7 @@ struct completion {
+ static inline void init_completion(struct completion *x)
+ {
+ x->done = 0;
+- init_waitqueue_head(&x->wait);
++ init_swait_head(&x->wait);
+ }
+
+ /**
+--- a/include/linux/uprobes.h
++++ b/include/linux/uprobes.h
+@@ -27,6 +27,7 @@
+ #include <linux/errno.h>
+ #include <linux/rbtree.h>
+ #include <linux/types.h>
++#include <linux/wait.h>
+
+ struct vm_area_struct;
+ struct mm_struct;
+--- a/kernel/sched/completion.c
++++ b/kernel/sched/completion.c
+@@ -30,10 +30,10 @@ void complete(struct completion *x)
+ {
+ unsigned long flags;
+
+- spin_lock_irqsave(&x->wait.lock, flags);
++ raw_spin_lock_irqsave(&x->wait.lock, flags);
+ x->done++;
+- __wake_up_locked(&x->wait, TASK_NORMAL, 1);
+- spin_unlock_irqrestore(&x->wait.lock, flags);
++ __swait_wake_locked(&x->wait, TASK_NORMAL, 1);
++ raw_spin_unlock_irqrestore(&x->wait.lock, flags);
+ }
+ EXPORT_SYMBOL(complete);
+
+@@ -50,10 +50,10 @@ void complete_all(struct completion *x)
+ {
+ unsigned long flags;
+
+- spin_lock_irqsave(&x->wait.lock, flags);
++ raw_spin_lock_irqsave(&x->wait.lock, flags);
+ x->done += UINT_MAX/2;
+- __wake_up_locked(&x->wait, TASK_NORMAL, 0);
+- spin_unlock_irqrestore(&x->wait.lock, flags);
++ __swait_wake_locked(&x->wait, TASK_NORMAL, 0);
++ raw_spin_unlock_irqrestore(&x->wait.lock, flags);
+ }
+ EXPORT_SYMBOL(complete_all);
+
+@@ -62,20 +62,20 @@ do_wait_for_common(struct completion *x,
+ long (*action)(long), long timeout, int state)
+ {
+ if (!x->done) {
+- DECLARE_WAITQUEUE(wait, current);
++ DEFINE_SWAITER(wait);
+
+- __add_wait_queue_tail_exclusive(&x->wait, &wait);
++ swait_prepare_locked(&x->wait, &wait);
+ do {
+ if (signal_pending_state(state, current)) {
+ timeout = -ERESTARTSYS;
+ break;
+ }
+ __set_current_state(state);
+- spin_unlock_irq(&x->wait.lock);
++ raw_spin_unlock_irq(&x->wait.lock);
+ timeout = action(timeout);
+- spin_lock_irq(&x->wait.lock);
++ raw_spin_lock_irq(&x->wait.lock);
+ } while (!x->done && timeout);
+- __remove_wait_queue(&x->wait, &wait);
++ swait_finish_locked(&x->wait, &wait);
+ if (!x->done)
+ return timeout;
+ }
+@@ -89,9 +89,9 @@ static inline long __sched
+ {
+ might_sleep();
+
+- spin_lock_irq(&x->wait.lock);
++ raw_spin_lock_irq(&x->wait.lock);
+ timeout = do_wait_for_common(x, action, timeout, state);
+- spin_unlock_irq(&x->wait.lock);
++ raw_spin_unlock_irq(&x->wait.lock);
+ return timeout;
+ }
+
+@@ -277,12 +277,12 @@ bool try_wait_for_completion(struct comp
+ if (!READ_ONCE(x->done))
+ return 0;
+
+- spin_lock_irqsave(&x->wait.lock, flags);
++ raw_spin_lock_irqsave(&x->wait.lock, flags);
+ if (!x->done)
+ ret = 0;
+ else
+ x->done--;
+- spin_unlock_irqrestore(&x->wait.lock, flags);
++ raw_spin_unlock_irqrestore(&x->wait.lock, flags);
+ return ret;
+ }
+ EXPORT_SYMBOL(try_wait_for_completion);
+@@ -311,7 +311,7 @@ bool completion_done(struct completion *
+ * after it's acquired the lock.
+ */
+ smp_rmb();
+- spin_unlock_wait(&x->wait.lock);
++ raw_spin_unlock_wait(&x->wait.lock);
+ return true;
+ }
+ EXPORT_SYMBOL(completion_done);
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2802,7 +2802,10 @@ void migrate_disable(void)
+ }
+
+ #ifdef CONFIG_SCHED_DEBUG
+- WARN_ON_ONCE(p->migrate_disable_atomic);
++ if (unlikely(p->migrate_disable_atomic)) {
++ tracing_off();
++ WARN_ON_ONCE(1);
++ }
+ #endif
+
+ if (p->migrate_disable) {
+@@ -2832,7 +2835,10 @@ void migrate_enable(void)
+ }
+
+ #ifdef CONFIG_SCHED_DEBUG
+- WARN_ON_ONCE(p->migrate_disable_atomic);
++ if (unlikely(p->migrate_disable_atomic)) {
++ tracing_off();
++ WARN_ON_ONCE(1);
++ }
+ #endif
+ WARN_ON_ONCE(p->migrate_disable <= 0);
+
diff --git a/patches/cond-resched-lock-rt-tweak.patch b/patches/cond-resched-lock-rt-tweak.patch
new file mode 100644
index 00000000000000..0e1c3af51198c3
--- /dev/null
+++ b/patches/cond-resched-lock-rt-tweak.patch
@@ -0,0 +1,23 @@
+Subject: sched: Use the proper LOCK_OFFSET for cond_resched()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 22:51:33 +0200
+
+RT does not increment preempt count when a 'sleeping' spinlock is
+locked. Update PREEMPT_LOCK_OFFSET for that case.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/sched.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -2926,7 +2926,7 @@ extern int _cond_resched(void);
+
+ extern int __cond_resched_lock(spinlock_t *lock);
+
+-#ifdef CONFIG_PREEMPT_COUNT
++#if defined(CONFIG_PREEMPT_COUNT) && !defined(CONFIG_PREEMPT_RT_FULL)
+ #define PREEMPT_LOCK_OFFSET PREEMPT_OFFSET
+ #else
+ #define PREEMPT_LOCK_OFFSET 0
diff --git a/patches/cond-resched-softirq-rt.patch b/patches/cond-resched-softirq-rt.patch
new file mode 100644
index 00000000000000..c3ed89bf094faa
--- /dev/null
+++ b/patches/cond-resched-softirq-rt.patch
@@ -0,0 +1,52 @@
+Subject: sched: Take RT softirq semantics into account in cond_resched()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 14 Jul 2011 09:56:44 +0200
+
+The softirq semantics work different on -RT. There is no SOFTIRQ_MASK in
+the preemption counter which leads to the BUG_ON() statement in
+__cond_resched_softirq(). As for -RT it is enough to perform a "normal"
+schedule.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/sched.h | 4 ++++
+ kernel/sched/core.c | 2 ++
+ 2 files changed, 6 insertions(+)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -2937,12 +2937,16 @@ extern int __cond_resched_lock(spinlock_
+ __cond_resched_lock(lock); \
+ })
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ extern int __cond_resched_softirq(void);
+
+ #define cond_resched_softirq() ({ \
+ ___might_sleep(__FILE__, __LINE__, SOFTIRQ_DISABLE_OFFSET); \
+ __cond_resched_softirq(); \
+ })
++#else
++# define cond_resched_softirq() cond_resched()
++#endif
+
+ static inline void cond_resched_rcu(void)
+ {
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -4479,6 +4479,7 @@ int __cond_resched_lock(spinlock_t *lock
+ }
+ EXPORT_SYMBOL(__cond_resched_lock);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ int __sched __cond_resched_softirq(void)
+ {
+ BUG_ON(!in_softirq());
+@@ -4492,6 +4493,7 @@ int __sched __cond_resched_softirq(void)
+ return 0;
+ }
+ EXPORT_SYMBOL(__cond_resched_softirq);
++#endif
+
+ /**
+ * yield - yield the current processor to other threads.
diff --git a/patches/cpu-hotplug-Document-why-PREEMPT_RT-uses-a-spinlock.patch b/patches/cpu-hotplug-Document-why-PREEMPT_RT-uses-a-spinlock.patch
new file mode 100644
index 00000000000000..5a6e1e5222caff
--- /dev/null
+++ b/patches/cpu-hotplug-Document-why-PREEMPT_RT-uses-a-spinlock.patch
@@ -0,0 +1,55 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Thu, 5 Dec 2013 09:16:52 -0500
+Subject: cpu hotplug: Document why PREEMPT_RT uses a spinlock
+
+The patch:
+
+ cpu: Make hotplug.lock a "sleeping" spinlock on RT
+
+ Tasks can block on hotplug.lock in pin_current_cpu(), but their
+ state might be != RUNNING. So the mutex wakeup will set the state
+ unconditionally to RUNNING. That might cause spurious unexpected
+ wakeups. We could provide a state preserving mutex_lock() function,
+ but this is semantically backwards. So instead we convert the
+ hotplug.lock() to a spinlock for RT, which has the state preserving
+ semantics already.
+
+Fixed a bug where the hotplug lock on PREEMPT_RT can be called after a
+task set its state to TASK_UNINTERRUPTIBLE and before it called
+schedule. If the hotplug_lock used a mutex, and there was contention,
+the current task's state would be turned to TASK_RUNNABLE and the
+schedule call will not sleep. This caused unexpected results.
+
+Although the patch had a description of the change, the code had no
+comments about it. This causes confusion to those that review the code,
+and as PREEMPT_RT is held in a quilt queue and not git, it's not as easy
+to see why a change was made. Even if it was in git, the code should
+still have a comment for something as subtle as this.
+
+Document the rational for using a spinlock on PREEMPT_RT in the hotplug
+lock code.
+
+Reported-by: Nicholas Mc Guire <der.herr@hofr.at>
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/cpu.c | 8 ++++++++
+ 1 file changed, 8 insertions(+)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -109,6 +109,14 @@ struct hotplug_pcp {
+ int grab_lock;
+ struct completion synced;
+ #ifdef CONFIG_PREEMPT_RT_FULL
++ /*
++ * Note, on PREEMPT_RT, the hotplug lock must save the state of
++ * the task, otherwise the mutex will cause the task to fail
++ * to sleep when required. (Because it's called from migrate_disable())
++ *
++ * The spinlock_t on PREEMPT_RT is a mutex that saves the task's
++ * state.
++ */
+ spinlock_t lock;
+ #else
+ struct mutex mutex;
diff --git a/patches/cpu-rt-make-hotplug-lock-a-sleeping-spinlock-on-rt.patch b/patches/cpu-rt-make-hotplug-lock-a-sleeping-spinlock-on-rt.patch
new file mode 100644
index 00000000000000..3ddd623d38dd55
--- /dev/null
+++ b/patches/cpu-rt-make-hotplug-lock-a-sleeping-spinlock-on-rt.patch
@@ -0,0 +1,130 @@
+Subject: cpu: Make hotplug.lock a "sleeping" spinlock on RT
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Fri, 02 Mar 2012 10:36:57 -0500
+
+Tasks can block on hotplug.lock in pin_current_cpu(), but their state
+might be != RUNNING. So the mutex wakeup will set the state
+unconditionally to RUNNING. That might cause spurious unexpected
+wakeups. We could provide a state preserving mutex_lock() function,
+but this is semantically backwards. So instead we convert the
+hotplug.lock() to a spinlock for RT, which has the state preserving
+semantics already.
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Cc: Carsten Emde <C.Emde@osadl.org>
+Cc: John Kacur <jkacur@redhat.com>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Clark Williams <clark.williams@gmail.com>
+
+Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/cpu.c | 38 +++++++++++++++++++++++++++++---------
+ 1 file changed, 29 insertions(+), 9 deletions(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -59,10 +59,16 @@ static int cpu_hotplug_disabled;
+
+ static struct {
+ struct task_struct *active_writer;
++
+ /* wait queue to wake up the active_writer */
+ wait_queue_head_t wq;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ /* Makes the lock keep the task's state */
++ spinlock_t lock;
++#else
+ /* verifies that no writer will get active while readers are active */
+ struct mutex lock;
++#endif
+ /*
+ * Also blocks the new readers during
+ * an ongoing cpu hotplug operation.
+@@ -75,12 +81,26 @@ static struct {
+ } cpu_hotplug = {
+ .active_writer = NULL,
+ .wq = __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.wq),
++#ifdef CONFIG_PREEMPT_RT_FULL
++ .lock = __SPIN_LOCK_UNLOCKED(cpu_hotplug.lock),
++#else
+ .lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
++#endif
+ #ifdef CONFIG_DEBUG_LOCK_ALLOC
+ .dep_map = {.name = "cpu_hotplug.lock" },
+ #endif
+ };
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define hotplug_lock() rt_spin_lock(&cpu_hotplug.lock)
++# define hotplug_trylock() rt_spin_trylock(&cpu_hotplug.lock)
++# define hotplug_unlock() rt_spin_unlock(&cpu_hotplug.lock)
++#else
++# define hotplug_lock() mutex_lock(&cpu_hotplug.lock)
++# define hotplug_trylock() mutex_trylock(&cpu_hotplug.lock)
++# define hotplug_unlock() mutex_unlock(&cpu_hotplug.lock)
++#endif
++
+ /* Lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin/end() */
+ #define cpuhp_lock_acquire_read() lock_map_acquire_read(&cpu_hotplug.dep_map)
+ #define cpuhp_lock_acquire_tryread() \
+@@ -117,8 +137,8 @@ void pin_current_cpu(void)
+ return;
+ }
+ preempt_enable();
+- mutex_lock(&cpu_hotplug.lock);
+- mutex_unlock(&cpu_hotplug.lock);
++ hotplug_lock();
++ hotplug_unlock();
+ preempt_disable();
+ goto retry;
+ }
+@@ -191,9 +211,9 @@ void get_online_cpus(void)
+ if (cpu_hotplug.active_writer == current)
+ return;
+ cpuhp_lock_acquire_read();
+- mutex_lock(&cpu_hotplug.lock);
++ hotplug_lock();
+ atomic_inc(&cpu_hotplug.refcount);
+- mutex_unlock(&cpu_hotplug.lock);
++ hotplug_unlock();
+ }
+ EXPORT_SYMBOL_GPL(get_online_cpus);
+
+@@ -201,11 +221,11 @@ bool try_get_online_cpus(void)
+ {
+ if (cpu_hotplug.active_writer == current)
+ return true;
+- if (!mutex_trylock(&cpu_hotplug.lock))
++ if (!hotplug_trylock())
+ return false;
+ cpuhp_lock_acquire_tryread();
+ atomic_inc(&cpu_hotplug.refcount);
+- mutex_unlock(&cpu_hotplug.lock);
++ hotplug_unlock();
+ return true;
+ }
+ EXPORT_SYMBOL_GPL(try_get_online_cpus);
+@@ -259,11 +279,11 @@ void cpu_hotplug_begin(void)
+ cpuhp_lock_acquire();
+
+ for (;;) {
+- mutex_lock(&cpu_hotplug.lock);
++ hotplug_lock();
+ prepare_to_wait(&cpu_hotplug.wq, &wait, TASK_UNINTERRUPTIBLE);
+ if (likely(!atomic_read(&cpu_hotplug.refcount)))
+ break;
+- mutex_unlock(&cpu_hotplug.lock);
++ hotplug_unlock();
+ schedule();
+ }
+ finish_wait(&cpu_hotplug.wq, &wait);
+@@ -272,7 +292,7 @@ void cpu_hotplug_begin(void)
+ void cpu_hotplug_done(void)
+ {
+ cpu_hotplug.active_writer = NULL;
+- mutex_unlock(&cpu_hotplug.lock);
++ hotplug_unlock();
+ cpuhp_lock_release();
+ }
+
diff --git a/patches/cpu-rt-rework-cpu-down.patch b/patches/cpu-rt-rework-cpu-down.patch
new file mode 100644
index 00000000000000..a86d0ad657e3ea
--- /dev/null
+++ b/patches/cpu-rt-rework-cpu-down.patch
@@ -0,0 +1,561 @@
+From: Steven Rostedt <srostedt@redhat.com>
+Date: Mon, 16 Jul 2012 08:07:43 +0000
+Subject: cpu/rt: Rework cpu down for PREEMPT_RT
+
+Bringing a CPU down is a pain with the PREEMPT_RT kernel because
+tasks can be preempted in many more places than in non-RT. In
+order to handle per_cpu variables, tasks may be pinned to a CPU
+for a while, and even sleep. But these tasks need to be off the CPU
+if that CPU is going down.
+
+Several synchronization methods have been tried, but when stressed
+they failed. This is a new approach.
+
+A sync_tsk thread is still created and tasks may still block on a
+lock when the CPU is going down, but how that works is a bit different.
+When cpu_down() starts, it will create the sync_tsk and wait on it
+to inform that current tasks that are pinned on the CPU are no longer
+pinned. But new tasks that are about to be pinned will still be allowed
+to do so at this time.
+
+Then the notifiers are called. Several notifiers will bring down tasks
+that will enter these locations. Some of these tasks will take locks
+of other tasks that are on the CPU. If we don't let those other tasks
+continue, but make them block until CPU down is done, the tasks that
+the notifiers are waiting on will never complete as they are waiting
+for the locks held by the tasks that are blocked.
+
+Thus we still let the task pin the CPU until the notifiers are done.
+After the notifiers run, we then make new tasks entering the pinned
+CPU sections grab a mutex and wait. This mutex is now a per CPU mutex
+in the hotplug_pcp descriptor.
+
+To help things along, a new function in the scheduler code is created
+called migrate_me(). This function will try to migrate the current task
+off the CPU this is going down if possible. When the sync_tsk is created,
+all tasks will then try to migrate off the CPU going down. There are
+several cases that this wont work, but it helps in most cases.
+
+After the notifiers are called and if a task can't migrate off but enters
+the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex
+until the CPU down is complete. Then the scheduler will force the migration
+anyway.
+
+Also, I found that THREAD_BOUND need to also be accounted for in the
+pinned CPU, and the migrate_disable no longer treats them special.
+This helps fix issues with ksoftirqd and workqueue that unbind on CPU down.
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/sched.h | 7 +
+ kernel/cpu.c | 244 ++++++++++++++++++++++++++++++++++++++++----------
+ kernel/sched/core.c | 82 ++++++++++++++++
+ 3 files changed, 285 insertions(+), 48 deletions(-)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -2217,6 +2217,10 @@ extern void do_set_cpus_allowed(struct t
+
+ extern int set_cpus_allowed_ptr(struct task_struct *p,
+ const struct cpumask *new_mask);
++int migrate_me(void);
++void tell_sched_cpu_down_begin(int cpu);
++void tell_sched_cpu_down_done(int cpu);
++
+ #else
+ static inline void do_set_cpus_allowed(struct task_struct *p,
+ const struct cpumask *new_mask)
+@@ -2229,6 +2233,9 @@ static inline int set_cpus_allowed_ptr(s
+ return -EINVAL;
+ return 0;
+ }
++static inline int migrate_me(void) { return 0; }
++static inline void tell_sched_cpu_down_begin(int cpu) { }
++static inline void tell_sched_cpu_down_done(int cpu) { }
+ #endif
+
+ #ifdef CONFIG_NO_HZ_COMMON
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -59,16 +59,10 @@ static int cpu_hotplug_disabled;
+
+ static struct {
+ struct task_struct *active_writer;
+-
+ /* wait queue to wake up the active_writer */
+ wait_queue_head_t wq;
+-#ifdef CONFIG_PREEMPT_RT_FULL
+- /* Makes the lock keep the task's state */
+- spinlock_t lock;
+-#else
+ /* verifies that no writer will get active while readers are active */
+ struct mutex lock;
+-#endif
+ /*
+ * Also blocks the new readers during
+ * an ongoing cpu hotplug operation.
+@@ -80,27 +74,13 @@ static struct {
+ #endif
+ } cpu_hotplug = {
+ .active_writer = NULL,
+- .wq = __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.wq),
+-#ifdef CONFIG_PREEMPT_RT_FULL
+- .lock = __SPIN_LOCK_UNLOCKED(cpu_hotplug.lock),
+-#else
+ .lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
+-#endif
++ .wq = __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.wq),
+ #ifdef CONFIG_DEBUG_LOCK_ALLOC
+ .dep_map = {.name = "cpu_hotplug.lock" },
+ #endif
+ };
+
+-#ifdef CONFIG_PREEMPT_RT_FULL
+-# define hotplug_lock() rt_spin_lock(&cpu_hotplug.lock)
+-# define hotplug_trylock() rt_spin_trylock(&cpu_hotplug.lock)
+-# define hotplug_unlock() rt_spin_unlock(&cpu_hotplug.lock)
+-#else
+-# define hotplug_lock() mutex_lock(&cpu_hotplug.lock)
+-# define hotplug_trylock() mutex_trylock(&cpu_hotplug.lock)
+-# define hotplug_unlock() mutex_unlock(&cpu_hotplug.lock)
+-#endif
+-
+ /* Lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin/end() */
+ #define cpuhp_lock_acquire_read() lock_map_acquire_read(&cpu_hotplug.dep_map)
+ #define cpuhp_lock_acquire_tryread() \
+@@ -108,12 +88,42 @@ static struct {
+ #define cpuhp_lock_acquire() lock_map_acquire(&cpu_hotplug.dep_map)
+ #define cpuhp_lock_release() lock_map_release(&cpu_hotplug.dep_map)
+
++/**
++ * hotplug_pcp - per cpu hotplug descriptor
++ * @unplug: set when pin_current_cpu() needs to sync tasks
++ * @sync_tsk: the task that waits for tasks to finish pinned sections
++ * @refcount: counter of tasks in pinned sections
++ * @grab_lock: set when the tasks entering pinned sections should wait
++ * @synced: notifier for @sync_tsk to tell cpu_down it's finished
++ * @mutex: the mutex to make tasks wait (used when @grab_lock is true)
++ * @mutex_init: zero if the mutex hasn't been initialized yet.
++ *
++ * Although @unplug and @sync_tsk may point to the same task, the @unplug
++ * is used as a flag and still exists after @sync_tsk has exited and
++ * @sync_tsk set to NULL.
++ */
+ struct hotplug_pcp {
+ struct task_struct *unplug;
++ struct task_struct *sync_tsk;
+ int refcount;
++ int grab_lock;
+ struct completion synced;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ spinlock_t lock;
++#else
++ struct mutex mutex;
++#endif
++ int mutex_init;
+ };
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define hotplug_lock(hp) rt_spin_lock(&(hp)->lock)
++# define hotplug_unlock(hp) rt_spin_unlock(&(hp)->lock)
++#else
++# define hotplug_lock(hp) mutex_lock(&(hp)->mutex)
++# define hotplug_unlock(hp) mutex_unlock(&(hp)->mutex)
++#endif
++
+ static DEFINE_PER_CPU(struct hotplug_pcp, hotplug_pcp);
+
+ /**
+@@ -127,18 +137,39 @@ static DEFINE_PER_CPU(struct hotplug_pcp
+ void pin_current_cpu(void)
+ {
+ struct hotplug_pcp *hp;
++ int force = 0;
+
+ retry:
+ hp = this_cpu_ptr(&hotplug_pcp);
+
+- if (!hp->unplug || hp->refcount || preempt_count() > 1 ||
++ if (!hp->unplug || hp->refcount || force || preempt_count() > 1 ||
+ hp->unplug == current) {
+ hp->refcount++;
+ return;
+ }
+- preempt_enable();
+- hotplug_lock();
+- hotplug_unlock();
++ if (hp->grab_lock) {
++ preempt_enable();
++ hotplug_lock(hp);
++ hotplug_unlock(hp);
++ } else {
++ preempt_enable();
++ /*
++ * Try to push this task off of this CPU.
++ */
++ if (!migrate_me()) {
++ preempt_disable();
++ hp = this_cpu_ptr(&hotplug_pcp);
++ if (!hp->grab_lock) {
++ /*
++ * Just let it continue it's already pinned
++ * or about to sleep.
++ */
++ force = 1;
++ goto retry;
++ }
++ preempt_enable();
++ }
++ }
+ preempt_disable();
+ goto retry;
+ }
+@@ -159,26 +190,84 @@ void unpin_current_cpu(void)
+ wake_up_process(hp->unplug);
+ }
+
+-/*
+- * FIXME: Is this really correct under all circumstances ?
+- */
++static void wait_for_pinned_cpus(struct hotplug_pcp *hp)
++{
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ while (hp->refcount) {
++ schedule_preempt_disabled();
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ }
++}
++
+ static int sync_unplug_thread(void *data)
+ {
+ struct hotplug_pcp *hp = data;
+
+ preempt_disable();
+ hp->unplug = current;
++ wait_for_pinned_cpus(hp);
++
++ /*
++ * This thread will synchronize the cpu_down() with threads
++ * that have pinned the CPU. When the pinned CPU count reaches
++ * zero, we inform the cpu_down code to continue to the next step.
++ */
+ set_current_state(TASK_UNINTERRUPTIBLE);
+- while (hp->refcount) {
+- schedule_preempt_disabled();
++ preempt_enable();
++ complete(&hp->synced);
++
++ /*
++ * If all succeeds, the next step will need tasks to wait till
++ * the CPU is offline before continuing. To do this, the grab_lock
++ * is set and tasks going into pin_current_cpu() will block on the
++ * mutex. But we still need to wait for those that are already in
++ * pinned CPU sections. If the cpu_down() failed, the kthread_should_stop()
++ * will kick this thread out.
++ */
++ while (!hp->grab_lock && !kthread_should_stop()) {
++ schedule();
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ }
++
++ /* Make sure grab_lock is seen before we see a stale completion */
++ smp_mb();
++
++ /*
++ * Now just before cpu_down() enters stop machine, we need to make
++ * sure all tasks that are in pinned CPU sections are out, and new
++ * tasks will now grab the lock, keeping them from entering pinned
++ * CPU sections.
++ */
++ if (!kthread_should_stop()) {
++ preempt_disable();
++ wait_for_pinned_cpus(hp);
++ preempt_enable();
++ complete(&hp->synced);
++ }
++
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ while (!kthread_should_stop()) {
++ schedule();
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ }
+ set_current_state(TASK_RUNNING);
+- preempt_enable();
+- complete(&hp->synced);
++
++ /*
++ * Force this thread off this CPU as it's going down and
++ * we don't want any more work on this CPU.
++ */
++ current->flags &= ~PF_NO_SETAFFINITY;
++ do_set_cpus_allowed(current, cpu_present_mask);
++ migrate_me();
+ return 0;
+ }
+
++static void __cpu_unplug_sync(struct hotplug_pcp *hp)
++{
++ wake_up_process(hp->sync_tsk);
++ wait_for_completion(&hp->synced);
++}
++
+ /*
+ * Start the sync_unplug_thread on the target cpu and wait for it to
+ * complete.
+@@ -186,23 +275,83 @@ static int sync_unplug_thread(void *data
+ static int cpu_unplug_begin(unsigned int cpu)
+ {
+ struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
+- struct task_struct *tsk;
++ int err;
++
++ /* Protected by cpu_hotplug.lock */
++ if (!hp->mutex_init) {
++#ifdef CONFIG_PREEMPT_RT_FULL
++ spin_lock_init(&hp->lock);
++#else
++ mutex_init(&hp->mutex);
++#endif
++ hp->mutex_init = 1;
++ }
++
++ /* Inform the scheduler to migrate tasks off this CPU */
++ tell_sched_cpu_down_begin(cpu);
+
+ init_completion(&hp->synced);
+- tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d", cpu);
+- if (IS_ERR(tsk))
+- return (PTR_ERR(tsk));
+- kthread_bind(tsk, cpu);
+- wake_up_process(tsk);
+- wait_for_completion(&hp->synced);
++
++ hp->sync_tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d", cpu);
++ if (IS_ERR(hp->sync_tsk)) {
++ err = PTR_ERR(hp->sync_tsk);
++ hp->sync_tsk = NULL;
++ return err;
++ }
++ kthread_bind(hp->sync_tsk, cpu);
++
++ /*
++ * Wait for tasks to get out of the pinned sections,
++ * it's still OK if new tasks enter. Some CPU notifiers will
++ * wait for tasks that are going to enter these sections and
++ * we must not have them block.
++ */
++ __cpu_unplug_sync(hp);
++
+ return 0;
+ }
+
++static void cpu_unplug_sync(unsigned int cpu)
++{
++ struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
++
++ init_completion(&hp->synced);
++ /* The completion needs to be initialzied before setting grab_lock */
++ smp_wmb();
++
++ /* Grab the mutex before setting grab_lock */
++ hotplug_lock(hp);
++ hp->grab_lock = 1;
++
++ /*
++ * The CPU notifiers have been completed.
++ * Wait for tasks to get out of pinned CPU sections and have new
++ * tasks block until the CPU is completely down.
++ */
++ __cpu_unplug_sync(hp);
++
++ /* All done with the sync thread */
++ kthread_stop(hp->sync_tsk);
++ hp->sync_tsk = NULL;
++}
++
+ static void cpu_unplug_done(unsigned int cpu)
+ {
+ struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
+
+ hp->unplug = NULL;
++ /* Let all tasks know cpu unplug is finished before cleaning up */
++ smp_wmb();
++
++ if (hp->sync_tsk)
++ kthread_stop(hp->sync_tsk);
++
++ if (hp->grab_lock) {
++ hotplug_unlock(hp);
++ /* protected by cpu_hotplug.lock */
++ hp->grab_lock = 0;
++ }
++ tell_sched_cpu_down_done(cpu);
+ }
+
+ void get_online_cpus(void)
+@@ -211,9 +360,9 @@ void get_online_cpus(void)
+ if (cpu_hotplug.active_writer == current)
+ return;
+ cpuhp_lock_acquire_read();
+- hotplug_lock();
++ mutex_lock(&cpu_hotplug.lock);
+ atomic_inc(&cpu_hotplug.refcount);
+- hotplug_unlock();
++ mutex_unlock(&cpu_hotplug.lock);
+ }
+ EXPORT_SYMBOL_GPL(get_online_cpus);
+
+@@ -221,11 +370,11 @@ bool try_get_online_cpus(void)
+ {
+ if (cpu_hotplug.active_writer == current)
+ return true;
+- if (!hotplug_trylock())
++ if (!mutex_trylock(&cpu_hotplug.lock))
+ return false;
+ cpuhp_lock_acquire_tryread();
+ atomic_inc(&cpu_hotplug.refcount);
+- hotplug_unlock();
++ mutex_unlock(&cpu_hotplug.lock);
+ return true;
+ }
+ EXPORT_SYMBOL_GPL(try_get_online_cpus);
+@@ -279,11 +428,11 @@ void cpu_hotplug_begin(void)
+ cpuhp_lock_acquire();
+
+ for (;;) {
+- hotplug_lock();
++ mutex_lock(&cpu_hotplug.lock);
+ prepare_to_wait(&cpu_hotplug.wq, &wait, TASK_UNINTERRUPTIBLE);
+ if (likely(!atomic_read(&cpu_hotplug.refcount)))
+ break;
+- hotplug_unlock();
++ mutex_unlock(&cpu_hotplug.lock);
+ schedule();
+ }
+ finish_wait(&cpu_hotplug.wq, &wait);
+@@ -292,7 +441,7 @@ void cpu_hotplug_begin(void)
+ void cpu_hotplug_done(void)
+ {
+ cpu_hotplug.active_writer = NULL;
+- hotplug_unlock();
++ mutex_unlock(&cpu_hotplug.lock);
+ cpuhp_lock_release();
+ }
+
+@@ -527,6 +676,9 @@ static int __ref _cpu_down(unsigned int
+
+ smpboot_park_threads(cpu);
+
++ /* Notifiers are done. Don't let any more tasks pin this CPU. */
++ cpu_unplug_sync(cpu);
++
+ /*
+ * So now all preempt/rcu users must observe !cpu_active().
+ */
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2754,7 +2754,7 @@ void migrate_disable(void)
+ {
+ struct task_struct *p = current;
+
+- if (in_atomic() || p->flags & PF_NO_SETAFFINITY) {
++ if (in_atomic()) {
+ #ifdef CONFIG_SCHED_DEBUG
+ p->migrate_disable_atomic++;
+ #endif
+@@ -2787,7 +2787,7 @@ void migrate_enable(void)
+ unsigned long flags;
+ struct rq *rq;
+
+- if (in_atomic() || p->flags & PF_NO_SETAFFINITY) {
++ if (in_atomic()) {
+ #ifdef CONFIG_SCHED_DEBUG
+ p->migrate_disable_atomic--;
+ #endif
+@@ -4960,6 +4960,84 @@ void do_set_cpus_allowed(struct task_str
+ cpumask_copy(&p->cpus_allowed, new_mask);
+ }
+
++static DEFINE_PER_CPU(struct cpumask, sched_cpumasks);
++static DEFINE_MUTEX(sched_down_mutex);
++static cpumask_t sched_down_cpumask;
++
++void tell_sched_cpu_down_begin(int cpu)
++{
++ mutex_lock(&sched_down_mutex);
++ cpumask_set_cpu(cpu, &sched_down_cpumask);
++ mutex_unlock(&sched_down_mutex);
++}
++
++void tell_sched_cpu_down_done(int cpu)
++{
++ mutex_lock(&sched_down_mutex);
++ cpumask_clear_cpu(cpu, &sched_down_cpumask);
++ mutex_unlock(&sched_down_mutex);
++}
++
++/**
++ * migrate_me - try to move the current task off this cpu
++ *
++ * Used by the pin_current_cpu() code to try to get tasks
++ * to move off the current CPU as it is going down.
++ * It will only move the task if the task isn't pinned to
++ * the CPU (with migrate_disable, affinity or NO_SETAFFINITY)
++ * and the task has to be in a RUNNING state. Otherwise the
++ * movement of the task will wake it up (change its state
++ * to running) when the task did not expect it.
++ *
++ * Returns 1 if it succeeded in moving the current task
++ * 0 otherwise.
++ */
++int migrate_me(void)
++{
++ struct task_struct *p = current;
++ struct migration_arg arg;
++ struct cpumask *cpumask;
++ struct cpumask *mask;
++ unsigned long flags;
++ unsigned int dest_cpu;
++ struct rq *rq;
++
++ /*
++ * We can not migrate tasks bounded to a CPU or tasks not
++ * running. The movement of the task will wake it up.
++ */
++ if (p->flags & PF_NO_SETAFFINITY || p->state)
++ return 0;
++
++ mutex_lock(&sched_down_mutex);
++ rq = task_rq_lock(p, &flags);
++
++ cpumask = this_cpu_ptr(&sched_cpumasks);
++ mask = &p->cpus_allowed;
++
++ cpumask_andnot(cpumask, mask, &sched_down_cpumask);
++
++ if (!cpumask_weight(cpumask)) {
++ /* It's only on this CPU? */
++ task_rq_unlock(rq, p, &flags);
++ mutex_unlock(&sched_down_mutex);
++ return 0;
++ }
++
++ dest_cpu = cpumask_any_and(cpu_active_mask, cpumask);
++
++ arg.task = p;
++ arg.dest_cpu = dest_cpu;
++
++ task_rq_unlock(rq, p, &flags);
++
++ stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
++ tlb_migrate_finish(p->mm);
++ mutex_unlock(&sched_down_mutex);
++
++ return 1;
++}
++
+ /*
+ * This is how migration works:
+ *
diff --git a/patches/cpu_chill-Add-a-UNINTERRUPTIBLE-hrtimer_nanosleep.patch b/patches/cpu_chill-Add-a-UNINTERRUPTIBLE-hrtimer_nanosleep.patch
new file mode 100644
index 00000000000000..edf10aca00f4d9
--- /dev/null
+++ b/patches/cpu_chill-Add-a-UNINTERRUPTIBLE-hrtimer_nanosleep.patch
@@ -0,0 +1,106 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Tue, 4 Mar 2014 12:28:32 -0500
+Subject: cpu_chill: Add a UNINTERRUPTIBLE hrtimer_nanosleep
+
+We hit another bug that was caused by switching cpu_chill() from
+msleep() to hrtimer_nanosleep().
+
+This time it is a livelock. The problem is that hrtimer_nanosleep()
+calls schedule with the state == TASK_INTERRUPTIBLE. But these means
+that if a signal is pending, the scheduler wont schedule, and will
+simply change the current task state back to TASK_RUNNING. This
+nullifies the whole point of cpu_chill() in the first place. That is,
+if a task is spinning on a try_lock() and it preempted the owner of the
+lock, if it has a signal pending, it will never give up the CPU to let
+the owner of the lock run.
+
+I made a static function __hrtimer_nanosleep() that takes a fifth
+parameter "state", which determines the task state of that the
+nanosleep() will be in. The normal hrtimer_nanosleep() will act the
+same, but cpu_chill() will call the __hrtimer_nanosleep() directly with
+the TASK_UNINTERRUPTIBLE state.
+
+cpu_chill() only cares that the first sleep happens, and does not care
+about the state of the restart schedule (in hrtimer_nanosleep_restart).
+
+
+Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/time/hrtimer.c | 25 ++++++++++++++++++-------
+ 1 file changed, 18 insertions(+), 7 deletions(-)
+
+--- a/kernel/time/hrtimer.c
++++ b/kernel/time/hrtimer.c
+@@ -1746,12 +1746,13 @@ void hrtimer_init_sleeper(struct hrtimer
+ }
+ EXPORT_SYMBOL_GPL(hrtimer_init_sleeper);
+
+-static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode)
++static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode,
++ unsigned long state)
+ {
+ hrtimer_init_sleeper(t, current);
+
+ do {
+- set_current_state(TASK_INTERRUPTIBLE);
++ set_current_state(state);
+ hrtimer_start_expires(&t->timer, mode);
+ if (!hrtimer_active(&t->timer))
+ t->task = NULL;
+@@ -1795,7 +1796,8 @@ long __sched hrtimer_nanosleep_restart(s
+ HRTIMER_MODE_ABS);
+ hrtimer_set_expires_tv64(&t.timer, restart->nanosleep.expires);
+
+- if (do_nanosleep(&t, HRTIMER_MODE_ABS))
++ /* cpu_chill() does not care about restart state. */
++ if (do_nanosleep(&t, HRTIMER_MODE_ABS, TASK_INTERRUPTIBLE))
+ goto out;
+
+ rmtp = restart->nanosleep.rmtp;
+@@ -1812,8 +1814,10 @@ long __sched hrtimer_nanosleep_restart(s
+ return ret;
+ }
+
+-long hrtimer_nanosleep(struct timespec *rqtp, struct timespec __user *rmtp,
+- const enum hrtimer_mode mode, const clockid_t clockid)
++static long
++__hrtimer_nanosleep(struct timespec *rqtp, struct timespec __user *rmtp,
++ const enum hrtimer_mode mode, const clockid_t clockid,
++ unsigned long state)
+ {
+ struct restart_block *restart;
+ struct hrtimer_sleeper t;
+@@ -1826,7 +1830,7 @@ long hrtimer_nanosleep(struct timespec *
+
+ hrtimer_init_on_stack(&t.timer, clockid, mode);
+ hrtimer_set_expires_range_ns(&t.timer, timespec_to_ktime(*rqtp), slack);
+- if (do_nanosleep(&t, mode))
++ if (do_nanosleep(&t, mode, state))
+ goto out;
+
+ /* Absolute timers do not update the rmtp value and restart: */
+@@ -1853,6 +1857,12 @@ long hrtimer_nanosleep(struct timespec *
+ return ret;
+ }
+
++long hrtimer_nanosleep(struct timespec *rqtp, struct timespec __user *rmtp,
++ const enum hrtimer_mode mode, const clockid_t clockid)
++{
++ return __hrtimer_nanosleep(rqtp, rmtp, mode, clockid, TASK_INTERRUPTIBLE);
++}
++
+ SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp,
+ struct timespec __user *, rmtp)
+ {
+@@ -1879,7 +1889,8 @@ void cpu_chill(void)
+ unsigned int freeze_flag = current->flags & PF_NOFREEZE;
+
+ current->flags |= PF_NOFREEZE;
+- hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
++ __hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC,
++ TASK_UNINTERRUPTIBLE);
+ if (!freeze_flag)
+ current->flags &= ~PF_NOFREEZE;
+ }
diff --git a/patches/cpu_down_move_migrate_enable_back.patch b/patches/cpu_down_move_migrate_enable_back.patch
new file mode 100644
index 00000000000000..2d39eb47ee972b
--- /dev/null
+++ b/patches/cpu_down_move_migrate_enable_back.patch
@@ -0,0 +1,52 @@
+From: Tiejun Chen <tiejun.chen@windriver.com>
+Subject: cpu_down: move migrate_enable() back
+Date: Thu, 7 Nov 2013 10:06:07 +0800
+
+Commit 08c1ab68, "hotplug-use-migrate-disable.patch", intends to
+use migrate_enable()/migrate_disable() to replace that combination
+of preempt_enable() and preempt_disable(), but actually in
+!CONFIG_PREEMPT_RT_FULL case, migrate_enable()/migrate_disable()
+are still equal to preempt_enable()/preempt_disable(). So that
+followed cpu_hotplug_begin()/cpu_unplug_begin(cpu) would go schedule()
+to trigger schedule_debug() like this:
+
+_cpu_down()
+ |
+ + migrate_disable() = preempt_disable()
+ |
+ + cpu_hotplug_begin() or cpu_unplug_begin()
+ |
+ + schedule()
+ |
+ + __schedule()
+ |
+ + preempt_disable();
+ |
+ + __schedule_bug() is true!
+
+So we should move migrate_enable() as the original scheme.
+
+
+Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
+---
+ kernel/cpu.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -668,6 +668,7 @@ static int __ref _cpu_down(unsigned int
+ err = -EBUSY;
+ goto restore_cpus;
+ }
++ migrate_enable();
+
+ cpu_hotplug_begin();
+ err = cpu_unplug_begin(cpu);
+@@ -744,7 +745,6 @@ static int __ref _cpu_down(unsigned int
+ out_release:
+ cpu_unplug_done(cpu);
+ out_cancel:
+- migrate_enable();
+ cpu_hotplug_done();
+ if (!err)
+ cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
diff --git a/patches/cpufreq-drop-K8-s-driver-from-beeing-selected.patch b/patches/cpufreq-drop-K8-s-driver-from-beeing-selected.patch
new file mode 100644
index 00000000000000..e3f833c1032e95
--- /dev/null
+++ b/patches/cpufreq-drop-K8-s-driver-from-beeing-selected.patch
@@ -0,0 +1,32 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 9 Apr 2015 15:23:01 +0200
+Subject: cpufreq: drop K8's driver from beeing selected
+
+Ralf posted a picture of a backtrace from
+
+| powernowk8_target_fn() -> transition_frequency_fidvid() and then at the
+| end:
+| 932 policy = cpufreq_cpu_get(smp_processor_id());
+| 933 cpufreq_cpu_put(policy);
+
+crashing the system on -RT. I assumed that policy was a NULL pointer but
+was rulled out. Since Ralf can't do any more investigations on this and
+I have no machine with this, I simply switch it off.
+
+Reported-by: Ralf Mardorf <ralf.mardorf@alice-dsl.net>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/cpufreq/Kconfig.x86 | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/cpufreq/Kconfig.x86
++++ b/drivers/cpufreq/Kconfig.x86
+@@ -123,7 +123,7 @@ config X86_POWERNOW_K7_ACPI
+
+ config X86_POWERNOW_K8
+ tristate "AMD Opteron/Athlon64 PowerNow!"
+- depends on ACPI && ACPI_PROCESSOR && X86_ACPI_CPUFREQ
++ depends on ACPI && ACPI_PROCESSOR && X86_ACPI_CPUFREQ && !PREEMPT_RT_BASE
+ help
+ This adds the CPUFreq driver for K8/early Opteron/Athlon64 processors.
+ Support for K10 and newer processors is now in acpi-cpufreq.
diff --git a/patches/cpumask-disable-offstack-on-rt.patch b/patches/cpumask-disable-offstack-on-rt.patch
new file mode 100644
index 00000000000000..5fcdbbe512f5f4
--- /dev/null
+++ b/patches/cpumask-disable-offstack-on-rt.patch
@@ -0,0 +1,34 @@
+Subject: cpumask: Disable CONFIG_CPUMASK_OFFSTACK for RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 14 Dec 2011 01:03:49 +0100
+
+We can't deal with the cpumask allocations which happen in atomic
+context (see arch/x86/kernel/apic/io_apic.c) on RT right now.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/x86/Kconfig | 2 +-
+ lib/Kconfig | 1 +
+ 2 files changed, 2 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/Kconfig
++++ b/arch/x86/Kconfig
+@@ -841,7 +841,7 @@ config IOMMU_HELPER
+ config MAXSMP
+ bool "Enable Maximum number of SMP Processors and NUMA Nodes"
+ depends on X86_64 && SMP && DEBUG_KERNEL
+- select CPUMASK_OFFSTACK
++ select CPUMASK_OFFSTACK if !PREEMPT_RT_FULL
+ ---help---
+ Enable maximum number of CPUS and NUMA Nodes for this architecture.
+ If unsure, say N.
+--- a/lib/Kconfig
++++ b/lib/Kconfig
+@@ -391,6 +391,7 @@ config CHECK_SIGNATURE
+
+ config CPUMASK_OFFSTACK
+ bool "Force CPU masks off stack" if DEBUG_PER_CPU_MAPS
++ depends on !PREEMPT_RT_FULL
+ help
+ Use dynamic allocation for cpumask_var_t, instead of putting
+ them on the stack. This is a bit more expensive, but avoids
diff --git a/patches/crypto-Reduce-preempt-disabled-regions-more-algos.patch b/patches/crypto-Reduce-preempt-disabled-regions-more-algos.patch
new file mode 100644
index 00000000000000..52ebb7a3aee1cc
--- /dev/null
+++ b/patches/crypto-Reduce-preempt-disabled-regions-more-algos.patch
@@ -0,0 +1,241 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 21 Feb 2014 17:24:04 +0100
+Subject: crypto: Reduce preempt disabled regions, more algos
+
+Don Estabrook reported
+| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
+| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
+| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
+
+and his backtrace showed some crypto functions which looked fine.
+
+The problem is the following sequence:
+
+glue_xts_crypt_128bit()
+{
+ blkcipher_walk_virt(); /* normal migrate_disable() */
+
+ glue_fpu_begin(); /* get atomic */
+
+ while (nbytes) {
+ __glue_xts_crypt_128bit();
+ blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
+ * while we are atomic */
+ };
+ glue_fpu_end() /* no longer atomic */
+}
+
+and this is why the counter get out of sync and the warning is printed.
+The other problem is that we are non-preemptible between
+glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
+I shorten the FPU off region and ensure blkcipher_walk_done() is called
+with preemption enabled. This might hurt the performance because we now
+enable/disable the FPU state more often but we gain lower latency and
+the bug is gone.
+
+
+Reported-by: Don Estabrook <don.estabrook@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/x86/crypto/cast5_avx_glue.c | 21 +++++++++------------
+ arch/x86/crypto/glue_helper.c | 31 +++++++++++++++----------------
+ 2 files changed, 24 insertions(+), 28 deletions(-)
+
+--- a/arch/x86/crypto/cast5_avx_glue.c
++++ b/arch/x86/crypto/cast5_avx_glue.c
+@@ -60,7 +60,7 @@ static inline void cast5_fpu_end(bool fp
+ static int ecb_crypt(struct blkcipher_desc *desc, struct blkcipher_walk *walk,
+ bool enc)
+ {
+- bool fpu_enabled = false;
++ bool fpu_enabled;
+ struct cast5_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
+ const unsigned int bsize = CAST5_BLOCK_SIZE;
+ unsigned int nbytes;
+@@ -76,7 +76,7 @@ static int ecb_crypt(struct blkcipher_de
+ u8 *wsrc = walk->src.virt.addr;
+ u8 *wdst = walk->dst.virt.addr;
+
+- fpu_enabled = cast5_fpu_begin(fpu_enabled, nbytes);
++ fpu_enabled = cast5_fpu_begin(false, nbytes);
+
+ /* Process multi-block batch */
+ if (nbytes >= bsize * CAST5_PARALLEL_BLOCKS) {
+@@ -104,10 +104,9 @@ static int ecb_crypt(struct blkcipher_de
+ } while (nbytes >= bsize);
+
+ done:
++ cast5_fpu_end(fpu_enabled);
+ err = blkcipher_walk_done(desc, walk, nbytes);
+ }
+-
+- cast5_fpu_end(fpu_enabled);
+ return err;
+ }
+
+@@ -228,7 +227,7 @@ static unsigned int __cbc_decrypt(struct
+ static int cbc_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
+ struct scatterlist *src, unsigned int nbytes)
+ {
+- bool fpu_enabled = false;
++ bool fpu_enabled;
+ struct blkcipher_walk walk;
+ int err;
+
+@@ -237,12 +236,11 @@ static int cbc_decrypt(struct blkcipher_
+ desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+ while ((nbytes = walk.nbytes)) {
+- fpu_enabled = cast5_fpu_begin(fpu_enabled, nbytes);
++ fpu_enabled = cast5_fpu_begin(false, nbytes);
+ nbytes = __cbc_decrypt(desc, &walk);
++ cast5_fpu_end(fpu_enabled);
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+-
+- cast5_fpu_end(fpu_enabled);
+ return err;
+ }
+
+@@ -312,7 +310,7 @@ static unsigned int __ctr_crypt(struct b
+ static int ctr_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
+ struct scatterlist *src, unsigned int nbytes)
+ {
+- bool fpu_enabled = false;
++ bool fpu_enabled;
+ struct blkcipher_walk walk;
+ int err;
+
+@@ -321,13 +319,12 @@ static int ctr_crypt(struct blkcipher_de
+ desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+ while ((nbytes = walk.nbytes) >= CAST5_BLOCK_SIZE) {
+- fpu_enabled = cast5_fpu_begin(fpu_enabled, nbytes);
++ fpu_enabled = cast5_fpu_begin(false, nbytes);
+ nbytes = __ctr_crypt(desc, &walk);
++ cast5_fpu_end(fpu_enabled);
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+
+- cast5_fpu_end(fpu_enabled);
+-
+ if (walk.nbytes) {
+ ctr_crypt_final(desc, &walk);
+ err = blkcipher_walk_done(desc, &walk, 0);
+--- a/arch/x86/crypto/glue_helper.c
++++ b/arch/x86/crypto/glue_helper.c
+@@ -39,7 +39,7 @@ static int __glue_ecb_crypt_128bit(const
+ void *ctx = crypto_blkcipher_ctx(desc->tfm);
+ const unsigned int bsize = 128 / 8;
+ unsigned int nbytes, i, func_bytes;
+- bool fpu_enabled = false;
++ bool fpu_enabled;
+ int err;
+
+ err = blkcipher_walk_virt(desc, walk);
+@@ -49,7 +49,7 @@ static int __glue_ecb_crypt_128bit(const
+ u8 *wdst = walk->dst.virt.addr;
+
+ fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
+- desc, fpu_enabled, nbytes);
++ desc, false, nbytes);
+
+ for (i = 0; i < gctx->num_funcs; i++) {
+ func_bytes = bsize * gctx->funcs[i].num_blocks;
+@@ -71,10 +71,10 @@ static int __glue_ecb_crypt_128bit(const
+ }
+
+ done:
++ glue_fpu_end(fpu_enabled);
+ err = blkcipher_walk_done(desc, walk, nbytes);
+ }
+
+- glue_fpu_end(fpu_enabled);
+ return err;
+ }
+
+@@ -194,7 +194,7 @@ int glue_cbc_decrypt_128bit(const struct
+ struct scatterlist *src, unsigned int nbytes)
+ {
+ const unsigned int bsize = 128 / 8;
+- bool fpu_enabled = false;
++ bool fpu_enabled;
+ struct blkcipher_walk walk;
+ int err;
+
+@@ -203,12 +203,12 @@ int glue_cbc_decrypt_128bit(const struct
+
+ while ((nbytes = walk.nbytes)) {
+ fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
+- desc, fpu_enabled, nbytes);
++ desc, false, nbytes);
+ nbytes = __glue_cbc_decrypt_128bit(gctx, desc, &walk);
++ glue_fpu_end(fpu_enabled);
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+
+- glue_fpu_end(fpu_enabled);
+ return err;
+ }
+ EXPORT_SYMBOL_GPL(glue_cbc_decrypt_128bit);
+@@ -277,7 +277,7 @@ int glue_ctr_crypt_128bit(const struct c
+ struct scatterlist *src, unsigned int nbytes)
+ {
+ const unsigned int bsize = 128 / 8;
+- bool fpu_enabled = false;
++ bool fpu_enabled;
+ struct blkcipher_walk walk;
+ int err;
+
+@@ -286,13 +286,12 @@ int glue_ctr_crypt_128bit(const struct c
+
+ while ((nbytes = walk.nbytes) >= bsize) {
+ fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
+- desc, fpu_enabled, nbytes);
++ desc, false, nbytes);
+ nbytes = __glue_ctr_crypt_128bit(gctx, desc, &walk);
++ glue_fpu_end(fpu_enabled);
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+
+- glue_fpu_end(fpu_enabled);
+-
+ if (walk.nbytes) {
+ glue_ctr_crypt_final_128bit(
+ gctx->funcs[gctx->num_funcs - 1].fn_u.ctr, desc, &walk);
+@@ -347,7 +346,7 @@ int glue_xts_crypt_128bit(const struct c
+ void *tweak_ctx, void *crypt_ctx)
+ {
+ const unsigned int bsize = 128 / 8;
+- bool fpu_enabled = false;
++ bool fpu_enabled;
+ struct blkcipher_walk walk;
+ int err;
+
+@@ -360,21 +359,21 @@ int glue_xts_crypt_128bit(const struct c
+
+ /* set minimum length to bsize, for tweak_fn */
+ fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
+- desc, fpu_enabled,
++ desc, false,
+ nbytes < bsize ? bsize : nbytes);
+-
+ /* calculate first value of T */
+ tweak_fn(tweak_ctx, walk.iv, walk.iv);
++ glue_fpu_end(fpu_enabled);
+
+ while (nbytes) {
++ fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
++ desc, false, nbytes);
+ nbytes = __glue_xts_crypt_128bit(gctx, crypt_ctx, desc, &walk);
+
++ glue_fpu_end(fpu_enabled);
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ nbytes = walk.nbytes;
+ }
+-
+- glue_fpu_end(fpu_enabled);
+-
+ return err;
+ }
+ EXPORT_SYMBOL_GPL(glue_xts_crypt_128bit);
diff --git a/patches/debugobjects-rt.patch b/patches/debugobjects-rt.patch
new file mode 100644
index 00000000000000..5a35a8eb17f096
--- /dev/null
+++ b/patches/debugobjects-rt.patch
@@ -0,0 +1,25 @@
+Subject: debugobjects: Make RT aware
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 21:41:35 +0200
+
+Avoid filling the pool / allocating memory with irqs off().
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ lib/debugobjects.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+--- a/lib/debugobjects.c
++++ b/lib/debugobjects.c
+@@ -309,7 +309,10 @@ static void
+ struct debug_obj *obj;
+ unsigned long flags;
+
+- fill_pool();
++#ifdef CONFIG_PREEMPT_RT_FULL
++ if (preempt_count() == 0 && !irqs_disabled())
++#endif
++ fill_pool();
+
+ db = get_bucket((unsigned long) addr);
+
diff --git a/patches/dm-make-rt-aware.patch b/patches/dm-make-rt-aware.patch
new file mode 100644
index 00000000000000..07095808ba6c0c
--- /dev/null
+++ b/patches/dm-make-rt-aware.patch
@@ -0,0 +1,26 @@
+Subject: dm: Make rt aware
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 14 Nov 2011 23:06:09 +0100
+
+Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has
+interrupts legitimately enabled here as we cant deadlock against the
+irq thread due to the "sleeping spinlocks" conversion.
+
+Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/md/dm.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/md/dm.c
++++ b/drivers/md/dm.c
+@@ -2143,7 +2143,7 @@ static void dm_request_fn(struct request
+ /* Establish tio->ti before queuing work (map_tio_request) */
+ tio->ti = ti;
+ queue_kthread_work(&md->kworker, &tio->work);
+- BUG_ON(!irqs_disabled());
++ BUG_ON_NONRT(!irqs_disabled());
+ }
+
+ goto out;
diff --git a/patches/drivers-net-8139-disable-irq-nosync.patch b/patches/drivers-net-8139-disable-irq-nosync.patch
new file mode 100644
index 00000000000000..fd12422bc784eb
--- /dev/null
+++ b/patches/drivers-net-8139-disable-irq-nosync.patch
@@ -0,0 +1,25 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:24 -0500
+Subject: drivers/net: Use disable_irq_nosync() in 8139too
+
+Use disable_irq_nosync() instead of disable_irq() as this might be
+called in atomic context with netpoll.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/net/ethernet/realtek/8139too.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/net/ethernet/realtek/8139too.c
++++ b/drivers/net/ethernet/realtek/8139too.c
+@@ -2229,7 +2229,7 @@ static void rtl8139_poll_controller(stru
+ struct rtl8139_private *tp = netdev_priv(dev);
+ const int irq = tp->pci_dev->irq;
+
+- disable_irq(irq);
++ disable_irq_nosync(irq);
+ rtl8139_interrupt(irq, dev);
+ enable_irq(irq);
+ }
diff --git a/patches/drivers-net-fix-livelock-issues.patch b/patches/drivers-net-fix-livelock-issues.patch
new file mode 100644
index 00000000000000..21af2b1b04a08b
--- /dev/null
+++ b/patches/drivers-net-fix-livelock-issues.patch
@@ -0,0 +1,126 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sat, 20 Jun 2009 11:36:54 +0200
+Subject: drivers/net: fix livelock issues
+
+Preempt-RT runs into a live lock issue with the NETDEV_TX_LOCKED micro
+optimization. The reason is that the softirq thread is rescheduling
+itself on that return value. Depending on priorities it starts to
+monoplize the CPU and livelock on UP systems.
+
+Remove it.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 6 +-----
+ drivers/net/ethernet/atheros/atl1e/atl1e_main.c | 3 +--
+ drivers/net/ethernet/chelsio/cxgb/sge.c | 3 +--
+ drivers/net/ethernet/neterion/s2io.c | 7 +------
+ drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 6 ++----
+ drivers/net/ethernet/tehuti/tehuti.c | 9 ++-------
+ drivers/net/rionet.c | 6 +-----
+ 7 files changed, 9 insertions(+), 31 deletions(-)
+
+--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
++++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+@@ -2213,11 +2213,7 @@ static netdev_tx_t atl1c_xmit_frame(stru
+ }
+
+ tpd_req = atl1c_cal_tpd_req(skb);
+- if (!spin_trylock_irqsave(&adapter->tx_lock, flags)) {
+- if (netif_msg_pktdata(adapter))
+- dev_info(&adapter->pdev->dev, "tx locked\n");
+- return NETDEV_TX_LOCKED;
+- }
++ spin_lock_irqsave(&adapter->tx_lock, flags);
+
+ if (atl1c_tpd_avail(adapter, type) < tpd_req) {
+ /* no enough descriptor, just stop queue */
+--- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
++++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
+@@ -1880,8 +1880,7 @@ static netdev_tx_t atl1e_xmit_frame(stru
+ return NETDEV_TX_OK;
+ }
+ tpd_req = atl1e_cal_tdp_req(skb);
+- if (!spin_trylock_irqsave(&adapter->tx_lock, flags))
+- return NETDEV_TX_LOCKED;
++ spin_lock_irqsave(&adapter->tx_lock, flags);
+
+ if (atl1e_tpd_avail(adapter) < tpd_req) {
+ /* no enough descriptor, just stop queue */
+--- a/drivers/net/ethernet/chelsio/cxgb/sge.c
++++ b/drivers/net/ethernet/chelsio/cxgb/sge.c
+@@ -1664,8 +1664,7 @@ static int t1_sge_tx(struct sk_buff *skb
+ struct cmdQ *q = &sge->cmdQ[qid];
+ unsigned int credits, pidx, genbit, count, use_sched_skb = 0;
+
+- if (!spin_trylock(&q->lock))
+- return NETDEV_TX_LOCKED;
++ spin_lock(&q->lock);
+
+ reclaim_completed_tx(sge, q);
+
+--- a/drivers/net/ethernet/neterion/s2io.c
++++ b/drivers/net/ethernet/neterion/s2io.c
+@@ -4084,12 +4084,7 @@ static netdev_tx_t s2io_xmit(struct sk_b
+ [skb->priority & (MAX_TX_FIFOS - 1)];
+ fifo = &mac_control->fifos[queue];
+
+- if (do_spin_lock)
+- spin_lock_irqsave(&fifo->tx_lock, flags);
+- else {
+- if (unlikely(!spin_trylock_irqsave(&fifo->tx_lock, flags)))
+- return NETDEV_TX_LOCKED;
+- }
++ spin_lock_irqsave(&fifo->tx_lock, flags);
+
+ if (sp->config.multiq) {
+ if (__netif_subqueue_stopped(dev, fifo->fifo_no)) {
+--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
++++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+@@ -2137,10 +2137,8 @@ static int pch_gbe_xmit_frame(struct sk_
+ struct pch_gbe_tx_ring *tx_ring = adapter->tx_ring;
+ unsigned long flags;
+
+- if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) {
+- /* Collision - tell upper layer to requeue */
+- return NETDEV_TX_LOCKED;
+- }
++ spin_lock_irqsave(&tx_ring->tx_lock, flags);
++
+ if (unlikely(!PCH_GBE_DESC_UNUSED(tx_ring))) {
+ netif_stop_queue(netdev);
+ spin_unlock_irqrestore(&tx_ring->tx_lock, flags);
+--- a/drivers/net/ethernet/tehuti/tehuti.c
++++ b/drivers/net/ethernet/tehuti/tehuti.c
+@@ -1629,13 +1629,8 @@ static netdev_tx_t bdx_tx_transmit(struc
+ unsigned long flags;
+
+ ENTER;
+- local_irq_save(flags);
+- if (!spin_trylock(&priv->tx_lock)) {
+- local_irq_restore(flags);
+- DBG("%s[%s]: TX locked, returning NETDEV_TX_LOCKED\n",
+- BDX_DRV_NAME, ndev->name);
+- return NETDEV_TX_LOCKED;
+- }
++
++ spin_lock_irqsave(&priv->tx_lock, flags);
+
+ /* build tx descriptor */
+ BDX_ASSERT(f->m.wptr >= f->m.memsz); /* started with valid wptr */
+--- a/drivers/net/rionet.c
++++ b/drivers/net/rionet.c
+@@ -174,11 +174,7 @@ static int rionet_start_xmit(struct sk_b
+ unsigned long flags;
+ int add_num = 1;
+
+- local_irq_save(flags);
+- if (!spin_trylock(&rnet->tx_lock)) {
+- local_irq_restore(flags);
+- return NETDEV_TX_LOCKED;
+- }
++ spin_lock_irqsave(&rnet->tx_lock, flags);
+
+ if (is_multicast_ether_addr(eth->h_dest))
+ add_num = nets[rnet->mport->id].nact;
diff --git a/patches/drivers-net-vortex-fix-locking-issues.patch b/patches/drivers-net-vortex-fix-locking-issues.patch
new file mode 100644
index 00000000000000..0c6dde7e2729a3
--- /dev/null
+++ b/patches/drivers-net-vortex-fix-locking-issues.patch
@@ -0,0 +1,48 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Fri, 3 Jul 2009 08:30:00 -0500
+Subject: drivers/net: vortex fix locking issues
+
+Argh, cut and paste wasn't enough...
+
+Use this patch instead. It needs an irq disable. But, believe it or not,
+on SMP this is actually better. If the irq is shared (as it is in Mark's
+case), we don't stop the irq of other devices from being handled on
+another CPU (unfortunately for Mark, he pinned all interrupts to one CPU).
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+ drivers/net/ethernet/3com/3c59x.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+
+--- a/drivers/net/ethernet/3com/3c59x.c
++++ b/drivers/net/ethernet/3com/3c59x.c
+@@ -842,9 +842,9 @@ static void poll_vortex(struct net_devic
+ {
+ struct vortex_private *vp = netdev_priv(dev);
+ unsigned long flags;
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ (vp->full_bus_master_rx ? boomerang_interrupt:vortex_interrupt)(dev->irq,dev);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+ #endif
+
+@@ -1916,12 +1916,12 @@ static void vortex_tx_timeout(struct net
+ * Block interrupts because vortex_interrupt does a bare spin_lock()
+ */
+ unsigned long flags;
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ if (vp->full_bus_master_tx)
+ boomerang_interrupt(dev->irq, dev);
+ else
+ vortex_interrupt(dev->irq, dev);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+ }
+
diff --git a/patches/drivers-random-reduce-preempt-disabled-region.patch b/patches/drivers-random-reduce-preempt-disabled-region.patch
new file mode 100644
index 00000000000000..31078915df1289
--- /dev/null
+++ b/patches/drivers-random-reduce-preempt-disabled-region.patch
@@ -0,0 +1,32 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:30 -0500
+Subject: drivers: random: Reduce preempt disabled region
+
+No need to keep preemption disabled across the whole function.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/char/random.c | 3 ---
+ 1 file changed, 3 deletions(-)
+
+--- a/drivers/char/random.c
++++ b/drivers/char/random.c
+@@ -776,8 +776,6 @@ static void add_timer_randomness(struct
+ } sample;
+ long delta, delta2, delta3;
+
+- preempt_disable();
+-
+ sample.jiffies = jiffies;
+ sample.cycles = random_get_entropy();
+ sample.num = num;
+@@ -818,7 +816,6 @@ static void add_timer_randomness(struct
+ */
+ credit_entropy_bits(r, min_t(int, fls(delta>>1), 11));
+ }
+- preempt_enable();
+ }
+
+ void add_input_randomness(unsigned int type, unsigned int code,
diff --git a/patches/drivers-tty-fix-omap-lock-crap.patch b/patches/drivers-tty-fix-omap-lock-crap.patch
new file mode 100644
index 00000000000000..5c1d59ee9027d8
--- /dev/null
+++ b/patches/drivers-tty-fix-omap-lock-crap.patch
@@ -0,0 +1,42 @@
+Subject: tty/serial/omap: Make the locking RT aware
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 28 Jul 2011 13:32:57 +0200
+
+The lock is a sleeping lock and local_irq_save() is not the
+optimsation we are looking for. Redo it to make it work on -RT and
+non-RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/tty/serial/omap-serial.c | 12 ++++--------
+ 1 file changed, 4 insertions(+), 8 deletions(-)
+
+--- a/drivers/tty/serial/omap-serial.c
++++ b/drivers/tty/serial/omap-serial.c
+@@ -1282,13 +1282,10 @@ serial_omap_console_write(struct console
+
+ pm_runtime_get_sync(up->dev);
+
+- local_irq_save(flags);
+- if (up->port.sysrq)
+- locked = 0;
+- else if (oops_in_progress)
+- locked = spin_trylock(&up->port.lock);
++ if (up->port.sysrq || oops_in_progress)
++ locked = spin_trylock_irqsave(&up->port.lock, flags);
+ else
+- spin_lock(&up->port.lock);
++ spin_lock_irqsave(&up->port.lock, flags);
+
+ /*
+ * First save the IER then disable the interrupts
+@@ -1317,8 +1314,7 @@ serial_omap_console_write(struct console
+ pm_runtime_mark_last_busy(up->dev);
+ pm_runtime_put_autosuspend(up->dev);
+ if (locked)
+- spin_unlock(&up->port.lock);
+- local_irq_restore(flags);
++ spin_unlock_irqrestore(&up->port.lock, flags);
+ }
+
+ static int __init
diff --git a/patches/drivers-tty-pl011-irq-disable-madness.patch b/patches/drivers-tty-pl011-irq-disable-madness.patch
new file mode 100644
index 00000000000000..43404cb43d3187
--- /dev/null
+++ b/patches/drivers-tty-pl011-irq-disable-madness.patch
@@ -0,0 +1,47 @@
+Subject: tty/serial/pl011: Make the locking work on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 08 Jan 2013 21:36:51 +0100
+
+The lock is a sleeping lock and local_irq_save() is not the optimsation
+we are looking for. Redo it to make it work on -RT and non-RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/tty/serial/amba-pl011.c | 15 ++++++++++-----
+ 1 file changed, 10 insertions(+), 5 deletions(-)
+
+--- a/drivers/tty/serial/amba-pl011.c
++++ b/drivers/tty/serial/amba-pl011.c
+@@ -2000,13 +2000,19 @@ pl011_console_write(struct console *co,
+
+ clk_enable(uap->clk);
+
+- local_irq_save(flags);
++ /*
++ * local_irq_save(flags);
++ *
++ * This local_irq_save() is nonsense. If we come in via sysrq
++ * handling then interrupts are already disabled. Aside of
++ * that the port.sysrq check is racy on SMP regardless.
++ */
+ if (uap->port.sysrq)
+ locked = 0;
+ else if (oops_in_progress)
+- locked = spin_trylock(&uap->port.lock);
++ locked = spin_trylock_irqsave(&uap->port.lock, flags);
+ else
+- spin_lock(&uap->port.lock);
++ spin_lock_irqsave(&uap->port.lock, flags);
+
+ /*
+ * First save the CR then disable the interrupts
+@@ -2028,8 +2034,7 @@ pl011_console_write(struct console *co,
+ writew(old_cr, uap->port.membase + UART011_CR);
+
+ if (locked)
+- spin_unlock(&uap->port.lock);
+- local_irq_restore(flags);
++ spin_unlock_irqrestore(&uap->port.lock, flags);
+
+ clk_disable(uap->clk);
+ }
diff --git a/patches/drm-i915-drop-trace_i915_gem_ring_dispatch-onrt.patch b/patches/drm-i915-drop-trace_i915_gem_ring_dispatch-onrt.patch
new file mode 100644
index 00000000000000..fb381dd566fb92
--- /dev/null
+++ b/patches/drm-i915-drop-trace_i915_gem_ring_dispatch-onrt.patch
@@ -0,0 +1,58 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 25 Apr 2013 18:12:52 +0200
+Subject: drm/i915: drop trace_i915_gem_ring_dispatch on rt
+
+This tracepoint is responsible for:
+
+|[<814cc358>] __schedule_bug+0x4d/0x59
+|[<814d24cc>] __schedule+0x88c/0x930
+|[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50
+|[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50
+|[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250
+|[<814d27d9>] schedule+0x29/0x70
+|[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278
+|[<814d3786>] rt_spin_lock+0x26/0x30
+|[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915]
+|[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915]
+|[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915]
+|[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915]
+|[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60
+|[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180
+|[<8101d063>] ? native_sched_clock+0x13/0x80
+|[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915]
+|[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm]
+|[<8101d0d9>] ? sched_clock+0x9/0x10
+|[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915]
+|[<810f1c18>] ? rb_commit+0x68/0xa0
+|[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0
+|[<81197467>] do_vfs_ioctl+0x97/0x540
+|[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130
+|[<811979a1>] sys_ioctl+0x91/0xb0
+|[<814db931>] tracesys+0xe1/0xe6
+
+Chris Wilson does not like to move i915_trace_irq_get() out of the macro
+
+|No. This enables the IRQ, as well as making a number of
+|very expensively serialised read, unconditionally.
+
+so it is gone now on RT.
+
+
+Reported-by: Joakim Hernberg <jbh@alchemy.lu>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
++++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+@@ -1339,7 +1339,9 @@ i915_gem_ringbuffer_submission(struct dr
+ return ret;
+ }
+
++#ifndef CONFIG_PREEMPT_RT_BASE
+ trace_i915_gem_ring_dispatch(intel_ring_get_request(ring), dispatch_flags);
++#endif
+
+ i915_gem_execbuffer_move_to_active(vmas, ring);
+ i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
diff --git a/patches/epoll-use-get-cpu-light.patch b/patches/epoll-use-get-cpu-light.patch
new file mode 100644
index 00000000000000..f75d1701a8ab20
--- /dev/null
+++ b/patches/epoll-use-get-cpu-light.patch
@@ -0,0 +1,30 @@
+Subject: fs/epoll: Do not disable preemption on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 08 Jul 2011 16:35:35 +0200
+
+ep_call_nested() takes a sleeping lock so we can't disable preemption.
+The light version is enough since ep_call_nested() doesn't mind beeing
+invoked twice on the same CPU.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ fs/eventpoll.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/fs/eventpoll.c
++++ b/fs/eventpoll.c
+@@ -505,12 +505,12 @@ static int ep_poll_wakeup_proc(void *pri
+ */
+ static void ep_poll_safewake(wait_queue_head_t *wq)
+ {
+- int this_cpu = get_cpu();
++ int this_cpu = get_cpu_light();
+
+ ep_call_nested(&poll_safewake_ncalls, EP_MAX_NESTS,
+ ep_poll_wakeup_proc, NULL, wq, (void *) (long) this_cpu);
+
+- put_cpu();
++ put_cpu_light();
+ }
+
+ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
diff --git a/patches/fix-rt-int3-x86_32-3.2-rt.patch b/patches/fix-rt-int3-x86_32-3.2-rt.patch
new file mode 100644
index 00000000000000..8f052ac7564aea
--- /dev/null
+++ b/patches/fix-rt-int3-x86_32-3.2-rt.patch
@@ -0,0 +1,101 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: x86: Do not disable preemption in int3 on 32bit
+
+Preemption must be disabled before enabling interrupts in do_trap
+on x86_64 because the stack in use for int3 and debug is a per CPU
+stack set by th IST. But 32bit does not have an IST and the stack
+still belongs to the current task and there is no problem in scheduling
+out the task.
+
+Keep preemption enabled on X86_32 when enabling interrupts for
+do_trap().
+
+The name of the function is changed from preempt_conditional_sti/cli()
+to conditional_sti/cli_ist(), to annotate that this function is used
+when the stack is on the IST.
+
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ arch/x86/kernel/traps.c | 28 +++++++++++++++++++++-------
+ 1 file changed, 21 insertions(+), 7 deletions(-)
+
+--- a/arch/x86/kernel/traps.c
++++ b/arch/x86/kernel/traps.c
+@@ -88,9 +88,21 @@ static inline void conditional_sti(struc
+ local_irq_enable();
+ }
+
+-static inline void preempt_conditional_sti(struct pt_regs *regs)
++static inline void conditional_sti_ist(struct pt_regs *regs)
+ {
++#ifdef CONFIG_X86_64
++ /*
++ * X86_64 uses a per CPU stack on the IST for certain traps
++ * like int3. The task can not be preempted when using one
++ * of these stacks, thus preemption must be disabled, otherwise
++ * the stack can be corrupted if the task is scheduled out,
++ * and another task comes in and uses this stack.
++ *
++ * On x86_32 the task keeps its own stack and it is OK if the
++ * task schedules out.
++ */
+ preempt_count_inc();
++#endif
+ if (regs->flags & X86_EFLAGS_IF)
+ local_irq_enable();
+ }
+@@ -101,11 +113,13 @@ static inline void conditional_cli(struc
+ local_irq_disable();
+ }
+
+-static inline void preempt_conditional_cli(struct pt_regs *regs)
++static inline void conditional_cli_ist(struct pt_regs *regs)
+ {
+ if (regs->flags & X86_EFLAGS_IF)
+ local_irq_disable();
++#ifdef CONFIG_X86_64
+ preempt_count_dec();
++#endif
+ }
+
+ enum ctx_state ist_enter(struct pt_regs *regs)
+@@ -536,9 +550,9 @@ dotraplinkage void notrace do_int3(struc
+ * as we may switch to the interrupt stack.
+ */
+ debug_stack_usage_inc();
+- preempt_conditional_sti(regs);
++ conditional_sti_ist(regs);
+ do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, error_code, NULL);
+- preempt_conditional_cli(regs);
++ conditional_cli_ist(regs);
+ debug_stack_usage_dec();
+ exit:
+ ist_exit(regs, prev_state);
+@@ -668,12 +682,12 @@ dotraplinkage void do_debug(struct pt_re
+ debug_stack_usage_inc();
+
+ /* It's safe to allow irq's after DR6 has been saved */
+- preempt_conditional_sti(regs);
++ conditional_sti_ist(regs);
+
+ if (v8086_mode(regs)) {
+ handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code,
+ X86_TRAP_DB);
+- preempt_conditional_cli(regs);
++ conditional_cli_ist(regs);
+ debug_stack_usage_dec();
+ goto exit;
+ }
+@@ -693,7 +707,7 @@ dotraplinkage void do_debug(struct pt_re
+ si_code = get_si_code(tsk->thread.debugreg6);
+ if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS) || user_icebp)
+ send_sigtrap(tsk, regs, error_code, si_code);
+- preempt_conditional_cli(regs);
++ conditional_cli_ist(regs);
+ debug_stack_usage_dec();
+
+ exit:
diff --git a/patches/fs-aio-simple-simple-work.patch b/patches/fs-aio-simple-simple-work.patch
new file mode 100644
index 00000000000000..b08c65f861482f
--- /dev/null
+++ b/patches/fs-aio-simple-simple-work.patch
@@ -0,0 +1,106 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Mon, 16 Feb 2015 18:49:10 +0100
+Subject: fs/aio: simple simple work
+
+|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:768
+|in_atomic(): 1, irqs_disabled(): 0, pid: 26, name: rcuos/2
+|2 locks held by rcuos/2/26:
+| #0: (rcu_callback){.+.+..}, at: [<ffffffff810b1a12>] rcu_nocb_kthread+0x1e2/0x380
+| #1: (rcu_read_lock_sched){.+.+..}, at: [<ffffffff812acd26>] percpu_ref_kill_rcu+0xa6/0x1c0
+|Preemption disabled at:[<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
+|Call Trace:
+| [<ffffffff81582e9e>] dump_stack+0x4e/0x9c
+| [<ffffffff81077aeb>] __might_sleep+0xfb/0x170
+| [<ffffffff81589304>] rt_spin_lock+0x24/0x70
+| [<ffffffff811c5790>] free_ioctx_users+0x30/0x130
+| [<ffffffff812ace34>] percpu_ref_kill_rcu+0x1b4/0x1c0
+| [<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
+| [<ffffffff8106e046>] kthread+0xd6/0xf0
+| [<ffffffff81591eec>] ret_from_fork+0x7c/0xb0
+
+replace this preempt_disable() friendly swork.
+
+Reported-By: Mike Galbraith <umgwanakikbuti@gmail.com>
+Suggested-by: Benjamin LaHaise <bcrl@kvack.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ fs/aio.c | 24 +++++++++++++++++-------
+ 1 file changed, 17 insertions(+), 7 deletions(-)
+
+--- a/fs/aio.c
++++ b/fs/aio.c
+@@ -40,6 +40,7 @@
+ #include <linux/ramfs.h>
+ #include <linux/percpu-refcount.h>
+ #include <linux/mount.h>
++#include <linux/work-simple.h>
+
+ #include <asm/kmap_types.h>
+ #include <asm/uaccess.h>
+@@ -115,7 +116,7 @@ struct kioctx {
+ struct page **ring_pages;
+ long nr_pages;
+
+- struct work_struct free_work;
++ struct swork_event free_work;
+
+ /*
+ * signals when all in-flight requests are done
+@@ -253,6 +254,7 @@ static int __init aio_setup(void)
+ .mount = aio_mount,
+ .kill_sb = kill_anon_super,
+ };
++ BUG_ON(swork_get());
+ aio_mnt = kern_mount(&aio_fs);
+ if (IS_ERR(aio_mnt))
+ panic("Failed to create aio fs mount.");
+@@ -559,9 +561,9 @@ static int kiocb_cancel(struct aio_kiocb
+ return cancel(&kiocb->common);
+ }
+
+-static void free_ioctx(struct work_struct *work)
++static void free_ioctx(struct swork_event *sev)
+ {
+- struct kioctx *ctx = container_of(work, struct kioctx, free_work);
++ struct kioctx *ctx = container_of(sev, struct kioctx, free_work);
+
+ pr_debug("freeing %p\n", ctx);
+
+@@ -580,8 +582,8 @@ static void free_ioctx_reqs(struct percp
+ if (ctx->rq_wait && atomic_dec_and_test(&ctx->rq_wait->count))
+ complete(&ctx->rq_wait->comp);
+
+- INIT_WORK(&ctx->free_work, free_ioctx);
+- schedule_work(&ctx->free_work);
++ INIT_SWORK(&ctx->free_work, free_ioctx);
++ swork_queue(&ctx->free_work);
+ }
+
+ /*
+@@ -589,9 +591,9 @@ static void free_ioctx_reqs(struct percp
+ * and ctx->users has dropped to 0, so we know no more kiocbs can be submitted -
+ * now it's safe to cancel any that need to be.
+ */
+-static void free_ioctx_users(struct percpu_ref *ref)
++static void free_ioctx_users_work(struct swork_event *sev)
+ {
+- struct kioctx *ctx = container_of(ref, struct kioctx, users);
++ struct kioctx *ctx = container_of(sev, struct kioctx, free_work);
+ struct aio_kiocb *req;
+
+ spin_lock_irq(&ctx->ctx_lock);
+@@ -610,6 +612,14 @@ static void free_ioctx_users(struct perc
+ percpu_ref_put(&ctx->reqs);
+ }
+
++static void free_ioctx_users(struct percpu_ref *ref)
++{
++ struct kioctx *ctx = container_of(ref, struct kioctx, users);
++
++ INIT_SWORK(&ctx->free_work, free_ioctx_users_work);
++ swork_queue(&ctx->free_work);
++}
++
+ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
+ {
+ unsigned i, new_nr;
diff --git a/patches/fs-block-rt-support.patch b/patches/fs-block-rt-support.patch
new file mode 100644
index 00000000000000..8f124097ccff8a
--- /dev/null
+++ b/patches/fs-block-rt-support.patch
@@ -0,0 +1,22 @@
+Subject: block: Turn off warning which is bogus on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 14 Jun 2011 17:05:09 +0200
+
+On -RT the context is always with IRQs enabled. Ignore this warning on -RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ block/blk-core.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/block/blk-core.c
++++ b/block/blk-core.c
+@@ -194,7 +194,7 @@ EXPORT_SYMBOL(blk_delay_queue);
+ **/
+ void blk_start_queue(struct request_queue *q)
+ {
+- WARN_ON(!irqs_disabled());
++ WARN_ON_NONRT(!irqs_disabled());
+
+ queue_flag_clear(QUEUE_FLAG_STOPPED, q);
+ __blk_run_queue(q);
diff --git a/patches/fs-dcache-use-cpu-chill-in-trylock-loops.patch b/patches/fs-dcache-use-cpu-chill-in-trylock-loops.patch
new file mode 100644
index 00000000000000..8100b0eeacf3ff
--- /dev/null
+++ b/patches/fs-dcache-use-cpu-chill-in-trylock-loops.patch
@@ -0,0 +1,85 @@
+Subject: fs: dcache: Use cpu_chill() in trylock loops
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 07 Mar 2012 21:00:34 +0100
+
+Retry loops on RT might loop forever when the modifying side was
+preempted. Use cpu_chill() instead of cpu_relax() to let the system
+make progress.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ fs/autofs4/autofs_i.h | 1 +
+ fs/autofs4/expire.c | 2 +-
+ fs/dcache.c | 5 +++--
+ fs/namespace.c | 3 ++-
+ 4 files changed, 7 insertions(+), 4 deletions(-)
+
+--- a/fs/autofs4/autofs_i.h
++++ b/fs/autofs4/autofs_i.h
+@@ -34,6 +34,7 @@
+ #include <linux/sched.h>
+ #include <linux/mount.h>
+ #include <linux/namei.h>
++#include <linux/delay.h>
+ #include <asm/current.h>
+ #include <asm/uaccess.h>
+
+--- a/fs/autofs4/expire.c
++++ b/fs/autofs4/expire.c
+@@ -150,7 +150,7 @@ static struct dentry *get_next_positive_
+ parent = p->d_parent;
+ if (!spin_trylock(&parent->d_lock)) {
+ spin_unlock(&p->d_lock);
+- cpu_relax();
++ cpu_chill();
+ goto relock;
+ }
+ spin_unlock(&p->d_lock);
+--- a/fs/dcache.c
++++ b/fs/dcache.c
+@@ -19,6 +19,7 @@
+ #include <linux/mm.h>
+ #include <linux/fs.h>
+ #include <linux/fsnotify.h>
++#include <linux/delay.h>
+ #include <linux/slab.h>
+ #include <linux/init.h>
+ #include <linux/hash.h>
+@@ -589,7 +590,7 @@ static struct dentry *dentry_kill(struct
+
+ failed:
+ spin_unlock(&dentry->d_lock);
+- cpu_relax();
++ cpu_chill();
+ return dentry; /* try again with same dentry */
+ }
+
+@@ -2392,7 +2393,7 @@ void d_delete(struct dentry * dentry)
+ if (dentry->d_lockref.count == 1) {
+ if (!spin_trylock(&inode->i_lock)) {
+ spin_unlock(&dentry->d_lock);
+- cpu_relax();
++ cpu_chill();
+ goto again;
+ }
+ dentry->d_flags &= ~DCACHE_CANT_MOUNT;
+--- a/fs/namespace.c
++++ b/fs/namespace.c
+@@ -14,6 +14,7 @@
+ #include <linux/mnt_namespace.h>
+ #include <linux/user_namespace.h>
+ #include <linux/namei.h>
++#include <linux/delay.h>
+ #include <linux/security.h>
+ #include <linux/idr.h>
+ #include <linux/init.h> /* init_rootfs */
+@@ -355,7 +356,7 @@ int __mnt_want_write(struct vfsmount *m)
+ smp_mb();
+ while (ACCESS_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
+ preempt_enable();
+- cpu_relax();
++ cpu_chill();
+ preempt_disable();
+ }
+ /*
diff --git a/patches/fs-jbd-pull-plug-when-waiting-for-space.patch b/patches/fs-jbd-pull-plug-when-waiting-for-space.patch
new file mode 100644
index 00000000000000..0dd4c95d3ec308
--- /dev/null
+++ b/patches/fs-jbd-pull-plug-when-waiting-for-space.patch
@@ -0,0 +1,29 @@
+From: Mike Galbraith <mgalbraith@suse.de>
+Date: Wed, 11 Jul 2012 22:05:20 +0000
+Subject: fs, jbd: pull your plug when waiting for space
+
+With an -rt kernel, and a heavy sync IO load, tasks can jam
+up on journal locks without unplugging, which can lead to
+terminal IO starvation. Unplug and schedule when waiting for space.
+
+Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Cc: Theodore Tso <tytso@mit.edu>
+Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ fs/jbd/checkpoint.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/fs/jbd/checkpoint.c
++++ b/fs/jbd/checkpoint.c
+@@ -129,6 +129,8 @@ void __log_wait_for_space(journal_t *jou
+ if (journal->j_flags & JFS_ABORT)
+ return;
+ spin_unlock(&journal->j_state_lock);
++ if (current->plug)
++ io_schedule();
+ mutex_lock(&journal->j_checkpoint_mutex);
+
+ /*
diff --git a/patches/fs-jbd-replace-bh_state-lock.patch b/patches/fs-jbd-replace-bh_state-lock.patch
new file mode 100644
index 00000000000000..bd240b1081a8e4
--- /dev/null
+++ b/patches/fs-jbd-replace-bh_state-lock.patch
@@ -0,0 +1,100 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 18 Mar 2011 10:11:25 +0100
+Subject: fs: jbd/jbd2: Make state lock and journal head lock rt safe
+
+bit_spin_locks break under RT.
+
+Based on a previous patch from Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+--
+
+ include/linux/buffer_head.h | 10 ++++++++++
+ include/linux/jbd_common.h | 24 ++++++++++++++++++++++++
+ 2 files changed, 34 insertions(+)
+
+--- a/include/linux/buffer_head.h
++++ b/include/linux/buffer_head.h
+@@ -77,6 +77,11 @@ struct buffer_head {
+ atomic_t b_count; /* users using this buffer_head */
+ #ifdef CONFIG_PREEMPT_RT_BASE
+ spinlock_t b_uptodate_lock;
++#if defined(CONFIG_JBD) || defined(CONFIG_JBD_MODULE) || \
++ defined(CONFIG_JBD2) || defined(CONFIG_JBD2_MODULE)
++ spinlock_t b_state_lock;
++ spinlock_t b_journal_head_lock;
++#endif
+ #endif
+ };
+
+@@ -108,6 +113,11 @@ static inline void buffer_head_init_lock
+ {
+ #ifdef CONFIG_PREEMPT_RT_BASE
+ spin_lock_init(&bh->b_uptodate_lock);
++#if defined(CONFIG_JBD) || defined(CONFIG_JBD_MODULE) || \
++ defined(CONFIG_JBD2) || defined(CONFIG_JBD2_MODULE)
++ spin_lock_init(&bh->b_state_lock);
++ spin_lock_init(&bh->b_journal_head_lock);
++#endif
+ #endif
+ }
+
+--- a/include/linux/jbd_common.h
++++ b/include/linux/jbd_common.h
+@@ -15,32 +15,56 @@ static inline struct journal_head *bh2jh
+
+ static inline void jbd_lock_bh_state(struct buffer_head *bh)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ bit_spin_lock(BH_State, &bh->b_state);
++#else
++ spin_lock(&bh->b_state_lock);
++#endif
+ }
+
+ static inline int jbd_trylock_bh_state(struct buffer_head *bh)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ return bit_spin_trylock(BH_State, &bh->b_state);
++#else
++ return spin_trylock(&bh->b_state_lock);
++#endif
+ }
+
+ static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ return bit_spin_is_locked(BH_State, &bh->b_state);
++#else
++ return spin_is_locked(&bh->b_state_lock);
++#endif
+ }
+
+ static inline void jbd_unlock_bh_state(struct buffer_head *bh)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ bit_spin_unlock(BH_State, &bh->b_state);
++#else
++ spin_unlock(&bh->b_state_lock);
++#endif
+ }
+
+ static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ bit_spin_lock(BH_JournalHead, &bh->b_state);
++#else
++ spin_lock(&bh->b_journal_head_lock);
++#endif
+ }
+
+ static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ bit_spin_unlock(BH_JournalHead, &bh->b_state);
++#else
++ spin_unlock(&bh->b_journal_head_lock);
++#endif
+ }
+
+ #endif
diff --git a/patches/fs-jbd2-pull-your-plug-when-waiting-for-space.patch b/patches/fs-jbd2-pull-your-plug-when-waiting-for-space.patch
new file mode 100644
index 00000000000000..600ade5ddf9aeb
--- /dev/null
+++ b/patches/fs-jbd2-pull-your-plug-when-waiting-for-space.patch
@@ -0,0 +1,31 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Mon, 17 Feb 2014 17:30:03 +0100
+Subject: fs: jbd2: pull your plug when waiting for space
+
+Two cps in parallel managed to stall the the ext4 fs. It seems that
+journal code is either waiting for locks or sleeping waiting for
+something to happen. This seems similar to what Mike observed on ext3,
+here is his description:
+
+|With an -rt kernel, and a heavy sync IO load, tasks can jam
+|up on journal locks without unplugging, which can lead to
+|terminal IO starvation. Unplug and schedule when waiting
+|for space.
+
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ fs/jbd2/checkpoint.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/fs/jbd2/checkpoint.c
++++ b/fs/jbd2/checkpoint.c
+@@ -116,6 +116,8 @@ void __jbd2_log_wait_for_space(journal_t
+ nblocks = jbd2_space_needed(journal);
+ while (jbd2_log_space_left(journal) < nblocks) {
+ write_unlock(&journal->j_state_lock);
++ if (current->plug)
++ io_schedule();
+ mutex_lock(&journal->j_checkpoint_mutex);
+
+ /*
diff --git a/patches/fs-namespace-preemption-fix.patch b/patches/fs-namespace-preemption-fix.patch
new file mode 100644
index 00000000000000..40511193367228
--- /dev/null
+++ b/patches/fs-namespace-preemption-fix.patch
@@ -0,0 +1,30 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 19 Jul 2009 08:44:27 -0500
+Subject: fs: namespace preemption fix
+
+On RT we cannot loop with preemption disabled here as
+mnt_make_readonly() might have been preempted. We can safely enable
+preemption while waiting for MNT_WRITE_HOLD to be cleared. Safe on !RT
+as well.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ fs/namespace.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+--- a/fs/namespace.c
++++ b/fs/namespace.c
+@@ -353,8 +353,11 @@ int __mnt_want_write(struct vfsmount *m)
+ * incremented count after it has set MNT_WRITE_HOLD.
+ */
+ smp_mb();
+- while (ACCESS_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD)
++ while (ACCESS_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
++ preempt_enable();
+ cpu_relax();
++ preempt_disable();
++ }
+ /*
+ * After the slowpath clears MNT_WRITE_HOLD, mnt_is_readonly will
+ * be set to match its requirements. So we must not load that until
diff --git a/patches/fs-ntfs-disable-interrupt-non-rt.patch b/patches/fs-ntfs-disable-interrupt-non-rt.patch
new file mode 100644
index 00000000000000..49203174e40860
--- /dev/null
+++ b/patches/fs-ntfs-disable-interrupt-non-rt.patch
@@ -0,0 +1,59 @@
+From: Mike Galbraith <efault@gmx.de>
+Date: Fri, 3 Jul 2009 08:44:12 -0500
+Subject: fs: ntfs: disable interrupt only on !RT
+
+On Sat, 2007-10-27 at 11:44 +0200, Ingo Molnar wrote:
+> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
+>
+> > > [10138.175796] [<c0105de3>] show_trace+0x12/0x14
+> > > [10138.180291] [<c0105dfb>] dump_stack+0x16/0x18
+> > > [10138.184769] [<c011609f>] native_smp_call_function_mask+0x138/0x13d
+> > > [10138.191117] [<c0117606>] smp_call_function+0x1e/0x24
+> > > [10138.196210] [<c012f85c>] on_each_cpu+0x25/0x50
+> > > [10138.200807] [<c0115c74>] flush_tlb_all+0x1e/0x20
+> > > [10138.205553] [<c016caaf>] kmap_high+0x1b6/0x417
+> > > [10138.210118] [<c011ec88>] kmap+0x4d/0x4f
+> > > [10138.214102] [<c026a9d8>] ntfs_end_buffer_async_read+0x228/0x2f9
+> > > [10138.220163] [<c01a0e9e>] end_bio_bh_io_sync+0x26/0x3f
+> > > [10138.225352] [<c01a2b09>] bio_endio+0x42/0x6d
+> > > [10138.229769] [<c02c2a08>] __end_that_request_first+0x115/0x4ac
+> > > [10138.235682] [<c02c2da7>] end_that_request_chunk+0x8/0xa
+> > > [10138.241052] [<c0365943>] ide_end_request+0x55/0x10a
+> > > [10138.246058] [<c036dae3>] ide_dma_intr+0x6f/0xac
+> > > [10138.250727] [<c0366d83>] ide_intr+0x93/0x1e0
+> > > [10138.255125] [<c015afb4>] handle_IRQ_event+0x5c/0xc9
+> >
+> > Looks like ntfs is kmap()ing from interrupt context. Should be using
+> > kmap_atomic instead, I think.
+>
+> it's not atomic interrupt context but irq thread context - and -rt
+> remaps kmap_atomic() to kmap() internally.
+
+Hm. Looking at the change to mm/bounce.c, perhaps I should do this
+instead?
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ fs/ntfs/aops.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/fs/ntfs/aops.c
++++ b/fs/ntfs/aops.c
+@@ -143,13 +143,13 @@ static void ntfs_end_buffer_async_read(s
+ recs = PAGE_CACHE_SIZE / rec_size;
+ /* Should have been verified before we got here... */
+ BUG_ON(!recs);
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ kaddr = kmap_atomic(page);
+ for (i = 0; i < recs; i++)
+ post_read_mst_fixup((NTFS_RECORD*)(kaddr +
+ i * rec_size), rec_size);
+ kunmap_atomic(kaddr);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ flush_dcache_page(page);
+ if (likely(page_uptodate && !PageError(page)))
+ SetPageUptodate(page);
diff --git a/patches/fs-replace-bh_uptodate_lock-for-rt.patch b/patches/fs-replace-bh_uptodate_lock-for-rt.patch
new file mode 100644
index 00000000000000..aff9e498fbde07
--- /dev/null
+++ b/patches/fs-replace-bh_uptodate_lock-for-rt.patch
@@ -0,0 +1,161 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 18 Mar 2011 09:18:52 +0100
+Subject: buffer_head: Replace bh_uptodate_lock for -rt
+
+Wrap the bit_spin_lock calls into a separate inline and add the RT
+replacements with a real spinlock.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ fs/buffer.c | 21 +++++++--------------
+ fs/ntfs/aops.c | 10 +++-------
+ include/linux/buffer_head.h | 34 ++++++++++++++++++++++++++++++++++
+ 3 files changed, 44 insertions(+), 21 deletions(-)
+
+--- a/fs/buffer.c
++++ b/fs/buffer.c
+@@ -301,8 +301,7 @@ static void end_buffer_async_read(struct
+ * decide that the page is now completely done.
+ */
+ first = page_buffers(page);
+- local_irq_save(flags);
+- bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
++ flags = bh_uptodate_lock_irqsave(first);
+ clear_buffer_async_read(bh);
+ unlock_buffer(bh);
+ tmp = bh;
+@@ -315,8 +314,7 @@ static void end_buffer_async_read(struct
+ }
+ tmp = tmp->b_this_page;
+ } while (tmp != bh);
+- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
+- local_irq_restore(flags);
++ bh_uptodate_unlock_irqrestore(first, flags);
+
+ /*
+ * If none of the buffers had errors and they are all
+@@ -328,9 +326,7 @@ static void end_buffer_async_read(struct
+ return;
+
+ still_busy:
+- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
+- local_irq_restore(flags);
+- return;
++ bh_uptodate_unlock_irqrestore(first, flags);
+ }
+
+ /*
+@@ -358,8 +354,7 @@ void end_buffer_async_write(struct buffe
+ }
+
+ first = page_buffers(page);
+- local_irq_save(flags);
+- bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
++ flags = bh_uptodate_lock_irqsave(first);
+
+ clear_buffer_async_write(bh);
+ unlock_buffer(bh);
+@@ -371,15 +366,12 @@ void end_buffer_async_write(struct buffe
+ }
+ tmp = tmp->b_this_page;
+ }
+- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
+- local_irq_restore(flags);
++ bh_uptodate_unlock_irqrestore(first, flags);
+ end_page_writeback(page);
+ return;
+
+ still_busy:
+- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
+- local_irq_restore(flags);
+- return;
++ bh_uptodate_unlock_irqrestore(first, flags);
+ }
+ EXPORT_SYMBOL(end_buffer_async_write);
+
+@@ -3325,6 +3317,7 @@ struct buffer_head *alloc_buffer_head(gf
+ struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags);
+ if (ret) {
+ INIT_LIST_HEAD(&ret->b_assoc_buffers);
++ buffer_head_init_locks(ret);
+ preempt_disable();
+ __this_cpu_inc(bh_accounting.nr);
+ recalc_bh_state();
+--- a/fs/ntfs/aops.c
++++ b/fs/ntfs/aops.c
+@@ -107,8 +107,7 @@ static void ntfs_end_buffer_async_read(s
+ "0x%llx.", (unsigned long long)bh->b_blocknr);
+ }
+ first = page_buffers(page);
+- local_irq_save(flags);
+- bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
++ flags = bh_uptodate_lock_irqsave(first);
+ clear_buffer_async_read(bh);
+ unlock_buffer(bh);
+ tmp = bh;
+@@ -123,8 +122,7 @@ static void ntfs_end_buffer_async_read(s
+ }
+ tmp = tmp->b_this_page;
+ } while (tmp != bh);
+- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
+- local_irq_restore(flags);
++ bh_uptodate_unlock_irqrestore(first, flags);
+ /*
+ * If none of the buffers had errors then we can set the page uptodate,
+ * but we first have to perform the post read mst fixups, if the
+@@ -159,9 +157,7 @@ static void ntfs_end_buffer_async_read(s
+ unlock_page(page);
+ return;
+ still_busy:
+- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
+- local_irq_restore(flags);
+- return;
++ bh_uptodate_unlock_irqrestore(first, flags);
+ }
+
+ /**
+--- a/include/linux/buffer_head.h
++++ b/include/linux/buffer_head.h
+@@ -75,8 +75,42 @@ struct buffer_head {
+ struct address_space *b_assoc_map; /* mapping this buffer is
+ associated with */
+ atomic_t b_count; /* users using this buffer_head */
++#ifdef CONFIG_PREEMPT_RT_BASE
++ spinlock_t b_uptodate_lock;
++#endif
+ };
+
++static inline unsigned long bh_uptodate_lock_irqsave(struct buffer_head *bh)
++{
++ unsigned long flags;
++
++#ifndef CONFIG_PREEMPT_RT_BASE
++ local_irq_save(flags);
++ bit_spin_lock(BH_Uptodate_Lock, &bh->b_state);
++#else
++ spin_lock_irqsave(&bh->b_uptodate_lock, flags);
++#endif
++ return flags;
++}
++
++static inline void
++bh_uptodate_unlock_irqrestore(struct buffer_head *bh, unsigned long flags)
++{
++#ifndef CONFIG_PREEMPT_RT_BASE
++ bit_spin_unlock(BH_Uptodate_Lock, &bh->b_state);
++ local_irq_restore(flags);
++#else
++ spin_unlock_irqrestore(&bh->b_uptodate_lock, flags);
++#endif
++}
++
++static inline void buffer_head_init_locks(struct buffer_head *bh)
++{
++#ifdef CONFIG_PREEMPT_RT_BASE
++ spin_lock_init(&bh->b_uptodate_lock);
++#endif
++}
++
+ /*
+ * macro tricks to expand the set_buffer_foo(), clear_buffer_foo()
+ * and buffer_foo() functions.
diff --git a/patches/ftrace-migrate-disable-tracing.patch b/patches/ftrace-migrate-disable-tracing.patch
new file mode 100644
index 00000000000000..4a4412259955fc
--- /dev/null
+++ b/patches/ftrace-migrate-disable-tracing.patch
@@ -0,0 +1,73 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 21:56:42 +0200
+Subject: trace: Add migrate-disabled counter to tracing output
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/ftrace_event.h | 2 ++
+ kernel/trace/trace.c | 9 ++++++---
+ kernel/trace/trace_events.c | 2 ++
+ kernel/trace/trace_output.c | 5 +++++
+ 4 files changed, 15 insertions(+), 3 deletions(-)
+
+--- a/include/linux/ftrace_event.h
++++ b/include/linux/ftrace_event.h
+@@ -66,6 +66,8 @@ struct trace_entry {
+ unsigned char flags;
+ unsigned char preempt_count;
+ int pid;
++ unsigned short migrate_disable;
++ unsigned short padding;
+ };
+
+ #define FTRACE_MAX_EVENT \
+--- a/kernel/trace/trace.c
++++ b/kernel/trace/trace.c
+@@ -1641,6 +1641,8 @@ tracing_generic_entry_update(struct trac
+ ((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
+ (tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
+ (test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
++
++ entry->migrate_disable = (tsk) ? __migrate_disabled(tsk) & 0xFF : 0;
+ }
+ EXPORT_SYMBOL_GPL(tracing_generic_entry_update);
+
+@@ -2563,9 +2565,10 @@ static void print_lat_help_header(struct
+ "# | / _----=> need-resched \n"
+ "# || / _---=> hardirq/softirq \n"
+ "# ||| / _--=> preempt-depth \n"
+- "# |||| / delay \n"
+- "# cmd pid ||||| time | caller \n"
+- "# \\ / ||||| \\ | / \n");
++ "# |||| / _--=> migrate-disable\n"
++ "# ||||| / delay \n"
++ "# cmd pid |||||| time | caller \n"
++ "# \\ / ||||| \\ | / \n");
+ }
+
+ static void print_event_info(struct trace_buffer *buf, struct seq_file *m)
+--- a/kernel/trace/trace_events.c
++++ b/kernel/trace/trace_events.c
+@@ -162,6 +162,8 @@ static int trace_define_common_fields(vo
+ __common_field(unsigned char, flags);
+ __common_field(unsigned char, preempt_count);
+ __common_field(int, pid);
++ __common_field(unsigned short, migrate_disable);
++ __common_field(unsigned short, padding);
+
+ return ret;
+ }
+--- a/kernel/trace/trace_output.c
++++ b/kernel/trace/trace_output.c
+@@ -472,6 +472,11 @@ int trace_print_lat_fmt(struct trace_seq
+ else
+ trace_seq_putc(s, '.');
+
++ if (entry->migrate_disable)
++ trace_seq_printf(s, "%x", entry->migrate_disable);
++ else
++ trace_seq_putc(s, '.');
++
+ return !trace_seq_has_overflowed(s);
+ }
+
diff --git a/patches/futex-avoid-double-wake-up-in-PI-futex-wait-wake-on-.patch b/patches/futex-avoid-double-wake-up-in-PI-futex-wait-wake-on-.patch
new file mode 100644
index 00000000000000..ec0496037c0125
--- /dev/null
+++ b/patches/futex-avoid-double-wake-up-in-PI-futex-wait-wake-on-.patch
@@ -0,0 +1,223 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 18 Feb 2015 20:17:31 +0100
+Subject: futex: avoid double wake up in PI futex wait / wake on -RT
+
+The boosted priority is reverted after the unlock but before the
+futex_hash_bucket (hb) has been accessed. The result is that we boost the
+task, deboost the task, boost again for the hb lock, deboost again.
+A sched trace of this scenario looks the following:
+
+| med_prio-93 sched_wakeup: comm=high_prio pid=92 prio=9 success=1 target_cpu=000
+| med_prio-93 sched_switch: prev_comm=med_prio prev_pid=93 prev_prio=29 prev_state=R ==> next_comm=high_prio next_pid=92 next_prio=9
+|high_prio-92 sched_pi_setprio: comm=low_prio pid=91 oldprio=120 newprio=9
+|high_prio-92 sched_switch: prev_comm=high_prio prev_pid=92 prev_prio=9 prev_state=S ==> next_comm=low_prio next_pid=91 next_prio=9
+| low_prio-91 sched_wakeup: comm=high_prio pid=92 prio=9 success=1 target_cpu=000
+| low_prio-91 sched_pi_setprio: comm=low_prio pid=91 oldprio=9 newprio=120
+| low_prio-91 sched_switch: prev_comm=low_prio prev_pid=91 prev_prio=120 prev_state=R+ ==> next_comm=high_prio next_pid=92 next_prio=9
+|high_prio-92 sched_pi_setprio: comm=low_prio pid=91 oldprio=120 newprio=9
+|high_prio-92 sched_switch: prev_comm=high_prio prev_pid=92 prev_prio=9 prev_state=D ==> next_comm=low_prio next_pid=91 next_prio=9
+| low_prio-91 sched_wakeup: comm=high_prio pid=92 prio=9 success=1 target_cpu=000
+| low_prio-91 sched_pi_setprio: comm=low_prio pid=91 oldprio=9 newprio=120
+| low_prio-91 sched_switch: prev_comm=low_prio prev_pid=91 prev_prio=120 prev_state=R+ ==> next_comm=high_prio next_pid=92 next_prio=9
+
+We see four sched_pi_setprio() invocation but ideally two would be enough.
+The patch tries to avoid the double wakeup by a wake up once the hb lock is
+released. The same test case:
+
+| med_prio-21 sched_wakeup: comm=high_prio pid=20 prio=9 success=1 target_cpu=000
+| med_prio-21 sched_switch: prev_comm=med_prio prev_pid=21 prev_prio=29 prev_state=R ==> next_comm=high_prio next_pid=20 next_prio=9
+|high_prio-20 sched_pi_setprio: comm=low_prio pid=19 oldprio=120 newprio=9
+|high_prio-20 sched_switch: prev_comm=high_prio prev_pid=20 prev_prio=9 prev_state=S ==> next_comm=low_prio next_pid=19 next_prio=9
+| low_prio-19 sched_wakeup: comm=high_prio pid=20 prio=9 success=1 target_cpu=000
+| low_prio-19 sched_pi_setprio: comm=low_prio pid=19 oldprio=9 newprio=120
+| low_prio-19 sched_switch: prev_comm=low_prio prev_pid=19 prev_prio=120 prev_state=R+ ==> next_comm=high_prio next_pid=20 next_prio=9
+
+only two sched_pi_setprio() invocations as one would expect and see
+without -RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/futex.c | 32 +++++++++++++++++++++++++++++---
+ kernel/locking/rtmutex.c | 40 +++++++++++++++++++++++++++++-----------
+ kernel/locking/rtmutex_common.h | 4 ++++
+ 3 files changed, 62 insertions(+), 14 deletions(-)
+
+--- a/kernel/futex.c
++++ b/kernel/futex.c
+@@ -1117,11 +1117,13 @@ static void mark_wake_futex(struct wake_
+ q->lock_ptr = NULL;
+ }
+
+-static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
++static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
++ struct futex_hash_bucket *hb)
+ {
+ struct task_struct *new_owner;
+ struct futex_pi_state *pi_state = this->pi_state;
+ u32 uninitialized_var(curval), newval;
++ bool deboost;
+ int ret = 0;
+
+ if (!pi_state)
+@@ -1173,7 +1175,17 @@ static int wake_futex_pi(u32 __user *uad
+ raw_spin_unlock_irq(&new_owner->pi_lock);
+
+ raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
+- rt_mutex_unlock(&pi_state->pi_mutex);
++
++ deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex);
++
++ /*
++ * We deboost after dropping hb->lock. That prevents a double
++ * wakeup on RT.
++ */
++ spin_unlock(&hb->lock);
++
++ if (deboost)
++ rt_mutex_adjust_prio(current);
+
+ return 0;
+ }
+@@ -2413,13 +2425,26 @@ static int futex_unlock_pi(u32 __user *u
+ */
+ match = futex_top_waiter(hb, &key);
+ if (match) {
+- ret = wake_futex_pi(uaddr, uval, match);
++ ret = wake_futex_pi(uaddr, uval, match, hb);
++
++ /*
++ * In case of success wake_futex_pi dropped the hash
++ * bucket lock.
++ */
++ if (!ret)
++ goto out_putkey;
++
+ /*
+ * The atomic access to the futex value generated a
+ * pagefault, so retry the user-access and the wakeup:
+ */
+ if (ret == -EFAULT)
+ goto pi_faulted;
++
++ /*
++ * wake_futex_pi has detected invalid state. Tell user
++ * space.
++ */
+ goto out_unlock;
+ }
+
+@@ -2440,6 +2465,7 @@ static int futex_unlock_pi(u32 __user *u
+
+ out_unlock:
+ spin_unlock(&hb->lock);
++out_putkey:
+ put_futex_key(&key);
+ return ret;
+
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -300,7 +300,7 @@ static void __rt_mutex_adjust_prio(struc
+ * of task. We do not use the spin_xx_mutex() variants here as we are
+ * outside of the debug path.)
+ */
+-static void rt_mutex_adjust_prio(struct task_struct *task)
++void rt_mutex_adjust_prio(struct task_struct *task)
+ {
+ unsigned long flags;
+
+@@ -957,8 +957,9 @@ static int task_blocks_on_rt_mutex(struc
+ /*
+ * Wake up the next waiter on the lock.
+ *
+- * Remove the top waiter from the current tasks pi waiter list and
+- * wake it up.
++ * Remove the top waiter from the current tasks pi waiter list,
++ * wake it up and return whether the current task needs to undo
++ * a potential priority boosting.
+ *
+ * Called with lock->wait_lock held.
+ */
+@@ -1255,7 +1256,7 @@ static inline int rt_mutex_slowtrylock(s
+ /*
+ * Slow path to release a rt-mutex:
+ */
+-static void __sched
++static bool __sched
+ rt_mutex_slowunlock(struct rt_mutex *lock)
+ {
+ raw_spin_lock(&lock->wait_lock);
+@@ -1298,7 +1299,7 @@ rt_mutex_slowunlock(struct rt_mutex *loc
+ while (!rt_mutex_has_waiters(lock)) {
+ /* Drops lock->wait_lock ! */
+ if (unlock_rt_mutex_safe(lock) == true)
+- return;
++ return false;
+ /* Relock the rtmutex and try again */
+ raw_spin_lock(&lock->wait_lock);
+ }
+@@ -1311,8 +1312,7 @@ rt_mutex_slowunlock(struct rt_mutex *loc
+
+ raw_spin_unlock(&lock->wait_lock);
+
+- /* Undo pi boosting if necessary: */
+- rt_mutex_adjust_prio(current);
++ return true;
+ }
+
+ /*
+@@ -1363,12 +1363,14 @@ rt_mutex_fasttrylock(struct rt_mutex *lo
+
+ static inline void
+ rt_mutex_fastunlock(struct rt_mutex *lock,
+- void (*slowfn)(struct rt_mutex *lock))
++ bool (*slowfn)(struct rt_mutex *lock))
+ {
+- if (likely(rt_mutex_cmpxchg(lock, current, NULL)))
++ if (likely(rt_mutex_cmpxchg(lock, current, NULL))) {
+ rt_mutex_deadlock_account_unlock(current);
+- else
+- slowfn(lock);
++ } else if (slowfn(lock)) {
++ /* Undo pi boosting if necessary: */
++ rt_mutex_adjust_prio(current);
++ }
+ }
+
+ /**
+@@ -1463,6 +1465,22 @@ void __sched rt_mutex_unlock(struct rt_m
+ EXPORT_SYMBOL_GPL(rt_mutex_unlock);
+
+ /**
++ * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock
++ * @lock: the rt_mutex to be unlocked
++ *
++ * Returns: true/false indicating whether priority adjustment is
++ * required or not.
++ */
++bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock)
++{
++ if (likely(rt_mutex_cmpxchg(lock, current, NULL))) {
++ rt_mutex_deadlock_account_unlock(current);
++ return false;
++ }
++ return rt_mutex_slowunlock(lock);
++}
++
++/**
+ * rt_mutex_destroy - mark a mutex unusable
+ * @lock: the mutex to be destroyed
+ *
+--- a/kernel/locking/rtmutex_common.h
++++ b/kernel/locking/rtmutex_common.h
+@@ -132,6 +132,10 @@ extern int rt_mutex_finish_proxy_lock(st
+ struct rt_mutex_waiter *waiter);
+ extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
+
++extern bool rt_mutex_futex_unlock(struct rt_mutex *lock);
++
++extern void rt_mutex_adjust_prio(struct task_struct *task);
++
+ #ifdef CONFIG_DEBUG_RT_MUTEXES
+ # include "rtmutex-debug.h"
+ #else
diff --git a/patches/futex-requeue-pi-fix.patch b/patches/futex-requeue-pi-fix.patch
new file mode 100644
index 00000000000000..d2938959eaf7ff
--- /dev/null
+++ b/patches/futex-requeue-pi-fix.patch
@@ -0,0 +1,113 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: futex: Fix bug on when a requeued RT task times out
+
+Requeue with timeout causes a bug with PREEMPT_RT_FULL.
+
+The bug comes from a timed out condition.
+
+
+ TASK 1 TASK 2
+ ------ ------
+ futex_wait_requeue_pi()
+ futex_wait_queue_me()
+ <timed out>
+
+ double_lock_hb();
+
+ raw_spin_lock(pi_lock);
+ if (current->pi_blocked_on) {
+ } else {
+ current->pi_blocked_on = PI_WAKE_INPROGRESS;
+ run_spin_unlock(pi_lock);
+ spin_lock(hb->lock); <-- blocked!
+
+
+ plist_for_each_entry_safe(this) {
+ rt_mutex_start_proxy_lock();
+ task_blocks_on_rt_mutex();
+ BUG_ON(task->pi_blocked_on)!!!!
+
+The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
+problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
+grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
+it will block and set the "pi_blocked_on" to the hb->lock.
+
+When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
+because the task1's pi_blocked_on is no longer set to that, but instead,
+set to the hb->lock.
+
+The fix:
+
+When calling rt_mutex_start_proxy_lock() a check is made to see
+if the proxy tasks pi_blocked_on is set. If so, exit out early.
+Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
+the proxy task that it is being requeued, and will handle things
+appropriately.
+
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/locking/rtmutex.c | 32 +++++++++++++++++++++++++++++++-
+ kernel/locking/rtmutex_common.h | 1 +
+ 2 files changed, 32 insertions(+), 1 deletion(-)
+
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -71,7 +71,8 @@ static void fixup_rt_mutex_waiters(struc
+
+ static int rt_mutex_real_waiter(struct rt_mutex_waiter *waiter)
+ {
+- return waiter && waiter != PI_WAKEUP_INPROGRESS;
++ return waiter && waiter != PI_WAKEUP_INPROGRESS &&
++ waiter != PI_REQUEUE_INPROGRESS;
+ }
+
+ /*
+@@ -1603,6 +1604,35 @@ int rt_mutex_start_proxy_lock(struct rt_
+ return 1;
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++ /*
++ * In PREEMPT_RT there's an added race.
++ * If the task, that we are about to requeue, times out,
++ * it can set the PI_WAKEUP_INPROGRESS. This tells the requeue
++ * to skip this task. But right after the task sets
++ * its pi_blocked_on to PI_WAKEUP_INPROGRESS it can then
++ * block on the spin_lock(&hb->lock), which in RT is an rtmutex.
++ * This will replace the PI_WAKEUP_INPROGRESS with the actual
++ * lock that it blocks on. We *must not* place this task
++ * on this proxy lock in that case.
++ *
++ * To prevent this race, we first take the task's pi_lock
++ * and check if it has updated its pi_blocked_on. If it has,
++ * we assume that it woke up and we return -EAGAIN.
++ * Otherwise, we set the task's pi_blocked_on to
++ * PI_REQUEUE_INPROGRESS, so that if the task is waking up
++ * it will know that we are in the process of requeuing it.
++ */
++ raw_spin_lock_irq(&task->pi_lock);
++ if (task->pi_blocked_on) {
++ raw_spin_unlock_irq(&task->pi_lock);
++ raw_spin_unlock(&lock->wait_lock);
++ return -EAGAIN;
++ }
++ task->pi_blocked_on = PI_REQUEUE_INPROGRESS;
++ raw_spin_unlock_irq(&task->pi_lock);
++#endif
++
+ /* We enforce deadlock detection for futexes */
+ ret = task_blocks_on_rt_mutex(lock, waiter, task,
+ RT_MUTEX_FULL_CHAINWALK);
+--- a/kernel/locking/rtmutex_common.h
++++ b/kernel/locking/rtmutex_common.h
+@@ -120,6 +120,7 @@ enum rtmutex_chainwalk {
+ * PI-futex support (proxy locking functions, etc.):
+ */
+ #define PI_WAKEUP_INPROGRESS ((struct rt_mutex_waiter *) 1)
++#define PI_REQUEUE_INPROGRESS ((struct rt_mutex_waiter *) 2)
+
+ extern struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock);
+ extern void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
diff --git a/patches/genirq-disable-irqpoll-on-rt.patch b/patches/genirq-disable-irqpoll-on-rt.patch
new file mode 100644
index 00000000000000..9aa92515d4d2a3
--- /dev/null
+++ b/patches/genirq-disable-irqpoll-on-rt.patch
@@ -0,0 +1,37 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:57 -0500
+Subject: genirq: Disable irqpoll on -rt
+
+Creates long latencies for no value
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/irq/spurious.c | 8 ++++++++
+ 1 file changed, 8 insertions(+)
+
+--- a/kernel/irq/spurious.c
++++ b/kernel/irq/spurious.c
+@@ -444,6 +444,10 @@ MODULE_PARM_DESC(noirqdebug, "Disable ir
+
+ static int __init irqfixup_setup(char *str)
+ {
++#ifdef CONFIG_PREEMPT_RT_BASE
++ pr_warn("irqfixup boot option not supported w/ CONFIG_PREEMPT_RT_BASE\n");
++ return 1;
++#endif
+ irqfixup = 1;
+ printk(KERN_WARNING "Misrouted IRQ fixup support enabled.\n");
+ printk(KERN_WARNING "This may impact system performance.\n");
+@@ -456,6 +460,10 @@ module_param(irqfixup, int, 0644);
+
+ static int __init irqpoll_setup(char *str)
+ {
++#ifdef CONFIG_PREEMPT_RT_BASE
++ pr_warn("irqpoll boot option not supported w/ CONFIG_PREEMPT_RT_BASE\n");
++ return 1;
++#endif
+ irqfixup = 2;
+ printk(KERN_WARNING "Misrouted IRQ fixup and polling support "
+ "enabled\n");
diff --git a/patches/genirq-do-not-invoke-the-affinity-callback-via-a-wor.patch b/patches/genirq-do-not-invoke-the-affinity-callback-via-a-wor.patch
new file mode 100644
index 00000000000000..bf5b7589303c9a
--- /dev/null
+++ b/patches/genirq-do-not-invoke-the-affinity-callback-via-a-wor.patch
@@ -0,0 +1,144 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Wed, 21 Aug 2013 17:48:46 +0200
+Subject: genirq: Do not invoke the affinity callback via a workqueue on RT
+
+Joe Korty reported, that __irq_set_affinity_locked() schedules a
+workqueue while holding a rawlock which results in a might_sleep()
+warning.
+This patch moves the invokation into a process context so that we only
+wakeup() a process while holding the lock.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/interrupt.h | 1
+ kernel/irq/manage.c | 79 ++++++++++++++++++++++++++++++++++++++++++++--
+ 2 files changed, 77 insertions(+), 3 deletions(-)
+
+--- a/include/linux/interrupt.h
++++ b/include/linux/interrupt.h
+@@ -215,6 +215,7 @@ struct irq_affinity_notify {
+ unsigned int irq;
+ struct kref kref;
+ struct work_struct work;
++ struct list_head list;
+ void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
+ void (*release)(struct kref *ref);
+ };
+--- a/kernel/irq/manage.c
++++ b/kernel/irq/manage.c
+@@ -181,6 +181,62 @@ static inline void
+ irq_get_pending(struct cpumask *mask, struct irq_desc *desc) { }
+ #endif
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++static void _irq_affinity_notify(struct irq_affinity_notify *notify);
++static struct task_struct *set_affinity_helper;
++static LIST_HEAD(affinity_list);
++static DEFINE_RAW_SPINLOCK(affinity_list_lock);
++
++static int set_affinity_thread(void *unused)
++{
++ while (1) {
++ struct irq_affinity_notify *notify;
++ int empty;
++
++ set_current_state(TASK_INTERRUPTIBLE);
++
++ raw_spin_lock_irq(&affinity_list_lock);
++ empty = list_empty(&affinity_list);
++ raw_spin_unlock_irq(&affinity_list_lock);
++
++ if (empty)
++ schedule();
++ if (kthread_should_stop())
++ break;
++ set_current_state(TASK_RUNNING);
++try_next:
++ notify = NULL;
++
++ raw_spin_lock_irq(&affinity_list_lock);
++ if (!list_empty(&affinity_list)) {
++ notify = list_first_entry(&affinity_list,
++ struct irq_affinity_notify, list);
++ list_del_init(&notify->list);
++ }
++ raw_spin_unlock_irq(&affinity_list_lock);
++
++ if (!notify)
++ continue;
++ _irq_affinity_notify(notify);
++ goto try_next;
++ }
++ return 0;
++}
++
++static void init_helper_thread(void)
++{
++ if (set_affinity_helper)
++ return;
++ set_affinity_helper = kthread_run(set_affinity_thread, NULL,
++ "affinity-cb");
++ WARN_ON(IS_ERR(set_affinity_helper));
++}
++#else
++
++static inline void init_helper_thread(void) { }
++
++#endif
++
+ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
+ bool force)
+ {
+@@ -220,7 +276,17 @@ int irq_set_affinity_locked(struct irq_d
+
+ if (desc->affinity_notify) {
+ kref_get(&desc->affinity_notify->kref);
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++ raw_spin_lock(&affinity_list_lock);
++ if (list_empty(&desc->affinity_notify->list))
++ list_add_tail(&affinity_list,
++ &desc->affinity_notify->list);
++ raw_spin_unlock(&affinity_list_lock);
++ wake_up_process(set_affinity_helper);
++#else
+ schedule_work(&desc->affinity_notify->work);
++#endif
+ }
+ irqd_set(data, IRQD_AFFINITY_SET);
+
+@@ -258,10 +324,8 @@ int irq_set_affinity_hint(unsigned int i
+ }
+ EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
+
+-static void irq_affinity_notify(struct work_struct *work)
++static void _irq_affinity_notify(struct irq_affinity_notify *notify)
+ {
+- struct irq_affinity_notify *notify =
+- container_of(work, struct irq_affinity_notify, work);
+ struct irq_desc *desc = irq_to_desc(notify->irq);
+ cpumask_var_t cpumask;
+ unsigned long flags;
+@@ -283,6 +347,13 @@ static void irq_affinity_notify(struct w
+ kref_put(&notify->kref, notify->release);
+ }
+
++static void irq_affinity_notify(struct work_struct *work)
++{
++ struct irq_affinity_notify *notify =
++ container_of(work, struct irq_affinity_notify, work);
++ _irq_affinity_notify(notify);
++}
++
+ /**
+ * irq_set_affinity_notifier - control notification of IRQ affinity changes
+ * @irq: Interrupt for which to enable/disable notification
+@@ -312,6 +383,8 @@ irq_set_affinity_notifier(unsigned int i
+ notify->irq = irq;
+ kref_init(&notify->kref);
+ INIT_WORK(&notify->work, irq_affinity_notify);
++ INIT_LIST_HEAD(&notify->list);
++ init_helper_thread();
+ }
+
+ raw_spin_lock_irqsave(&desc->lock, flags);
diff --git a/patches/genirq-force-threading.patch b/patches/genirq-force-threading.patch
new file mode 100644
index 00000000000000..a37ee6b250ae90
--- /dev/null
+++ b/patches/genirq-force-threading.patch
@@ -0,0 +1,48 @@
+Subject: genirq: Force interrupt thread on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 03 Apr 2011 11:57:29 +0200
+
+Force threaded_irqs and optimize the code (force_irqthreads) in regard
+to this.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/interrupt.h | 6 +++++-
+ kernel/irq/manage.c | 2 ++
+ 2 files changed, 7 insertions(+), 1 deletion(-)
+
+--- a/include/linux/interrupt.h
++++ b/include/linux/interrupt.h
+@@ -377,9 +377,13 @@ extern int irq_set_irqchip_state(unsigne
+ bool state);
+
+ #ifdef CONFIG_IRQ_FORCED_THREADING
++# ifndef CONFIG_PREEMPT_RT_BASE
+ extern bool force_irqthreads;
++# else
++# define force_irqthreads (true)
++# endif
+ #else
+-#define force_irqthreads (0)
++#define force_irqthreads (false)
+ #endif
+
+ #ifndef __ARCH_SET_SOFTIRQ_PENDING
+--- a/kernel/irq/manage.c
++++ b/kernel/irq/manage.c
+@@ -22,6 +22,7 @@
+ #include "internals.h"
+
+ #ifdef CONFIG_IRQ_FORCED_THREADING
++# ifndef CONFIG_PREEMPT_RT_BASE
+ __read_mostly bool force_irqthreads;
+
+ static int __init setup_forced_irqthreads(char *arg)
+@@ -30,6 +31,7 @@ static int __init setup_forced_irqthread
+ return 0;
+ }
+ early_param("threadirqs", setup_forced_irqthreads);
++# endif
+ #endif
+
+ static void __synchronize_hardirq(struct irq_desc *desc)
diff --git a/patches/gpio-omap-use-raw-locks-for-locking.patch b/patches/gpio-omap-use-raw-locks-for-locking.patch
new file mode 100644
index 00000000000000..48947fe69c8317
--- /dev/null
+++ b/patches/gpio-omap-use-raw-locks-for-locking.patch
@@ -0,0 +1,316 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 12 Feb 2015 16:01:13 +0100
+Subject: gpio: omap: use raw locks for locking
+
+This patch converts gpio_bank.lock from a spin_lock into a
+raw_spin_lock. The call path is to access this lock is always under a
+raw_spin_lock, for instance
+- __setup_irq() holds &desc->lock with irq off
+ + __irq_set_trigger()
+ + omap_gpio_irq_type()
+
+- handle_level_irq() (runs with irqs off therefore raw locks)
+ + mask_ack_irq()
+ + omap_gpio_mask_irq()
+
+This fixes the obvious backtrace on -RT. However the locking vs context
+is not and this is not limited to -RT:
+- omap_gpio_irq_type() is called with IRQ off and has an conditional
+ call to pm_runtime_get_sync() which may sleep. Either it may happen or
+ it may not happen but pm_runtime_get_sync() should not be called with
+ irqs off.
+
+- omap_gpio_debounce() is holding the lock with IRQs off.
+ + omap2_set_gpio_debounce()
+ + clk_prepare_enable()
+ + clk_prepare() this one might sleep.
+ The number of users of gpiod_set_debounce() / gpio_set_debounce()
+ looks low but still this is not good.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/gpio/gpio-omap.c | 78 +++++++++++++++++++++++------------------------
+ 1 file changed, 39 insertions(+), 39 deletions(-)
+
+--- a/drivers/gpio/gpio-omap.c
++++ b/drivers/gpio/gpio-omap.c
+@@ -57,7 +57,7 @@ struct gpio_bank {
+ u32 saved_datain;
+ u32 level_mask;
+ u32 toggle_mask;
+- spinlock_t lock;
++ raw_spinlock_t lock;
+ struct gpio_chip chip;
+ struct clk *dbck;
+ u32 mod_usage;
+@@ -498,14 +498,14 @@ static int omap_gpio_irq_type(struct irq
+ (type & (IRQ_TYPE_LEVEL_LOW|IRQ_TYPE_LEVEL_HIGH)))
+ return -EINVAL;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ retval = omap_set_gpio_triggering(bank, offset, type);
+ omap_gpio_init_irq(bank, offset);
+ if (!omap_gpio_is_input(bank, offset)) {
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ return -EINVAL;
+ }
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ if (type & (IRQ_TYPE_LEVEL_LOW | IRQ_TYPE_LEVEL_HIGH))
+ __irq_set_handler_locked(d->irq, handle_level_irq);
+@@ -626,14 +626,14 @@ static int omap_set_gpio_wakeup(struct g
+ return -EINVAL;
+ }
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ if (enable)
+ bank->context.wake_en |= gpio_bit;
+ else
+ bank->context.wake_en &= ~gpio_bit;
+
+ writel_relaxed(bank->context.wake_en, bank->base + bank->regs->wkup_en);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ return 0;
+ }
+@@ -668,7 +668,7 @@ static int omap_gpio_request(struct gpio
+ if (!BANK_USED(bank))
+ pm_runtime_get_sync(bank->dev);
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ /* Set trigger to none. You need to enable the desired trigger with
+ * request_irq() or set_irq_type(). Only do this if the IRQ line has
+ * not already been requested.
+@@ -678,7 +678,7 @@ static int omap_gpio_request(struct gpio
+ omap_enable_gpio_module(bank, offset);
+ }
+ bank->mod_usage |= BIT(offset);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ return 0;
+ }
+@@ -688,11 +688,11 @@ static void omap_gpio_free(struct gpio_c
+ struct gpio_bank *bank = container_of(chip, struct gpio_bank, chip);
+ unsigned long flags;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ bank->mod_usage &= ~(BIT(offset));
+ omap_disable_gpio_module(bank, offset);
+ omap_reset_gpio(bank, offset);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ /*
+ * If this is the last gpio to be freed in the bank,
+@@ -794,9 +794,9 @@ static unsigned int omap_gpio_irq_startu
+ if (!BANK_USED(bank))
+ pm_runtime_get_sync(bank->dev);
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ omap_gpio_init_irq(bank, offset);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ omap_gpio_unmask_irq(d);
+
+ return 0;
+@@ -808,11 +808,11 @@ static void omap_gpio_irq_shutdown(struc
+ unsigned long flags;
+ unsigned offset = d->hwirq;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ bank->irq_usage &= ~(BIT(offset));
+ omap_disable_gpio_module(bank, offset);
+ omap_reset_gpio(bank, offset);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ /*
+ * If this is the last IRQ to be freed in the bank,
+@@ -836,10 +836,10 @@ static void omap_gpio_mask_irq(struct ir
+ unsigned offset = d->hwirq;
+ unsigned long flags;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ omap_set_gpio_irqenable(bank, offset, 0);
+ omap_set_gpio_triggering(bank, offset, IRQ_TYPE_NONE);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ }
+
+ static void omap_gpio_unmask_irq(struct irq_data *d)
+@@ -849,7 +849,7 @@ static void omap_gpio_unmask_irq(struct
+ u32 trigger = irqd_get_trigger_type(d);
+ unsigned long flags;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ if (trigger)
+ omap_set_gpio_triggering(bank, offset, trigger);
+
+@@ -861,7 +861,7 @@ static void omap_gpio_unmask_irq(struct
+ }
+
+ omap_set_gpio_irqenable(bank, offset, 1);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ }
+
+ /*---------------------------------------------------------------------*/
+@@ -874,9 +874,9 @@ static int omap_mpuio_suspend_noirq(stru
+ OMAP_MPUIO_GPIO_MASKIT / bank->stride;
+ unsigned long flags;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ writel_relaxed(0xffff & ~bank->context.wake_en, mask_reg);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ return 0;
+ }
+@@ -889,9 +889,9 @@ static int omap_mpuio_resume_noirq(struc
+ OMAP_MPUIO_GPIO_MASKIT / bank->stride;
+ unsigned long flags;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ writel_relaxed(bank->context.wake_en, mask_reg);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ return 0;
+ }
+@@ -937,9 +937,9 @@ static int omap_gpio_get_direction(struc
+
+ bank = container_of(chip, struct gpio_bank, chip);
+ reg = bank->base + bank->regs->direction;
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ dir = !!(readl_relaxed(reg) & BIT(offset));
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ return dir;
+ }
+
+@@ -949,9 +949,9 @@ static int omap_gpio_input(struct gpio_c
+ unsigned long flags;
+
+ bank = container_of(chip, struct gpio_bank, chip);
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ omap_set_gpio_direction(bank, offset, 1);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ return 0;
+ }
+
+@@ -973,10 +973,10 @@ static int omap_gpio_output(struct gpio_
+ unsigned long flags;
+
+ bank = container_of(chip, struct gpio_bank, chip);
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ bank->set_dataout(bank, offset, value);
+ omap_set_gpio_direction(bank, offset, 0);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ return 0;
+ }
+
+@@ -988,9 +988,9 @@ static int omap_gpio_debounce(struct gpi
+
+ bank = container_of(chip, struct gpio_bank, chip);
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ omap2_set_gpio_debounce(bank, offset, debounce);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ return 0;
+ }
+@@ -1001,9 +1001,9 @@ static void omap_gpio_set(struct gpio_ch
+ unsigned long flags;
+
+ bank = container_of(chip, struct gpio_bank, chip);
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+ bank->set_dataout(bank, offset, value);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ }
+
+ /*---------------------------------------------------------------------*/
+@@ -1199,7 +1199,7 @@ static int omap_gpio_probe(struct platfo
+ else
+ bank->set_dataout = omap_set_gpio_dataout_mask;
+
+- spin_lock_init(&bank->lock);
++ raw_spin_lock_init(&bank->lock);
+
+ /* Static mapping, never released */
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+@@ -1246,7 +1246,7 @@ static int omap_gpio_runtime_suspend(str
+ unsigned long flags;
+ u32 wake_low, wake_hi;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+
+ /*
+ * Only edges can generate a wakeup event to the PRCM.
+@@ -1299,7 +1299,7 @@ static int omap_gpio_runtime_suspend(str
+ bank->get_context_loss_count(bank->dev);
+
+ omap_gpio_dbck_disable(bank);
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ return 0;
+ }
+@@ -1314,7 +1314,7 @@ static int omap_gpio_runtime_resume(stru
+ unsigned long flags;
+ int c;
+
+- spin_lock_irqsave(&bank->lock, flags);
++ raw_spin_lock_irqsave(&bank->lock, flags);
+
+ /*
+ * On the first resume during the probe, the context has not
+@@ -1350,14 +1350,14 @@ static int omap_gpio_runtime_resume(stru
+ if (c != bank->context_loss_count) {
+ omap_gpio_restore_context(bank);
+ } else {
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ return 0;
+ }
+ }
+ }
+
+ if (!bank->workaround_enabled) {
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ return 0;
+ }
+
+@@ -1412,7 +1412,7 @@ static int omap_gpio_runtime_resume(stru
+ }
+
+ bank->workaround_enabled = false;
+- spin_unlock_irqrestore(&bank->lock, flags);
++ raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ return 0;
+ }
diff --git a/patches/hotplug-Use-set_cpus_allowed_ptr-in-sync_unplug_thre.patch b/patches/hotplug-Use-set_cpus_allowed_ptr-in-sync_unplug_thre.patch
new file mode 100644
index 00000000000000..78d70daec6e33f
--- /dev/null
+++ b/patches/hotplug-Use-set_cpus_allowed_ptr-in-sync_unplug_thre.patch
@@ -0,0 +1,46 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Tue, 24 Mar 2015 08:14:49 +0100
+Subject: hotplug: Use set_cpus_allowed_ptr() in sync_unplug_thread()
+
+do_set_cpus_allowed() is not safe vs ->sched_class change.
+
+crash> bt
+PID: 11676 TASK: ffff88026f979da0 CPU: 22 COMMAND: "sync_unplug/22"
+ #0 [ffff880274d25bc8] machine_kexec at ffffffff8103b41c
+ #1 [ffff880274d25c18] crash_kexec at ffffffff810d881a
+ #2 [ffff880274d25cd8] oops_end at ffffffff81525818
+ #3 [ffff880274d25cf8] do_invalid_op at ffffffff81003096
+ #4 [ffff880274d25d90] invalid_op at ffffffff8152d3de
+ [exception RIP: set_cpus_allowed_rt+18]
+ RIP: ffffffff8109e012 RSP: ffff880274d25e48 RFLAGS: 00010202
+ RAX: ffffffff8109e000 RBX: ffff88026f979da0 RCX: ffff8802770cb6e8
+ RDX: 0000000000000000 RSI: ffffffff81add700 RDI: ffff88026f979da0
+ RBP: ffff880274d25e78 R8: ffffffff816112e0 R9: 0000000000000001
+ R10: 0000000000000001 R11: 0000000000011940 R12: ffff88026f979da0
+ R13: ffff8802770cb6d0 R14: ffff880274d25fd8 R15: 0000000000000000
+ ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
+ #5 [ffff880274d25e60] do_set_cpus_allowed at ffffffff8108e65f
+ #6 [ffff880274d25e80] sync_unplug_thread at ffffffff81058c08
+ #7 [ffff880274d25ed8] kthread at ffffffff8107cad6
+ #8 [ffff880274d25f50] ret_from_fork at ffffffff8152bbbc
+crash> task_struct ffff88026f979da0 | grep class
+ sched_class = 0xffffffff816111e0 <fair_sched_class+64>,
+
+Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/cpu.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -267,7 +267,7 @@ static int sync_unplug_thread(void *data
+ * we don't want any more work on this CPU.
+ */
+ current->flags &= ~PF_NO_SETAFFINITY;
+- do_set_cpus_allowed(current, cpu_present_mask);
++ set_cpus_allowed_ptr(current, cpu_present_mask);
+ migrate_me();
+ return 0;
+ }
diff --git a/patches/hotplug-light-get-online-cpus.patch b/patches/hotplug-light-get-online-cpus.patch
new file mode 100644
index 00000000000000..4ddd270805bf14
--- /dev/null
+++ b/patches/hotplug-light-get-online-cpus.patch
@@ -0,0 +1,204 @@
+Subject: hotplug: Lightweight get online cpus
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 15 Jun 2011 12:36:06 +0200
+
+get_online_cpus() is a heavy weight function which involves a global
+mutex. migrate_disable() wants a simpler construct which prevents only
+a CPU from going doing while a task is in a migrate disabled section.
+
+Implement a per cpu lockless mechanism, which serializes only in the
+real unplug case on a global mutex. That serialization affects only
+tasks on the cpu which should be brought down.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/cpu.h | 7 +--
+ kernel/cpu.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++-
+ 2 files changed, 122 insertions(+), 4 deletions(-)
+
+--- a/include/linux/cpu.h
++++ b/include/linux/cpu.h
+@@ -221,9 +221,6 @@ static inline void smpboot_thread_init(v
+ #endif /* CONFIG_SMP */
+ extern struct bus_type cpu_subsys;
+
+-static inline void pin_current_cpu(void) { }
+-static inline void unpin_current_cpu(void) { }
+-
+ #ifdef CONFIG_HOTPLUG_CPU
+ /* Stop CPUs going up and down. */
+
+@@ -234,6 +231,8 @@ extern bool try_get_online_cpus(void);
+ extern void put_online_cpus(void);
+ extern void cpu_hotplug_disable(void);
+ extern void cpu_hotplug_enable(void);
++extern void pin_current_cpu(void);
++extern void unpin_current_cpu(void);
+ #define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri)
+ #define __hotcpu_notifier(fn, pri) __cpu_notifier(fn, pri)
+ #define register_hotcpu_notifier(nb) register_cpu_notifier(nb)
+@@ -252,6 +251,8 @@ static inline void cpu_hotplug_done(void
+ #define put_online_cpus() do { } while (0)
+ #define cpu_hotplug_disable() do { } while (0)
+ #define cpu_hotplug_enable() do { } while (0)
++static inline void pin_current_cpu(void) { }
++static inline void unpin_current_cpu(void) { }
+ #define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
+ #define __hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
+ /* These aren't inline functions due to a GCC bug. */
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -88,6 +88,100 @@ static struct {
+ #define cpuhp_lock_acquire() lock_map_acquire(&cpu_hotplug.dep_map)
+ #define cpuhp_lock_release() lock_map_release(&cpu_hotplug.dep_map)
+
++struct hotplug_pcp {
++ struct task_struct *unplug;
++ int refcount;
++ struct completion synced;
++};
++
++static DEFINE_PER_CPU(struct hotplug_pcp, hotplug_pcp);
++
++/**
++ * pin_current_cpu - Prevent the current cpu from being unplugged
++ *
++ * Lightweight version of get_online_cpus() to prevent cpu from being
++ * unplugged when code runs in a migration disabled region.
++ *
++ * Must be called with preemption disabled (preempt_count = 1)!
++ */
++void pin_current_cpu(void)
++{
++ struct hotplug_pcp *hp = this_cpu_ptr(&hotplug_pcp);
++
++retry:
++ if (!hp->unplug || hp->refcount || preempt_count() > 1 ||
++ hp->unplug == current) {
++ hp->refcount++;
++ return;
++ }
++ preempt_enable();
++ mutex_lock(&cpu_hotplug.lock);
++ mutex_unlock(&cpu_hotplug.lock);
++ preempt_disable();
++ goto retry;
++}
++
++/**
++ * unpin_current_cpu - Allow unplug of current cpu
++ *
++ * Must be called with preemption or interrupts disabled!
++ */
++void unpin_current_cpu(void)
++{
++ struct hotplug_pcp *hp = this_cpu_ptr(&hotplug_pcp);
++
++ WARN_ON(hp->refcount <= 0);
++
++ /* This is safe. sync_unplug_thread is pinned to this cpu */
++ if (!--hp->refcount && hp->unplug && hp->unplug != current)
++ wake_up_process(hp->unplug);
++}
++
++/*
++ * FIXME: Is this really correct under all circumstances ?
++ */
++static int sync_unplug_thread(void *data)
++{
++ struct hotplug_pcp *hp = data;
++
++ preempt_disable();
++ hp->unplug = current;
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ while (hp->refcount) {
++ schedule_preempt_disabled();
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ }
++ set_current_state(TASK_RUNNING);
++ preempt_enable();
++ complete(&hp->synced);
++ return 0;
++}
++
++/*
++ * Start the sync_unplug_thread on the target cpu and wait for it to
++ * complete.
++ */
++static int cpu_unplug_begin(unsigned int cpu)
++{
++ struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
++ struct task_struct *tsk;
++
++ init_completion(&hp->synced);
++ tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d\n", cpu);
++ if (IS_ERR(tsk))
++ return (PTR_ERR(tsk));
++ kthread_bind(tsk, cpu);
++ wake_up_process(tsk);
++ wait_for_completion(&hp->synced);
++ return 0;
++}
++
++static void cpu_unplug_done(unsigned int cpu)
++{
++ struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
++
++ hp->unplug = NULL;
++}
+
+ void get_online_cpus(void)
+ {
+@@ -349,13 +443,14 @@ static int __ref take_cpu_down(void *_pa
+ /* Requires cpu_add_remove_lock to be held */
+ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
+ {
+- int err, nr_calls = 0;
++ int mycpu, err, nr_calls = 0;
+ void *hcpu = (void *)(long)cpu;
+ unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
+ struct take_cpu_down_param tcd_param = {
+ .mod = mod,
+ .hcpu = hcpu,
+ };
++ cpumask_var_t cpumask;
+
+ if (num_online_cpus() == 1)
+ return -EBUSY;
+@@ -363,7 +458,27 @@ static int __ref _cpu_down(unsigned int
+ if (!cpu_online(cpu))
+ return -EINVAL;
+
++ /* Move the downtaker off the unplug cpu */
++ if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
++ return -ENOMEM;
++ cpumask_andnot(cpumask, cpu_online_mask, cpumask_of(cpu));
++ set_cpus_allowed_ptr(current, cpumask);
++ free_cpumask_var(cpumask);
++ preempt_disable();
++ mycpu = smp_processor_id();
++ if (mycpu == cpu) {
++ printk(KERN_ERR "Yuck! Still on unplug CPU\n!");
++ preempt_enable();
++ return -EBUSY;
++ }
++ preempt_enable();
++
+ cpu_hotplug_begin();
++ err = cpu_unplug_begin(cpu);
++ if (err) {
++ printk("cpu_unplug_begin(%d) failed\n", cpu);
++ goto out_cancel;
++ }
+
+ err = __cpu_notify(CPU_DOWN_PREPARE | mod, hcpu, -1, &nr_calls);
+ if (err) {
+@@ -427,6 +542,8 @@ static int __ref _cpu_down(unsigned int
+ check_for_tasks(cpu);
+
+ out_release:
++ cpu_unplug_done(cpu);
++out_cancel:
+ cpu_hotplug_done();
+ if (!err)
+ cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
diff --git a/patches/hotplug-sync_unplug-no-27-5cn-27-in-task-name.patch b/patches/hotplug-sync_unplug-no-27-5cn-27-in-task-name.patch
new file mode 100644
index 00000000000000..cc7d94c5daab64
--- /dev/null
+++ b/patches/hotplug-sync_unplug-no-27-5cn-27-in-task-name.patch
@@ -0,0 +1,24 @@
+Subject: hotplug: sync_unplug: No "\n" in task name
+From: Yong Zhang <yong.zhang0@gmail.com>
+Date: Sun, 16 Oct 2011 18:56:43 +0800
+
+Otherwise the output will look a little odd.
+
+Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
+Link: http://lkml.kernel.org/r/1318762607-2261-2-git-send-email-yong.zhang0@gmail.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/cpu.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -167,7 +167,7 @@ static int cpu_unplug_begin(unsigned int
+ struct task_struct *tsk;
+
+ init_completion(&hp->synced);
+- tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d\n", cpu);
++ tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d", cpu);
+ if (IS_ERR(tsk))
+ return (PTR_ERR(tsk));
+ kthread_bind(tsk, cpu);
diff --git a/patches/hotplug-use-migrate-disable.patch b/patches/hotplug-use-migrate-disable.patch
new file mode 100644
index 00000000000000..25844ce8e17141
--- /dev/null
+++ b/patches/hotplug-use-migrate-disable.patch
@@ -0,0 +1,39 @@
+Subject: hotplug: Use migrate disable on unplug
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 19:35:29 +0200
+
+Migration needs to be disabled accross the unplug handling to make
+sure that the unplug thread is off the unplugged cpu.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/cpu.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -466,14 +466,13 @@ static int __ref _cpu_down(unsigned int
+ cpumask_andnot(cpumask, cpu_online_mask, cpumask_of(cpu));
+ set_cpus_allowed_ptr(current, cpumask);
+ free_cpumask_var(cpumask);
+- preempt_disable();
++ migrate_disable();
+ mycpu = smp_processor_id();
+ if (mycpu == cpu) {
+ printk(KERN_ERR "Yuck! Still on unplug CPU\n!");
+- preempt_enable();
++ migrate_enable();
+ return -EBUSY;
+ }
+- preempt_enable();
+
+ cpu_hotplug_begin();
+ err = cpu_unplug_begin(cpu);
+@@ -546,6 +545,7 @@ static int __ref _cpu_down(unsigned int
+ out_release:
+ cpu_unplug_done(cpu);
+ out_cancel:
++ migrate_enable();
+ cpu_hotplug_done();
+ if (!err)
+ cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
diff --git a/patches/hrtimer-Move-schedule_work-call-to-helper-thread.patch b/patches/hrtimer-Move-schedule_work-call-to-helper-thread.patch
new file mode 100644
index 00000000000000..ef0a92332bd471
--- /dev/null
+++ b/patches/hrtimer-Move-schedule_work-call-to-helper-thread.patch
@@ -0,0 +1,117 @@
+From: Yang Shi <yang.shi@windriver.com>
+Date: Mon, 16 Sep 2013 14:09:19 -0700
+Subject: hrtimer: Move schedule_work call to helper thread
+
+When run ltp leapsec_timer test, the following call trace is caught:
+
+BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
+in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
+Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310
+
+CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-rt3 #2
+Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
+ffffffff81c2f800 ffff880076843e40 ffffffff8169918d ffff880076843e58
+ffffffff8106db31 ffff88007684b4a0 ffff880076843e70 ffffffff8169d9c0
+ffff88007684b4a0 ffff880076843eb0 ffffffff81059da1 0000001876851200
+Call Trace:
+<IRQ> [<ffffffff8169918d>] dump_stack+0x19/0x1b
+[<ffffffff8106db31>] __might_sleep+0xf1/0x170
+[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
+[<ffffffff81059da1>] queue_work_on+0x61/0x100
+[<ffffffff81065aa1>] clock_was_set_delayed+0x21/0x30
+[<ffffffff810883be>] do_timer+0x40e/0x660
+[<ffffffff8108f487>] tick_do_update_jiffies64+0xf7/0x140
+[<ffffffff8108fe42>] tick_check_idle+0x92/0xc0
+[<ffffffff81044327>] irq_enter+0x57/0x70
+[<ffffffff816a040e>] smp_apic_timer_interrupt+0x3e/0x9b
+[<ffffffff8169f80a>] apic_timer_interrupt+0x6a/0x70
+<EOI> [<ffffffff8155ea1c>] ? cpuidle_enter_state+0x4c/0xc0
+[<ffffffff8155eb68>] cpuidle_idle_call+0xd8/0x2d0
+[<ffffffff8100b59e>] arch_cpu_idle+0xe/0x30
+[<ffffffff8108585e>] cpu_startup_entry+0x19e/0x310
+[<ffffffff8168efa2>] start_secondary+0x1ad/0x1b0
+
+The clock_was_set_delayed is called in hard IRQ handler (timer interrupt), which
+calls schedule_work.
+
+Under PREEMPT_RT_FULL, schedule_work calls spinlocks which could sleep, so it's
+not safe to call schedule_work in interrupt context.
+
+Reference upstream commit b68d61c705ef02384c0538b8d9374545097899ca
+(rt,ntp: Move call to schedule_delayed_work() to helper thread)
+from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git, which
+makes a similar change.
+
+add a helper thread which does the call to schedule_work and wake up that
+thread instead of calling schedule_work directly.
+
+
+Signed-off-by: Yang Shi <yang.shi@windriver.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/time/hrtimer.c | 40 ++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 40 insertions(+)
+
+--- a/kernel/time/hrtimer.c
++++ b/kernel/time/hrtimer.c
+@@ -48,6 +48,7 @@
+ #include <linux/sched/rt.h>
+ #include <linux/sched/deadline.h>
+ #include <linux/timer.h>
++#include <linux/kthread.h>
+ #include <linux/freezer.h>
+
+ #include <asm/uaccess.h>
+@@ -713,6 +714,44 @@ static void clock_was_set_work(struct wo
+
+ static DECLARE_WORK(hrtimer_work, clock_was_set_work);
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++/*
++ * RT can not call schedule_work from real interrupt context.
++ * Need to make a thread to do the real work.
++ */
++static struct task_struct *clock_set_delay_thread;
++static bool do_clock_set_delay;
++
++static int run_clock_set_delay(void *ignore)
++{
++ while (!kthread_should_stop()) {
++ set_current_state(TASK_INTERRUPTIBLE);
++ if (do_clock_set_delay) {
++ do_clock_set_delay = false;
++ schedule_work(&hrtimer_work);
++ }
++ schedule();
++ }
++ __set_current_state(TASK_RUNNING);
++ return 0;
++}
++
++void clock_was_set_delayed(void)
++{
++ do_clock_set_delay = true;
++ /* Make visible before waking up process */
++ smp_wmb();
++ wake_up_process(clock_set_delay_thread);
++}
++
++static __init int create_clock_set_delay_thread(void)
++{
++ clock_set_delay_thread = kthread_run(run_clock_set_delay, NULL, "kclksetdelayd");
++ BUG_ON(!clock_set_delay_thread);
++ return 0;
++}
++early_initcall(create_clock_set_delay_thread);
++#else /* PREEMPT_RT_FULL */
+ /*
+ * Called from timekeeping and resume code to reprogramm the hrtimer
+ * interrupt device on all cpus.
+@@ -721,6 +760,7 @@ void clock_was_set_delayed(void)
+ {
+ schedule_work(&hrtimer_work);
+ }
++#endif
+
+ #else
+
diff --git a/patches/hrtimer-fixup-hrtimer-callback-changes-for-preempt-r.patch b/patches/hrtimer-fixup-hrtimer-callback-changes-for-preempt-r.patch
new file mode 100644
index 00000000000000..7e46774ae69e7e
--- /dev/null
+++ b/patches/hrtimer-fixup-hrtimer-callback-changes-for-preempt-r.patch
@@ -0,0 +1,462 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 3 Jul 2009 08:44:31 -0500
+Subject: hrtimer: Fixup hrtimer callback changes for preempt-rt
+
+In preempt-rt we can not call the callbacks which take sleeping locks
+from the timer interrupt context.
+
+Bring back the softirq split for now, until we fixed the signal
+delivery problem for real.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+
+---
+ include/linux/hrtimer.h | 3
+ kernel/sched/core.c | 1
+ kernel/sched/rt.c | 1
+ kernel/time/hrtimer.c | 219 +++++++++++++++++++++++++++++++++++++++++------
+ kernel/time/tick-sched.c | 1
+ kernel/watchdog.c | 1
+ 6 files changed, 200 insertions(+), 26 deletions(-)
+
+--- a/include/linux/hrtimer.h
++++ b/include/linux/hrtimer.h
+@@ -111,6 +111,8 @@ struct hrtimer {
+ enum hrtimer_restart (*function)(struct hrtimer *);
+ struct hrtimer_clock_base *base;
+ unsigned long state;
++ struct list_head cb_entry;
++ int irqsafe;
+ #ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
+ ktime_t praecox;
+ #endif
+@@ -150,6 +152,7 @@ struct hrtimer_clock_base {
+ int index;
+ clockid_t clockid;
+ struct timerqueue_head active;
++ struct list_head expired;
+ ktime_t resolution;
+ ktime_t (*get_time)(void);
+ ktime_t softirq_time;
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -461,6 +461,7 @@ static void init_rq_hrtick(struct rq *rq
+
+ hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ rq->hrtick_timer.function = hrtick;
++ rq->hrtick_timer.irqsafe = 1;
+ }
+ #else /* CONFIG_SCHED_HRTICK */
+ static inline void hrtick_clear(struct rq *rq)
+--- a/kernel/sched/rt.c
++++ b/kernel/sched/rt.c
+@@ -44,6 +44,7 @@ void init_rt_bandwidth(struct rt_bandwid
+
+ hrtimer_init(&rt_b->rt_period_timer,
+ CLOCK_MONOTONIC, HRTIMER_MODE_REL);
++ rt_b->rt_period_timer.irqsafe = 1;
+ rt_b->rt_period_timer.function = sched_rt_period_timer;
+ }
+
+--- a/kernel/time/hrtimer.c
++++ b/kernel/time/hrtimer.c
+@@ -577,8 +577,7 @@ static int hrtimer_reprogram(struct hrti
+ * When the callback is running, we do not reprogram the clock event
+ * device. The timer callback is either running on a different CPU or
+ * the callback is executed in the hrtimer_interrupt context. The
+- * reprogramming is handled either by the softirq, which called the
+- * callback or at the end of the hrtimer_interrupt.
++ * reprogramming is handled at the end of the hrtimer_interrupt.
+ */
+ if (hrtimer_callback_running(timer))
+ return 0;
+@@ -622,6 +621,9 @@ static int hrtimer_reprogram(struct hrti
+ return res;
+ }
+
++static void __run_hrtimer(struct hrtimer *timer, ktime_t *now);
++static int hrtimer_rt_defer(struct hrtimer *timer);
++
+ /*
+ * Initialize the high resolution related parts of cpu_base
+ */
+@@ -631,6 +633,21 @@ static inline void hrtimer_init_hres(str
+ base->hres_active = 0;
+ }
+
++static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
++ struct hrtimer_clock_base *base,
++ int wakeup)
++{
++ if (!hrtimer_reprogram(timer, base))
++ return 0;
++ if (!wakeup)
++ return -ETIME;
++#ifdef CONFIG_PREEMPT_RT_BASE
++ if (!hrtimer_rt_defer(timer))
++ return -ETIME;
++#endif
++ return 1;
++}
++
+ static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
+ {
+ ktime_t *offs_real = &base->clock_base[HRTIMER_BASE_REALTIME].offset;
+@@ -712,6 +729,13 @@ static inline int hrtimer_is_hres_enable
+ static inline int hrtimer_switch_to_hres(void) { return 0; }
+ static inline void
+ hrtimer_force_reprogram(struct hrtimer_cpu_base *base, int skip_equal) { }
++static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
++ struct hrtimer_clock_base *base,
++ int wakeup)
++{
++ return 0;
++}
++
+ static inline int hrtimer_reprogram(struct hrtimer *timer,
+ struct hrtimer_clock_base *base)
+ {
+@@ -719,7 +743,6 @@ static inline int hrtimer_reprogram(stru
+ }
+ static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base) { }
+ static inline void retrigger_next_event(void *arg) { }
+-
+ #endif /* CONFIG_HIGH_RES_TIMERS */
+
+ /*
+@@ -854,9 +877,9 @@ void hrtimer_wait_for_timer(const struct
+ {
+ struct hrtimer_clock_base *base = timer->base;
+
+- if (base && base->cpu_base && !hrtimer_hres_active())
++ if (base && base->cpu_base && !timer->irqsafe)
+ wait_event(base->cpu_base->wait,
+- !(timer->state & HRTIMER_STATE_CALLBACK));
++ !(timer->state & HRTIMER_STATE_CALLBACK));
+ }
+
+ #else
+@@ -906,6 +929,11 @@ static void __remove_hrtimer(struct hrti
+ if (!(timer->state & HRTIMER_STATE_ENQUEUED))
+ goto out;
+
++ if (unlikely(!list_empty(&timer->cb_entry))) {
++ list_del_init(&timer->cb_entry);
++ goto out;
++ }
++
+ next_timer = timerqueue_getnext(&base->active);
+ timerqueue_del(&base->active, &timer->node);
+ if (&timer->node == next_timer) {
+@@ -1016,15 +1044,26 @@ int __hrtimer_start_range_ns(struct hrti
+ * on dynticks target.
+ */
+ wake_up_nohz_cpu(new_base->cpu_base->cpu);
+- } else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases) &&
+- hrtimer_reprogram(timer, new_base)) {
++ } else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases)) {
++
++ ret = hrtimer_enqueue_reprogram(timer, new_base, wakeup);
++ if (ret < 0) {
++ /*
++ * In case we failed to reprogram the timer (mostly
++ * because out current timer is already elapsed),
++ * remove it again and report a failure. This avoids
++ * stale base->first entries.
++ */
++ debug_deactivate(timer);
++ __remove_hrtimer(timer, new_base,
++ timer->state & HRTIMER_STATE_CALLBACK, 0);
++ } else if (ret > 0) {
+ /*
+ * Only allow reprogramming if the new base is on this CPU.
+ * (it might still be on another CPU if the timer was pending)
+ *
+ * XXX send_remote_softirq() ?
+ */
+- if (wakeup) {
+ /*
+ * We need to drop cpu_base->lock to avoid a
+ * lock ordering issue vs. rq->lock.
+@@ -1032,9 +1071,7 @@ int __hrtimer_start_range_ns(struct hrti
+ raw_spin_unlock(&new_base->cpu_base->lock);
+ raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+ local_irq_restore(flags);
+- return ret;
+- } else {
+- __raise_softirq_irqoff(HRTIMER_SOFTIRQ);
++ return 0;
+ }
+ }
+
+@@ -1189,6 +1226,7 @@ static void __hrtimer_init(struct hrtime
+
+ base = hrtimer_clockid_to_base(clock_id);
+ timer->base = &cpu_base->clock_base[base];
++ INIT_LIST_HEAD(&timer->cb_entry);
+ timerqueue_init(&timer->node);
+
+ #ifdef CONFIG_TIMER_STATS
+@@ -1272,10 +1310,128 @@ static void __run_hrtimer(struct hrtimer
+ timer->state &= ~HRTIMER_STATE_CALLBACK;
+ }
+
+-#ifdef CONFIG_HIGH_RES_TIMERS
+-
+ static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++static void hrtimer_rt_reprogram(int restart, struct hrtimer *timer,
++ struct hrtimer_clock_base *base)
++{
++ /*
++ * Note, we clear the callback flag before we requeue the
++ * timer otherwise we trigger the callback_running() check
++ * in hrtimer_reprogram().
++ */
++ timer->state &= ~HRTIMER_STATE_CALLBACK;
++
++ if (restart != HRTIMER_NORESTART) {
++ BUG_ON(hrtimer_active(timer));
++ /*
++ * Enqueue the timer, if it's the leftmost timer then
++ * we need to reprogram it.
++ */
++ if (!enqueue_hrtimer(timer, base))
++ return;
++
++#ifndef CONFIG_HIGH_RES_TIMERS
++ }
++#else
++ if (base->cpu_base->hres_active &&
++ hrtimer_reprogram(timer, base))
++ goto requeue;
++
++ } else if (hrtimer_active(timer)) {
++ /*
++ * If the timer was rearmed on another CPU, reprogram
++ * the event device.
++ */
++ if (&timer->node == base->active.next &&
++ base->cpu_base->hres_active &&
++ hrtimer_reprogram(timer, base))
++ goto requeue;
++ }
++ return;
++
++requeue:
++ /*
++ * Timer is expired. Thus move it from tree to pending list
++ * again.
++ */
++ __remove_hrtimer(timer, base, timer->state, 0);
++ list_add_tail(&timer->cb_entry, &base->expired);
++#endif
++}
++
++/*
++ * The changes in mainline which removed the callback modes from
++ * hrtimer are not yet working with -rt. The non wakeup_process()
++ * based callbacks which involve sleeping locks need to be treated
++ * seperately.
++ */
++static void hrtimer_rt_run_pending(void)
++{
++ enum hrtimer_restart (*fn)(struct hrtimer *);
++ struct hrtimer_cpu_base *cpu_base;
++ struct hrtimer_clock_base *base;
++ struct hrtimer *timer;
++ int index, restart;
++
++ local_irq_disable();
++ cpu_base = &per_cpu(hrtimer_bases, smp_processor_id());
++
++ raw_spin_lock(&cpu_base->lock);
++
++ for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
++ base = &cpu_base->clock_base[index];
++
++ while (!list_empty(&base->expired)) {
++ timer = list_first_entry(&base->expired,
++ struct hrtimer, cb_entry);
++
++ /*
++ * Same as the above __run_hrtimer function
++ * just we run with interrupts enabled.
++ */
++ debug_hrtimer_deactivate(timer);
++ __remove_hrtimer(timer, base, HRTIMER_STATE_CALLBACK, 0);
++ timer_stats_account_hrtimer(timer);
++ fn = timer->function;
++
++ raw_spin_unlock_irq(&cpu_base->lock);
++ restart = fn(timer);
++ raw_spin_lock_irq(&cpu_base->lock);
++
++ hrtimer_rt_reprogram(restart, timer, base);
++ }
++ }
++
++ raw_spin_unlock_irq(&cpu_base->lock);
++
++ wake_up_timer_waiters(cpu_base);
++}
++
++static int hrtimer_rt_defer(struct hrtimer *timer)
++{
++ if (timer->irqsafe)
++ return 0;
++
++ __remove_hrtimer(timer, timer->base, timer->state, 0);
++ list_add_tail(&timer->cb_entry, &timer->base->expired);
++ return 1;
++}
++
++#else
++
++static inline void hrtimer_rt_run_pending(void)
++{
++ hrtimer_peek_ahead_timers();
++}
++
++static inline int hrtimer_rt_defer(struct hrtimer *timer) { return 0; }
++
++#endif
++
++#ifdef CONFIG_HIGH_RES_TIMERS
++
+ /*
+ * High resolution timer interrupt
+ * Called with interrupts disabled
+@@ -1284,7 +1440,7 @@ void hrtimer_interrupt(struct clock_even
+ {
+ struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases);
+ ktime_t expires_next, now, entry_time, delta;
+- int i, retries = 0;
++ int i, retries = 0, raise = 0;
+
+ BUG_ON(!cpu_base->hres_active);
+ cpu_base->nr_events++;
+@@ -1343,7 +1499,10 @@ void hrtimer_interrupt(struct clock_even
+ if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
+ break;
+
+- __run_hrtimer(timer, &basenow);
++ if (!hrtimer_rt_defer(timer))
++ __run_hrtimer(timer, &basenow);
++ else
++ raise = 1;
+ }
+ }
+ /* Reevaluate the clock bases for the next expiry */
+@@ -1360,6 +1519,10 @@ void hrtimer_interrupt(struct clock_even
+ if (expires_next.tv64 == KTIME_MAX ||
+ !tick_program_event(expires_next, 0)) {
+ cpu_base->hang_detected = 0;
++
++ if (raise)
++ raise_softirq_irqoff(HRTIMER_SOFTIRQ);
++
+ return;
+ }
+
+@@ -1439,18 +1602,18 @@ void hrtimer_peek_ahead_timers(void)
+ __hrtimer_peek_ahead_timers();
+ local_irq_restore(flags);
+ }
+-
+-static void run_hrtimer_softirq(struct softirq_action *h)
+-{
+- hrtimer_peek_ahead_timers();
+-}
+-
+ #else /* CONFIG_HIGH_RES_TIMERS */
+
+ static inline void __hrtimer_peek_ahead_timers(void) { }
+
+ #endif /* !CONFIG_HIGH_RES_TIMERS */
+
++
++static void run_hrtimer_softirq(struct softirq_action *h)
++{
++ hrtimer_rt_run_pending();
++}
++
+ /*
+ * Called from timer softirq every jiffy, expire hrtimers:
+ *
+@@ -1483,7 +1646,7 @@ void hrtimer_run_queues(void)
+ struct timerqueue_node *node;
+ struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases);
+ struct hrtimer_clock_base *base;
+- int index, gettime = 1;
++ int index, gettime = 1, raise = 0;
+
+ if (hrtimer_hres_active())
+ return;
+@@ -1508,12 +1671,16 @@ void hrtimer_run_queues(void)
+ hrtimer_get_expires_tv64(timer))
+ break;
+
+- __run_hrtimer(timer, &base->softirq_time);
++ if (!hrtimer_rt_defer(timer))
++ __run_hrtimer(timer, &base->softirq_time);
++ else
++ raise = 1;
+ }
+ raw_spin_unlock(&cpu_base->lock);
+ }
+
+- wake_up_timer_waiters(cpu_base);
++ if (raise)
++ raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+ }
+
+ /*
+@@ -1535,6 +1702,7 @@ static enum hrtimer_restart hrtimer_wake
+ void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, struct task_struct *task)
+ {
+ sl->timer.function = hrtimer_wakeup;
++ sl->timer.irqsafe = 1;
+ sl->task = task;
+ }
+ EXPORT_SYMBOL_GPL(hrtimer_init_sleeper);
+@@ -1671,6 +1839,7 @@ static void init_hrtimers_cpu(int cpu)
+ for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
+ cpu_base->clock_base[i].cpu_base = cpu_base;
+ timerqueue_init_head(&cpu_base->clock_base[i].active);
++ INIT_LIST_HEAD(&cpu_base->clock_base[i].expired);
+ }
+
+ cpu_base->cpu = cpu;
+@@ -1783,9 +1952,7 @@ void __init hrtimers_init(void)
+ hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
+ (void *)(long)smp_processor_id());
+ register_cpu_notifier(&hrtimers_nb);
+-#ifdef CONFIG_HIGH_RES_TIMERS
+ open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq);
+-#endif
+ }
+
+ /**
+--- a/kernel/time/tick-sched.c
++++ b/kernel/time/tick-sched.c
+@@ -1159,6 +1159,7 @@ void tick_setup_sched_timer(void)
+ * Emulate tick processing via per-CPU hrtimers:
+ */
+ hrtimer_init(&ts->sched_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
++ ts->sched_timer.irqsafe = 1;
+ ts->sched_timer.function = tick_sched_timer;
+
+ /* Get the next period (per cpu) */
+--- a/kernel/watchdog.c
++++ b/kernel/watchdog.c
+@@ -454,6 +454,7 @@ static void watchdog_enable(unsigned int
+ /* kick off the timer for the hardlockup detector */
+ hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ hrtimer->function = watchdog_timer_fn;
++ hrtimer->irqsafe = 1;
+
+ /* Enable the perf event */
+ watchdog_nmi_enable(cpu);
diff --git a/patches/hrtimer-raise-softirq-if-hrtimer-irq-stalled.patch b/patches/hrtimer-raise-softirq-if-hrtimer-irq-stalled.patch
new file mode 100644
index 00000000000000..3b19d7e0e3f646
--- /dev/null
+++ b/patches/hrtimer-raise-softirq-if-hrtimer-irq-stalled.patch
@@ -0,0 +1,37 @@
+Subject: hrtimer: Raise softirq if hrtimer irq stalled
+From: Watanabe <shunsuke.watanabe@tel.com>
+Date: Sun, 28 Oct 2012 11:13:44 +0100
+
+When the hrtimer stall detection hits the softirq is not raised.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/time/hrtimer.c | 9 ++++-----
+ 1 file changed, 4 insertions(+), 5 deletions(-)
+
+--- a/kernel/time/hrtimer.c
++++ b/kernel/time/hrtimer.c
+@@ -1519,11 +1519,7 @@ void hrtimer_interrupt(struct clock_even
+ if (expires_next.tv64 == KTIME_MAX ||
+ !tick_program_event(expires_next, 0)) {
+ cpu_base->hang_detected = 0;
+-
+- if (raise)
+- raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+-
+- return;
++ goto out;
+ }
+
+ /*
+@@ -1567,6 +1563,9 @@ void hrtimer_interrupt(struct clock_even
+ tick_program_event(expires_next, 1);
+ printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
+ ktime_to_ns(delta));
++out:
++ if (raise)
++ raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+ }
+
+ /*
diff --git a/patches/hrtimers-prepare-full-preemption.patch b/patches/hrtimers-prepare-full-preemption.patch
new file mode 100644
index 00000000000000..3edbdde07a7bc2
--- /dev/null
+++ b/patches/hrtimers-prepare-full-preemption.patch
@@ -0,0 +1,195 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:34 -0500
+Subject: hrtimers: Prepare full preemption
+
+Make cancellation of a running callback in softirq context safe
+against preemption.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/hrtimer.h | 10 ++++++++++
+ kernel/time/hrtimer.c | 33 ++++++++++++++++++++++++++++++++-
+ kernel/time/itimer.c | 1 +
+ kernel/time/posix-timers.c | 33 +++++++++++++++++++++++++++++++++
+ 4 files changed, 76 insertions(+), 1 deletion(-)
+
+--- a/include/linux/hrtimer.h
++++ b/include/linux/hrtimer.h
+@@ -197,6 +197,9 @@ struct hrtimer_cpu_base {
+ unsigned long nr_hangs;
+ ktime_t max_hang_time;
+ #endif
++#ifdef CONFIG_PREEMPT_RT_BASE
++ wait_queue_head_t wait;
++#endif
+ struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+ };
+
+@@ -384,6 +387,13 @@ static inline int hrtimer_restart(struct
+ return hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+ }
+
++/* Softirq preemption could deadlock timer removal */
++#ifdef CONFIG_PREEMPT_RT_BASE
++ extern void hrtimer_wait_for_timer(const struct hrtimer *timer);
++#else
++# define hrtimer_wait_for_timer(timer) do { cpu_relax(); } while (0)
++#endif
++
+ /* Query timers: */
+ extern ktime_t hrtimer_get_remaining(const struct hrtimer *timer);
+ extern int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp);
+--- a/kernel/time/hrtimer.c
++++ b/kernel/time/hrtimer.c
+@@ -837,6 +837,32 @@ u64 hrtimer_forward(struct hrtimer *time
+ }
+ EXPORT_SYMBOL_GPL(hrtimer_forward);
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++# define wake_up_timer_waiters(b) wake_up(&(b)->wait)
++
++/**
++ * hrtimer_wait_for_timer - Wait for a running timer
++ *
++ * @timer: timer to wait for
++ *
++ * The function waits in case the timers callback function is
++ * currently executed on the waitqueue of the timer base. The
++ * waitqueue is woken up after the timer callback function has
++ * finished execution.
++ */
++void hrtimer_wait_for_timer(const struct hrtimer *timer)
++{
++ struct hrtimer_clock_base *base = timer->base;
++
++ if (base && base->cpu_base && !hrtimer_hres_active())
++ wait_event(base->cpu_base->wait,
++ !(timer->state & HRTIMER_STATE_CALLBACK));
++}
++
++#else
++# define wake_up_timer_waiters(b) do { } while (0)
++#endif
++
+ /*
+ * enqueue_hrtimer - internal function to (re)start a timer
+ *
+@@ -1099,7 +1125,7 @@ int hrtimer_cancel(struct hrtimer *timer
+
+ if (ret >= 0)
+ return ret;
+- cpu_relax();
++ hrtimer_wait_for_timer(timer);
+ }
+ }
+ EXPORT_SYMBOL_GPL(hrtimer_cancel);
+@@ -1486,6 +1512,8 @@ void hrtimer_run_queues(void)
+ }
+ raw_spin_unlock(&cpu_base->lock);
+ }
++
++ wake_up_timer_waiters(cpu_base);
+ }
+
+ /*
+@@ -1647,6 +1675,9 @@ static void init_hrtimers_cpu(int cpu)
+
+ cpu_base->cpu = cpu;
+ hrtimer_init_hres(cpu_base);
++#ifdef CONFIG_PREEMPT_RT_BASE
++ init_waitqueue_head(&cpu_base->wait);
++#endif
+ }
+
+ #ifdef CONFIG_HOTPLUG_CPU
+--- a/kernel/time/itimer.c
++++ b/kernel/time/itimer.c
+@@ -213,6 +213,7 @@ int do_setitimer(int which, struct itime
+ /* We are sharing ->siglock with it_real_fn() */
+ if (hrtimer_try_to_cancel(timer) < 0) {
+ spin_unlock_irq(&tsk->sighand->siglock);
++ hrtimer_wait_for_timer(&tsk->signal->real_timer);
+ goto again;
+ }
+ expires = timeval_to_ktime(value->it_value);
+--- a/kernel/time/posix-timers.c
++++ b/kernel/time/posix-timers.c
+@@ -821,6 +821,20 @@ SYSCALL_DEFINE1(timer_getoverrun, timer_
+ return overrun;
+ }
+
++/*
++ * Protected by RCU!
++ */
++static void timer_wait_for_callback(struct k_clock *kc, struct k_itimer *timr)
++{
++#ifdef CONFIG_PREEMPT_RT_FULL
++ if (kc->timer_set == common_timer_set)
++ hrtimer_wait_for_timer(&timr->it.real.timer);
++ else
++ /* FIXME: Whacky hack for posix-cpu-timers */
++ schedule_timeout(1);
++#endif
++}
++
+ /* Set a POSIX.1b interval timer. */
+ /* timr->it_lock is taken. */
+ static int
+@@ -898,6 +912,7 @@ SYSCALL_DEFINE4(timer_settime, timer_t,
+ if (!timr)
+ return -EINVAL;
+
++ rcu_read_lock();
+ kc = clockid_to_kclock(timr->it_clock);
+ if (WARN_ON_ONCE(!kc || !kc->timer_set))
+ error = -EINVAL;
+@@ -906,9 +921,12 @@ SYSCALL_DEFINE4(timer_settime, timer_t,
+
+ unlock_timer(timr, flag);
+ if (error == TIMER_RETRY) {
++ timer_wait_for_callback(kc, timr);
+ rtn = NULL; // We already got the old time...
++ rcu_read_unlock();
+ goto retry;
+ }
++ rcu_read_unlock();
+
+ if (old_setting && !error &&
+ copy_to_user(old_setting, &old_spec, sizeof (old_spec)))
+@@ -946,10 +964,15 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t
+ if (!timer)
+ return -EINVAL;
+
++ rcu_read_lock();
+ if (timer_delete_hook(timer) == TIMER_RETRY) {
+ unlock_timer(timer, flags);
++ timer_wait_for_callback(clockid_to_kclock(timer->it_clock),
++ timer);
++ rcu_read_unlock();
+ goto retry_delete;
+ }
++ rcu_read_unlock();
+
+ spin_lock(&current->sighand->siglock);
+ list_del(&timer->list);
+@@ -975,8 +998,18 @@ static void itimer_delete(struct k_itime
+ retry_delete:
+ spin_lock_irqsave(&timer->it_lock, flags);
+
++ /* On RT we can race with a deletion */
++ if (!timer->it_signal) {
++ unlock_timer(timer, flags);
++ return;
++ }
++
+ if (timer_delete_hook(timer) == TIMER_RETRY) {
++ rcu_read_lock();
+ unlock_timer(timer, flags);
++ timer_wait_for_callback(clockid_to_kclock(timer->it_clock),
++ timer);
++ rcu_read_unlock();
+ goto retry_delete;
+ }
+ list_del(&timer->list);
diff --git a/patches/hwlat-detector-Don-t-ignore-threshold-module-paramet.patch b/patches/hwlat-detector-Don-t-ignore-threshold-module-paramet.patch
new file mode 100644
index 00000000000000..e4cba99d921c1b
--- /dev/null
+++ b/patches/hwlat-detector-Don-t-ignore-threshold-module-paramet.patch
@@ -0,0 +1,25 @@
+From: Mike Galbraith <bitbucket@online.de>
+Date: Fri, 30 Aug 2013 07:57:25 +0200
+Subject: hwlat-detector: Don't ignore threshold module parameter
+
+If the user specified a threshold at module load time, use it.
+
+
+Acked-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Mike Galbraith <bitbucket@online.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/misc/hwlat_detector.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/misc/hwlat_detector.c
++++ b/drivers/misc/hwlat_detector.c
+@@ -414,7 +414,7 @@ static int init_stats(void)
+ goto out;
+
+ __reset_stats();
+- data.threshold = DEFAULT_LAT_THRESHOLD; /* threshold us */
++ data.threshold = threshold ?: DEFAULT_LAT_THRESHOLD; /* threshold us */
+ data.sample_window = DEFAULT_SAMPLE_WINDOW; /* window us */
+ data.sample_width = DEFAULT_SAMPLE_WIDTH; /* width us */
+
diff --git a/patches/hwlat-detector-Update-hwlat_detector-to-add-outer-lo.patch b/patches/hwlat-detector-Update-hwlat_detector-to-add-outer-lo.patch
new file mode 100644
index 00000000000000..58f97a82de9c96
--- /dev/null
+++ b/patches/hwlat-detector-Update-hwlat_detector-to-add-outer-lo.patch
@@ -0,0 +1,125 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Mon, 19 Aug 2013 17:33:25 -0400
+Subject: hwlat-detector: Update hwlat_detector to add outer loop detection
+
+The hwlat_detector reads two timestamps in a row, then reports any
+gap between those calls. The problem is, it misses everything between
+the second reading of the time stamp to the first reading of the time stamp
+in the next loop. That's were most of the time is spent, which means,
+chances are likely that it will miss all hardware latencies. This
+defeats the purpose.
+
+By also testing the first time stamp from the previous loop second
+time stamp (the outer loop), we are more likely to find a latency.
+
+Setting the threshold to 1, here's what the report now looks like:
+
+1347415723.0232202770 0 2
+1347415725.0234202822 0 2
+1347415727.0236202875 0 2
+1347415729.0238202928 0 2
+1347415731.0240202980 0 2
+1347415734.0243203061 0 2
+1347415736.0245203113 0 2
+1347415738.0247203166 2 0
+1347415740.0249203219 0 3
+1347415742.0251203272 0 3
+1347415743.0252203299 0 3
+1347415745.0254203351 0 2
+1347415747.0256203404 0 2
+1347415749.0258203457 0 2
+1347415751.0260203510 0 2
+1347415754.0263203589 0 2
+1347415756.0265203642 0 2
+1347415758.0267203695 0 2
+1347415760.0269203748 0 2
+1347415762.0271203801 0 2
+1347415764.0273203853 2 0
+
+There's some hardware latency that takes 2 microseconds to run.
+
+Signed-off-by: Steven Rostedt <srostedt@redhat.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/misc/hwlat_detector.c | 32 ++++++++++++++++++++++++++------
+ 1 file changed, 26 insertions(+), 6 deletions(-)
+
+--- a/drivers/misc/hwlat_detector.c
++++ b/drivers/misc/hwlat_detector.c
+@@ -143,6 +143,7 @@ static void detector_exit(void);
+ struct sample {
+ u64 seqnum; /* unique sequence */
+ u64 duration; /* ktime delta */
++ u64 outer_duration; /* ktime delta (outer loop) */
+ struct timespec timestamp; /* wall time */
+ unsigned long lost;
+ };
+@@ -219,11 +220,13 @@ static struct sample *buffer_get_sample(
+ */
+ static int get_sample(void *unused)
+ {
+- ktime_t start, t1, t2;
++ ktime_t start, t1, t2, last_t2;
+ s64 diff, total = 0;
+ u64 sample = 0;
++ u64 outer_sample = 0;
+ int ret = 1;
+
++ last_t2.tv64 = 0;
+ start = ktime_get(); /* start timestamp */
+
+ do {
+@@ -231,7 +234,22 @@ static int get_sample(void *unused)
+ t1 = ktime_get(); /* we'll look for a discontinuity */
+ t2 = ktime_get();
+
++ if (last_t2.tv64) {
++ /* Check the delta from outer loop (t2 to next t1) */
++ diff = ktime_to_us(ktime_sub(t1, last_t2));
++ /* This shouldn't happen */
++ if (diff < 0) {
++ pr_err(BANNER "time running backwards\n");
++ goto out;
++ }
++ if (diff > outer_sample)
++ outer_sample = diff;
++ }
++ last_t2 = t2;
++
+ total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
++
++ /* This checks the inner loop (t1 to t2) */
+ diff = ktime_to_us(ktime_sub(t2, t1)); /* current diff */
+
+ /* This shouldn't happen */
+@@ -246,12 +264,13 @@ static int get_sample(void *unused)
+ } while (total <= data.sample_width);
+
+ /* If we exceed the threshold value, we have found a hardware latency */
+- if (sample > data.threshold) {
++ if (sample > data.threshold || outer_sample > data.threshold) {
+ struct sample s;
+
+ data.count++;
+ s.seqnum = data.count;
+ s.duration = sample;
++ s.outer_duration = outer_sample;
+ s.timestamp = CURRENT_TIME;
+ __buffer_add_sample(&s);
+
+@@ -738,10 +757,11 @@ static ssize_t debug_sample_fread(struct
+ }
+ }
+
+- len = snprintf(buf, sizeof(buf), "%010lu.%010lu\t%llu\n",
+- sample->timestamp.tv_sec,
+- sample->timestamp.tv_nsec,
+- sample->duration);
++ len = snprintf(buf, sizeof(buf), "%010lu.%010lu\t%llu\t%llu\n",
++ sample->timestamp.tv_sec,
++ sample->timestamp.tv_nsec,
++ sample->duration,
++ sample->outer_duration);
+
+
+ /* handling partial reads is more trouble than it's worth */
diff --git a/patches/hwlat-detector-Use-thread-instead-of-stop-machine.patch b/patches/hwlat-detector-Use-thread-instead-of-stop-machine.patch
new file mode 100644
index 00000000000000..fe1a4358324422
--- /dev/null
+++ b/patches/hwlat-detector-Use-thread-instead-of-stop-machine.patch
@@ -0,0 +1,183 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Mon, 19 Aug 2013 17:33:27 -0400
+Subject: hwlat-detector: Use thread instead of stop machine
+
+There's no reason to use stop machine to search for hardware latency.
+Simply disabling interrupts while running the loop will do enough to
+check if something comes in that wasn't disabled by interrupts being
+off, which is exactly what stop machine does.
+
+Instead of using stop machine, just have the thread disable interrupts
+while it checks for hardware latency.
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/misc/hwlat_detector.c | 60 ++++++++++++++++++------------------------
+ 1 file changed, 26 insertions(+), 34 deletions(-)
+
+--- a/drivers/misc/hwlat_detector.c
++++ b/drivers/misc/hwlat_detector.c
+@@ -41,7 +41,6 @@
+ #include <linux/module.h>
+ #include <linux/init.h>
+ #include <linux/ring_buffer.h>
+-#include <linux/stop_machine.h>
+ #include <linux/time.h>
+ #include <linux/hrtimer.h>
+ #include <linux/kthread.h>
+@@ -107,7 +106,6 @@ struct data; /* Global state */
+ /* Sampling functions */
+ static int __buffer_add_sample(struct sample *sample);
+ static struct sample *buffer_get_sample(struct sample *sample);
+-static int get_sample(void *unused);
+
+ /* Threading and state */
+ static int kthread_fn(void *unused);
+@@ -149,7 +147,7 @@ struct sample {
+ unsigned long lost;
+ };
+
+-/* keep the global state somewhere. Mostly used under stop_machine. */
++/* keep the global state somewhere. */
+ static struct data {
+
+ struct mutex lock; /* protect changes */
+@@ -172,7 +170,7 @@ static struct data {
+ * @sample: The new latency sample value
+ *
+ * This receives a new latency sample and records it in a global ring buffer.
+- * No additional locking is used in this case - suited for stop_machine use.
++ * No additional locking is used in this case.
+ */
+ static int __buffer_add_sample(struct sample *sample)
+ {
+@@ -229,18 +227,18 @@ static struct sample *buffer_get_sample(
+ #endif
+ /**
+ * get_sample - sample the CPU TSC and look for likely hardware latencies
+- * @unused: This is not used but is a part of the stop_machine API
+ *
+ * Used to repeatedly capture the CPU TSC (or similar), looking for potential
+- * hardware-induced latency. Called under stop_machine, with data.lock held.
++ * hardware-induced latency. Called with interrupts disabled and with
++ * data.lock held.
+ */
+-static int get_sample(void *unused)
++static int get_sample(void)
+ {
+ time_type start, t1, t2, last_t2;
+ s64 diff, total = 0;
+ u64 sample = 0;
+ u64 outer_sample = 0;
+- int ret = 1;
++ int ret = -1;
+
+ init_time(last_t2, 0);
+ start = time_get(); /* start timestamp */
+@@ -279,10 +277,14 @@ static int get_sample(void *unused)
+
+ } while (total <= data.sample_width);
+
++ ret = 0;
++
+ /* If we exceed the threshold value, we have found a hardware latency */
+ if (sample > data.threshold || outer_sample > data.threshold) {
+ struct sample s;
+
++ ret = 1;
++
+ data.count++;
+ s.seqnum = data.count;
+ s.duration = sample;
+@@ -295,7 +297,6 @@ static int get_sample(void *unused)
+ data.max_sample = sample;
+ }
+
+- ret = 0;
+ out:
+ return ret;
+ }
+@@ -305,32 +306,30 @@ static int get_sample(void *unused)
+ * @unused: A required part of the kthread API.
+ *
+ * Used to periodically sample the CPU TSC via a call to get_sample. We
+- * use stop_machine, whith does (intentionally) introduce latency since we
++ * disable interrupts, which does (intentionally) introduce latency since we
+ * need to ensure nothing else might be running (and thus pre-empting).
+ * Obviously this should never be used in production environments.
+ *
+- * stop_machine will schedule us typically only on CPU0 which is fine for
+- * almost every real-world hardware latency situation - but we might later
+- * generalize this if we find there are any actualy systems with alternate
+- * SMI delivery or other non CPU0 hardware latencies.
++ * Currently this runs on which ever CPU it was scheduled on, but most
++ * real-worald hardware latency situations occur across several CPUs,
++ * but we might later generalize this if we find there are any actualy
++ * systems with alternate SMI delivery or other hardware latencies.
+ */
+ static int kthread_fn(void *unused)
+ {
+- int err = 0;
+- u64 interval = 0;
++ int ret;
++ u64 interval;
+
+ while (!kthread_should_stop()) {
+
+ mutex_lock(&data.lock);
+
+- err = stop_machine(get_sample, unused, 0);
+- if (err) {
+- /* Houston, we have a problem */
+- mutex_unlock(&data.lock);
+- goto err_out;
+- }
++ local_irq_disable();
++ ret = get_sample();
++ local_irq_enable();
+
+- wake_up(&data.wq); /* wake up reader(s) */
++ if (ret > 0)
++ wake_up(&data.wq); /* wake up reader(s) */
+
+ interval = data.sample_window - data.sample_width;
+ do_div(interval, USEC_PER_MSEC); /* modifies interval value */
+@@ -338,15 +337,10 @@ static int kthread_fn(void *unused)
+ mutex_unlock(&data.lock);
+
+ if (msleep_interruptible(interval))
+- goto out;
++ break;
+ }
+- goto out;
+-err_out:
+- pr_err(BANNER "could not call stop_machine, disabling\n");
+- enabled = 0;
+-out:
+- return err;
+
++ return 0;
+ }
+
+ /**
+@@ -442,8 +436,7 @@ static int init_stats(void)
+ * This function provides a generic read implementation for the global state
+ * "data" structure debugfs filesystem entries. It would be nice to use
+ * simple_attr_read directly, but we need to make sure that the data.lock
+- * spinlock is held during the actual read (even though we likely won't ever
+- * actually race here as the updater runs under a stop_machine context).
++ * is held during the actual read.
+ */
+ static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
+ size_t cnt, loff_t *ppos, const u64 *entry)
+@@ -478,8 +471,7 @@ static ssize_t simple_data_read(struct f
+ * This function provides a generic write implementation for the global state
+ * "data" structure debugfs filesystem entries. It would be nice to use
+ * simple_attr_write directly, but we need to make sure that the data.lock
+- * spinlock is held during the actual write (even though we likely won't ever
+- * actually race here as the updater runs under a stop_machine context).
++ * is held during the actual write.
+ */
+ static ssize_t simple_data_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos, u64 *entry)
diff --git a/patches/hwlat-detector-Use-trace_clock_local-if-available.patch b/patches/hwlat-detector-Use-trace_clock_local-if-available.patch
new file mode 100644
index 00000000000000..a45adaaf7767fc
--- /dev/null
+++ b/patches/hwlat-detector-Use-trace_clock_local-if-available.patch
@@ -0,0 +1,92 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Mon, 19 Aug 2013 17:33:26 -0400
+Subject: hwlat-detector: Use trace_clock_local if available
+
+As ktime_get() calls into the timing code which does a read_seq(), it
+may be affected by other CPUS that touch that lock. To remove this
+dependency, use the trace_clock_local() which is already exported
+for module use. If CONFIG_TRACING is enabled, use that as the clock,
+otherwise use ktime_get().
+
+Signed-off-by: Steven Rostedt <srostedt@redhat.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/misc/hwlat_detector.c | 34 +++++++++++++++++++++++++---------
+ 1 file changed, 25 insertions(+), 9 deletions(-)
+
+--- a/drivers/misc/hwlat_detector.c
++++ b/drivers/misc/hwlat_detector.c
+@@ -51,6 +51,7 @@
+ #include <linux/version.h>
+ #include <linux/delay.h>
+ #include <linux/slab.h>
++#include <linux/trace_clock.h>
+
+ #define BUF_SIZE_DEFAULT 262144UL /* 8K*(sizeof(entry)) */
+ #define BUF_FLAGS (RB_FL_OVERWRITE) /* no block on full */
+@@ -211,6 +212,21 @@ static struct sample *buffer_get_sample(
+ return sample;
+ }
+
++#ifndef CONFIG_TRACING
++#define time_type ktime_t
++#define time_get() ktime_get()
++#define time_to_us(x) ktime_to_us(x)
++#define time_sub(a, b) ktime_sub(a, b)
++#define init_time(a, b) (a).tv64 = b
++#define time_u64(a) ((a).tv64)
++#else
++#define time_type u64
++#define time_get() trace_clock_local()
++#define time_to_us(x) div_u64(x, 1000)
++#define time_sub(a, b) ((a) - (b))
++#define init_time(a, b) (a = b)
++#define time_u64(a) a
++#endif
+ /**
+ * get_sample - sample the CPU TSC and look for likely hardware latencies
+ * @unused: This is not used but is a part of the stop_machine API
+@@ -220,23 +236,23 @@ static struct sample *buffer_get_sample(
+ */
+ static int get_sample(void *unused)
+ {
+- ktime_t start, t1, t2, last_t2;
++ time_type start, t1, t2, last_t2;
+ s64 diff, total = 0;
+ u64 sample = 0;
+ u64 outer_sample = 0;
+ int ret = 1;
+
+- last_t2.tv64 = 0;
+- start = ktime_get(); /* start timestamp */
++ init_time(last_t2, 0);
++ start = time_get(); /* start timestamp */
+
+ do {
+
+- t1 = ktime_get(); /* we'll look for a discontinuity */
+- t2 = ktime_get();
++ t1 = time_get(); /* we'll look for a discontinuity */
++ t2 = time_get();
+
+- if (last_t2.tv64) {
++ if (time_u64(last_t2)) {
+ /* Check the delta from outer loop (t2 to next t1) */
+- diff = ktime_to_us(ktime_sub(t1, last_t2));
++ diff = time_to_us(time_sub(t1, last_t2));
+ /* This shouldn't happen */
+ if (diff < 0) {
+ pr_err(BANNER "time running backwards\n");
+@@ -247,10 +263,10 @@ static int get_sample(void *unused)
+ }
+ last_t2 = t2;
+
+- total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
++ total = time_to_us(time_sub(t2, start)); /* sample width */
+
+ /* This checks the inner loop (t1 to t2) */
+- diff = ktime_to_us(ktime_sub(t2, t1)); /* current diff */
++ diff = time_to_us(time_sub(t2, t1)); /* current diff */
+
+ /* This shouldn't happen */
+ if (diff < 0) {
diff --git a/patches/hwlatdetect.patch b/patches/hwlatdetect.patch
new file mode 100644
index 00000000000000..b77f79ffff7bd2
--- /dev/null
+++ b/patches/hwlatdetect.patch
@@ -0,0 +1,1347 @@
+Subject: hwlatdetect.patch
+From: Carsten Emde <C.Emde@osadl.org>
+Date: Tue, 19 Jul 2011 13:53:12 +0100
+
+Jon Masters developed this wonderful SMI detector. For details please
+consult Documentation/hwlat_detector.txt. It could be ported to Linux
+3.0 RT without any major change.
+
+Signed-off-by: Carsten Emde <C.Emde@osadl.org>
+
+---
+ Documentation/hwlat_detector.txt | 64 ++
+ drivers/misc/Kconfig | 29
+ drivers/misc/Makefile | 1
+ drivers/misc/hwlat_detector.c | 1212 +++++++++++++++++++++++++++++++++++++++
+ 4 files changed, 1306 insertions(+)
+
+--- /dev/null
++++ b/Documentation/hwlat_detector.txt
+@@ -0,0 +1,64 @@
++Introduction:
++-------------
++
++The module hwlat_detector is a special purpose kernel module that is used to
++detect large system latencies induced by the behavior of certain underlying
++hardware or firmware, independent of Linux itself. The code was developed
++originally to detect SMIs (System Management Interrupts) on x86 systems,
++however there is nothing x86 specific about this patchset. It was
++originally written for use by the "RT" patch since the Real Time
++kernel is highly latency sensitive.
++
++SMIs are usually not serviced by the Linux kernel, which typically does not
++even know that they are occuring. SMIs are instead are set up by BIOS code
++and are serviced by BIOS code, usually for "critical" events such as
++management of thermal sensors and fans. Sometimes though, SMIs are used for
++other tasks and those tasks can spend an inordinate amount of time in the
++handler (sometimes measured in milliseconds). Obviously this is a problem if
++you are trying to keep event service latencies down in the microsecond range.
++
++The hardware latency detector works by hogging all of the cpus for configurable
++amounts of time (by calling stop_machine()), polling the CPU Time Stamp Counter
++for some period, then looking for gaps in the TSC data. Any gap indicates a
++time when the polling was interrupted and since the machine is stopped and
++interrupts turned off the only thing that could do that would be an SMI.
++
++Note that the SMI detector should *NEVER* be used in a production environment.
++It is intended to be run manually to determine if the hardware platform has a
++problem with long system firmware service routines.
++
++Usage:
++------
++
++Loading the module hwlat_detector passing the parameter "enabled=1" (or by
++setting the "enable" entry in "hwlat_detector" debugfs toggled on) is the only
++step required to start the hwlat_detector. It is possible to redefine the
++threshold in microseconds (us) above which latency spikes will be taken
++into account (parameter "threshold=").
++
++Example:
++
++ # modprobe hwlat_detector enabled=1 threshold=100
++
++After the module is loaded, it creates a directory named "hwlat_detector" under
++the debugfs mountpoint, "/debug/hwlat_detector" for this text. It is necessary
++to have debugfs mounted, which might be on /sys/debug on your system.
++
++The /debug/hwlat_detector interface contains the following files:
++
++count - number of latency spikes observed since last reset
++enable - a global enable/disable toggle (0/1), resets count
++max - maximum hardware latency actually observed (usecs)
++sample - a pipe from which to read current raw sample data
++ in the format <timestamp> <latency observed usecs>
++ (can be opened O_NONBLOCK for a single sample)
++threshold - minimum latency value to be considered (usecs)
++width - time period to sample with CPUs held (usecs)
++ must be less than the total window size (enforced)
++window - total period of sampling, width being inside (usecs)
++
++By default we will set width to 500,000 and window to 1,000,000, meaning that
++we will sample every 1,000,000 usecs (1s) for 500,000 usecs (0.5s). If we
++observe any latencies that exceed the threshold (initially 100 usecs),
++then we write to a global sample ring buffer of 8K samples, which is
++consumed by reading from the "sample" (pipe) debugfs file interface.
+--- a/drivers/misc/Kconfig
++++ b/drivers/misc/Kconfig
+@@ -121,6 +121,35 @@ config IBM_ASM
+ for information on the specific driver level and support statement
+ for your IBM server.
+
++config HWLAT_DETECTOR
++ tristate "Testing module to detect hardware-induced latencies"
++ depends on DEBUG_FS
++ depends on RING_BUFFER
++ default m
++ ---help---
++ A simple hardware latency detector. Use this module to detect
++ large latencies introduced by the behavior of the underlying
++ system firmware external to Linux. We do this using periodic
++ use of stop_machine to grab all available CPUs and measure
++ for unexplainable gaps in the CPU timestamp counter(s). By
++ default, the module is not enabled until the "enable" file
++ within the "hwlat_detector" debugfs directory is toggled.
++
++ This module is often used to detect SMI (System Management
++ Interrupts) on x86 systems, though is not x86 specific. To
++ this end, we default to using a sample window of 1 second,
++ during which we will sample for 0.5 seconds. If an SMI or
++ similar event occurs during that time, it is recorded
++ into an 8K samples global ring buffer until retreived.
++
++ WARNING: This software should never be enabled (it can be built
++ but should not be turned on after it is loaded) in a production
++ environment where high latencies are a concern since the
++ sampling mechanism actually introduces latencies for
++ regular tasks while the CPU(s) are being held.
++
++ If unsure, say N
++
+ config PHANTOM
+ tristate "Sensable PHANToM (PCI)"
+ depends on PCI
+--- a/drivers/misc/Makefile
++++ b/drivers/misc/Makefile
+@@ -38,6 +38,7 @@ obj-$(CONFIG_C2PORT) += c2port/
+ obj-$(CONFIG_HMC6352) += hmc6352.o
+ obj-y += eeprom/
+ obj-y += cb710/
++obj-$(CONFIG_HWLAT_DETECTOR) += hwlat_detector.o
+ obj-$(CONFIG_SPEAR13XX_PCIE_GADGET) += spear13xx_pcie_gadget.o
+ obj-$(CONFIG_VMWARE_BALLOON) += vmw_balloon.o
+ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
+--- /dev/null
++++ b/drivers/misc/hwlat_detector.c
+@@ -0,0 +1,1212 @@
++/*
++ * hwlat_detector.c - A simple Hardware Latency detector.
++ *
++ * Use this module to detect large system latencies induced by the behavior of
++ * certain underlying system hardware or firmware, independent of Linux itself.
++ * The code was developed originally to detect the presence of SMIs on Intel
++ * and AMD systems, although there is no dependency upon x86 herein.
++ *
++ * The classical example usage of this module is in detecting the presence of
++ * SMIs or System Management Interrupts on Intel and AMD systems. An SMI is a
++ * somewhat special form of hardware interrupt spawned from earlier CPU debug
++ * modes in which the (BIOS/EFI/etc.) firmware arranges for the South Bridge
++ * LPC (or other device) to generate a special interrupt under certain
++ * circumstances, for example, upon expiration of a special SMI timer device,
++ * due to certain external thermal readings, on certain I/O address accesses,
++ * and other situations. An SMI hits a special CPU pin, triggers a special
++ * SMI mode (complete with special memory map), and the OS is unaware.
++ *
++ * Although certain hardware-inducing latencies are necessary (for example,
++ * a modern system often requires an SMI handler for correct thermal control
++ * and remote management) they can wreak havoc upon any OS-level performance
++ * guarantees toward low-latency, especially when the OS is not even made
++ * aware of the presence of these interrupts. For this reason, we need a
++ * somewhat brute force mechanism to detect these interrupts. In this case,
++ * we do it by hogging all of the CPU(s) for configurable timer intervals,
++ * sampling the built-in CPU timer, looking for discontiguous readings.
++ *
++ * WARNING: This implementation necessarily introduces latencies. Therefore,
++ * you should NEVER use this module in a production environment
++ * requiring any kind of low-latency performance guarantee(s).
++ *
++ * Copyright (C) 2008-2009 Jon Masters, Red Hat, Inc. <jcm@redhat.com>
++ *
++ * Includes useful feedback from Clark Williams <clark@redhat.com>
++ *
++ * This file is licensed under the terms of the GNU General Public
++ * License version 2. This program is licensed "as is" without any
++ * warranty of any kind, whether express or implied.
++ */
++
++#include <linux/module.h>
++#include <linux/init.h>
++#include <linux/ring_buffer.h>
++#include <linux/stop_machine.h>
++#include <linux/time.h>
++#include <linux/hrtimer.h>
++#include <linux/kthread.h>
++#include <linux/debugfs.h>
++#include <linux/seq_file.h>
++#include <linux/uaccess.h>
++#include <linux/version.h>
++#include <linux/delay.h>
++#include <linux/slab.h>
++
++#define BUF_SIZE_DEFAULT 262144UL /* 8K*(sizeof(entry)) */
++#define BUF_FLAGS (RB_FL_OVERWRITE) /* no block on full */
++#define U64STR_SIZE 22 /* 20 digits max */
++
++#define VERSION "1.0.0"
++#define BANNER "hwlat_detector: "
++#define DRVNAME "hwlat_detector"
++#define DEFAULT_SAMPLE_WINDOW 1000000 /* 1s */
++#define DEFAULT_SAMPLE_WIDTH 500000 /* 0.5s */
++#define DEFAULT_LAT_THRESHOLD 10 /* 10us */
++
++/* Module metadata */
++
++MODULE_LICENSE("GPL");
++MODULE_AUTHOR("Jon Masters <jcm@redhat.com>");
++MODULE_DESCRIPTION("A simple hardware latency detector");
++MODULE_VERSION(VERSION);
++
++/* Module parameters */
++
++static int debug;
++static int enabled;
++static int threshold;
++
++module_param(debug, int, 0); /* enable debug */
++module_param(enabled, int, 0); /* enable detector */
++module_param(threshold, int, 0); /* latency threshold */
++
++/* Buffering and sampling */
++
++static struct ring_buffer *ring_buffer; /* sample buffer */
++static DEFINE_MUTEX(ring_buffer_mutex); /* lock changes */
++static unsigned long buf_size = BUF_SIZE_DEFAULT;
++static struct task_struct *kthread; /* sampling thread */
++
++/* DebugFS filesystem entries */
++
++static struct dentry *debug_dir; /* debugfs directory */
++static struct dentry *debug_max; /* maximum TSC delta */
++static struct dentry *debug_count; /* total detect count */
++static struct dentry *debug_sample_width; /* sample width us */
++static struct dentry *debug_sample_window; /* sample window us */
++static struct dentry *debug_sample; /* raw samples us */
++static struct dentry *debug_threshold; /* threshold us */
++static struct dentry *debug_enable; /* enable/disable */
++
++/* Individual samples and global state */
++
++struct sample; /* latency sample */
++struct data; /* Global state */
++
++/* Sampling functions */
++static int __buffer_add_sample(struct sample *sample);
++static struct sample *buffer_get_sample(struct sample *sample);
++static int get_sample(void *unused);
++
++/* Threading and state */
++static int kthread_fn(void *unused);
++static int start_kthread(void);
++static int stop_kthread(void);
++static void __reset_stats(void);
++static int init_stats(void);
++
++/* Debugfs interface */
++static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos, const u64 *entry);
++static ssize_t simple_data_write(struct file *filp, const char __user *ubuf,
++ size_t cnt, loff_t *ppos, u64 *entry);
++static int debug_sample_fopen(struct inode *inode, struct file *filp);
++static ssize_t debug_sample_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos);
++static int debug_sample_release(struct inode *inode, struct file *filp);
++static int debug_enable_fopen(struct inode *inode, struct file *filp);
++static ssize_t debug_enable_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos);
++static ssize_t debug_enable_fwrite(struct file *file,
++ const char __user *user_buffer,
++ size_t user_size, loff_t *offset);
++
++/* Initialization functions */
++static int init_debugfs(void);
++static void free_debugfs(void);
++static int detector_init(void);
++static void detector_exit(void);
++
++/* Individual latency samples are stored here when detected and packed into
++ * the ring_buffer circular buffer, where they are overwritten when
++ * more than buf_size/sizeof(sample) samples are received. */
++struct sample {
++ u64 seqnum; /* unique sequence */
++ u64 duration; /* ktime delta */
++ struct timespec timestamp; /* wall time */
++ unsigned long lost;
++};
++
++/* keep the global state somewhere. Mostly used under stop_machine. */
++static struct data {
++
++ struct mutex lock; /* protect changes */
++
++ u64 count; /* total since reset */
++ u64 max_sample; /* max hardware latency */
++ u64 threshold; /* sample threshold level */
++
++ u64 sample_window; /* total sampling window (on+off) */
++ u64 sample_width; /* active sampling portion of window */
++
++ atomic_t sample_open; /* whether the sample file is open */
++
++ wait_queue_head_t wq; /* waitqeue for new sample values */
++
++} data;
++
++/**
++ * __buffer_add_sample - add a new latency sample recording to the ring buffer
++ * @sample: The new latency sample value
++ *
++ * This receives a new latency sample and records it in a global ring buffer.
++ * No additional locking is used in this case - suited for stop_machine use.
++ */
++static int __buffer_add_sample(struct sample *sample)
++{
++ return ring_buffer_write(ring_buffer,
++ sizeof(struct sample), sample);
++}
++
++/**
++ * buffer_get_sample - remove a hardware latency sample from the ring buffer
++ * @sample: Pre-allocated storage for the sample
++ *
++ * This retrieves a hardware latency sample from the global circular buffer
++ */
++static struct sample *buffer_get_sample(struct sample *sample)
++{
++ struct ring_buffer_event *e = NULL;
++ struct sample *s = NULL;
++ unsigned int cpu = 0;
++
++ if (!sample)
++ return NULL;
++
++ mutex_lock(&ring_buffer_mutex);
++ for_each_online_cpu(cpu) {
++ e = ring_buffer_consume(ring_buffer, cpu, NULL, &sample->lost);
++ if (e)
++ break;
++ }
++
++ if (e) {
++ s = ring_buffer_event_data(e);
++ memcpy(sample, s, sizeof(struct sample));
++ } else
++ sample = NULL;
++ mutex_unlock(&ring_buffer_mutex);
++
++ return sample;
++}
++
++/**
++ * get_sample - sample the CPU TSC and look for likely hardware latencies
++ * @unused: This is not used but is a part of the stop_machine API
++ *
++ * Used to repeatedly capture the CPU TSC (or similar), looking for potential
++ * hardware-induced latency. Called under stop_machine, with data.lock held.
++ */
++static int get_sample(void *unused)
++{
++ ktime_t start, t1, t2;
++ s64 diff, total = 0;
++ u64 sample = 0;
++ int ret = 1;
++
++ start = ktime_get(); /* start timestamp */
++
++ do {
++
++ t1 = ktime_get(); /* we'll look for a discontinuity */
++ t2 = ktime_get();
++
++ total = ktime_to_us(ktime_sub(t2, start)); /* sample width */
++ diff = ktime_to_us(ktime_sub(t2, t1)); /* current diff */
++
++ /* This shouldn't happen */
++ if (diff < 0) {
++ pr_err(BANNER "time running backwards\n");
++ goto out;
++ }
++
++ if (diff > sample)
++ sample = diff; /* only want highest value */
++
++ } while (total <= data.sample_width);
++
++ /* If we exceed the threshold value, we have found a hardware latency */
++ if (sample > data.threshold) {
++ struct sample s;
++
++ data.count++;
++ s.seqnum = data.count;
++ s.duration = sample;
++ s.timestamp = CURRENT_TIME;
++ __buffer_add_sample(&s);
++
++ /* Keep a running maximum ever recorded hardware latency */
++ if (sample > data.max_sample)
++ data.max_sample = sample;
++ }
++
++ ret = 0;
++out:
++ return ret;
++}
++
++/*
++ * kthread_fn - The CPU time sampling/hardware latency detection kernel thread
++ * @unused: A required part of the kthread API.
++ *
++ * Used to periodically sample the CPU TSC via a call to get_sample. We
++ * use stop_machine, whith does (intentionally) introduce latency since we
++ * need to ensure nothing else might be running (and thus pre-empting).
++ * Obviously this should never be used in production environments.
++ *
++ * stop_machine will schedule us typically only on CPU0 which is fine for
++ * almost every real-world hardware latency situation - but we might later
++ * generalize this if we find there are any actualy systems with alternate
++ * SMI delivery or other non CPU0 hardware latencies.
++ */
++static int kthread_fn(void *unused)
++{
++ int err = 0;
++ u64 interval = 0;
++
++ while (!kthread_should_stop()) {
++
++ mutex_lock(&data.lock);
++
++ err = stop_machine(get_sample, unused, 0);
++ if (err) {
++ /* Houston, we have a problem */
++ mutex_unlock(&data.lock);
++ goto err_out;
++ }
++
++ wake_up(&data.wq); /* wake up reader(s) */
++
++ interval = data.sample_window - data.sample_width;
++ do_div(interval, USEC_PER_MSEC); /* modifies interval value */
++
++ mutex_unlock(&data.lock);
++
++ if (msleep_interruptible(interval))
++ goto out;
++ }
++ goto out;
++err_out:
++ pr_err(BANNER "could not call stop_machine, disabling\n");
++ enabled = 0;
++out:
++ return err;
++
++}
++
++/**
++ * start_kthread - Kick off the hardware latency sampling/detector kthread
++ *
++ * This starts a kernel thread that will sit and sample the CPU timestamp
++ * counter (TSC or similar) and look for potential hardware latencies.
++ */
++static int start_kthread(void)
++{
++ kthread = kthread_run(kthread_fn, NULL,
++ DRVNAME);
++ if (IS_ERR(kthread)) {
++ pr_err(BANNER "could not start sampling thread\n");
++ enabled = 0;
++ return -ENOMEM;
++ }
++
++ return 0;
++}
++
++/**
++ * stop_kthread - Inform the hardware latency samping/detector kthread to stop
++ *
++ * This kicks the running hardware latency sampling/detector kernel thread and
++ * tells it to stop sampling now. Use this on unload and at system shutdown.
++ */
++static int stop_kthread(void)
++{
++ int ret;
++
++ ret = kthread_stop(kthread);
++
++ return ret;
++}
++
++/**
++ * __reset_stats - Reset statistics for the hardware latency detector
++ *
++ * We use data to store various statistics and global state. We call this
++ * function in order to reset those when "enable" is toggled on or off, and
++ * also at initialization. Should be called with data.lock held.
++ */
++static void __reset_stats(void)
++{
++ data.count = 0;
++ data.max_sample = 0;
++ ring_buffer_reset(ring_buffer); /* flush out old sample entries */
++}
++
++/**
++ * init_stats - Setup global state statistics for the hardware latency detector
++ *
++ * We use data to store various statistics and global state. We also use
++ * a global ring buffer (ring_buffer) to keep raw samples of detected hardware
++ * induced system latencies. This function initializes these structures and
++ * allocates the global ring buffer also.
++ */
++static int init_stats(void)
++{
++ int ret = -ENOMEM;
++
++ mutex_init(&data.lock);
++ init_waitqueue_head(&data.wq);
++ atomic_set(&data.sample_open, 0);
++
++ ring_buffer = ring_buffer_alloc(buf_size, BUF_FLAGS);
++
++ if (WARN(!ring_buffer, KERN_ERR BANNER
++ "failed to allocate ring buffer!\n"))
++ goto out;
++
++ __reset_stats();
++ data.threshold = DEFAULT_LAT_THRESHOLD; /* threshold us */
++ data.sample_window = DEFAULT_SAMPLE_WINDOW; /* window us */
++ data.sample_width = DEFAULT_SAMPLE_WIDTH; /* width us */
++
++ ret = 0;
++
++out:
++ return ret;
++
++}
++
++/*
++ * simple_data_read - Wrapper read function for global state debugfs entries
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to read value into
++ * @cnt: The maximum number of bytes to read
++ * @ppos: The current "file" position
++ * @entry: The entry to read from
++ *
++ * This function provides a generic read implementation for the global state
++ * "data" structure debugfs filesystem entries. It would be nice to use
++ * simple_attr_read directly, but we need to make sure that the data.lock
++ * spinlock is held during the actual read (even though we likely won't ever
++ * actually race here as the updater runs under a stop_machine context).
++ */
++static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos, const u64 *entry)
++{
++ char buf[U64STR_SIZE];
++ u64 val = 0;
++ int len = 0;
++
++ memset(buf, 0, sizeof(buf));
++
++ if (!entry)
++ return -EFAULT;
++
++ mutex_lock(&data.lock);
++ val = *entry;
++ mutex_unlock(&data.lock);
++
++ len = snprintf(buf, sizeof(buf), "%llu\n", (unsigned long long)val);
++
++ return simple_read_from_buffer(ubuf, cnt, ppos, buf, len);
++
++}
++
++/*
++ * simple_data_write - Wrapper write function for global state debugfs entries
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to write value from
++ * @cnt: The maximum number of bytes to write
++ * @ppos: The current "file" position
++ * @entry: The entry to write to
++ *
++ * This function provides a generic write implementation for the global state
++ * "data" structure debugfs filesystem entries. It would be nice to use
++ * simple_attr_write directly, but we need to make sure that the data.lock
++ * spinlock is held during the actual write (even though we likely won't ever
++ * actually race here as the updater runs under a stop_machine context).
++ */
++static ssize_t simple_data_write(struct file *filp, const char __user *ubuf,
++ size_t cnt, loff_t *ppos, u64 *entry)
++{
++ char buf[U64STR_SIZE];
++ int csize = min(cnt, sizeof(buf));
++ u64 val = 0;
++ int err = 0;
++
++ memset(buf, '\0', sizeof(buf));
++ if (copy_from_user(buf, ubuf, csize))
++ return -EFAULT;
++
++ buf[U64STR_SIZE-1] = '\0'; /* just in case */
++ err = kstrtoull(buf, 10, &val);
++ if (err)
++ return -EINVAL;
++
++ mutex_lock(&data.lock);
++ *entry = val;
++ mutex_unlock(&data.lock);
++
++ return csize;
++}
++
++/**
++ * debug_count_fopen - Open function for "count" debugfs entry
++ * @inode: The in-kernel inode representation of the debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function provides an open implementation for the "count" debugfs
++ * interface to the hardware latency detector.
++ */
++static int debug_count_fopen(struct inode *inode, struct file *filp)
++{
++ return 0;
++}
++
++/**
++ * debug_count_fread - Read function for "count" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to read value into
++ * @cnt: The maximum number of bytes to read
++ * @ppos: The current "file" position
++ *
++ * This function provides a read implementation for the "count" debugfs
++ * interface to the hardware latency detector. Can be used to read the
++ * number of latency readings exceeding the configured threshold since
++ * the detector was last reset (e.g. by writing a zero into "count").
++ */
++static ssize_t debug_count_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ return simple_data_read(filp, ubuf, cnt, ppos, &data.count);
++}
++
++/**
++ * debug_count_fwrite - Write function for "count" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The user buffer that contains the value to write
++ * @cnt: The maximum number of bytes to write to "file"
++ * @ppos: The current position in the debugfs "file"
++ *
++ * This function provides a write implementation for the "count" debugfs
++ * interface to the hardware latency detector. Can be used to write a
++ * desired value, especially to zero the total count.
++ */
++static ssize_t debug_count_fwrite(struct file *filp,
++ const char __user *ubuf,
++ size_t cnt,
++ loff_t *ppos)
++{
++ return simple_data_write(filp, ubuf, cnt, ppos, &data.count);
++}
++
++/**
++ * debug_enable_fopen - Dummy open function for "enable" debugfs interface
++ * @inode: The in-kernel inode representation of the debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function provides an open implementation for the "enable" debugfs
++ * interface to the hardware latency detector.
++ */
++static int debug_enable_fopen(struct inode *inode, struct file *filp)
++{
++ return 0;
++}
++
++/**
++ * debug_enable_fread - Read function for "enable" debugfs interface
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to read value into
++ * @cnt: The maximum number of bytes to read
++ * @ppos: The current "file" position
++ *
++ * This function provides a read implementation for the "enable" debugfs
++ * interface to the hardware latency detector. Can be used to determine
++ * whether the detector is currently enabled ("0\n" or "1\n" returned).
++ */
++static ssize_t debug_enable_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ char buf[4];
++
++ if ((cnt < sizeof(buf)) || (*ppos))
++ return 0;
++
++ buf[0] = enabled ? '1' : '0';
++ buf[1] = '\n';
++ buf[2] = '\0';
++ if (copy_to_user(ubuf, buf, strlen(buf)))
++ return -EFAULT;
++ return *ppos = strlen(buf);
++}
++
++/**
++ * debug_enable_fwrite - Write function for "enable" debugfs interface
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The user buffer that contains the value to write
++ * @cnt: The maximum number of bytes to write to "file"
++ * @ppos: The current position in the debugfs "file"
++ *
++ * This function provides a write implementation for the "enable" debugfs
++ * interface to the hardware latency detector. Can be used to enable or
++ * disable the detector, which will have the side-effect of possibly
++ * also resetting the global stats and kicking off the measuring
++ * kthread (on an enable) or the converse (upon a disable).
++ */
++static ssize_t debug_enable_fwrite(struct file *filp,
++ const char __user *ubuf,
++ size_t cnt,
++ loff_t *ppos)
++{
++ char buf[4];
++ int csize = min(cnt, sizeof(buf));
++ long val = 0;
++ int err = 0;
++
++ memset(buf, '\0', sizeof(buf));
++ if (copy_from_user(buf, ubuf, csize))
++ return -EFAULT;
++
++ buf[sizeof(buf)-1] = '\0'; /* just in case */
++ err = kstrtoul(buf, 10, &val);
++ if (0 != err)
++ return -EINVAL;
++
++ if (val) {
++ if (enabled)
++ goto unlock;
++ enabled = 1;
++ __reset_stats();
++ if (start_kthread())
++ return -EFAULT;
++ } else {
++ if (!enabled)
++ goto unlock;
++ enabled = 0;
++ err = stop_kthread();
++ if (err) {
++ pr_err(BANNER "cannot stop kthread\n");
++ return -EFAULT;
++ }
++ wake_up(&data.wq); /* reader(s) should return */
++ }
++unlock:
++ return csize;
++}
++
++/**
++ * debug_max_fopen - Open function for "max" debugfs entry
++ * @inode: The in-kernel inode representation of the debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function provides an open implementation for the "max" debugfs
++ * interface to the hardware latency detector.
++ */
++static int debug_max_fopen(struct inode *inode, struct file *filp)
++{
++ return 0;
++}
++
++/**
++ * debug_max_fread - Read function for "max" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to read value into
++ * @cnt: The maximum number of bytes to read
++ * @ppos: The current "file" position
++ *
++ * This function provides a read implementation for the "max" debugfs
++ * interface to the hardware latency detector. Can be used to determine
++ * the maximum latency value observed since it was last reset.
++ */
++static ssize_t debug_max_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ return simple_data_read(filp, ubuf, cnt, ppos, &data.max_sample);
++}
++
++/**
++ * debug_max_fwrite - Write function for "max" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The user buffer that contains the value to write
++ * @cnt: The maximum number of bytes to write to "file"
++ * @ppos: The current position in the debugfs "file"
++ *
++ * This function provides a write implementation for the "max" debugfs
++ * interface to the hardware latency detector. Can be used to reset the
++ * maximum or set it to some other desired value - if, then, subsequent
++ * measurements exceed this value, the maximum will be updated.
++ */
++static ssize_t debug_max_fwrite(struct file *filp,
++ const char __user *ubuf,
++ size_t cnt,
++ loff_t *ppos)
++{
++ return simple_data_write(filp, ubuf, cnt, ppos, &data.max_sample);
++}
++
++
++/**
++ * debug_sample_fopen - An open function for "sample" debugfs interface
++ * @inode: The in-kernel inode representation of this debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function handles opening the "sample" file within the hardware
++ * latency detector debugfs directory interface. This file is used to read
++ * raw samples from the global ring_buffer and allows the user to see a
++ * running latency history. Can be opened blocking or non-blocking,
++ * affecting whether it behaves as a buffer read pipe, or does not.
++ * Implements simple locking to prevent multiple simultaneous use.
++ */
++static int debug_sample_fopen(struct inode *inode, struct file *filp)
++{
++ if (!atomic_add_unless(&data.sample_open, 1, 1))
++ return -EBUSY;
++ else
++ return 0;
++}
++
++/**
++ * debug_sample_fread - A read function for "sample" debugfs interface
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The user buffer that will contain the samples read
++ * @cnt: The maximum bytes to read from the debugfs "file"
++ * @ppos: The current position in the debugfs "file"
++ *
++ * This function handles reading from the "sample" file within the hardware
++ * latency detector debugfs directory interface. This file is used to read
++ * raw samples from the global ring_buffer and allows the user to see a
++ * running latency history. By default this will block pending a new
++ * value written into the sample buffer, unless there are already a
++ * number of value(s) waiting in the buffer, or the sample file was
++ * previously opened in a non-blocking mode of operation.
++ */
++static ssize_t debug_sample_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ int len = 0;
++ char buf[64];
++ struct sample *sample = NULL;
++
++ if (!enabled)
++ return 0;
++
++ sample = kzalloc(sizeof(struct sample), GFP_KERNEL);
++ if (!sample)
++ return -ENOMEM;
++
++ while (!buffer_get_sample(sample)) {
++
++ DEFINE_WAIT(wait);
++
++ if (filp->f_flags & O_NONBLOCK) {
++ len = -EAGAIN;
++ goto out;
++ }
++
++ prepare_to_wait(&data.wq, &wait, TASK_INTERRUPTIBLE);
++ schedule();
++ finish_wait(&data.wq, &wait);
++
++ if (signal_pending(current)) {
++ len = -EINTR;
++ goto out;
++ }
++
++ if (!enabled) { /* enable was toggled */
++ len = 0;
++ goto out;
++ }
++ }
++
++ len = snprintf(buf, sizeof(buf), "%010lu.%010lu\t%llu\n",
++ sample->timestamp.tv_sec,
++ sample->timestamp.tv_nsec,
++ sample->duration);
++
++
++ /* handling partial reads is more trouble than it's worth */
++ if (len > cnt)
++ goto out;
++
++ if (copy_to_user(ubuf, buf, len))
++ len = -EFAULT;
++
++out:
++ kfree(sample);
++ return len;
++}
++
++/**
++ * debug_sample_release - Release function for "sample" debugfs interface
++ * @inode: The in-kernel inode represenation of the debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function completes the close of the debugfs interface "sample" file.
++ * Frees the sample_open "lock" so that other users may open the interface.
++ */
++static int debug_sample_release(struct inode *inode, struct file *filp)
++{
++ atomic_dec(&data.sample_open);
++
++ return 0;
++}
++
++/**
++ * debug_threshold_fopen - Open function for "threshold" debugfs entry
++ * @inode: The in-kernel inode representation of the debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function provides an open implementation for the "threshold" debugfs
++ * interface to the hardware latency detector.
++ */
++static int debug_threshold_fopen(struct inode *inode, struct file *filp)
++{
++ return 0;
++}
++
++/**
++ * debug_threshold_fread - Read function for "threshold" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to read value into
++ * @cnt: The maximum number of bytes to read
++ * @ppos: The current "file" position
++ *
++ * This function provides a read implementation for the "threshold" debugfs
++ * interface to the hardware latency detector. It can be used to determine
++ * the current threshold level at which a latency will be recorded in the
++ * global ring buffer, typically on the order of 10us.
++ */
++static ssize_t debug_threshold_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ return simple_data_read(filp, ubuf, cnt, ppos, &data.threshold);
++}
++
++/**
++ * debug_threshold_fwrite - Write function for "threshold" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The user buffer that contains the value to write
++ * @cnt: The maximum number of bytes to write to "file"
++ * @ppos: The current position in the debugfs "file"
++ *
++ * This function provides a write implementation for the "threshold" debugfs
++ * interface to the hardware latency detector. It can be used to configure
++ * the threshold level at which any subsequently detected latencies will
++ * be recorded into the global ring buffer.
++ */
++static ssize_t debug_threshold_fwrite(struct file *filp,
++ const char __user *ubuf,
++ size_t cnt,
++ loff_t *ppos)
++{
++ int ret;
++
++ ret = simple_data_write(filp, ubuf, cnt, ppos, &data.threshold);
++
++ if (enabled)
++ wake_up_process(kthread);
++
++ return ret;
++}
++
++/**
++ * debug_width_fopen - Open function for "width" debugfs entry
++ * @inode: The in-kernel inode representation of the debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function provides an open implementation for the "width" debugfs
++ * interface to the hardware latency detector.
++ */
++static int debug_width_fopen(struct inode *inode, struct file *filp)
++{
++ return 0;
++}
++
++/**
++ * debug_width_fread - Read function for "width" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to read value into
++ * @cnt: The maximum number of bytes to read
++ * @ppos: The current "file" position
++ *
++ * This function provides a read implementation for the "width" debugfs
++ * interface to the hardware latency detector. It can be used to determine
++ * for how many us of the total window us we will actively sample for any
++ * hardware-induced latecy periods. Obviously, it is not possible to
++ * sample constantly and have the system respond to a sample reader, or,
++ * worse, without having the system appear to have gone out to lunch.
++ */
++static ssize_t debug_width_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ return simple_data_read(filp, ubuf, cnt, ppos, &data.sample_width);
++}
++
++/**
++ * debug_width_fwrite - Write function for "width" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The user buffer that contains the value to write
++ * @cnt: The maximum number of bytes to write to "file"
++ * @ppos: The current position in the debugfs "file"
++ *
++ * This function provides a write implementation for the "width" debugfs
++ * interface to the hardware latency detector. It can be used to configure
++ * for how many us of the total window us we will actively sample for any
++ * hardware-induced latency periods. Obviously, it is not possible to
++ * sample constantly and have the system respond to a sample reader, or,
++ * worse, without having the system appear to have gone out to lunch. It
++ * is enforced that width is less that the total window size.
++ */
++static ssize_t debug_width_fwrite(struct file *filp,
++ const char __user *ubuf,
++ size_t cnt,
++ loff_t *ppos)
++{
++ char buf[U64STR_SIZE];
++ int csize = min(cnt, sizeof(buf));
++ u64 val = 0;
++ int err = 0;
++
++ memset(buf, '\0', sizeof(buf));
++ if (copy_from_user(buf, ubuf, csize))
++ return -EFAULT;
++
++ buf[U64STR_SIZE-1] = '\0'; /* just in case */
++ err = kstrtoull(buf, 10, &val);
++ if (0 != err)
++ return -EINVAL;
++
++ mutex_lock(&data.lock);
++ if (val < data.sample_window)
++ data.sample_width = val;
++ else {
++ mutex_unlock(&data.lock);
++ return -EINVAL;
++ }
++ mutex_unlock(&data.lock);
++
++ if (enabled)
++ wake_up_process(kthread);
++
++ return csize;
++}
++
++/**
++ * debug_window_fopen - Open function for "window" debugfs entry
++ * @inode: The in-kernel inode representation of the debugfs "file"
++ * @filp: The active open file structure for the debugfs "file"
++ *
++ * This function provides an open implementation for the "window" debugfs
++ * interface to the hardware latency detector. The window is the total time
++ * in us that will be considered one sample period. Conceptually, windows
++ * occur back-to-back and contain a sample width period during which
++ * actual sampling occurs.
++ */
++static int debug_window_fopen(struct inode *inode, struct file *filp)
++{
++ return 0;
++}
++
++/**
++ * debug_window_fread - Read function for "window" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The userspace provided buffer to read value into
++ * @cnt: The maximum number of bytes to read
++ * @ppos: The current "file" position
++ *
++ * This function provides a read implementation for the "window" debugfs
++ * interface to the hardware latency detector. The window is the total time
++ * in us that will be considered one sample period. Conceptually, windows
++ * occur back-to-back and contain a sample width period during which
++ * actual sampling occurs. Can be used to read the total window size.
++ */
++static ssize_t debug_window_fread(struct file *filp, char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ return simple_data_read(filp, ubuf, cnt, ppos, &data.sample_window);
++}
++
++/**
++ * debug_window_fwrite - Write function for "window" debugfs entry
++ * @filp: The active open file structure for the debugfs "file"
++ * @ubuf: The user buffer that contains the value to write
++ * @cnt: The maximum number of bytes to write to "file"
++ * @ppos: The current position in the debugfs "file"
++ *
++ * This function provides a write implementation for the "window" debufds
++ * interface to the hardware latency detetector. The window is the total time
++ * in us that will be considered one sample period. Conceptually, windows
++ * occur back-to-back and contain a sample width period during which
++ * actual sampling occurs. Can be used to write a new total window size. It
++ * is enfoced that any value written must be greater than the sample width
++ * size, or an error results.
++ */
++static ssize_t debug_window_fwrite(struct file *filp,
++ const char __user *ubuf,
++ size_t cnt,
++ loff_t *ppos)
++{
++ char buf[U64STR_SIZE];
++ int csize = min(cnt, sizeof(buf));
++ u64 val = 0;
++ int err = 0;
++
++ memset(buf, '\0', sizeof(buf));
++ if (copy_from_user(buf, ubuf, csize))
++ return -EFAULT;
++
++ buf[U64STR_SIZE-1] = '\0'; /* just in case */
++ err = kstrtoull(buf, 10, &val);
++ if (0 != err)
++ return -EINVAL;
++
++ mutex_lock(&data.lock);
++ if (data.sample_width < val)
++ data.sample_window = val;
++ else {
++ mutex_unlock(&data.lock);
++ return -EINVAL;
++ }
++ mutex_unlock(&data.lock);
++
++ return csize;
++}
++
++/*
++ * Function pointers for the "count" debugfs file operations
++ */
++static const struct file_operations count_fops = {
++ .open = debug_count_fopen,
++ .read = debug_count_fread,
++ .write = debug_count_fwrite,
++ .owner = THIS_MODULE,
++};
++
++/*
++ * Function pointers for the "enable" debugfs file operations
++ */
++static const struct file_operations enable_fops = {
++ .open = debug_enable_fopen,
++ .read = debug_enable_fread,
++ .write = debug_enable_fwrite,
++ .owner = THIS_MODULE,
++};
++
++/*
++ * Function pointers for the "max" debugfs file operations
++ */
++static const struct file_operations max_fops = {
++ .open = debug_max_fopen,
++ .read = debug_max_fread,
++ .write = debug_max_fwrite,
++ .owner = THIS_MODULE,
++};
++
++/*
++ * Function pointers for the "sample" debugfs file operations
++ */
++static const struct file_operations sample_fops = {
++ .open = debug_sample_fopen,
++ .read = debug_sample_fread,
++ .release = debug_sample_release,
++ .owner = THIS_MODULE,
++};
++
++/*
++ * Function pointers for the "threshold" debugfs file operations
++ */
++static const struct file_operations threshold_fops = {
++ .open = debug_threshold_fopen,
++ .read = debug_threshold_fread,
++ .write = debug_threshold_fwrite,
++ .owner = THIS_MODULE,
++};
++
++/*
++ * Function pointers for the "width" debugfs file operations
++ */
++static const struct file_operations width_fops = {
++ .open = debug_width_fopen,
++ .read = debug_width_fread,
++ .write = debug_width_fwrite,
++ .owner = THIS_MODULE,
++};
++
++/*
++ * Function pointers for the "window" debugfs file operations
++ */
++static const struct file_operations window_fops = {
++ .open = debug_window_fopen,
++ .read = debug_window_fread,
++ .write = debug_window_fwrite,
++ .owner = THIS_MODULE,
++};
++
++/**
++ * init_debugfs - A function to initialize the debugfs interface files
++ *
++ * This function creates entries in debugfs for "hwlat_detector", including
++ * files to read values from the detector, current samples, and the
++ * maximum sample that has been captured since the hardware latency
++ * dectector was started.
++ */
++static int init_debugfs(void)
++{
++ int ret = -ENOMEM;
++
++ debug_dir = debugfs_create_dir(DRVNAME, NULL);
++ if (!debug_dir)
++ goto err_debug_dir;
++
++ debug_sample = debugfs_create_file("sample", 0444,
++ debug_dir, NULL,
++ &sample_fops);
++ if (!debug_sample)
++ goto err_sample;
++
++ debug_count = debugfs_create_file("count", 0444,
++ debug_dir, NULL,
++ &count_fops);
++ if (!debug_count)
++ goto err_count;
++
++ debug_max = debugfs_create_file("max", 0444,
++ debug_dir, NULL,
++ &max_fops);
++ if (!debug_max)
++ goto err_max;
++
++ debug_sample_window = debugfs_create_file("window", 0644,
++ debug_dir, NULL,
++ &window_fops);
++ if (!debug_sample_window)
++ goto err_window;
++
++ debug_sample_width = debugfs_create_file("width", 0644,
++ debug_dir, NULL,
++ &width_fops);
++ if (!debug_sample_width)
++ goto err_width;
++
++ debug_threshold = debugfs_create_file("threshold", 0644,
++ debug_dir, NULL,
++ &threshold_fops);
++ if (!debug_threshold)
++ goto err_threshold;
++
++ debug_enable = debugfs_create_file("enable", 0644,
++ debug_dir, &enabled,
++ &enable_fops);
++ if (!debug_enable)
++ goto err_enable;
++
++ else {
++ ret = 0;
++ goto out;
++ }
++
++err_enable:
++ debugfs_remove(debug_threshold);
++err_threshold:
++ debugfs_remove(debug_sample_width);
++err_width:
++ debugfs_remove(debug_sample_window);
++err_window:
++ debugfs_remove(debug_max);
++err_max:
++ debugfs_remove(debug_count);
++err_count:
++ debugfs_remove(debug_sample);
++err_sample:
++ debugfs_remove(debug_dir);
++err_debug_dir:
++out:
++ return ret;
++}
++
++/**
++ * free_debugfs - A function to cleanup the debugfs file interface
++ */
++static void free_debugfs(void)
++{
++ /* could also use a debugfs_remove_recursive */
++ debugfs_remove(debug_enable);
++ debugfs_remove(debug_threshold);
++ debugfs_remove(debug_sample_width);
++ debugfs_remove(debug_sample_window);
++ debugfs_remove(debug_max);
++ debugfs_remove(debug_count);
++ debugfs_remove(debug_sample);
++ debugfs_remove(debug_dir);
++}
++
++/**
++ * detector_init - Standard module initialization code
++ */
++static int detector_init(void)
++{
++ int ret = -ENOMEM;
++
++ pr_info(BANNER "version %s\n", VERSION);
++
++ ret = init_stats();
++ if (0 != ret)
++ goto out;
++
++ ret = init_debugfs();
++ if (0 != ret)
++ goto err_stats;
++
++ if (enabled)
++ ret = start_kthread();
++
++ goto out;
++
++err_stats:
++ ring_buffer_free(ring_buffer);
++out:
++ return ret;
++
++}
++
++/**
++ * detector_exit - Standard module cleanup code
++ */
++static void detector_exit(void)
++{
++ int err;
++
++ if (enabled) {
++ enabled = 0;
++ err = stop_kthread();
++ if (err)
++ pr_err(BANNER "cannot stop kthread\n");
++ }
++
++ free_debugfs();
++ ring_buffer_free(ring_buffer); /* free up the ring buffer */
++
++}
++
++module_init(detector_init);
++module_exit(detector_exit);
diff --git a/patches/i2c-omap-drop-the-lock-hard-irq-context.patch b/patches/i2c-omap-drop-the-lock-hard-irq-context.patch
new file mode 100644
index 00000000000000..d0d91940206821
--- /dev/null
+++ b/patches/i2c-omap-drop-the-lock-hard-irq-context.patch
@@ -0,0 +1,33 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 21 Mar 2013 11:35:49 +0100
+Subject: i2c/omap: drop the lock hard irq context
+
+The lock is taken while reading two registers. On RT the first lock is
+taken in hard irq where it might sleep and in the threaded irq.
+The threaded irq runs in oneshot mode so the hard irq does not run until
+the thread the completes so there is no reason to grab the lock.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/i2c/busses/i2c-omap.c | 5 +----
+ 1 file changed, 1 insertion(+), 4 deletions(-)
+
+--- a/drivers/i2c/busses/i2c-omap.c
++++ b/drivers/i2c/busses/i2c-omap.c
+@@ -996,15 +996,12 @@ omap_i2c_isr(int irq, void *dev_id)
+ u16 mask;
+ u16 stat;
+
+- spin_lock(&dev->lock);
+- mask = omap_i2c_read_reg(dev, OMAP_I2C_IE_REG);
+ stat = omap_i2c_read_reg(dev, OMAP_I2C_STAT_REG);
++ mask = omap_i2c_read_reg(dev, OMAP_I2C_IE_REG);
+
+ if (stat & mask)
+ ret = IRQ_WAKE_THREAD;
+
+- spin_unlock(&dev->lock);
+-
+ return ret;
+ }
+
diff --git a/patches/i915-bogus-warning-from-i915-when-running-on-PREEMPT.patch b/patches/i915-bogus-warning-from-i915-when-running-on-PREEMPT.patch
new file mode 100644
index 00000000000000..931f5852465e03
--- /dev/null
+++ b/patches/i915-bogus-warning-from-i915-when-running-on-PREEMPT.patch
@@ -0,0 +1,29 @@
+From: Clark Williams <williams@redhat.com>
+Date: Tue, 26 May 2015 10:43:43 -0500
+Subject: i915: bogus warning from i915 when running on PREEMPT_RT
+
+The i915 driver has a 'WARN_ON(!in_interrupt())' in the display
+handler, which whines constanly on the RT kernel (since the interrupt
+is actually handled in a threaded handler and not actual interrupt
+context).
+
+Change the WARN_ON to WARN_ON_NORT
+
+Tested-by: Joakim Hernberg <jhernberg@alchemy.lu>
+Signed-off-by: Clark Williams <williams@redhat.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/gpu/drm/i915/intel_display.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/i915/intel_display.c
++++ b/drivers/gpu/drm/i915/intel_display.c
+@@ -10086,7 +10086,7 @@ void intel_check_page_flip(struct drm_de
+ struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
+ struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+
+- WARN_ON(!in_interrupt());
++ WARN_ON_NONRT(!in_interrupt());
+
+ if (crtc == NULL)
+ return;
diff --git a/patches/i915_compile_fix.patch b/patches/i915_compile_fix.patch
new file mode 100644
index 00000000000000..61ffbf809d9583
--- /dev/null
+++ b/patches/i915_compile_fix.patch
@@ -0,0 +1,23 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: gpu/i915: don't open code these things
+
+The opencode part is gone in 1f83fee0 ("drm/i915: clear up wedged transitions")
+the owner check is still there.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
++++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
+@@ -39,7 +39,7 @@ static bool mutex_is_locked_by(struct mu
+ if (!mutex_is_locked(mutex))
+ return false;
+
+-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_MUTEXES)
++#if (defined(CONFIG_SMP) || defined(CONFIG_DEBUG_MUTEXES)) && !defined(CONFIG_PREEMPT_RT_BASE)
+ return mutex->owner == task;
+ #else
+ /* Since UP may be pre-empted, we cannot assume that we own the lock */
diff --git a/patches/ide-use-nort-local-irq-variants.patch b/patches/ide-use-nort-local-irq-variants.patch
new file mode 100644
index 00000000000000..11e3df947f0e14
--- /dev/null
+++ b/patches/ide-use-nort-local-irq-variants.patch
@@ -0,0 +1,169 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:30:16 -0500
+Subject: ide: Do not disable interrupts for PREEMPT-RT
+
+Use the local_irq_*_nort variants.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/ide/alim15x3.c | 4 ++--
+ drivers/ide/hpt366.c | 4 ++--
+ drivers/ide/ide-io-std.c | 8 ++++----
+ drivers/ide/ide-io.c | 2 +-
+ drivers/ide/ide-iops.c | 4 ++--
+ drivers/ide/ide-probe.c | 4 ++--
+ drivers/ide/ide-taskfile.c | 6 +++---
+ 7 files changed, 16 insertions(+), 16 deletions(-)
+
+--- a/drivers/ide/alim15x3.c
++++ b/drivers/ide/alim15x3.c
+@@ -234,7 +234,7 @@ static int init_chipset_ali15x3(struct p
+
+ isa_dev = pci_get_device(PCI_VENDOR_ID_AL, PCI_DEVICE_ID_AL_M1533, NULL);
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+
+ if (m5229_revision < 0xC2) {
+ /*
+@@ -325,7 +325,7 @@ static int init_chipset_ali15x3(struct p
+ }
+ pci_dev_put(north);
+ pci_dev_put(isa_dev);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ return 0;
+ }
+
+--- a/drivers/ide/hpt366.c
++++ b/drivers/ide/hpt366.c
+@@ -1241,7 +1241,7 @@ static int init_dma_hpt366(ide_hwif_t *h
+
+ dma_old = inb(base + 2);
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+
+ dma_new = dma_old;
+ pci_read_config_byte(dev, hwif->channel ? 0x4b : 0x43, &masterdma);
+@@ -1252,7 +1252,7 @@ static int init_dma_hpt366(ide_hwif_t *h
+ if (dma_new != dma_old)
+ outb(dma_new, base + 2);
+
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ printk(KERN_INFO " %s: BM-DMA at 0x%04lx-0x%04lx\n",
+ hwif->name, base, base + 7);
+--- a/drivers/ide/ide-io-std.c
++++ b/drivers/ide/ide-io-std.c
+@@ -175,7 +175,7 @@ void ide_input_data(ide_drive_t *drive,
+ unsigned long uninitialized_var(flags);
+
+ if ((io_32bit & 2) && !mmio) {
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ ata_vlb_sync(io_ports->nsect_addr);
+ }
+
+@@ -186,7 +186,7 @@ void ide_input_data(ide_drive_t *drive,
+ insl(data_addr, buf, words);
+
+ if ((io_32bit & 2) && !mmio)
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ if (((len + 1) & 3) < 2)
+ return;
+@@ -219,7 +219,7 @@ void ide_output_data(ide_drive_t *drive,
+ unsigned long uninitialized_var(flags);
+
+ if ((io_32bit & 2) && !mmio) {
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ ata_vlb_sync(io_ports->nsect_addr);
+ }
+
+@@ -230,7 +230,7 @@ void ide_output_data(ide_drive_t *drive,
+ outsl(data_addr, buf, words);
+
+ if ((io_32bit & 2) && !mmio)
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ if (((len + 1) & 3) < 2)
+ return;
+--- a/drivers/ide/ide-io.c
++++ b/drivers/ide/ide-io.c
+@@ -659,7 +659,7 @@ void ide_timer_expiry (unsigned long dat
+ /* disable_irq_nosync ?? */
+ disable_irq(hwif->irq);
+ /* local CPU only, as if we were handling an interrupt */
+- local_irq_disable();
++ local_irq_disable_nort();
+ if (hwif->polling) {
+ startstop = handler(drive);
+ } else if (drive_is_ready(drive)) {
+--- a/drivers/ide/ide-iops.c
++++ b/drivers/ide/ide-iops.c
+@@ -129,12 +129,12 @@ int __ide_wait_stat(ide_drive_t *drive,
+ if ((stat & ATA_BUSY) == 0)
+ break;
+
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ *rstat = stat;
+ return -EBUSY;
+ }
+ }
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+ /*
+ * Allow status to settle, then read it again.
+--- a/drivers/ide/ide-probe.c
++++ b/drivers/ide/ide-probe.c
+@@ -196,10 +196,10 @@ static void do_identify(ide_drive_t *dri
+ int bswap = 1;
+
+ /* local CPU only; some systems need this */
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ /* read 512 bytes of id info */
+ hwif->tp_ops->input_data(drive, NULL, id, SECTOR_SIZE);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ drive->dev_flags |= IDE_DFLAG_ID_READ;
+ #ifdef DEBUG
+--- a/drivers/ide/ide-taskfile.c
++++ b/drivers/ide/ide-taskfile.c
+@@ -250,7 +250,7 @@ void ide_pio_bytes(ide_drive_t *drive, s
+
+ page_is_high = PageHighMem(page);
+ if (page_is_high)
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+
+ buf = kmap_atomic(page) + offset;
+
+@@ -271,7 +271,7 @@ void ide_pio_bytes(ide_drive_t *drive, s
+ kunmap_atomic(buf);
+
+ if (page_is_high)
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ len -= nr_bytes;
+ }
+@@ -414,7 +414,7 @@ static ide_startstop_t pre_task_out_intr
+ }
+
+ if ((drive->dev_flags & IDE_DFLAG_UNMASK) == 0)
+- local_irq_disable();
++ local_irq_disable_nort();
+
+ ide_set_handler(drive, &task_pio_intr, WAIT_WORSTCASE);
+
diff --git a/patches/idr-use-local-lock-for-protection.patch b/patches/idr-use-local-lock-for-protection.patch
new file mode 100644
index 00000000000000..b5fec800707428
--- /dev/null
+++ b/patches/idr-use-local-lock-for-protection.patch
@@ -0,0 +1,96 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: idr: Use local lock instead of preempt enable/disable
+
+We need to protect the per cpu variable and prevent migration.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/idr.h | 4 ++++
+ lib/idr.c | 36 +++++++++++++++++++++++++++++++++---
+ 2 files changed, 37 insertions(+), 3 deletions(-)
+
+--- a/include/linux/idr.h
++++ b/include/linux/idr.h
+@@ -95,10 +95,14 @@ bool idr_is_empty(struct idr *idp);
+ * Each idr_preload() should be matched with an invocation of this
+ * function. See idr_preload() for details.
+ */
++#ifdef CONFIG_PREEMPT_RT_FULL
++void idr_preload_end(void);
++#else
+ static inline void idr_preload_end(void)
+ {
+ preempt_enable();
+ }
++#endif
+
+ /**
+ * idr_find - return pointer for given id
+--- a/lib/idr.c
++++ b/lib/idr.c
+@@ -30,6 +30,7 @@
+ #include <linux/idr.h>
+ #include <linux/spinlock.h>
+ #include <linux/percpu.h>
++#include <linux/locallock.h>
+
+ #define MAX_IDR_SHIFT (sizeof(int) * 8 - 1)
+ #define MAX_IDR_BIT (1U << MAX_IDR_SHIFT)
+@@ -366,6 +367,35 @@ static void idr_fill_slot(struct idr *id
+ idr_mark_full(pa, id);
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++static DEFINE_LOCAL_IRQ_LOCK(idr_lock);
++
++static inline void idr_preload_lock(void)
++{
++ local_lock(idr_lock);
++}
++
++static inline void idr_preload_unlock(void)
++{
++ local_unlock(idr_lock);
++}
++
++void idr_preload_end(void)
++{
++ idr_preload_unlock();
++}
++EXPORT_SYMBOL(idr_preload_end);
++#else
++static inline void idr_preload_lock(void)
++{
++ preempt_disable();
++}
++
++static inline void idr_preload_unlock(void)
++{
++ preempt_enable();
++}
++#endif
+
+ /**
+ * idr_preload - preload for idr_alloc()
+@@ -401,7 +431,7 @@ void idr_preload(gfp_t gfp_mask)
+ WARN_ON_ONCE(in_interrupt());
+ might_sleep_if(gfp_mask & __GFP_WAIT);
+
+- preempt_disable();
++ idr_preload_lock();
+
+ /*
+ * idr_alloc() is likely to succeed w/o full idr_layer buffer and
+@@ -413,9 +443,9 @@ void idr_preload(gfp_t gfp_mask)
+ while (__this_cpu_read(idr_preload_cnt) < MAX_IDR_FREE) {
+ struct idr_layer *new;
+
+- preempt_enable();
++ idr_preload_unlock();
+ new = kmem_cache_zalloc(idr_layer_cache, gfp_mask);
+- preempt_disable();
++ idr_preload_lock();
+ if (!new)
+ break;
+
diff --git a/patches/infiniband-mellanox-ib-use-nort-irq.patch b/patches/infiniband-mellanox-ib-use-nort-irq.patch
new file mode 100644
index 00000000000000..b282436e6f6946
--- /dev/null
+++ b/patches/infiniband-mellanox-ib-use-nort-irq.patch
@@ -0,0 +1,40 @@
+From: Sven-Thorsten Dietrich <sdietrich@novell.com>
+Date: Fri, 3 Jul 2009 08:30:35 -0500
+Subject: infiniband: Mellanox IB driver patch use _nort() primitives
+
+Fixes in_atomic stack-dump, when Mellanox module is loaded into the RT
+Kernel.
+
+Michael S. Tsirkin <mst@dev.mellanox.co.il> sayeth:
+"Basically, if you just make spin_lock_irqsave (and spin_lock_irq) not disable
+interrupts for non-raw spinlocks, I think all of infiniband will be fine without
+changes."
+
+Signed-off-by: Sven-Thorsten Dietrich <sven@thebigcorporation.com>
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
++++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+@@ -821,7 +821,7 @@ void ipoib_mcast_restart_task(struct wor
+
+ ipoib_dbg_mcast(priv, "restarting multicast task\n");
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ netif_addr_lock(dev);
+ spin_lock(&priv->lock);
+
+@@ -903,7 +903,7 @@ void ipoib_mcast_restart_task(struct wor
+
+ spin_unlock(&priv->lock);
+ netif_addr_unlock(dev);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ /*
+ * make sure the in-flight joins have finished before we attempt
diff --git a/patches/inpt-gameport-use-local-irq-nort.patch b/patches/inpt-gameport-use-local-irq-nort.patch
new file mode 100644
index 00000000000000..97e0e68484971d
--- /dev/null
+++ b/patches/inpt-gameport-use-local-irq-nort.patch
@@ -0,0 +1,44 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:30:16 -0500
+Subject: input: gameport: Do not disable interrupts on PREEMPT_RT
+
+Use the _nort() primitives.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/input/gameport/gameport.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+--- a/drivers/input/gameport/gameport.c
++++ b/drivers/input/gameport/gameport.c
+@@ -124,12 +124,12 @@ static int old_gameport_measure_speed(st
+ tx = 1 << 30;
+
+ for(i = 0; i < 50; i++) {
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ GET_TIME(t1);
+ for (t = 0; t < 50; t++) gameport_read(gameport);
+ GET_TIME(t2);
+ GET_TIME(t3);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ udelay(i * 10);
+ if ((t = DELTA(t2,t1) - DELTA(t3,t2)) < tx) tx = t;
+ }
+@@ -148,11 +148,11 @@ static int old_gameport_measure_speed(st
+ tx = 1 << 30;
+
+ for(i = 0; i < 50; i++) {
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ rdtscl(t1);
+ for (t = 0; t < 50; t++) gameport_read(gameport);
+ rdtscl(t2);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ udelay(i * 10);
+ if (t2 - t1 < tx) tx = t2 - t1;
+ }
diff --git a/patches/introduce_migrate_disable_cpu_light.patch b/patches/introduce_migrate_disable_cpu_light.patch
new file mode 100644
index 00000000000000..3e858e8148d578
--- /dev/null
+++ b/patches/introduce_migrate_disable_cpu_light.patch
@@ -0,0 +1,340 @@
+Subject: Intrduce migrate_disable() + cpu_light()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 17 Jun 2011 15:42:38 +0200
+
+Introduce migrate_disable(). The task can't be pushed to another CPU but can
+be preempted.
+
+From: Peter Zijlstra <a.p.zijlstra@chello.nl>:
+|Make migrate_disable() be a preempt_disable() for !rt kernels. This
+|allows generic code to use it but still enforces that these code
+|sections stay relatively small.
+|
+|A preemptible migrate_disable() accessible for general use would allow
+|people growing arbitrary per-cpu crap instead of clean these things
+|up.
+
+From: Steven Rostedt <rostedt@goodmis.org>
+| The migrate_disable() can cause a bit of a overhead to the RT kernel,
+| as changing the affinity is expensive to do at every lock encountered.
+| As a running task can not migrate, the actual disabling of migration
+| does not need to occur until the task is about to schedule out.
+|
+| In most cases, a task that disables migration will enable it before
+| it schedules making this change improve performance tremendously.
+
+On top of this build get/put_cpu_light(). It is similar to get_cpu():
+it uses migrate_disable() instead of preempt_disable(). That means the user
+remains on the same CPU but the function using it may be preempted and
+invoked again from another caller on the same CPU.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/cpu.h | 3 +
+ include/linux/preempt.h | 9 +++
+ include/linux/sched.h | 29 +++++++++-
+ include/linux/smp.h | 3 +
+ kernel/sched/core.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++--
+ kernel/sched/debug.c | 7 ++
+ lib/smp_processor_id.c | 5 +
+ 7 files changed, 178 insertions(+), 9 deletions(-)
+
+--- a/include/linux/cpu.h
++++ b/include/linux/cpu.h
+@@ -221,6 +221,9 @@ static inline void smpboot_thread_init(v
+ #endif /* CONFIG_SMP */
+ extern struct bus_type cpu_subsys;
+
++static inline void pin_current_cpu(void) { }
++static inline void unpin_current_cpu(void) { }
++
+ #ifdef CONFIG_HOTPLUG_CPU
+ /* Stop CPUs going up and down. */
+
+--- a/include/linux/preempt.h
++++ b/include/linux/preempt.h
+@@ -153,11 +153,20 @@ do { \
+ # define preempt_enable_rt() preempt_enable()
+ # define preempt_disable_nort() barrier()
+ # define preempt_enable_nort() barrier()
++# ifdef CONFIG_SMP
++ extern void migrate_disable(void);
++ extern void migrate_enable(void);
++# else /* CONFIG_SMP */
++# define migrate_disable() barrier()
++# define migrate_enable() barrier()
++# endif /* CONFIG_SMP */
+ #else
+ # define preempt_disable_rt() barrier()
+ # define preempt_enable_rt() barrier()
+ # define preempt_disable_nort() preempt_disable()
+ # define preempt_enable_nort() preempt_enable()
++# define migrate_disable() preempt_disable()
++# define migrate_enable() preempt_enable()
+ #endif
+
+ #ifdef CONFIG_PREEMPT_NOTIFIERS
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1371,6 +1371,12 @@ struct task_struct {
+ #endif
+
+ unsigned int policy;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ int migrate_disable;
++# ifdef CONFIG_SCHED_DEBUG
++ int migrate_disable_atomic;
++# endif
++#endif
+ int nr_cpus_allowed;
+ cpumask_t cpus_allowed;
+
+@@ -1781,9 +1787,6 @@ struct task_struct {
+ int pagefault_disabled;
+ };
+
+-/* Future-safe accessor for struct task_struct's cpus_allowed. */
+-#define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
+-
+ #define TNF_MIGRATED 0x01
+ #define TNF_NO_GROUP 0x02
+ #define TNF_SHARED 0x04
+@@ -3072,6 +3075,26 @@ static inline void set_task_cpu(struct t
+
+ #endif /* CONFIG_SMP */
+
++static inline int __migrate_disabled(struct task_struct *p)
++{
++#ifdef CONFIG_PREEMPT_RT_FULL
++ return p->migrate_disable;
++#else
++ return 0;
++#endif
++}
++
++/* Future-safe accessor for struct task_struct's cpus_allowed. */
++static inline const struct cpumask *tsk_cpus_allowed(struct task_struct *p)
++{
++#ifdef CONFIG_PREEMPT_RT_FULL
++ if (p->migrate_disable)
++ return cpumask_of(task_cpu(p));
++#endif
++
++ return &p->cpus_allowed;
++}
++
+ extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
+ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
+
+--- a/include/linux/smp.h
++++ b/include/linux/smp.h
+@@ -185,6 +185,9 @@ static inline void smp_init(void) { }
+ #define get_cpu() ({ preempt_disable(); smp_processor_id(); })
+ #define put_cpu() preempt_enable()
+
++#define get_cpu_light() ({ migrate_disable(); smp_processor_id(); })
++#define put_cpu_light() migrate_enable()
++
+ /*
+ * Callback to arch code if there's nosmp or maxcpus=0 on the
+ * boot command line:
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2696,6 +2696,125 @@ static inline void schedule_debug(struct
+ schedstat_inc(this_rq(), sched_count);
+ }
+
++#if defined(CONFIG_PREEMPT_RT_FULL) && defined(CONFIG_SMP)
++#define MIGRATE_DISABLE_SET_AFFIN (1<<30) /* Can't make a negative */
++#define migrate_disabled_updated(p) ((p)->migrate_disable & MIGRATE_DISABLE_SET_AFFIN)
++#define migrate_disable_count(p) ((p)->migrate_disable & ~MIGRATE_DISABLE_SET_AFFIN)
++
++static inline void update_migrate_disable(struct task_struct *p)
++{
++ const struct cpumask *mask;
++
++ if (likely(!p->migrate_disable))
++ return;
++
++ /* Did we already update affinity? */
++ if (unlikely(migrate_disabled_updated(p)))
++ return;
++
++ /*
++ * Since this is always current we can get away with only locking
++ * rq->lock, the ->cpus_allowed value can normally only be changed
++ * while holding both p->pi_lock and rq->lock, but seeing that this
++ * is current, we cannot actually be waking up, so all code that
++ * relies on serialization against p->pi_lock is out of scope.
++ *
++ * Having rq->lock serializes us against things like
++ * set_cpus_allowed_ptr() that can still happen concurrently.
++ */
++ mask = tsk_cpus_allowed(p);
++
++ if (p->sched_class->set_cpus_allowed)
++ p->sched_class->set_cpus_allowed(p, mask);
++ /* mask==cpumask_of(task_cpu(p)) which has a cpumask_weight==1 */
++ p->nr_cpus_allowed = 1;
++
++ /* Let migrate_enable know to fix things back up */
++ p->migrate_disable |= MIGRATE_DISABLE_SET_AFFIN;
++}
++
++void migrate_disable(void)
++{
++ struct task_struct *p = current;
++
++ if (in_atomic() || p->flags & PF_NO_SETAFFINITY) {
++#ifdef CONFIG_SCHED_DEBUG
++ p->migrate_disable_atomic++;
++#endif
++ return;
++ }
++
++#ifdef CONFIG_SCHED_DEBUG
++ WARN_ON_ONCE(p->migrate_disable_atomic);
++#endif
++
++ if (p->migrate_disable) {
++ p->migrate_disable++;
++ return;
++ }
++
++ preempt_disable();
++ pin_current_cpu();
++ p->migrate_disable = 1;
++ preempt_enable();
++}
++EXPORT_SYMBOL(migrate_disable);
++
++void migrate_enable(void)
++{
++ struct task_struct *p = current;
++ const struct cpumask *mask;
++ unsigned long flags;
++ struct rq *rq;
++
++ if (in_atomic() || p->flags & PF_NO_SETAFFINITY) {
++#ifdef CONFIG_SCHED_DEBUG
++ p->migrate_disable_atomic--;
++#endif
++ return;
++ }
++
++#ifdef CONFIG_SCHED_DEBUG
++ WARN_ON_ONCE(p->migrate_disable_atomic);
++#endif
++ WARN_ON_ONCE(p->migrate_disable <= 0);
++
++ if (migrate_disable_count(p) > 1) {
++ p->migrate_disable--;
++ return;
++ }
++
++ preempt_disable();
++ if (unlikely(migrate_disabled_updated(p))) {
++ /*
++ * Undo whatever update_migrate_disable() did, also see there
++ * about locking.
++ */
++ rq = this_rq();
++ raw_spin_lock_irqsave(&rq->lock, flags);
++
++ /*
++ * Clearing migrate_disable causes tsk_cpus_allowed to
++ * show the tasks original cpu affinity.
++ */
++ p->migrate_disable = 0;
++ mask = tsk_cpus_allowed(p);
++ if (p->sched_class->set_cpus_allowed)
++ p->sched_class->set_cpus_allowed(p, mask);
++ p->nr_cpus_allowed = cpumask_weight(mask);
++ raw_spin_unlock_irqrestore(&rq->lock, flags);
++ } else
++ p->migrate_disable = 0;
++
++ unpin_current_cpu();
++ preempt_enable();
++}
++EXPORT_SYMBOL(migrate_enable);
++#else
++static inline void update_migrate_disable(struct task_struct *p) { }
++#define migrate_disabled_updated(p) 0
++#endif
++
+ /*
+ * Pick up the highest-prio task:
+ */
+@@ -2802,6 +2921,8 @@ static void __sched __schedule(void)
+ smp_mb__before_spinlock();
+ raw_spin_lock_irq(&rq->lock);
+
++ update_migrate_disable(prev);
++
+ rq->clock_skip_update <<= 1; /* promote REQ to ACT */
+
+ switch_count = &prev->nivcsw;
+@@ -4803,11 +4924,13 @@ static struct rq *move_queued_task(struc
+
+ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
+ {
+- if (p->sched_class->set_cpus_allowed)
+- p->sched_class->set_cpus_allowed(p, new_mask);
++ if (!migrate_disabled_updated(p)) {
++ if (p->sched_class->set_cpus_allowed)
++ p->sched_class->set_cpus_allowed(p, new_mask);
++ p->nr_cpus_allowed = cpumask_weight(new_mask);
++ }
+
+ cpumask_copy(&p->cpus_allowed, new_mask);
+- p->nr_cpus_allowed = cpumask_weight(new_mask);
+ }
+
+ /*
+@@ -4853,7 +4976,7 @@ int set_cpus_allowed_ptr(struct task_str
+ do_set_cpus_allowed(p, new_mask);
+
+ /* Can the task run on the task's current CPU? If so, we're done */
+- if (cpumask_test_cpu(task_cpu(p), new_mask))
++ if (cpumask_test_cpu(task_cpu(p), new_mask) || __migrate_disabled(p))
+ goto out;
+
+ dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
+--- a/kernel/sched/debug.c
++++ b/kernel/sched/debug.c
+@@ -260,6 +260,9 @@ void print_rt_rq(struct seq_file *m, int
+ P(rt_throttled);
+ PN(rt_time);
+ PN(rt_runtime);
++#ifdef CONFIG_SMP
++ P(rt_nr_migratory);
++#endif
+
+ #undef PN
+ #undef P
+@@ -648,6 +651,10 @@ void proc_sched_show_task(struct task_st
+ #endif
+ P(policy);
+ P(prio);
++#ifdef CONFIG_PREEMPT_RT_FULL
++ P(migrate_disable);
++#endif
++ P(nr_cpus_allowed);
+ #undef PN
+ #undef __PN
+ #undef P
+--- a/lib/smp_processor_id.c
++++ b/lib/smp_processor_id.c
+@@ -39,8 +39,9 @@ notrace static unsigned int check_preemp
+ if (!printk_ratelimit())
+ goto out_enable;
+
+- printk(KERN_ERR "BUG: using %s%s() in preemptible [%08x] code: %s/%d\n",
+- what1, what2, preempt_count() - 1, current->comm, current->pid);
++ printk(KERN_ERR "BUG: using %s%s() in preemptible [%08x %08x] code: %s/%d\n",
++ what1, what2, preempt_count() - 1, __migrate_disabled(current),
++ current->comm, current->pid);
+
+ print_symbol("caller is %s\n", (long)__builtin_return_address(0));
+ dump_stack();
diff --git a/patches/ipc-make-rt-aware.patch b/patches/ipc-make-rt-aware.patch
new file mode 100644
index 00000000000000..78f3ed82b8d913
--- /dev/null
+++ b/patches/ipc-make-rt-aware.patch
@@ -0,0 +1,67 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:30:12 -0500
+Subject: ipc: Make the ipc code -rt aware
+
+RT serializes the code with the (rt)spinlock but keeps preemption
+enabled. Some parts of the code need to be atomic nevertheless.
+
+Protect it with preempt_disable/enable_rt pairts.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ ipc/msg.c | 16 +++++++++++++++-
+ 1 file changed, 15 insertions(+), 1 deletion(-)
+
+--- a/ipc/msg.c
++++ b/ipc/msg.c
+@@ -188,6 +188,12 @@ static void expunge_all(struct msg_queue
+ struct msg_receiver *msr, *t;
+
+ list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) {
++ /*
++ * Make sure that the wakeup doesnt preempt
++ * this CPU prematurely. (on PREEMPT_RT)
++ */
++ preempt_disable_rt();
++
+ msr->r_msg = NULL; /* initialize expunge ordering */
+ wake_up_process(msr->r_tsk);
+ /*
+@@ -198,6 +204,8 @@ static void expunge_all(struct msg_queue
+ */
+ smp_mb();
+ msr->r_msg = ERR_PTR(res);
++
++ preempt_enable_rt();
+ }
+ }
+
+@@ -574,6 +582,11 @@ static inline int pipelined_send(struct
+ if (testmsg(msg, msr->r_msgtype, msr->r_mode) &&
+ !security_msg_queue_msgrcv(msq, msg, msr->r_tsk,
+ msr->r_msgtype, msr->r_mode)) {
++ /*
++ * Make sure that the wakeup doesnt preempt
++ * this CPU prematurely. (on PREEMPT_RT)
++ */
++ preempt_disable_rt();
+
+ list_del(&msr->r_list);
+ if (msr->r_maxsize < msg->m_ts) {
+@@ -595,12 +608,13 @@ static inline int pipelined_send(struct
+ */
+ smp_mb();
+ msr->r_msg = msg;
++ preempt_enable_rt();
+
+ return 1;
+ }
++ preempt_enable_rt();
+ }
+ }
+-
+ return 0;
+ }
+
diff --git a/patches/ipc-sem-rework-semaphore-wakeups.patch b/patches/ipc-sem-rework-semaphore-wakeups.patch
new file mode 100644
index 00000000000000..ef31fe76aebaaa
--- /dev/null
+++ b/patches/ipc-sem-rework-semaphore-wakeups.patch
@@ -0,0 +1,69 @@
+Subject: ipc/sem: Rework semaphore wakeups
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Wed, 14 Sep 2011 11:57:04 +0200
+
+Current sysv sems have a weird ass wakeup scheme that involves keeping
+preemption disabled over a potential O(n^2) loop and busy waiting on
+that on other CPUs.
+
+Kill this and simply wake the task directly from under the sem_lock.
+
+This was discovered by a migrate_disable() debug feature that
+disallows:
+
+ spin_lock();
+ preempt_disable();
+ spin_unlock()
+ preempt_enable();
+
+Cc: Manfred Spraul <manfred@colorfullife.com>
+Suggested-by: Thomas Gleixner <tglx@linutronix.de>
+Reported-by: Mike Galbraith <efault@gmx.de>
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Cc: Manfred Spraul <manfred@colorfullife.com>
+Link: http://lkml.kernel.org/r/1315994224.5040.1.camel@twins
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ ipc/sem.c | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+--- a/ipc/sem.c
++++ b/ipc/sem.c
+@@ -680,6 +680,13 @@ static int perform_atomic_semop(struct s
+ static void wake_up_sem_queue_prepare(struct list_head *pt,
+ struct sem_queue *q, int error)
+ {
++#ifdef CONFIG_PREEMPT_RT_BASE
++ struct task_struct *p = q->sleeper;
++ get_task_struct(p);
++ q->status = error;
++ wake_up_process(p);
++ put_task_struct(p);
++#else
+ if (list_empty(pt)) {
+ /*
+ * Hold preempt off so that we don't get preempted and have the
+@@ -691,6 +698,7 @@ static void wake_up_sem_queue_prepare(st
+ q->pid = error;
+
+ list_add_tail(&q->list, pt);
++#endif
+ }
+
+ /**
+@@ -704,6 +712,7 @@ static void wake_up_sem_queue_prepare(st
+ */
+ static void wake_up_sem_queue_do(struct list_head *pt)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ struct sem_queue *q, *t;
+ int did_something;
+
+@@ -716,6 +725,7 @@ static void wake_up_sem_queue_do(struct
+ }
+ if (did_something)
+ preempt_enable();
++#endif
+ }
+
+ static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
diff --git a/patches/irq-allow-disabling-of-softirq-processing-in-irq-thread-context.patch b/patches/irq-allow-disabling-of-softirq-processing-in-irq-thread-context.patch
new file mode 100644
index 00000000000000..00eb5559e4d13d
--- /dev/null
+++ b/patches/irq-allow-disabling-of-softirq-processing-in-irq-thread-context.patch
@@ -0,0 +1,146 @@
+Subject: genirq: Allow disabling of softirq processing in irq thread context
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 31 Jan 2012 13:01:27 +0100
+
+The processing of softirqs in irq thread context is a performance gain
+for the non-rt workloads of a system, but it's counterproductive for
+interrupts which are explicitely related to the realtime
+workload. Allow such interrupts to prevent softirq processing in their
+thread context.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/interrupt.h | 2 ++
+ include/linux/irq.h | 4 +++-
+ kernel/irq/manage.c | 13 ++++++++++++-
+ kernel/irq/settings.h | 12 ++++++++++++
+ kernel/softirq.c | 9 +++++++++
+ 5 files changed, 38 insertions(+), 2 deletions(-)
+
+--- a/include/linux/interrupt.h
++++ b/include/linux/interrupt.h
+@@ -61,6 +61,7 @@
+ * interrupt handler after suspending interrupts. For system
+ * wakeup devices users need to implement wakeup detection in
+ * their interrupt handlers.
++ * IRQF_NO_SOFTIRQ_CALL - Do not process softirqs in the irq thread context (RT)
+ */
+ #define IRQF_SHARED 0x00000080
+ #define IRQF_PROBE_SHARED 0x00000100
+@@ -74,6 +75,7 @@
+ #define IRQF_NO_THREAD 0x00010000
+ #define IRQF_EARLY_RESUME 0x00020000
+ #define IRQF_COND_SUSPEND 0x00040000
++#define IRQF_NO_SOFTIRQ_CALL 0x00080000
+
+ #define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
+
+--- a/include/linux/irq.h
++++ b/include/linux/irq.h
+@@ -72,6 +72,7 @@ enum irqchip_irq_state;
+ * IRQ_IS_POLLED - Always polled by another interrupt. Exclude
+ * it from the spurious interrupt detection
+ * mechanism and from core side polling.
++ * IRQ_NO_SOFTIRQ_CALL - No softirq processing in the irq thread context (RT)
+ */
+ enum {
+ IRQ_TYPE_NONE = 0x00000000,
+@@ -97,13 +98,14 @@ enum {
+ IRQ_NOTHREAD = (1 << 16),
+ IRQ_PER_CPU_DEVID = (1 << 17),
+ IRQ_IS_POLLED = (1 << 18),
++ IRQ_NO_SOFTIRQ_CALL = (1 << 19),
+ };
+
+ #define IRQF_MODIFY_MASK \
+ (IRQ_TYPE_SENSE_MASK | IRQ_NOPROBE | IRQ_NOREQUEST | \
+ IRQ_NOAUTOEN | IRQ_MOVE_PCNTXT | IRQ_LEVEL | IRQ_NO_BALANCING | \
+ IRQ_PER_CPU | IRQ_NESTED_THREAD | IRQ_NOTHREAD | IRQ_PER_CPU_DEVID | \
+- IRQ_IS_POLLED)
++ IRQ_IS_POLLED | IRQ_NO_SOFTIRQ_CALL)
+
+ #define IRQ_NO_BALANCING_MASK (IRQ_PER_CPU | IRQ_NO_BALANCING)
+
+--- a/kernel/irq/manage.c
++++ b/kernel/irq/manage.c
+@@ -900,7 +900,15 @@ irq_forced_thread_fn(struct irq_desc *de
+ local_bh_disable();
+ ret = action->thread_fn(action->irq, action->dev_id);
+ irq_finalize_oneshot(desc, action);
+- local_bh_enable();
++ /*
++ * Interrupts which have real time requirements can be set up
++ * to avoid softirq processing in the thread handler. This is
++ * safe as these interrupts do not raise soft interrupts.
++ */
++ if (irq_settings_no_softirq_call(desc))
++ _local_bh_enable();
++ else
++ local_bh_enable();
+ return ret;
+ }
+
+@@ -1296,6 +1304,9 @@ static int
+ irqd_set(&desc->irq_data, IRQD_NO_BALANCING);
+ }
+
++ if (new->flags & IRQF_NO_SOFTIRQ_CALL)
++ irq_settings_set_no_softirq_call(desc);
++
+ /* Set default affinity mask once everything is setup */
+ setup_affinity(irq, desc, mask);
+
+--- a/kernel/irq/settings.h
++++ b/kernel/irq/settings.h
+@@ -15,6 +15,7 @@ enum {
+ _IRQ_NESTED_THREAD = IRQ_NESTED_THREAD,
+ _IRQ_PER_CPU_DEVID = IRQ_PER_CPU_DEVID,
+ _IRQ_IS_POLLED = IRQ_IS_POLLED,
++ _IRQ_NO_SOFTIRQ_CALL = IRQ_NO_SOFTIRQ_CALL,
+ _IRQF_MODIFY_MASK = IRQF_MODIFY_MASK,
+ };
+
+@@ -28,6 +29,7 @@ enum {
+ #define IRQ_NESTED_THREAD GOT_YOU_MORON
+ #define IRQ_PER_CPU_DEVID GOT_YOU_MORON
+ #define IRQ_IS_POLLED GOT_YOU_MORON
++#define IRQ_NO_SOFTIRQ_CALL GOT_YOU_MORON
+ #undef IRQF_MODIFY_MASK
+ #define IRQF_MODIFY_MASK GOT_YOU_MORON
+
+@@ -38,6 +40,16 @@ irq_settings_clr_and_set(struct irq_desc
+ desc->status_use_accessors |= (set & _IRQF_MODIFY_MASK);
+ }
+
++static inline bool irq_settings_no_softirq_call(struct irq_desc *desc)
++{
++ return desc->status_use_accessors & _IRQ_NO_SOFTIRQ_CALL;
++}
++
++static inline void irq_settings_set_no_softirq_call(struct irq_desc *desc)
++{
++ desc->status_use_accessors |= _IRQ_NO_SOFTIRQ_CALL;
++}
++
+ static inline bool irq_settings_is_per_cpu(struct irq_desc *desc)
+ {
+ return desc->status_use_accessors & _IRQ_PER_CPU;
+--- a/kernel/softirq.c
++++ b/kernel/softirq.c
+@@ -606,6 +606,15 @@ void local_bh_enable_ip(unsigned long ip
+ }
+ EXPORT_SYMBOL(local_bh_enable_ip);
+
++void _local_bh_enable(void)
++{
++ if (WARN_ON(current->softirq_nestcnt == 0))
++ return;
++ if (--current->softirq_nestcnt == 0)
++ migrate_enable();
++}
++EXPORT_SYMBOL(_local_bh_enable);
++
+ int in_serving_softirq(void)
+ {
+ return current->flags & PF_IN_SOFTIRQ;
diff --git a/patches/irqwork-push_most_work_into_softirq_context.patch b/patches/irqwork-push_most_work_into_softirq_context.patch
new file mode 100644
index 00000000000000..b4ffd72b23ed81
--- /dev/null
+++ b/patches/irqwork-push_most_work_into_softirq_context.patch
@@ -0,0 +1,197 @@
+Subject: irqwork: push most work into softirq context
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Tue, 23 Jun 2015 15:32:51 +0200
+
+Initially we defered all irqwork into softirq because we didn't want the
+latency spikes if perf or another user was busy and delayed the RT task.
+The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
+as expected if it did not run in the original irqwork context so we had to
+bring it back somehow for it. push_irq_work_func is the second one that
+requires this.
+
+This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
+in raw-irq context. Everything else is defered into softirq context. Without
+-RT we have the orignal behavior.
+
+This patch incorporates tglx orignal work which revoked a little bringing back
+the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
+Mike Galbraith,
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/irq_work.h | 1 +
+ kernel/irq_work.c | 47 ++++++++++++++++++++++++++++++++++-------------
+ kernel/sched/rt.c | 1 +
+ kernel/time/tick-sched.c | 6 ++++++
+ kernel/time/timer.c | 6 +++++-
+ 5 files changed, 47 insertions(+), 14 deletions(-)
+
+--- a/include/linux/irq_work.h
++++ b/include/linux/irq_work.h
+@@ -16,6 +16,7 @@
+ #define IRQ_WORK_BUSY 2UL
+ #define IRQ_WORK_FLAGS 3UL
+ #define IRQ_WORK_LAZY 4UL /* Doesn't want IPI, wait for tick */
++#define IRQ_WORK_HARD_IRQ 8UL /* Run hard IRQ context, even on RT */
+
+ struct irq_work {
+ unsigned long flags;
+--- a/kernel/irq_work.c
++++ b/kernel/irq_work.c
+@@ -17,6 +17,7 @@
+ #include <linux/cpu.h>
+ #include <linux/notifier.h>
+ #include <linux/smp.h>
++#include <linux/interrupt.h>
+ #include <asm/processor.h>
+
+
+@@ -65,6 +66,8 @@ void __weak arch_irq_work_raise(void)
+ */
+ bool irq_work_queue_on(struct irq_work *work, int cpu)
+ {
++ struct llist_head *list;
++
+ /* All work should have been flushed before going offline */
+ WARN_ON_ONCE(cpu_is_offline(cpu));
+
+@@ -75,7 +78,12 @@ bool irq_work_queue_on(struct irq_work *
+ if (!irq_work_claim(work))
+ return false;
+
+- if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
++ if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !(work->flags & IRQ_WORK_HARD_IRQ))
++ list = &per_cpu(lazy_list, cpu);
++ else
++ list = &per_cpu(raised_list, cpu);
++
++ if (llist_add(&work->llnode, list))
+ arch_send_call_function_single_ipi(cpu);
+
+ return true;
+@@ -86,6 +94,9 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on);
+ /* Enqueue the irq work @work on the current CPU */
+ bool irq_work_queue(struct irq_work *work)
+ {
++ struct llist_head *list;
++ bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
++
+ /* Only queue if not already pending */
+ if (!irq_work_claim(work))
+ return false;
+@@ -93,13 +104,15 @@ bool irq_work_queue(struct irq_work *wor
+ /* Queue the entry and raise the IPI if needed. */
+ preempt_disable();
+
+- /* If the work is "lazy", handle it from next tick if any */
+- if (work->flags & IRQ_WORK_LAZY) {
+- if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
+- tick_nohz_tick_stopped())
+- arch_irq_work_raise();
+- } else {
+- if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
++ lazy_work = work->flags & IRQ_WORK_LAZY;
++
++ if (lazy_work || (realtime && !(work->flags & IRQ_WORK_HARD_IRQ)))
++ list = this_cpu_ptr(&lazy_list);
++ else
++ list = this_cpu_ptr(&raised_list);
++
++ if (llist_add(&work->llnode, list)) {
++ if (!lazy_work || tick_nohz_tick_stopped())
+ arch_irq_work_raise();
+ }
+
+@@ -116,9 +129,8 @@ bool irq_work_needs_cpu(void)
+ raised = this_cpu_ptr(&raised_list);
+ lazy = this_cpu_ptr(&lazy_list);
+
+- if (llist_empty(raised) || arch_irq_work_has_interrupt())
+- if (llist_empty(lazy))
+- return false;
++ if (llist_empty(raised) && llist_empty(lazy))
++ return false;
+
+ /* All work should have been flushed before going offline */
+ WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
+@@ -132,7 +144,7 @@ static void irq_work_run_list(struct lli
+ struct irq_work *work;
+ struct llist_node *llnode;
+
+- BUG_ON(!irqs_disabled());
++ BUG_ON_NONRT(!irqs_disabled());
+
+ if (llist_empty(list))
+ return;
+@@ -169,7 +181,16 @@ static void irq_work_run_list(struct lli
+ void irq_work_run(void)
+ {
+ irq_work_run_list(this_cpu_ptr(&raised_list));
+- irq_work_run_list(this_cpu_ptr(&lazy_list));
++ if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
++ /*
++ * NOTE: we raise softirq via IPI for safety,
++ * and execute in irq_work_tick() to move the
++ * overhead from hard to soft irq context.
++ */
++ if (!llist_empty(this_cpu_ptr(&lazy_list)))
++ raise_softirq(TIMER_SOFTIRQ);
++ } else
++ irq_work_run_list(this_cpu_ptr(&lazy_list));
+ }
+ EXPORT_SYMBOL_GPL(irq_work_run);
+
+--- a/kernel/sched/rt.c
++++ b/kernel/sched/rt.c
+@@ -90,6 +90,7 @@ void init_rt_rq(struct rt_rq *rt_rq)
+ rt_rq->push_cpu = nr_cpu_ids;
+ raw_spin_lock_init(&rt_rq->push_lock);
+ init_irq_work(&rt_rq->push_work, push_irq_work_func);
++ rt_rq->push_work.flags |= IRQ_WORK_HARD_IRQ;
+ #endif
+ #endif /* CONFIG_SMP */
+ /* We start is dequeued state, because no RT tasks are queued */
+--- a/kernel/time/tick-sched.c
++++ b/kernel/time/tick-sched.c
+@@ -181,6 +181,11 @@ static bool can_stop_full_tick(void)
+ return false;
+ }
+
++ if (!arch_irq_work_has_interrupt()) {
++ trace_tick_stop(0, "missing irq work interrupt\n");
++ return false;
++ }
++
+ /* sched_clock_tick() needs us? */
+ #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
+ /*
+@@ -227,6 +232,7 @@ static void nohz_full_kick_work_func(str
+
+ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
+ .func = nohz_full_kick_work_func,
++ .flags = IRQ_WORK_HARD_IRQ,
+ };
+
+ /*
+--- a/kernel/time/timer.c
++++ b/kernel/time/timer.c
+@@ -1455,7 +1455,7 @@ void update_process_times(int user_tick)
+ scheduler_tick();
+ run_local_timers();
+ rcu_check_callbacks(user_tick);
+-#ifdef CONFIG_IRQ_WORK
++#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
+ if (in_irq())
+ irq_work_tick();
+ #endif
+@@ -1471,6 +1471,10 @@ static void run_timer_softirq(struct sof
+
+ hrtimer_run_pending();
+
++#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT_FULL)
++ irq_work_tick();
++#endif
++
+ if (time_after_eq(jiffies, base->timer_jiffies))
+ __run_timers(base);
+ }
diff --git a/patches/jump-label-rt.patch b/patches/jump-label-rt.patch
new file mode 100644
index 00000000000000..b5ea36db24b1e7
--- /dev/null
+++ b/patches/jump-label-rt.patch
@@ -0,0 +1,35 @@
+Subject: jump-label: disable if stop_machine() is used
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 08 Jul 2015 17:14:48 +0200
+
+Some architectures are using stop_machine() while switching the opcode which
+leads to latency spikes.
+The architectures which use stop_machine() atm:
+- ARM stop machine
+- s390 stop machine
+
+The architecures which use other sorcery:
+- MIPS
+- X86
+- powerpc
+- sparc
+- arm64
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+[bigeasy: only ARM for now]
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/arm/Kconfig | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/arm/Kconfig
++++ b/arch/arm/Kconfig
+@@ -31,7 +31,7 @@ config ARM
+ select HARDIRQS_SW_RESEND
+ select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
+ select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
+- select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL
++ select HAVE_ARCH_JUMP_LABEL if (!XIP_KERNEL && !PREEMPT_RT_BASE)
+ select HAVE_ARCH_KGDB
+ select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
+ select HAVE_ARCH_TRACEHOOK
diff --git a/patches/kconfig-disable-a-few-options-rt.patch b/patches/kconfig-disable-a-few-options-rt.patch
new file mode 100644
index 00000000000000..10df569c51a077
--- /dev/null
+++ b/patches/kconfig-disable-a-few-options-rt.patch
@@ -0,0 +1,33 @@
+Subject: kconfig: Disable config options which are not RT compatible
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 24 Jul 2011 12:11:43 +0200
+
+Disable stuff which is known to have issues on RT
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/Kconfig | 1 +
+ mm/Kconfig | 2 +-
+ 2 files changed, 2 insertions(+), 1 deletion(-)
+
+--- a/arch/Kconfig
++++ b/arch/Kconfig
+@@ -6,6 +6,7 @@ config OPROFILE
+ tristate "OProfile system profiling"
+ depends on PROFILING
+ depends on HAVE_OPROFILE
++ depends on !PREEMPT_RT_FULL
+ select RING_BUFFER
+ select RING_BUFFER_ALLOW_SWAP
+ help
+--- a/mm/Kconfig
++++ b/mm/Kconfig
+@@ -409,7 +409,7 @@ config NOMMU_INITIAL_TRIM_EXCESS
+
+ config TRANSPARENT_HUGEPAGE
+ bool "Transparent Hugepage Support"
+- depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE
++ depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE && !PREEMPT_RT_FULL
+ select COMPACTION
+ help
+ Transparent Hugepages allows the kernel to use huge pages and
diff --git a/patches/kconfig-preempt-rt-full.patch b/patches/kconfig-preempt-rt-full.patch
new file mode 100644
index 00000000000000..b43504e381de8e
--- /dev/null
+++ b/patches/kconfig-preempt-rt-full.patch
@@ -0,0 +1,58 @@
+Subject: kconfig: Add PREEMPT_RT_FULL
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 29 Jun 2011 14:58:57 +0200
+
+Introduce the final symbol for PREEMPT_RT_FULL.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ init/Makefile | 2 +-
+ kernel/Kconfig.preempt | 8 ++++++++
+ scripts/mkcompile_h | 4 +++-
+ 3 files changed, 12 insertions(+), 2 deletions(-)
+
+--- a/init/Makefile
++++ b/init/Makefile
+@@ -33,4 +33,4 @@ mounts-$(CONFIG_BLK_DEV_MD) += do_mounts
+ include/generated/compile.h: FORCE
+ @$($(quiet)chk_compile.h)
+ $(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkcompile_h $@ \
+- "$(UTS_MACHINE)" "$(CONFIG_SMP)" "$(CONFIG_PREEMPT)" "$(CC) $(KBUILD_CFLAGS)"
++ "$(UTS_MACHINE)" "$(CONFIG_SMP)" "$(CONFIG_PREEMPT)" "$(CONFIG_PREEMPT_RT_FULL)" "$(CC) $(KBUILD_CFLAGS)"
+--- a/kernel/Kconfig.preempt
++++ b/kernel/Kconfig.preempt
+@@ -67,6 +67,14 @@ config PREEMPT_RTB
+ enables changes which are preliminary for the full preemptible
+ RT kernel.
+
++config PREEMPT_RT_FULL
++ bool "Fully Preemptible Kernel (RT)"
++ depends on IRQ_FORCED_THREADING
++ select PREEMPT_RT_BASE
++ select PREEMPT_RCU
++ help
++ All and everything
++
+ endchoice
+
+ config PREEMPT_COUNT
+--- a/scripts/mkcompile_h
++++ b/scripts/mkcompile_h
+@@ -4,7 +4,8 @@ TARGET=$1
+ ARCH=$2
+ SMP=$3
+ PREEMPT=$4
+-CC=$5
++RT=$5
++CC=$6
+
+ vecho() { [ "${quiet}" = "silent_" ] || echo "$@" ; }
+
+@@ -57,6 +58,7 @@ UTS_VERSION="#$VERSION"
+ CONFIG_FLAGS=""
+ if [ -n "$SMP" ] ; then CONFIG_FLAGS="SMP"; fi
+ if [ -n "$PREEMPT" ] ; then CONFIG_FLAGS="$CONFIG_FLAGS PREEMPT"; fi
++if [ -n "$RT" ] ; then CONFIG_FLAGS="$CONFIG_FLAGS RT"; fi
+ UTS_VERSION="$UTS_VERSION $CONFIG_FLAGS $TIMESTAMP"
+
+ # Truncate to maximum length
diff --git a/patches/kernel-SRCU-provide-a-static-initializer.patch b/patches/kernel-SRCU-provide-a-static-initializer.patch
new file mode 100644
index 00000000000000..d63ac9a0f40f58
--- /dev/null
+++ b/patches/kernel-SRCU-provide-a-static-initializer.patch
@@ -0,0 +1,124 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Tue, 19 Mar 2013 14:44:30 +0100
+Subject: [PATCH] kernel/SRCU: provide a static initializer
+
+There are macros for static initializer for the three out of four
+possible notifier types, that are:
+ ATOMIC_NOTIFIER_HEAD()
+ BLOCKING_NOTIFIER_HEAD()
+ RAW_NOTIFIER_HEAD()
+
+This patch provides a static initilizer for the forth type to make it
+complete.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/notifier.h | 34 +++++++++++++++++++++++++---------
+ include/linux/srcu.h | 6 +++---
+ 2 files changed, 28 insertions(+), 12 deletions(-)
+
+--- a/include/linux/notifier.h
++++ b/include/linux/notifier.h
+@@ -6,7 +6,7 @@
+ *
+ * Alan Cox <Alan.Cox@linux.org>
+ */
+-
++
+ #ifndef _LINUX_NOTIFIER_H
+ #define _LINUX_NOTIFIER_H
+ #include <linux/errno.h>
+@@ -42,9 +42,7 @@
+ * in srcu_notifier_call_chain(): no cache bounces and no memory barriers.
+ * As compensation, srcu_notifier_chain_unregister() is rather expensive.
+ * SRCU notifier chains should be used when the chain will be called very
+- * often but notifier_blocks will seldom be removed. Also, SRCU notifier
+- * chains are slightly more difficult to use because they require special
+- * runtime initialization.
++ * often but notifier_blocks will seldom be removed.
+ */
+
+ typedef int (*notifier_fn_t)(struct notifier_block *nb,
+@@ -88,7 +86,7 @@ struct srcu_notifier_head {
+ (name)->head = NULL; \
+ } while (0)
+
+-/* srcu_notifier_heads must be initialized and cleaned up dynamically */
++/* srcu_notifier_heads must be cleaned up dynamically */
+ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
+ #define srcu_cleanup_notifier_head(name) \
+ cleanup_srcu_struct(&(name)->srcu);
+@@ -101,7 +99,13 @@ extern void srcu_init_notifier_head(stru
+ .head = NULL }
+ #define RAW_NOTIFIER_INIT(name) { \
+ .head = NULL }
+-/* srcu_notifier_heads cannot be initialized statically */
++
++#define SRCU_NOTIFIER_INIT(name, pcpu) \
++ { \
++ .mutex = __MUTEX_INITIALIZER(name.mutex), \
++ .head = NULL, \
++ .srcu = __SRCU_STRUCT_INIT(name.srcu, pcpu), \
++ }
+
+ #define ATOMIC_NOTIFIER_HEAD(name) \
+ struct atomic_notifier_head name = \
+@@ -113,6 +117,18 @@ extern void srcu_init_notifier_head(stru
+ struct raw_notifier_head name = \
+ RAW_NOTIFIER_INIT(name)
+
++#define _SRCU_NOTIFIER_HEAD(name, mod) \
++ static DEFINE_PER_CPU(struct srcu_struct_array, \
++ name##_head_srcu_array); \
++ mod struct srcu_notifier_head name = \
++ SRCU_NOTIFIER_INIT(name, name##_head_srcu_array)
++
++#define SRCU_NOTIFIER_HEAD(name) \
++ _SRCU_NOTIFIER_HEAD(name, )
++
++#define SRCU_NOTIFIER_HEAD_STATIC(name) \
++ _SRCU_NOTIFIER_HEAD(name, static)
++
+ #ifdef __KERNEL__
+
+ extern int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
+@@ -182,12 +198,12 @@ static inline int notifier_to_errno(int
+
+ /*
+ * Declared notifiers so far. I can imagine quite a few more chains
+- * over time (eg laptop power reset chains, reboot chain (to clean
++ * over time (eg laptop power reset chains, reboot chain (to clean
+ * device units up), device [un]mount chain, module load/unload chain,
+- * low memory chain, screenblank chain (for plug in modular screenblankers)
++ * low memory chain, screenblank chain (for plug in modular screenblankers)
+ * VC switch chains (for loadable kernel svgalib VC switch helpers) etc...
+ */
+-
++
+ /* CPU notfiers are defined in include/linux/cpu.h. */
+
+ /* netdevice notifiers are defined in include/linux/netdevice.h */
+--- a/include/linux/srcu.h
++++ b/include/linux/srcu.h
+@@ -84,10 +84,10 @@ int init_srcu_struct(struct srcu_struct
+
+ void process_srcu(struct work_struct *work);
+
+-#define __SRCU_STRUCT_INIT(name) \
++#define __SRCU_STRUCT_INIT(name, pcpu_name) \
+ { \
+ .completed = -300, \
+- .per_cpu_ref = &name##_srcu_array, \
++ .per_cpu_ref = &pcpu_name, \
+ .queue_lock = __SPIN_LOCK_UNLOCKED(name.queue_lock), \
+ .running = false, \
+ .batch_queue = RCU_BATCH_INIT(name.batch_queue), \
+@@ -104,7 +104,7 @@ void process_srcu(struct work_struct *wo
+ */
+ #define __DEFINE_SRCU(name, is_static) \
+ static DEFINE_PER_CPU(struct srcu_struct_array, name##_srcu_array);\
+- is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
++ is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name##_srcu_array)
+ #define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
+ #define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
+
diff --git a/patches/kernel-cpu-fix-cpu-down-problem-if-kthread-s-cpu-is-.patch b/patches/kernel-cpu-fix-cpu-down-problem-if-kthread-s-cpu-is-.patch
new file mode 100644
index 00000000000000..421b4966210bb6
--- /dev/null
+++ b/patches/kernel-cpu-fix-cpu-down-problem-if-kthread-s-cpu-is-.patch
@@ -0,0 +1,85 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 7 Jun 2013 22:37:06 +0200
+Subject: kernel/cpu: fix cpu down problem if kthread's cpu is going down
+
+If kthread is pinned to CPUx and CPUx is going down then we get into
+trouble:
+- first the unplug thread is created
+- it will set itself to hp->unplug. As a result, every task that is
+ going to take a lock, has to leave the CPU.
+- the CPU_DOWN_PREPARE notifier are started. The worker thread will
+ start a new process for the "high priority worker".
+ Now kthread would like to take a lock but since it can't leave the CPU
+ it will never complete its task.
+
+We could fire the unplug thread after the notifier but then the cpu is
+no longer marked "online" and the unplug thread will run on CPU0 which
+was fixed before :)
+
+So instead the unplug thread is started and kept waiting until the
+notfier complete their work.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/cpu.c | 15 +++++++++++++--
+ 1 file changed, 13 insertions(+), 2 deletions(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -108,6 +108,7 @@ struct hotplug_pcp {
+ int refcount;
+ int grab_lock;
+ struct completion synced;
++ struct completion unplug_wait;
+ #ifdef CONFIG_PREEMPT_RT_FULL
+ /*
+ * Note, on PREEMPT_RT, the hotplug lock must save the state of
+@@ -211,6 +212,7 @@ static int sync_unplug_thread(void *data
+ {
+ struct hotplug_pcp *hp = data;
+
++ wait_for_completion(&hp->unplug_wait);
+ preempt_disable();
+ hp->unplug = current;
+ wait_for_pinned_cpus(hp);
+@@ -276,6 +278,14 @@ static void __cpu_unplug_sync(struct hot
+ wait_for_completion(&hp->synced);
+ }
+
++static void __cpu_unplug_wait(unsigned int cpu)
++{
++ struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
++
++ complete(&hp->unplug_wait);
++ wait_for_completion(&hp->synced);
++}
++
+ /*
+ * Start the sync_unplug_thread on the target cpu and wait for it to
+ * complete.
+@@ -299,6 +309,7 @@ static int cpu_unplug_begin(unsigned int
+ tell_sched_cpu_down_begin(cpu);
+
+ init_completion(&hp->synced);
++ init_completion(&hp->unplug_wait);
+
+ hp->sync_tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d", cpu);
+ if (IS_ERR(hp->sync_tsk)) {
+@@ -314,8 +325,7 @@ static int cpu_unplug_begin(unsigned int
+ * wait for tasks that are going to enter these sections and
+ * we must not have them block.
+ */
+- __cpu_unplug_sync(hp);
+-
++ wake_up_process(hp->sync_tsk);
+ return 0;
+ }
+
+@@ -682,6 +692,7 @@ static int __ref _cpu_down(unsigned int
+ #endif
+ synchronize_rcu();
+
++ __cpu_unplug_wait(cpu);
+ smpboot_park_threads(cpu);
+
+ /* Notifiers are done. Don't let any more tasks pin this CPU. */
diff --git a/patches/kernel-hotplug-restore-original-cpu-mask-oncpu-down.patch b/patches/kernel-hotplug-restore-original-cpu-mask-oncpu-down.patch
new file mode 100644
index 00000000000000..44da96651dfe05
--- /dev/null
+++ b/patches/kernel-hotplug-restore-original-cpu-mask-oncpu-down.patch
@@ -0,0 +1,58 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 14 Jun 2013 17:16:35 +0200
+Subject: kernel/hotplug: restore original cpu mask oncpu/down
+
+If a task which is allowed to run only on CPU X puts CPU Y down then it
+will be allowed on all CPUs but the on CPU Y after it comes back from
+kernel. This patch ensures that we don't lose the initial setting unless
+the CPU the task is running is going down.
+
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/cpu.c | 13 ++++++++++++-
+ 1 file changed, 12 insertions(+), 1 deletion(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -640,6 +640,7 @@ static int __ref _cpu_down(unsigned int
+ .hcpu = hcpu,
+ };
+ cpumask_var_t cpumask;
++ cpumask_var_t cpumask_org;
+
+ if (num_online_cpus() == 1)
+ return -EBUSY;
+@@ -650,6 +651,12 @@ static int __ref _cpu_down(unsigned int
+ /* Move the downtaker off the unplug cpu */
+ if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
+ return -ENOMEM;
++ if (!alloc_cpumask_var(&cpumask_org, GFP_KERNEL)) {
++ free_cpumask_var(cpumask);
++ return -ENOMEM;
++ }
++
++ cpumask_copy(cpumask_org, tsk_cpus_allowed(current));
+ cpumask_andnot(cpumask, cpu_online_mask, cpumask_of(cpu));
+ set_cpus_allowed_ptr(current, cpumask);
+ free_cpumask_var(cpumask);
+@@ -658,7 +665,8 @@ static int __ref _cpu_down(unsigned int
+ if (mycpu == cpu) {
+ printk(KERN_ERR "Yuck! Still on unplug CPU\n!");
+ migrate_enable();
+- return -EBUSY;
++ err = -EBUSY;
++ goto restore_cpus;
+ }
+
+ cpu_hotplug_begin();
+@@ -740,6 +748,9 @@ static int __ref _cpu_down(unsigned int
+ cpu_hotplug_done();
+ if (!err)
+ cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
++restore_cpus:
++ set_cpus_allowed_ptr(current, cpumask_org);
++ free_cpumask_var(cpumask_org);
+ return err;
+ }
+
diff --git a/patches/kgb-serial-hackaround.patch b/patches/kgb-serial-hackaround.patch
new file mode 100644
index 00000000000000..95a205b90c588f
--- /dev/null
+++ b/patches/kgb-serial-hackaround.patch
@@ -0,0 +1,101 @@
+From: Jason Wessel <jason.wessel@windriver.com>
+Date: Thu, 28 Jul 2011 12:42:23 -0500
+Subject: kgdb/serial: Short term workaround
+
+On 07/27/2011 04:37 PM, Thomas Gleixner wrote:
+> - KGDB (not yet disabled) is reportedly unusable on -rt right now due
+> to missing hacks in the console locking which I dropped on purpose.
+>
+
+To work around this in the short term you can use this patch, in
+addition to the clocksource watchdog patch that Thomas brewed up.
+
+Comments are welcome of course. Ultimately the right solution is to
+change separation between the console and the HW to have a polled mode
++ work queue so as not to introduce any kind of latency.
+
+Thanks,
+Jason.
+
+---
+ drivers/tty/serial/8250/8250_core.c | 3 ++-
+ include/linux/kdb.h | 2 ++
+ kernel/debug/kdb/kdb_io.c | 6 ++----
+ 3 files changed, 6 insertions(+), 5 deletions(-)
+
+--- a/drivers/tty/serial/8250/8250_core.c
++++ b/drivers/tty/serial/8250/8250_core.c
+@@ -36,6 +36,7 @@
+ #include <linux/nmi.h>
+ #include <linux/mutex.h>
+ #include <linux/slab.h>
++#include <linux/kdb.h>
+ #include <linux/uaccess.h>
+ #include <linux/pm_runtime.h>
+ #ifdef CONFIG_SPARC
+@@ -3373,7 +3374,7 @@ static void serial8250_console_write(str
+
+ if (port->sysrq)
+ locked = 0;
+- else if (oops_in_progress)
++ else if (oops_in_progress || in_kdb_printk())
+ locked = spin_trylock_irqsave(&port->lock, flags);
+ else
+ spin_lock_irqsave(&port->lock, flags);
+--- a/include/linux/kdb.h
++++ b/include/linux/kdb.h
+@@ -167,6 +167,7 @@ extern __printf(2, 0) int vkdb_printf(en
+ extern __printf(1, 2) int kdb_printf(const char *, ...);
+ typedef __printf(1, 2) int (*kdb_printf_t)(const char *, ...);
+
++#define in_kdb_printk() (kdb_trap_printk)
+ extern void kdb_init(int level);
+
+ /* Access to kdb specific polling devices */
+@@ -201,6 +202,7 @@ extern int kdb_register_flags(char *, kd
+ extern int kdb_unregister(char *);
+ #else /* ! CONFIG_KGDB_KDB */
+ static inline __printf(1, 2) int kdb_printf(const char *fmt, ...) { return 0; }
++#define in_kdb_printk() (0)
+ static inline void kdb_init(int level) {}
+ static inline int kdb_register(char *cmd, kdb_func_t func, char *usage,
+ char *help, short minlen) { return 0; }
+--- a/kernel/debug/kdb/kdb_io.c
++++ b/kernel/debug/kdb/kdb_io.c
+@@ -554,7 +554,6 @@ int vkdb_printf(enum kdb_msgsrc src, con
+ int linecount;
+ int colcount;
+ int logging, saved_loglevel = 0;
+- int saved_trap_printk;
+ int got_printf_lock = 0;
+ int retlen = 0;
+ int fnd, len;
+@@ -565,8 +564,6 @@ int vkdb_printf(enum kdb_msgsrc src, con
+ unsigned long uninitialized_var(flags);
+
+ preempt_disable();
+- saved_trap_printk = kdb_trap_printk;
+- kdb_trap_printk = 0;
+
+ /* Serialize kdb_printf if multiple cpus try to write at once.
+ * But if any cpu goes recursive in kdb, just print the output,
+@@ -855,7 +852,6 @@ int vkdb_printf(enum kdb_msgsrc src, con
+ } else {
+ __release(kdb_printf_lock);
+ }
+- kdb_trap_printk = saved_trap_printk;
+ preempt_enable();
+ return retlen;
+ }
+@@ -865,9 +861,11 @@ int kdb_printf(const char *fmt, ...)
+ va_list ap;
+ int r;
+
++ kdb_trap_printk++;
+ va_start(ap, fmt);
+ r = vkdb_printf(KDB_MSGSRC_INTERNAL, fmt, ap);
+ va_end(ap);
++ kdb_trap_printk--;
+
+ return r;
+ }
diff --git a/patches/latency-hist.patch b/patches/latency-hist.patch
new file mode 100644
index 00000000000000..a2d6273dac0cc7
--- /dev/null
+++ b/patches/latency-hist.patch
@@ -0,0 +1,1808 @@
+Subject: tracing: Add latency histograms
+From: Carsten Emde <C.Emde@osadl.org>
+Date: Tue, 19 Jul 2011 14:03:41 +0100
+
+This patch provides a recording mechanism to store data of potential
+sources of system latencies. The recordings separately determine the
+latency caused by a delayed timer expiration, by a delayed wakeup of the
+related user space program and by the sum of both. The histograms can be
+enabled and reset individually. The data are accessible via the debug
+filesystem. For details please consult Documentation/trace/histograms.txt.
+
+Signed-off-by: Carsten Emde <C.Emde@osadl.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ Documentation/trace/histograms.txt | 186 +++++
+ include/linux/hrtimer.h | 3
+ include/linux/sched.h | 6
+ include/trace/events/hist.h | 72 ++
+ include/trace/events/latency_hist.h | 29
+ kernel/time/hrtimer.c | 21
+ kernel/trace/Kconfig | 104 +++
+ kernel/trace/Makefile | 4
+ kernel/trace/latency_hist.c | 1178 ++++++++++++++++++++++++++++++++++++
+ kernel/trace/trace_irqsoff.c | 11
+ 10 files changed, 1614 insertions(+)
+
+--- /dev/null
++++ b/Documentation/trace/histograms.txt
+@@ -0,0 +1,186 @@
++ Using the Linux Kernel Latency Histograms
++
++
++This document gives a short explanation how to enable, configure and use
++latency histograms. Latency histograms are primarily relevant in the
++context of real-time enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)
++and are used in the quality management of the Linux real-time
++capabilities.
++
++
++* Purpose of latency histograms
++
++A latency histogram continuously accumulates the frequencies of latency
++data. There are two types of histograms
++- potential sources of latencies
++- effective latencies
++
++
++* Potential sources of latencies
++
++Potential sources of latencies are code segments where interrupts,
++preemption or both are disabled (aka critical sections). To create
++histograms of potential sources of latency, the kernel stores the time
++stamp at the start of a critical section, determines the time elapsed
++when the end of the section is reached, and increments the frequency
++counter of that latency value - irrespective of whether any concurrently
++running process is affected by latency or not.
++- Configuration items (in the Kernel hacking/Tracers submenu)
++ CONFIG_INTERRUPT_OFF_LATENCY
++ CONFIG_PREEMPT_OFF_LATENCY
++
++
++* Effective latencies
++
++Effective latencies are actually occuring during wakeup of a process. To
++determine effective latencies, the kernel stores the time stamp when a
++process is scheduled to be woken up, and determines the duration of the
++wakeup time shortly before control is passed over to this process. Note
++that the apparent latency in user space may be somewhat longer, since the
++process may be interrupted after control is passed over to it but before
++the execution in user space takes place. Simply measuring the interval
++between enqueuing and wakeup may also not appropriate in cases when a
++process is scheduled as a result of a timer expiration. The timer may have
++missed its deadline, e.g. due to disabled interrupts, but this latency
++would not be registered. Therefore, the offsets of missed timers are
++recorded in a separate histogram. If both wakeup latency and missed timer
++offsets are configured and enabled, a third histogram may be enabled that
++records the overall latency as a sum of the timer latency, if any, and the
++wakeup latency. This histogram is called "timerandwakeup".
++- Configuration items (in the Kernel hacking/Tracers submenu)
++ CONFIG_WAKEUP_LATENCY
++ CONFIG_MISSED_TIMER_OFSETS
++
++
++* Usage
++
++The interface to the administration of the latency histograms is located
++in the debugfs file system. To mount it, either enter
++
++mount -t sysfs nodev /sys
++mount -t debugfs nodev /sys/kernel/debug
++
++from shell command line level, or add
++
++nodev /sys sysfs defaults 0 0
++nodev /sys/kernel/debug debugfs defaults 0 0
++
++to the file /etc/fstab. All latency histogram related files are then
++available in the directory /sys/kernel/debug/tracing/latency_hist. A
++particular histogram type is enabled by writing non-zero to the related
++variable in the /sys/kernel/debug/tracing/latency_hist/enable directory.
++Select "preemptirqsoff" for the histograms of potential sources of
++latencies and "wakeup" for histograms of effective latencies etc. The
++histogram data - one per CPU - are available in the files
++
++/sys/kernel/debug/tracing/latency_hist/preemptoff/CPUx
++/sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx
++/sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx
++/sys/kernel/debug/tracing/latency_hist/wakeup/CPUx
++/sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/CPUx
++/sys/kernel/debug/tracing/latency_hist/missed_timer_offsets/CPUx
++/sys/kernel/debug/tracing/latency_hist/timerandwakeup/CPUx
++
++The histograms are reset by writing non-zero to the file "reset" in a
++particular latency directory. To reset all latency data, use
++
++#!/bin/sh
++
++TRACINGDIR=/sys/kernel/debug/tracing
++HISTDIR=$TRACINGDIR/latency_hist
++
++if test -d $HISTDIR
++then
++ cd $HISTDIR
++ for i in `find . | grep /reset$`
++ do
++ echo 1 >$i
++ done
++fi
++
++
++* Data format
++
++Latency data are stored with a resolution of one microsecond. The
++maximum latency is 10,240 microseconds. The data are only valid, if the
++overflow register is empty. Every output line contains the latency in
++microseconds in the first row and the number of samples in the second
++row. To display only lines with a positive latency count, use, for
++example,
++
++grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0
++
++#Minimum latency: 0 microseconds.
++#Average latency: 0 microseconds.
++#Maximum latency: 25 microseconds.
++#Total samples: 3104770694
++#There are 0 samples greater or equal than 10240 microseconds
++#usecs samples
++ 0 2984486876
++ 1 49843506
++ 2 58219047
++ 3 5348126
++ 4 2187960
++ 5 3388262
++ 6 959289
++ 7 208294
++ 8 40420
++ 9 4485
++ 10 14918
++ 11 18340
++ 12 25052
++ 13 19455
++ 14 5602
++ 15 969
++ 16 47
++ 17 18
++ 18 14
++ 19 1
++ 20 3
++ 21 2
++ 22 5
++ 23 2
++ 25 1
++
++
++* Wakeup latency of a selected process
++
++To only collect wakeup latency data of a particular process, write the
++PID of the requested process to
++
++/sys/kernel/debug/tracing/latency_hist/wakeup/pid
++
++PIDs are not considered, if this variable is set to 0.
++
++
++* Details of the process with the highest wakeup latency so far
++
++Selected data of the process that suffered from the highest wakeup
++latency that occurred in a particular CPU are available in the file
++
++/sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx.
++
++In addition, other relevant system data at the time when the
++latency occurred are given.
++
++The format of the data is (all in one line):
++<PID> <Priority> <Latency> (<Timeroffset>) <Command> \
++<- <PID> <Priority> <Command> <Timestamp>
++
++The value of <Timeroffset> is only relevant in the combined timer
++and wakeup latency recording. In the wakeup recording, it is
++always 0, in the missed_timer_offsets recording, it is the same
++as <Latency>.
++
++When retrospectively searching for the origin of a latency and
++tracing was not enabled, it may be helpful to know the name and
++some basic data of the task that (finally) was switching to the
++late real-tlme task. In addition to the victim's data, also the
++data of the possible culprit are therefore displayed after the
++"<-" symbol.
++
++Finally, the timestamp of the time when the latency occurred
++in <seconds>.<microseconds> after the most recent system boot
++is provided.
++
++These data are also reset when the wakeup histogram is reset.
+--- a/include/linux/hrtimer.h
++++ b/include/linux/hrtimer.h
+@@ -111,6 +111,9 @@ struct hrtimer {
+ enum hrtimer_restart (*function)(struct hrtimer *);
+ struct hrtimer_clock_base *base;
+ unsigned long state;
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ ktime_t praecox;
++#endif
+ #ifdef CONFIG_TIMER_STATS
+ int start_pid;
+ void *start_site;
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1753,6 +1753,12 @@ struct task_struct {
+ unsigned long trace;
+ /* bitmask and counter of trace recursion */
+ unsigned long trace_recursion;
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ u64 preempt_timestamp_hist;
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ long timer_offset;
++#endif
++#endif
+ #endif /* CONFIG_TRACING */
+ #ifdef CONFIG_MEMCG
+ struct memcg_oom_info {
+--- /dev/null
++++ b/include/trace/events/hist.h
+@@ -0,0 +1,72 @@
++#undef TRACE_SYSTEM
++#define TRACE_SYSTEM hist
++
++#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ)
++#define _TRACE_HIST_H
++
++#include "latency_hist.h"
++#include <linux/tracepoint.h>
++
++#if !defined(CONFIG_PREEMPT_OFF_HIST) && !defined(CONFIG_INTERRUPT_OFF_HIST)
++#define trace_preemptirqsoff_hist(a, b)
++#else
++TRACE_EVENT(preemptirqsoff_hist,
++
++ TP_PROTO(int reason, int starthist),
++
++ TP_ARGS(reason, starthist),
++
++ TP_STRUCT__entry(
++ __field(int, reason)
++ __field(int, starthist)
++ ),
++
++ TP_fast_assign(
++ __entry->reason = reason;
++ __entry->starthist = starthist;
++ ),
++
++ TP_printk("reason=%s starthist=%s", getaction(__entry->reason),
++ __entry->starthist ? "start" : "stop")
++);
++#endif
++
++#ifndef CONFIG_MISSED_TIMER_OFFSETS_HIST
++#define trace_hrtimer_interrupt(a, b, c, d)
++#else
++TRACE_EVENT(hrtimer_interrupt,
++
++ TP_PROTO(int cpu, long long offset, struct task_struct *curr,
++ struct task_struct *task),
++
++ TP_ARGS(cpu, offset, curr, task),
++
++ TP_STRUCT__entry(
++ __field(int, cpu)
++ __field(long long, offset)
++ __array(char, ccomm, TASK_COMM_LEN)
++ __field(int, cprio)
++ __array(char, tcomm, TASK_COMM_LEN)
++ __field(int, tprio)
++ ),
++
++ TP_fast_assign(
++ __entry->cpu = cpu;
++ __entry->offset = offset;
++ memcpy(__entry->ccomm, curr->comm, TASK_COMM_LEN);
++ __entry->cprio = curr->prio;
++ memcpy(__entry->tcomm, task != NULL ? task->comm : "<none>",
++ task != NULL ? TASK_COMM_LEN : 7);
++ __entry->tprio = task != NULL ? task->prio : -1;
++ ),
++
++ TP_printk("cpu=%d offset=%lld curr=%s[%d] thread=%s[%d]",
++ __entry->cpu, __entry->offset, __entry->ccomm,
++ __entry->cprio, __entry->tcomm, __entry->tprio)
++);
++#endif
++
++#endif /* _TRACE_HIST_H */
++
++/* This part must be outside protection */
++#include <trace/define_trace.h>
+--- /dev/null
++++ b/include/trace/events/latency_hist.h
+@@ -0,0 +1,29 @@
++#ifndef _LATENCY_HIST_H
++#define _LATENCY_HIST_H
++
++enum hist_action {
++ IRQS_ON,
++ PREEMPT_ON,
++ TRACE_STOP,
++ IRQS_OFF,
++ PREEMPT_OFF,
++ TRACE_START,
++};
++
++static char *actions[] = {
++ "IRQS_ON",
++ "PREEMPT_ON",
++ "TRACE_STOP",
++ "IRQS_OFF",
++ "PREEMPT_OFF",
++ "TRACE_START",
++};
++
++static inline char *getaction(int action)
++{
++ if (action >= 0 && action <= sizeof(actions)/sizeof(actions[0]))
++ return actions[action];
++ return "unknown";
++}
++
++#endif /* _LATENCY_HIST_H */
+--- a/kernel/time/hrtimer.c
++++ b/kernel/time/hrtimer.c
+@@ -53,6 +53,7 @@
+ #include <asm/uaccess.h>
+
+ #include <trace/events/timer.h>
++#include <trace/events/hist.h>
+
+ #include "tick-internal.h"
+
+@@ -966,7 +967,16 @@ int __hrtimer_start_range_ns(struct hrti
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
+ timer_stats_hrtimer_set_start_info(timer);
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ {
++ ktime_t now = new_base->get_time();
+
++ if (ktime_to_ns(tim) < ktime_to_ns(now))
++ timer->praecox = now;
++ else
++ timer->praecox = ktime_set(0, 0);
++ }
++#endif
+ leftmost = enqueue_hrtimer(timer, new_base);
+
+ if (!leftmost) {
+@@ -1238,6 +1248,8 @@ static void __run_hrtimer(struct hrtimer
+
+ #ifdef CONFIG_HIGH_RES_TIMERS
+
++static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
++
+ /*
+ * High resolution timer interrupt
+ * Called with interrupts disabled
+@@ -1281,6 +1293,15 @@ void hrtimer_interrupt(struct clock_even
+
+ timer = container_of(node, struct hrtimer, node);
+
++ trace_hrtimer_interrupt(raw_smp_processor_id(),
++ ktime_to_ns(ktime_sub(ktime_to_ns(timer->praecox) ?
++ timer->praecox : hrtimer_get_expires(timer),
++ basenow)),
++ current,
++ timer->function == hrtimer_wakeup ?
++ container_of(timer, struct hrtimer_sleeper,
++ timer)->task : NULL);
++
+ /*
+ * The immediate goal for using the softexpires is
+ * minimizing wakeups, not running timers at the
+--- a/kernel/trace/Kconfig
++++ b/kernel/trace/Kconfig
+@@ -187,6 +187,24 @@ config IRQSOFF_TRACER
+ enabled. This option and the preempt-off timing option can be
+ used together or separately.)
+
++config INTERRUPT_OFF_HIST
++ bool "Interrupts-off Latency Histogram"
++ depends on IRQSOFF_TRACER
++ help
++ This option generates continuously updated histograms (one per cpu)
++ of the duration of time periods with interrupts disabled. The
++ histograms are disabled by default. To enable them, write a non-zero
++ number to
++
++ /sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff
++
++ If PREEMPT_OFF_HIST is also selected, additional histograms (one
++ per cpu) are generated that accumulate the duration of time periods
++ when both interrupts and preemption are disabled. The histogram data
++ will be located in the debug file system at
++
++ /sys/kernel/debug/tracing/latency_hist/irqsoff
++
+ config PREEMPT_TRACER
+ bool "Preemption-off Latency Tracer"
+ default n
+@@ -211,6 +229,24 @@ config PREEMPT_TRACER
+ enabled. This option and the irqs-off timing option can be
+ used together or separately.)
+
++config PREEMPT_OFF_HIST
++ bool "Preemption-off Latency Histogram"
++ depends on PREEMPT_TRACER
++ help
++ This option generates continuously updated histograms (one per cpu)
++ of the duration of time periods with preemption disabled. The
++ histograms are disabled by default. To enable them, write a non-zero
++ number to
++
++ /sys/kernel/debug/tracing/latency_hist/enable/preemptirqsoff
++
++ If INTERRUPT_OFF_HIST is also selected, additional histograms (one
++ per cpu) are generated that accumulate the duration of time periods
++ when both interrupts and preemption are disabled. The histogram data
++ will be located in the debug file system at
++
++ /sys/kernel/debug/tracing/latency_hist/preemptoff
++
+ config SCHED_TRACER
+ bool "Scheduling Latency Tracer"
+ select GENERIC_TRACER
+@@ -221,6 +257,74 @@ config SCHED_TRACER
+ This tracer tracks the latency of the highest priority task
+ to be scheduled in, starting from the point it has woken up.
+
++config WAKEUP_LATENCY_HIST
++ bool "Scheduling Latency Histogram"
++ depends on SCHED_TRACER
++ help
++ This option generates continuously updated histograms (one per cpu)
++ of the scheduling latency of the highest priority task.
++ The histograms are disabled by default. To enable them, write a
++ non-zero number to
++
++ /sys/kernel/debug/tracing/latency_hist/enable/wakeup
++
++ Two different algorithms are used, one to determine the latency of
++ processes that exclusively use the highest priority of the system and
++ another one to determine the latency of processes that share the
++ highest system priority with other processes. The former is used to
++ improve hardware and system software, the latter to optimize the
++ priority design of a given system. The histogram data will be
++ located in the debug file system at
++
++ /sys/kernel/debug/tracing/latency_hist/wakeup
++
++ and
++
++ /sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio
++
++ If both Scheduling Latency Histogram and Missed Timer Offsets
++ Histogram are selected, additional histogram data will be collected
++ that contain, in addition to the wakeup latency, the timer latency, in
++ case the wakeup was triggered by an expired timer. These histograms
++ are available in the
++
++ /sys/kernel/debug/tracing/latency_hist/timerandwakeup
++
++ directory. They reflect the apparent interrupt and scheduling latency
++ and are best suitable to determine the worst-case latency of a given
++ system. To enable these histograms, write a non-zero number to
++
++ /sys/kernel/debug/tracing/latency_hist/enable/timerandwakeup
++
++config MISSED_TIMER_OFFSETS_HIST
++ depends on HIGH_RES_TIMERS
++ select GENERIC_TRACER
++ bool "Missed Timer Offsets Histogram"
++ help
++ Generate a histogram of missed timer offsets in microseconds. The
++ histograms are disabled by default. To enable them, write a non-zero
++ number to
++
++ /sys/kernel/debug/tracing/latency_hist/enable/missed_timer_offsets
++
++ The histogram data will be located in the debug file system at
++
++ /sys/kernel/debug/tracing/latency_hist/missed_timer_offsets
++
++ If both Scheduling Latency Histogram and Missed Timer Offsets
++ Histogram are selected, additional histogram data will be collected
++ that contain, in addition to the wakeup latency, the timer latency, in
++ case the wakeup was triggered by an expired timer. These histograms
++ are available in the
++
++ /sys/kernel/debug/tracing/latency_hist/timerandwakeup
++
++ directory. They reflect the apparent interrupt and scheduling latency
++ and are best suitable to determine the worst-case latency of a given
++ system. To enable these histograms, write a non-zero number to
++
++ /sys/kernel/debug/tracing/latency_hist/enable/timerandwakeup
++
+ config ENABLE_DEFAULT_TRACERS
+ bool "Trace process context switches and events"
+ depends on !GENERIC_TRACER
+--- a/kernel/trace/Makefile
++++ b/kernel/trace/Makefile
+@@ -36,6 +36,10 @@ obj-$(CONFIG_FUNCTION_TRACER) += trace_f
+ obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
+ obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
+ obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
++obj-$(CONFIG_INTERRUPT_OFF_HIST) += latency_hist.o
++obj-$(CONFIG_PREEMPT_OFF_HIST) += latency_hist.o
++obj-$(CONFIG_WAKEUP_LATENCY_HIST) += latency_hist.o
++obj-$(CONFIG_MISSED_TIMER_OFFSETS_HIST) += latency_hist.o
+ obj-$(CONFIG_NOP_TRACER) += trace_nop.o
+ obj-$(CONFIG_STACK_TRACER) += trace_stack.o
+ obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
+--- /dev/null
++++ b/kernel/trace/latency_hist.c
+@@ -0,0 +1,1178 @@
++/*
++ * kernel/trace/latency_hist.c
++ *
++ * Add support for histograms of preemption-off latency and
++ * interrupt-off latency and wakeup latency, it depends on
++ * Real-Time Preemption Support.
++ *
++ * Copyright (C) 2005 MontaVista Software, Inc.
++ * Yi Yang <yyang@ch.mvista.com>
++ *
++ * Converted to work with the new latency tracer.
++ * Copyright (C) 2008 Red Hat, Inc.
++ * Steven Rostedt <srostedt@redhat.com>
++ *
++ */
++#include <linux/module.h>
++#include <linux/debugfs.h>
++#include <linux/seq_file.h>
++#include <linux/percpu.h>
++#include <linux/kallsyms.h>
++#include <linux/uaccess.h>
++#include <linux/sched.h>
++#include <linux/sched/rt.h>
++#include <linux/slab.h>
++#include <linux/atomic.h>
++#include <asm/div64.h>
++
++#include "trace.h"
++#include <trace/events/sched.h>
++
++#define NSECS_PER_USECS 1000L
++
++#define CREATE_TRACE_POINTS
++#include <trace/events/hist.h>
++
++enum {
++ IRQSOFF_LATENCY = 0,
++ PREEMPTOFF_LATENCY,
++ PREEMPTIRQSOFF_LATENCY,
++ WAKEUP_LATENCY,
++ WAKEUP_LATENCY_SHAREDPRIO,
++ MISSED_TIMER_OFFSETS,
++ TIMERANDWAKEUP_LATENCY,
++ MAX_LATENCY_TYPE,
++};
++
++#define MAX_ENTRY_NUM 10240
++
++struct hist_data {
++ atomic_t hist_mode; /* 0 log, 1 don't log */
++ long offset; /* set it to MAX_ENTRY_NUM/2 for a bipolar scale */
++ long min_lat;
++ long max_lat;
++ unsigned long long below_hist_bound_samples;
++ unsigned long long above_hist_bound_samples;
++ long long accumulate_lat;
++ unsigned long long total_samples;
++ unsigned long long hist_array[MAX_ENTRY_NUM];
++};
++
++struct enable_data {
++ int latency_type;
++ int enabled;
++};
++
++static char *latency_hist_dir_root = "latency_hist";
++
++#ifdef CONFIG_INTERRUPT_OFF_HIST
++static DEFINE_PER_CPU(struct hist_data, irqsoff_hist);
++static char *irqsoff_hist_dir = "irqsoff";
++static DEFINE_PER_CPU(cycles_t, hist_irqsoff_start);
++static DEFINE_PER_CPU(int, hist_irqsoff_counting);
++#endif
++
++#ifdef CONFIG_PREEMPT_OFF_HIST
++static DEFINE_PER_CPU(struct hist_data, preemptoff_hist);
++static char *preemptoff_hist_dir = "preemptoff";
++static DEFINE_PER_CPU(cycles_t, hist_preemptoff_start);
++static DEFINE_PER_CPU(int, hist_preemptoff_counting);
++#endif
++
++#if defined(CONFIG_PREEMPT_OFF_HIST) && defined(CONFIG_INTERRUPT_OFF_HIST)
++static DEFINE_PER_CPU(struct hist_data, preemptirqsoff_hist);
++static char *preemptirqsoff_hist_dir = "preemptirqsoff";
++static DEFINE_PER_CPU(cycles_t, hist_preemptirqsoff_start);
++static DEFINE_PER_CPU(int, hist_preemptirqsoff_counting);
++#endif
++
++#if defined(CONFIG_PREEMPT_OFF_HIST) || defined(CONFIG_INTERRUPT_OFF_HIST)
++static notrace void probe_preemptirqsoff_hist(void *v, int reason, int start);
++static struct enable_data preemptirqsoff_enabled_data = {
++ .latency_type = PREEMPTIRQSOFF_LATENCY,
++ .enabled = 0,
++};
++#endif
++
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++struct maxlatproc_data {
++ char comm[FIELD_SIZEOF(struct task_struct, comm)];
++ char current_comm[FIELD_SIZEOF(struct task_struct, comm)];
++ int pid;
++ int current_pid;
++ int prio;
++ int current_prio;
++ long latency;
++ long timeroffset;
++ cycle_t timestamp;
++};
++#endif
++
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++static DEFINE_PER_CPU(struct hist_data, wakeup_latency_hist);
++static DEFINE_PER_CPU(struct hist_data, wakeup_latency_hist_sharedprio);
++static char *wakeup_latency_hist_dir = "wakeup";
++static char *wakeup_latency_hist_dir_sharedprio = "sharedprio";
++static notrace void probe_wakeup_latency_hist_start(void *v,
++ struct task_struct *p, int success);
++static notrace void probe_wakeup_latency_hist_stop(void *v,
++ struct task_struct *prev, struct task_struct *next);
++static notrace void probe_sched_migrate_task(void *,
++ struct task_struct *task, int cpu);
++static struct enable_data wakeup_latency_enabled_data = {
++ .latency_type = WAKEUP_LATENCY,
++ .enabled = 0,
++};
++static DEFINE_PER_CPU(struct maxlatproc_data, wakeup_maxlatproc);
++static DEFINE_PER_CPU(struct maxlatproc_data, wakeup_maxlatproc_sharedprio);
++static DEFINE_PER_CPU(struct task_struct *, wakeup_task);
++static DEFINE_PER_CPU(int, wakeup_sharedprio);
++static unsigned long wakeup_pid;
++#endif
++
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++static DEFINE_PER_CPU(struct hist_data, missed_timer_offsets);
++static char *missed_timer_offsets_dir = "missed_timer_offsets";
++static notrace void probe_hrtimer_interrupt(void *v, int cpu,
++ long long offset, struct task_struct *curr, struct task_struct *task);
++static struct enable_data missed_timer_offsets_enabled_data = {
++ .latency_type = MISSED_TIMER_OFFSETS,
++ .enabled = 0,
++};
++static DEFINE_PER_CPU(struct maxlatproc_data, missed_timer_offsets_maxlatproc);
++static unsigned long missed_timer_offsets_pid;
++#endif
++
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++static DEFINE_PER_CPU(struct hist_data, timerandwakeup_latency_hist);
++static char *timerandwakeup_latency_hist_dir = "timerandwakeup";
++static struct enable_data timerandwakeup_enabled_data = {
++ .latency_type = TIMERANDWAKEUP_LATENCY,
++ .enabled = 0,
++};
++static DEFINE_PER_CPU(struct maxlatproc_data, timerandwakeup_maxlatproc);
++#endif
++
++void notrace latency_hist(int latency_type, int cpu, long latency,
++ long timeroffset, cycle_t stop,
++ struct task_struct *p)
++{
++ struct hist_data *my_hist;
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ struct maxlatproc_data *mp = NULL;
++#endif
++
++ if (!cpu_possible(cpu) || latency_type < 0 ||
++ latency_type >= MAX_LATENCY_TYPE)
++ return;
++
++ switch (latency_type) {
++#ifdef CONFIG_INTERRUPT_OFF_HIST
++ case IRQSOFF_LATENCY:
++ my_hist = &per_cpu(irqsoff_hist, cpu);
++ break;
++#endif
++#ifdef CONFIG_PREEMPT_OFF_HIST
++ case PREEMPTOFF_LATENCY:
++ my_hist = &per_cpu(preemptoff_hist, cpu);
++ break;
++#endif
++#if defined(CONFIG_PREEMPT_OFF_HIST) && defined(CONFIG_INTERRUPT_OFF_HIST)
++ case PREEMPTIRQSOFF_LATENCY:
++ my_hist = &per_cpu(preemptirqsoff_hist, cpu);
++ break;
++#endif
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ case WAKEUP_LATENCY:
++ my_hist = &per_cpu(wakeup_latency_hist, cpu);
++ mp = &per_cpu(wakeup_maxlatproc, cpu);
++ break;
++ case WAKEUP_LATENCY_SHAREDPRIO:
++ my_hist = &per_cpu(wakeup_latency_hist_sharedprio, cpu);
++ mp = &per_cpu(wakeup_maxlatproc_sharedprio, cpu);
++ break;
++#endif
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ case MISSED_TIMER_OFFSETS:
++ my_hist = &per_cpu(missed_timer_offsets, cpu);
++ mp = &per_cpu(missed_timer_offsets_maxlatproc, cpu);
++ break;
++#endif
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ case TIMERANDWAKEUP_LATENCY:
++ my_hist = &per_cpu(timerandwakeup_latency_hist, cpu);
++ mp = &per_cpu(timerandwakeup_maxlatproc, cpu);
++ break;
++#endif
++
++ default:
++ return;
++ }
++
++ latency += my_hist->offset;
++
++ if (atomic_read(&my_hist->hist_mode) == 0)
++ return;
++
++ if (latency < 0 || latency >= MAX_ENTRY_NUM) {
++ if (latency < 0)
++ my_hist->below_hist_bound_samples++;
++ else
++ my_hist->above_hist_bound_samples++;
++ } else
++ my_hist->hist_array[latency]++;
++
++ if (unlikely(latency > my_hist->max_lat ||
++ my_hist->min_lat == LONG_MAX)) {
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ if (latency_type == WAKEUP_LATENCY ||
++ latency_type == WAKEUP_LATENCY_SHAREDPRIO ||
++ latency_type == MISSED_TIMER_OFFSETS ||
++ latency_type == TIMERANDWAKEUP_LATENCY) {
++ strncpy(mp->comm, p->comm, sizeof(mp->comm));
++ strncpy(mp->current_comm, current->comm,
++ sizeof(mp->current_comm));
++ mp->pid = task_pid_nr(p);
++ mp->current_pid = task_pid_nr(current);
++ mp->prio = p->prio;
++ mp->current_prio = current->prio;
++ mp->latency = latency;
++ mp->timeroffset = timeroffset;
++ mp->timestamp = stop;
++ }
++#endif
++ my_hist->max_lat = latency;
++ }
++ if (unlikely(latency < my_hist->min_lat))
++ my_hist->min_lat = latency;
++ my_hist->total_samples++;
++ my_hist->accumulate_lat += latency;
++}
++
++static void *l_start(struct seq_file *m, loff_t *pos)
++{
++ loff_t *index_ptr = NULL;
++ loff_t index = *pos;
++ struct hist_data *my_hist = m->private;
++
++ if (index == 0) {
++ char minstr[32], avgstr[32], maxstr[32];
++
++ atomic_dec(&my_hist->hist_mode);
++
++ if (likely(my_hist->total_samples)) {
++ long avg = (long) div64_s64(my_hist->accumulate_lat,
++ my_hist->total_samples);
++ snprintf(minstr, sizeof(minstr), "%ld",
++ my_hist->min_lat - my_hist->offset);
++ snprintf(avgstr, sizeof(avgstr), "%ld",
++ avg - my_hist->offset);
++ snprintf(maxstr, sizeof(maxstr), "%ld",
++ my_hist->max_lat - my_hist->offset);
++ } else {
++ strcpy(minstr, "<undef>");
++ strcpy(avgstr, minstr);
++ strcpy(maxstr, minstr);
++ }
++
++ seq_printf(m, "#Minimum latency: %s microseconds\n"
++ "#Average latency: %s microseconds\n"
++ "#Maximum latency: %s microseconds\n"
++ "#Total samples: %llu\n"
++ "#There are %llu samples lower than %ld"
++ " microseconds.\n"
++ "#There are %llu samples greater or equal"
++ " than %ld microseconds.\n"
++ "#usecs\t%16s\n",
++ minstr, avgstr, maxstr,
++ my_hist->total_samples,
++ my_hist->below_hist_bound_samples,
++ -my_hist->offset,
++ my_hist->above_hist_bound_samples,
++ MAX_ENTRY_NUM - my_hist->offset,
++ "samples");
++ }
++ if (index < MAX_ENTRY_NUM) {
++ index_ptr = kmalloc(sizeof(loff_t), GFP_KERNEL);
++ if (index_ptr)
++ *index_ptr = index;
++ }
++
++ return index_ptr;
++}
++
++static void *l_next(struct seq_file *m, void *p, loff_t *pos)
++{
++ loff_t *index_ptr = p;
++ struct hist_data *my_hist = m->private;
++
++ if (++*pos >= MAX_ENTRY_NUM) {
++ atomic_inc(&my_hist->hist_mode);
++ return NULL;
++ }
++ *index_ptr = *pos;
++ return index_ptr;
++}
++
++static void l_stop(struct seq_file *m, void *p)
++{
++ kfree(p);
++}
++
++static int l_show(struct seq_file *m, void *p)
++{
++ int index = *(loff_t *) p;
++ struct hist_data *my_hist = m->private;
++
++ seq_printf(m, "%6ld\t%16llu\n", index - my_hist->offset,
++ my_hist->hist_array[index]);
++ return 0;
++}
++
++static const struct seq_operations latency_hist_seq_op = {
++ .start = l_start,
++ .next = l_next,
++ .stop = l_stop,
++ .show = l_show
++};
++
++static int latency_hist_open(struct inode *inode, struct file *file)
++{
++ int ret;
++
++ ret = seq_open(file, &latency_hist_seq_op);
++ if (!ret) {
++ struct seq_file *seq = file->private_data;
++ seq->private = inode->i_private;
++ }
++ return ret;
++}
++
++static const struct file_operations latency_hist_fops = {
++ .open = latency_hist_open,
++ .read = seq_read,
++ .llseek = seq_lseek,
++ .release = seq_release,
++};
++
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++static void clear_maxlatprocdata(struct maxlatproc_data *mp)
++{
++ mp->comm[0] = mp->current_comm[0] = '\0';
++ mp->prio = mp->current_prio = mp->pid = mp->current_pid =
++ mp->latency = mp->timeroffset = -1;
++ mp->timestamp = 0;
++}
++#endif
++
++static void hist_reset(struct hist_data *hist)
++{
++ atomic_dec(&hist->hist_mode);
++
++ memset(hist->hist_array, 0, sizeof(hist->hist_array));
++ hist->below_hist_bound_samples = 0ULL;
++ hist->above_hist_bound_samples = 0ULL;
++ hist->min_lat = LONG_MAX;
++ hist->max_lat = LONG_MIN;
++ hist->total_samples = 0ULL;
++ hist->accumulate_lat = 0LL;
++
++ atomic_inc(&hist->hist_mode);
++}
++
++static ssize_t
++latency_hist_reset(struct file *file, const char __user *a,
++ size_t size, loff_t *off)
++{
++ int cpu;
++ struct hist_data *hist = NULL;
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ struct maxlatproc_data *mp = NULL;
++#endif
++ off_t latency_type = (off_t) file->private_data;
++
++ for_each_online_cpu(cpu) {
++
++ switch (latency_type) {
++#ifdef CONFIG_PREEMPT_OFF_HIST
++ case PREEMPTOFF_LATENCY:
++ hist = &per_cpu(preemptoff_hist, cpu);
++ break;
++#endif
++#ifdef CONFIG_INTERRUPT_OFF_HIST
++ case IRQSOFF_LATENCY:
++ hist = &per_cpu(irqsoff_hist, cpu);
++ break;
++#endif
++#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST)
++ case PREEMPTIRQSOFF_LATENCY:
++ hist = &per_cpu(preemptirqsoff_hist, cpu);
++ break;
++#endif
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ case WAKEUP_LATENCY:
++ hist = &per_cpu(wakeup_latency_hist, cpu);
++ mp = &per_cpu(wakeup_maxlatproc, cpu);
++ break;
++ case WAKEUP_LATENCY_SHAREDPRIO:
++ hist = &per_cpu(wakeup_latency_hist_sharedprio, cpu);
++ mp = &per_cpu(wakeup_maxlatproc_sharedprio, cpu);
++ break;
++#endif
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ case MISSED_TIMER_OFFSETS:
++ hist = &per_cpu(missed_timer_offsets, cpu);
++ mp = &per_cpu(missed_timer_offsets_maxlatproc, cpu);
++ break;
++#endif
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ case TIMERANDWAKEUP_LATENCY:
++ hist = &per_cpu(timerandwakeup_latency_hist, cpu);
++ mp = &per_cpu(timerandwakeup_maxlatproc, cpu);
++ break;
++#endif
++ }
++
++ hist_reset(hist);
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ if (latency_type == WAKEUP_LATENCY ||
++ latency_type == WAKEUP_LATENCY_SHAREDPRIO ||
++ latency_type == MISSED_TIMER_OFFSETS ||
++ latency_type == TIMERANDWAKEUP_LATENCY)
++ clear_maxlatprocdata(mp);
++#endif
++ }
++
++ return size;
++}
++
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++static ssize_t
++show_pid(struct file *file, char __user *ubuf, size_t cnt, loff_t *ppos)
++{
++ char buf[64];
++ int r;
++ unsigned long *this_pid = file->private_data;
++
++ r = snprintf(buf, sizeof(buf), "%lu\n", *this_pid);
++ return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
++}
++
++static ssize_t do_pid(struct file *file, const char __user *ubuf,
++ size_t cnt, loff_t *ppos)
++{
++ char buf[64];
++ unsigned long pid;
++ unsigned long *this_pid = file->private_data;
++
++ if (cnt >= sizeof(buf))
++ return -EINVAL;
++
++ if (copy_from_user(&buf, ubuf, cnt))
++ return -EFAULT;
++
++ buf[cnt] = '\0';
++
++ if (kstrtoul(buf, 10, &pid))
++ return -EINVAL;
++
++ *this_pid = pid;
++
++ return cnt;
++}
++#endif
++
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++static ssize_t
++show_maxlatproc(struct file *file, char __user *ubuf, size_t cnt, loff_t *ppos)
++{
++ int r;
++ struct maxlatproc_data *mp = file->private_data;
++ int strmaxlen = (TASK_COMM_LEN * 2) + (8 * 8);
++ unsigned long long t;
++ unsigned long usecs, secs;
++ char *buf;
++
++ if (mp->pid == -1 || mp->current_pid == -1) {
++ buf = "(none)\n";
++ return simple_read_from_buffer(ubuf, cnt, ppos, buf,
++ strlen(buf));
++ }
++
++ buf = kmalloc(strmaxlen, GFP_KERNEL);
++ if (buf == NULL)
++ return -ENOMEM;
++
++ t = ns2usecs(mp->timestamp);
++ usecs = do_div(t, USEC_PER_SEC);
++ secs = (unsigned long) t;
++ r = snprintf(buf, strmaxlen,
++ "%d %d %ld (%ld) %s <- %d %d %s %lu.%06lu\n", mp->pid,
++ MAX_RT_PRIO-1 - mp->prio, mp->latency, mp->timeroffset, mp->comm,
++ mp->current_pid, MAX_RT_PRIO-1 - mp->current_prio, mp->current_comm,
++ secs, usecs);
++ r = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
++ kfree(buf);
++ return r;
++}
++#endif
++
++static ssize_t
++show_enable(struct file *file, char __user *ubuf, size_t cnt, loff_t *ppos)
++{
++ char buf[64];
++ struct enable_data *ed = file->private_data;
++ int r;
++
++ r = snprintf(buf, sizeof(buf), "%d\n", ed->enabled);
++ return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
++}
++
++static ssize_t
++do_enable(struct file *file, const char __user *ubuf, size_t cnt, loff_t *ppos)
++{
++ char buf[64];
++ long enable;
++ struct enable_data *ed = file->private_data;
++
++ if (cnt >= sizeof(buf))
++ return -EINVAL;
++
++ if (copy_from_user(&buf, ubuf, cnt))
++ return -EFAULT;
++
++ buf[cnt] = 0;
++
++ if (kstrtoul(buf, 10, &enable))
++ return -EINVAL;
++
++ if ((enable && ed->enabled) || (!enable && !ed->enabled))
++ return cnt;
++
++ if (enable) {
++ int ret;
++
++ switch (ed->latency_type) {
++#if defined(CONFIG_INTERRUPT_OFF_HIST) || defined(CONFIG_PREEMPT_OFF_HIST)
++ case PREEMPTIRQSOFF_LATENCY:
++ ret = register_trace_preemptirqsoff_hist(
++ probe_preemptirqsoff_hist, NULL);
++ if (ret) {
++ pr_info("wakeup trace: Couldn't assign "
++ "probe_preemptirqsoff_hist "
++ "to trace_preemptirqsoff_hist\n");
++ return ret;
++ }
++ break;
++#endif
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ case WAKEUP_LATENCY:
++ ret = register_trace_sched_wakeup(
++ probe_wakeup_latency_hist_start, NULL);
++ if (ret) {
++ pr_info("wakeup trace: Couldn't assign "
++ "probe_wakeup_latency_hist_start "
++ "to trace_sched_wakeup\n");
++ return ret;
++ }
++ ret = register_trace_sched_wakeup_new(
++ probe_wakeup_latency_hist_start, NULL);
++ if (ret) {
++ pr_info("wakeup trace: Couldn't assign "
++ "probe_wakeup_latency_hist_start "
++ "to trace_sched_wakeup_new\n");
++ unregister_trace_sched_wakeup(
++ probe_wakeup_latency_hist_start, NULL);
++ return ret;
++ }
++ ret = register_trace_sched_switch(
++ probe_wakeup_latency_hist_stop, NULL);
++ if (ret) {
++ pr_info("wakeup trace: Couldn't assign "
++ "probe_wakeup_latency_hist_stop "
++ "to trace_sched_switch\n");
++ unregister_trace_sched_wakeup(
++ probe_wakeup_latency_hist_start, NULL);
++ unregister_trace_sched_wakeup_new(
++ probe_wakeup_latency_hist_start, NULL);
++ return ret;
++ }
++ ret = register_trace_sched_migrate_task(
++ probe_sched_migrate_task, NULL);
++ if (ret) {
++ pr_info("wakeup trace: Couldn't assign "
++ "probe_sched_migrate_task "
++ "to trace_sched_migrate_task\n");
++ unregister_trace_sched_wakeup(
++ probe_wakeup_latency_hist_start, NULL);
++ unregister_trace_sched_wakeup_new(
++ probe_wakeup_latency_hist_start, NULL);
++ unregister_trace_sched_switch(
++ probe_wakeup_latency_hist_stop, NULL);
++ return ret;
++ }
++ break;
++#endif
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ case MISSED_TIMER_OFFSETS:
++ ret = register_trace_hrtimer_interrupt(
++ probe_hrtimer_interrupt, NULL);
++ if (ret) {
++ pr_info("wakeup trace: Couldn't assign "
++ "probe_hrtimer_interrupt "
++ "to trace_hrtimer_interrupt\n");
++ return ret;
++ }
++ break;
++#endif
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ case TIMERANDWAKEUP_LATENCY:
++ if (!wakeup_latency_enabled_data.enabled ||
++ !missed_timer_offsets_enabled_data.enabled)
++ return -EINVAL;
++ break;
++#endif
++ default:
++ break;
++ }
++ } else {
++ switch (ed->latency_type) {
++#if defined(CONFIG_INTERRUPT_OFF_HIST) || defined(CONFIG_PREEMPT_OFF_HIST)
++ case PREEMPTIRQSOFF_LATENCY:
++ {
++ int cpu;
++
++ unregister_trace_preemptirqsoff_hist(
++ probe_preemptirqsoff_hist, NULL);
++ for_each_online_cpu(cpu) {
++#ifdef CONFIG_INTERRUPT_OFF_HIST
++ per_cpu(hist_irqsoff_counting,
++ cpu) = 0;
++#endif
++#ifdef CONFIG_PREEMPT_OFF_HIST
++ per_cpu(hist_preemptoff_counting,
++ cpu) = 0;
++#endif
++#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST)
++ per_cpu(hist_preemptirqsoff_counting,
++ cpu) = 0;
++#endif
++ }
++ }
++ break;
++#endif
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ case WAKEUP_LATENCY:
++ {
++ int cpu;
++
++ unregister_trace_sched_wakeup(
++ probe_wakeup_latency_hist_start, NULL);
++ unregister_trace_sched_wakeup_new(
++ probe_wakeup_latency_hist_start, NULL);
++ unregister_trace_sched_switch(
++ probe_wakeup_latency_hist_stop, NULL);
++ unregister_trace_sched_migrate_task(
++ probe_sched_migrate_task, NULL);
++
++ for_each_online_cpu(cpu) {
++ per_cpu(wakeup_task, cpu) = NULL;
++ per_cpu(wakeup_sharedprio, cpu) = 0;
++ }
++ }
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ timerandwakeup_enabled_data.enabled = 0;
++#endif
++ break;
++#endif
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ case MISSED_TIMER_OFFSETS:
++ unregister_trace_hrtimer_interrupt(
++ probe_hrtimer_interrupt, NULL);
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ timerandwakeup_enabled_data.enabled = 0;
++#endif
++ break;
++#endif
++ default:
++ break;
++ }
++ }
++ ed->enabled = enable;
++ return cnt;
++}
++
++static const struct file_operations latency_hist_reset_fops = {
++ .open = tracing_open_generic,
++ .write = latency_hist_reset,
++};
++
++static const struct file_operations enable_fops = {
++ .open = tracing_open_generic,
++ .read = show_enable,
++ .write = do_enable,
++};
++
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++static const struct file_operations pid_fops = {
++ .open = tracing_open_generic,
++ .read = show_pid,
++ .write = do_pid,
++};
++
++static const struct file_operations maxlatproc_fops = {
++ .open = tracing_open_generic,
++ .read = show_maxlatproc,
++};
++#endif
++
++#if defined(CONFIG_INTERRUPT_OFF_HIST) || defined(CONFIG_PREEMPT_OFF_HIST)
++static notrace void probe_preemptirqsoff_hist(void *v, int reason,
++ int starthist)
++{
++ int cpu = raw_smp_processor_id();
++ int time_set = 0;
++
++ if (starthist) {
++ cycle_t uninitialized_var(start);
++
++ if (!preempt_count() && !irqs_disabled())
++ return;
++
++#ifdef CONFIG_INTERRUPT_OFF_HIST
++ if ((reason == IRQS_OFF || reason == TRACE_START) &&
++ !per_cpu(hist_irqsoff_counting, cpu)) {
++ per_cpu(hist_irqsoff_counting, cpu) = 1;
++ start = ftrace_now(cpu);
++ time_set++;
++ per_cpu(hist_irqsoff_start, cpu) = start;
++ }
++#endif
++
++#ifdef CONFIG_PREEMPT_OFF_HIST
++ if ((reason == PREEMPT_OFF || reason == TRACE_START) &&
++ !per_cpu(hist_preemptoff_counting, cpu)) {
++ per_cpu(hist_preemptoff_counting, cpu) = 1;
++ if (!(time_set++))
++ start = ftrace_now(cpu);
++ per_cpu(hist_preemptoff_start, cpu) = start;
++ }
++#endif
++
++#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST)
++ if (per_cpu(hist_irqsoff_counting, cpu) &&
++ per_cpu(hist_preemptoff_counting, cpu) &&
++ !per_cpu(hist_preemptirqsoff_counting, cpu)) {
++ per_cpu(hist_preemptirqsoff_counting, cpu) = 1;
++ if (!time_set)
++ start = ftrace_now(cpu);
++ per_cpu(hist_preemptirqsoff_start, cpu) = start;
++ }
++#endif
++ } else {
++ cycle_t uninitialized_var(stop);
++
++#ifdef CONFIG_INTERRUPT_OFF_HIST
++ if ((reason == IRQS_ON || reason == TRACE_STOP) &&
++ per_cpu(hist_irqsoff_counting, cpu)) {
++ cycle_t start = per_cpu(hist_irqsoff_start, cpu);
++
++ stop = ftrace_now(cpu);
++ time_set++;
++ if (start) {
++ long latency = ((long) (stop - start)) /
++ NSECS_PER_USECS;
++
++ latency_hist(IRQSOFF_LATENCY, cpu, latency, 0,
++ stop, NULL);
++ }
++ per_cpu(hist_irqsoff_counting, cpu) = 0;
++ }
++#endif
++
++#ifdef CONFIG_PREEMPT_OFF_HIST
++ if ((reason == PREEMPT_ON || reason == TRACE_STOP) &&
++ per_cpu(hist_preemptoff_counting, cpu)) {
++ cycle_t start = per_cpu(hist_preemptoff_start, cpu);
++
++ if (!(time_set++))
++ stop = ftrace_now(cpu);
++ if (start) {
++ long latency = ((long) (stop - start)) /
++ NSECS_PER_USECS;
++
++ latency_hist(PREEMPTOFF_LATENCY, cpu, latency,
++ 0, stop, NULL);
++ }
++ per_cpu(hist_preemptoff_counting, cpu) = 0;
++ }
++#endif
++
++#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST)
++ if ((!per_cpu(hist_irqsoff_counting, cpu) ||
++ !per_cpu(hist_preemptoff_counting, cpu)) &&
++ per_cpu(hist_preemptirqsoff_counting, cpu)) {
++ cycle_t start = per_cpu(hist_preemptirqsoff_start, cpu);
++
++ if (!time_set)
++ stop = ftrace_now(cpu);
++ if (start) {
++ long latency = ((long) (stop - start)) /
++ NSECS_PER_USECS;
++
++ latency_hist(PREEMPTIRQSOFF_LATENCY, cpu,
++ latency, 0, stop, NULL);
++ }
++ per_cpu(hist_preemptirqsoff_counting, cpu) = 0;
++ }
++#endif
++ }
++}
++#endif
++
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++static DEFINE_RAW_SPINLOCK(wakeup_lock);
++static notrace void probe_sched_migrate_task(void *v, struct task_struct *task,
++ int cpu)
++{
++ int old_cpu = task_cpu(task);
++
++ if (cpu != old_cpu) {
++ unsigned long flags;
++ struct task_struct *cpu_wakeup_task;
++
++ raw_spin_lock_irqsave(&wakeup_lock, flags);
++
++ cpu_wakeup_task = per_cpu(wakeup_task, old_cpu);
++ if (task == cpu_wakeup_task) {
++ put_task_struct(cpu_wakeup_task);
++ per_cpu(wakeup_task, old_cpu) = NULL;
++ cpu_wakeup_task = per_cpu(wakeup_task, cpu) = task;
++ get_task_struct(cpu_wakeup_task);
++ }
++
++ raw_spin_unlock_irqrestore(&wakeup_lock, flags);
++ }
++}
++
++static notrace void probe_wakeup_latency_hist_start(void *v,
++ struct task_struct *p, int success)
++{
++ unsigned long flags;
++ struct task_struct *curr = current;
++ int cpu = task_cpu(p);
++ struct task_struct *cpu_wakeup_task;
++
++ raw_spin_lock_irqsave(&wakeup_lock, flags);
++
++ cpu_wakeup_task = per_cpu(wakeup_task, cpu);
++
++ if (wakeup_pid) {
++ if ((cpu_wakeup_task && p->prio == cpu_wakeup_task->prio) ||
++ p->prio == curr->prio)
++ per_cpu(wakeup_sharedprio, cpu) = 1;
++ if (likely(wakeup_pid != task_pid_nr(p)))
++ goto out;
++ } else {
++ if (likely(!rt_task(p)) ||
++ (cpu_wakeup_task && p->prio > cpu_wakeup_task->prio) ||
++ p->prio > curr->prio)
++ goto out;
++ if ((cpu_wakeup_task && p->prio == cpu_wakeup_task->prio) ||
++ p->prio == curr->prio)
++ per_cpu(wakeup_sharedprio, cpu) = 1;
++ }
++
++ if (cpu_wakeup_task)
++ put_task_struct(cpu_wakeup_task);
++ cpu_wakeup_task = per_cpu(wakeup_task, cpu) = p;
++ get_task_struct(cpu_wakeup_task);
++ cpu_wakeup_task->preempt_timestamp_hist =
++ ftrace_now(raw_smp_processor_id());
++out:
++ raw_spin_unlock_irqrestore(&wakeup_lock, flags);
++}
++
++static notrace void probe_wakeup_latency_hist_stop(void *v,
++ struct task_struct *prev, struct task_struct *next)
++{
++ unsigned long flags;
++ int cpu = task_cpu(next);
++ long latency;
++ cycle_t stop;
++ struct task_struct *cpu_wakeup_task;
++
++ raw_spin_lock_irqsave(&wakeup_lock, flags);
++
++ cpu_wakeup_task = per_cpu(wakeup_task, cpu);
++
++ if (cpu_wakeup_task == NULL)
++ goto out;
++
++ /* Already running? */
++ if (unlikely(current == cpu_wakeup_task))
++ goto out_reset;
++
++ if (next != cpu_wakeup_task) {
++ if (next->prio < cpu_wakeup_task->prio)
++ goto out_reset;
++
++ if (next->prio == cpu_wakeup_task->prio)
++ per_cpu(wakeup_sharedprio, cpu) = 1;
++
++ goto out;
++ }
++
++ if (current->prio == cpu_wakeup_task->prio)
++ per_cpu(wakeup_sharedprio, cpu) = 1;
++
++ /*
++ * The task we are waiting for is about to be switched to.
++ * Calculate latency and store it in histogram.
++ */
++ stop = ftrace_now(raw_smp_processor_id());
++
++ latency = ((long) (stop - next->preempt_timestamp_hist)) /
++ NSECS_PER_USECS;
++
++ if (per_cpu(wakeup_sharedprio, cpu)) {
++ latency_hist(WAKEUP_LATENCY_SHAREDPRIO, cpu, latency, 0, stop,
++ next);
++ per_cpu(wakeup_sharedprio, cpu) = 0;
++ } else {
++ latency_hist(WAKEUP_LATENCY, cpu, latency, 0, stop, next);
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ if (timerandwakeup_enabled_data.enabled) {
++ latency_hist(TIMERANDWAKEUP_LATENCY, cpu,
++ next->timer_offset + latency, next->timer_offset,
++ stop, next);
++ }
++#endif
++ }
++
++out_reset:
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ next->timer_offset = 0;
++#endif
++ put_task_struct(cpu_wakeup_task);
++ per_cpu(wakeup_task, cpu) = NULL;
++out:
++ raw_spin_unlock_irqrestore(&wakeup_lock, flags);
++}
++#endif
++
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++static notrace void probe_hrtimer_interrupt(void *v, int cpu,
++ long long latency_ns, struct task_struct *curr,
++ struct task_struct *task)
++{
++ if (latency_ns <= 0 && task != NULL && rt_task(task) &&
++ (task->prio < curr->prio ||
++ (task->prio == curr->prio &&
++ !cpumask_test_cpu(cpu, &task->cpus_allowed)))) {
++ long latency;
++ cycle_t now;
++
++ if (missed_timer_offsets_pid) {
++ if (likely(missed_timer_offsets_pid !=
++ task_pid_nr(task)))
++ return;
++ }
++
++ now = ftrace_now(cpu);
++ latency = (long) div_s64(-latency_ns, NSECS_PER_USECS);
++ latency_hist(MISSED_TIMER_OFFSETS, cpu, latency, latency, now,
++ task);
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ task->timer_offset = latency;
++#endif
++ }
++}
++#endif
++
++static __init int latency_hist_init(void)
++{
++ struct dentry *latency_hist_root = NULL;
++ struct dentry *dentry;
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ struct dentry *dentry_sharedprio;
++#endif
++ struct dentry *entry;
++ struct dentry *enable_root;
++ int i = 0;
++ struct hist_data *my_hist;
++ char name[64];
++ char *cpufmt = "CPU%d";
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ char *cpufmt_maxlatproc = "max_latency-CPU%d";
++ struct maxlatproc_data *mp = NULL;
++#endif
++
++ dentry = tracing_init_dentry();
++ latency_hist_root = debugfs_create_dir(latency_hist_dir_root, dentry);
++ enable_root = debugfs_create_dir("enable", latency_hist_root);
++
++#ifdef CONFIG_INTERRUPT_OFF_HIST
++ dentry = debugfs_create_dir(irqsoff_hist_dir, latency_hist_root);
++ for_each_possible_cpu(i) {
++ sprintf(name, cpufmt, i);
++ entry = debugfs_create_file(name, 0444, dentry,
++ &per_cpu(irqsoff_hist, i), &latency_hist_fops);
++ my_hist = &per_cpu(irqsoff_hist, i);
++ atomic_set(&my_hist->hist_mode, 1);
++ my_hist->min_lat = LONG_MAX;
++ }
++ entry = debugfs_create_file("reset", 0644, dentry,
++ (void *)IRQSOFF_LATENCY, &latency_hist_reset_fops);
++#endif
++
++#ifdef CONFIG_PREEMPT_OFF_HIST
++ dentry = debugfs_create_dir(preemptoff_hist_dir,
++ latency_hist_root);
++ for_each_possible_cpu(i) {
++ sprintf(name, cpufmt, i);
++ entry = debugfs_create_file(name, 0444, dentry,
++ &per_cpu(preemptoff_hist, i), &latency_hist_fops);
++ my_hist = &per_cpu(preemptoff_hist, i);
++ atomic_set(&my_hist->hist_mode, 1);
++ my_hist->min_lat = LONG_MAX;
++ }
++ entry = debugfs_create_file("reset", 0644, dentry,
++ (void *)PREEMPTOFF_LATENCY, &latency_hist_reset_fops);
++#endif
++
++#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST)
++ dentry = debugfs_create_dir(preemptirqsoff_hist_dir,
++ latency_hist_root);
++ for_each_possible_cpu(i) {
++ sprintf(name, cpufmt, i);
++ entry = debugfs_create_file(name, 0444, dentry,
++ &per_cpu(preemptirqsoff_hist, i), &latency_hist_fops);
++ my_hist = &per_cpu(preemptirqsoff_hist, i);
++ atomic_set(&my_hist->hist_mode, 1);
++ my_hist->min_lat = LONG_MAX;
++ }
++ entry = debugfs_create_file("reset", 0644, dentry,
++ (void *)PREEMPTIRQSOFF_LATENCY, &latency_hist_reset_fops);
++#endif
++
++#if defined(CONFIG_INTERRUPT_OFF_HIST) || defined(CONFIG_PREEMPT_OFF_HIST)
++ entry = debugfs_create_file("preemptirqsoff", 0644,
++ enable_root, (void *)&preemptirqsoff_enabled_data,
++ &enable_fops);
++#endif
++
++#ifdef CONFIG_WAKEUP_LATENCY_HIST
++ dentry = debugfs_create_dir(wakeup_latency_hist_dir,
++ latency_hist_root);
++ dentry_sharedprio = debugfs_create_dir(
++ wakeup_latency_hist_dir_sharedprio, dentry);
++ for_each_possible_cpu(i) {
++ sprintf(name, cpufmt, i);
++
++ entry = debugfs_create_file(name, 0444, dentry,
++ &per_cpu(wakeup_latency_hist, i),
++ &latency_hist_fops);
++ my_hist = &per_cpu(wakeup_latency_hist, i);
++ atomic_set(&my_hist->hist_mode, 1);
++ my_hist->min_lat = LONG_MAX;
++
++ entry = debugfs_create_file(name, 0444, dentry_sharedprio,
++ &per_cpu(wakeup_latency_hist_sharedprio, i),
++ &latency_hist_fops);
++ my_hist = &per_cpu(wakeup_latency_hist_sharedprio, i);
++ atomic_set(&my_hist->hist_mode, 1);
++ my_hist->min_lat = LONG_MAX;
++
++ sprintf(name, cpufmt_maxlatproc, i);
++
++ mp = &per_cpu(wakeup_maxlatproc, i);
++ entry = debugfs_create_file(name, 0444, dentry, mp,
++ &maxlatproc_fops);
++ clear_maxlatprocdata(mp);
++
++ mp = &per_cpu(wakeup_maxlatproc_sharedprio, i);
++ entry = debugfs_create_file(name, 0444, dentry_sharedprio, mp,
++ &maxlatproc_fops);
++ clear_maxlatprocdata(mp);
++ }
++ entry = debugfs_create_file("pid", 0644, dentry,
++ (void *)&wakeup_pid, &pid_fops);
++ entry = debugfs_create_file("reset", 0644, dentry,
++ (void *)WAKEUP_LATENCY, &latency_hist_reset_fops);
++ entry = debugfs_create_file("reset", 0644, dentry_sharedprio,
++ (void *)WAKEUP_LATENCY_SHAREDPRIO, &latency_hist_reset_fops);
++ entry = debugfs_create_file("wakeup", 0644,
++ enable_root, (void *)&wakeup_latency_enabled_data,
++ &enable_fops);
++#endif
++
++#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
++ dentry = debugfs_create_dir(missed_timer_offsets_dir,
++ latency_hist_root);
++ for_each_possible_cpu(i) {
++ sprintf(name, cpufmt, i);
++ entry = debugfs_create_file(name, 0444, dentry,
++ &per_cpu(missed_timer_offsets, i), &latency_hist_fops);
++ my_hist = &per_cpu(missed_timer_offsets, i);
++ atomic_set(&my_hist->hist_mode, 1);
++ my_hist->min_lat = LONG_MAX;
++
++ sprintf(name, cpufmt_maxlatproc, i);
++ mp = &per_cpu(missed_timer_offsets_maxlatproc, i);
++ entry = debugfs_create_file(name, 0444, dentry, mp,
++ &maxlatproc_fops);
++ clear_maxlatprocdata(mp);
++ }
++ entry = debugfs_create_file("pid", 0644, dentry,
++ (void *)&missed_timer_offsets_pid, &pid_fops);
++ entry = debugfs_create_file("reset", 0644, dentry,
++ (void *)MISSED_TIMER_OFFSETS, &latency_hist_reset_fops);
++ entry = debugfs_create_file("missed_timer_offsets", 0644,
++ enable_root, (void *)&missed_timer_offsets_enabled_data,
++ &enable_fops);
++#endif
++
++#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \
++ defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
++ dentry = debugfs_create_dir(timerandwakeup_latency_hist_dir,
++ latency_hist_root);
++ for_each_possible_cpu(i) {
++ sprintf(name, cpufmt, i);
++ entry = debugfs_create_file(name, 0444, dentry,
++ &per_cpu(timerandwakeup_latency_hist, i),
++ &latency_hist_fops);
++ my_hist = &per_cpu(timerandwakeup_latency_hist, i);
++ atomic_set(&my_hist->hist_mode, 1);
++ my_hist->min_lat = LONG_MAX;
++
++ sprintf(name, cpufmt_maxlatproc, i);
++ mp = &per_cpu(timerandwakeup_maxlatproc, i);
++ entry = debugfs_create_file(name, 0444, dentry, mp,
++ &maxlatproc_fops);
++ clear_maxlatprocdata(mp);
++ }
++ entry = debugfs_create_file("reset", 0644, dentry,
++ (void *)TIMERANDWAKEUP_LATENCY, &latency_hist_reset_fops);
++ entry = debugfs_create_file("timerandwakeup", 0644,
++ enable_root, (void *)&timerandwakeup_enabled_data,
++ &enable_fops);
++#endif
++ return 0;
++}
++
++device_initcall(latency_hist_init);
+--- a/kernel/trace/trace_irqsoff.c
++++ b/kernel/trace/trace_irqsoff.c
+@@ -13,6 +13,7 @@
+ #include <linux/uaccess.h>
+ #include <linux/module.h>
+ #include <linux/ftrace.h>
++#include <trace/events/hist.h>
+
+ #include "trace.h"
+
+@@ -433,11 +434,13 @@ void start_critical_timings(void)
+ {
+ if (preempt_trace() || irq_trace())
+ start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
++ trace_preemptirqsoff_hist(TRACE_START, 1);
+ }
+ EXPORT_SYMBOL_GPL(start_critical_timings);
+
+ void stop_critical_timings(void)
+ {
++ trace_preemptirqsoff_hist(TRACE_STOP, 0);
+ if (preempt_trace() || irq_trace())
+ stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+ }
+@@ -447,6 +450,7 @@ EXPORT_SYMBOL_GPL(stop_critical_timings)
+ #ifdef CONFIG_PROVE_LOCKING
+ void time_hardirqs_on(unsigned long a0, unsigned long a1)
+ {
++ trace_preemptirqsoff_hist(IRQS_ON, 0);
+ if (!preempt_trace() && irq_trace())
+ stop_critical_timing(a0, a1);
+ }
+@@ -455,6 +459,7 @@ void time_hardirqs_off(unsigned long a0,
+ {
+ if (!preempt_trace() && irq_trace())
+ start_critical_timing(a0, a1);
++ trace_preemptirqsoff_hist(IRQS_OFF, 1);
+ }
+
+ #else /* !CONFIG_PROVE_LOCKING */
+@@ -480,6 +485,7 @@ inline void print_irqtrace_events(struct
+ */
+ void trace_hardirqs_on(void)
+ {
++ trace_preemptirqsoff_hist(IRQS_ON, 0);
+ if (!preempt_trace() && irq_trace())
+ stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+ }
+@@ -489,11 +495,13 @@ void trace_hardirqs_off(void)
+ {
+ if (!preempt_trace() && irq_trace())
+ start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
++ trace_preemptirqsoff_hist(IRQS_OFF, 1);
+ }
+ EXPORT_SYMBOL(trace_hardirqs_off);
+
+ __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+ {
++ trace_preemptirqsoff_hist(IRQS_ON, 0);
+ if (!preempt_trace() && irq_trace())
+ stop_critical_timing(CALLER_ADDR0, caller_addr);
+ }
+@@ -503,6 +511,7 @@ EXPORT_SYMBOL(trace_hardirqs_on_caller);
+ {
+ if (!preempt_trace() && irq_trace())
+ start_critical_timing(CALLER_ADDR0, caller_addr);
++ trace_preemptirqsoff_hist(IRQS_OFF, 1);
+ }
+ EXPORT_SYMBOL(trace_hardirqs_off_caller);
+
+@@ -512,12 +521,14 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller)
+ #ifdef CONFIG_PREEMPT_TRACER
+ void trace_preempt_on(unsigned long a0, unsigned long a1)
+ {
++ trace_preemptirqsoff_hist(PREEMPT_ON, 0);
+ if (preempt_trace() && !irq_trace())
+ stop_critical_timing(a0, a1);
+ }
+
+ void trace_preempt_off(unsigned long a0, unsigned long a1)
+ {
++ trace_preemptirqsoff_hist(PREEMPT_ON, 1);
+ if (preempt_trace() && !irq_trace())
+ start_critical_timing(a0, a1);
+ }
diff --git a/patches/leds-trigger-disable-CPU-trigger-on-RT.patch b/patches/leds-trigger-disable-CPU-trigger-on-RT.patch
new file mode 100644
index 00000000000000..b275ed2a2c1296
--- /dev/null
+++ b/patches/leds-trigger-disable-CPU-trigger-on-RT.patch
@@ -0,0 +1,35 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 23 Jan 2014 14:45:59 +0100
+Subject: leds: trigger: disable CPU trigger on -RT
+
+as it triggers:
+|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
+|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
+|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
+|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
+|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
+|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
+|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
+|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
+|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
+|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
+|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
+|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
+
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/leds/trigger/Kconfig | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/leds/trigger/Kconfig
++++ b/drivers/leds/trigger/Kconfig
+@@ -61,7 +61,7 @@ config LEDS_TRIGGER_BACKLIGHT
+
+ config LEDS_TRIGGER_CPU
+ bool "LED CPU Trigger"
+- depends on LEDS_TRIGGERS
++ depends on LEDS_TRIGGERS && !PREEMPT_RT_BASE
+ help
+ This allows LEDs to be controlled by active CPUs. This shows
+ the active CPUs across an array of LEDs so you can see which
diff --git a/patches/lglocks-rt.patch b/patches/lglocks-rt.patch
new file mode 100644
index 00000000000000..2998b6da439637
--- /dev/null
+++ b/patches/lglocks-rt.patch
@@ -0,0 +1,182 @@
+Subject: lglocks: Provide a RT safe variant
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 15 Jun 2011 11:02:21 +0200
+
+lglocks by itself will spin in order to get the lock. This will end up
+badly if a task with the highest priority keeps spinning while a task
+with the lowest priority owns the lock.
+
+Lets replace them with rt_mutex based locks so they can sleep, track
+owner and boost if needed.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/lglock.h | 21 ++++++++++++++++--
+ kernel/locking/lglock.c | 54 ++++++++++++++++++++++++++++++++----------------
+ 2 files changed, 55 insertions(+), 20 deletions(-)
+
+--- a/include/linux/lglock.h
++++ b/include/linux/lglock.h
+@@ -34,22 +34,39 @@
+ #endif
+
+ struct lglock {
++#ifndef CONFIG_PREEMPT_RT_FULL
+ arch_spinlock_t __percpu *lock;
++#else
++ struct rt_mutex __percpu *lock;
++#endif
+ #ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lock_class_key lock_key;
+ struct lockdep_map lock_dep_map;
+ #endif
+ };
+
+-#define DEFINE_LGLOCK(name) \
++#ifndef CONFIG_PREEMPT_RT_FULL
++# define DEFINE_LGLOCK(name) \
+ static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \
+ = __ARCH_SPIN_LOCK_UNLOCKED; \
+ struct lglock name = { .lock = &name ## _lock }
+
+-#define DEFINE_STATIC_LGLOCK(name) \
++# define DEFINE_STATIC_LGLOCK(name) \
+ static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \
+ = __ARCH_SPIN_LOCK_UNLOCKED; \
+ static struct lglock name = { .lock = &name ## _lock }
++#else
++
++# define DEFINE_LGLOCK(name) \
++ static DEFINE_PER_CPU(struct rt_mutex, name ## _lock) \
++ = __RT_MUTEX_INITIALIZER( name ## _lock); \
++ struct lglock name = { .lock = &name ## _lock }
++
++# define DEFINE_STATIC_LGLOCK(name) \
++ static DEFINE_PER_CPU(struct rt_mutex, name ## _lock) \
++ = __RT_MUTEX_INITIALIZER( name ## _lock); \
++ static struct lglock name = { .lock = &name ## _lock }
++#endif
+
+ void lg_lock_init(struct lglock *lg, char *name);
+ void lg_local_lock(struct lglock *lg);
+--- a/kernel/locking/lglock.c
++++ b/kernel/locking/lglock.c
+@@ -4,6 +4,15 @@
+ #include <linux/cpu.h>
+ #include <linux/string.h>
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++# define lg_lock_ptr arch_spinlock_t
++# define lg_do_lock(l) arch_spin_lock(l)
++# define lg_do_unlock(l) arch_spin_unlock(l)
++#else
++# define lg_lock_ptr struct rt_mutex
++# define lg_do_lock(l) __rt_spin_lock(l)
++# define lg_do_unlock(l) __rt_spin_unlock(l)
++#endif
+ /*
+ * Note there is no uninit, so lglocks cannot be defined in
+ * modules (but it's fine to use them from there)
+@@ -12,51 +21,60 @@
+
+ void lg_lock_init(struct lglock *lg, char *name)
+ {
++#ifdef CONFIG_PREEMPT_RT_FULL
++ int i;
++
++ for_each_possible_cpu(i) {
++ struct rt_mutex *lock = per_cpu_ptr(lg->lock, i);
++
++ rt_mutex_init(lock);
++ }
++#endif
+ LOCKDEP_INIT_MAP(&lg->lock_dep_map, name, &lg->lock_key, 0);
+ }
+ EXPORT_SYMBOL(lg_lock_init);
+
+ void lg_local_lock(struct lglock *lg)
+ {
+- arch_spinlock_t *lock;
++ lg_lock_ptr *lock;
+
+- preempt_disable();
++ migrate_disable();
+ lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
+ lock = this_cpu_ptr(lg->lock);
+- arch_spin_lock(lock);
++ lg_do_lock(lock);
+ }
+ EXPORT_SYMBOL(lg_local_lock);
+
+ void lg_local_unlock(struct lglock *lg)
+ {
+- arch_spinlock_t *lock;
++ lg_lock_ptr *lock;
+
+ lock_release(&lg->lock_dep_map, 1, _RET_IP_);
+ lock = this_cpu_ptr(lg->lock);
+- arch_spin_unlock(lock);
+- preempt_enable();
++ lg_do_unlock(lock);
++ migrate_enable();
+ }
+ EXPORT_SYMBOL(lg_local_unlock);
+
+ void lg_local_lock_cpu(struct lglock *lg, int cpu)
+ {
+- arch_spinlock_t *lock;
++ lg_lock_ptr *lock;
+
+- preempt_disable();
++ preempt_disable_nort();
+ lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
+ lock = per_cpu_ptr(lg->lock, cpu);
+- arch_spin_lock(lock);
++ lg_do_lock(lock);
+ }
+ EXPORT_SYMBOL(lg_local_lock_cpu);
+
+ void lg_local_unlock_cpu(struct lglock *lg, int cpu)
+ {
+- arch_spinlock_t *lock;
++ lg_lock_ptr *lock;
+
+ lock_release(&lg->lock_dep_map, 1, _RET_IP_);
+ lock = per_cpu_ptr(lg->lock, cpu);
+- arch_spin_unlock(lock);
+- preempt_enable();
++ lg_do_unlock(lock);
++ preempt_enable_nort();
+ }
+ EXPORT_SYMBOL(lg_local_unlock_cpu);
+
+@@ -64,12 +82,12 @@ void lg_global_lock(struct lglock *lg)
+ {
+ int i;
+
+- preempt_disable();
++ preempt_disable_nort();
+ lock_acquire_exclusive(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
+ for_each_possible_cpu(i) {
+- arch_spinlock_t *lock;
++ lg_lock_ptr *lock;
+ lock = per_cpu_ptr(lg->lock, i);
+- arch_spin_lock(lock);
++ lg_do_lock(lock);
+ }
+ }
+ EXPORT_SYMBOL(lg_global_lock);
+@@ -80,10 +98,10 @@ void lg_global_unlock(struct lglock *lg)
+
+ lock_release(&lg->lock_dep_map, 1, _RET_IP_);
+ for_each_possible_cpu(i) {
+- arch_spinlock_t *lock;
++ lg_lock_ptr *lock;
+ lock = per_cpu_ptr(lg->lock, i);
+- arch_spin_unlock(lock);
++ lg_do_unlock(lock);
+ }
+- preempt_enable();
++ preempt_enable_nort();
+ }
+ EXPORT_SYMBOL(lg_global_unlock);
diff --git a/patches/list_bl.h-make-list-head-locking-RT-safe.patch b/patches/list_bl.h-make-list-head-locking-RT-safe.patch
new file mode 100644
index 00000000000000..b5c3490b6f2cad
--- /dev/null
+++ b/patches/list_bl.h-make-list-head-locking-RT-safe.patch
@@ -0,0 +1,114 @@
+From: Paul Gortmaker <paul.gortmaker@windriver.com>
+Date: Fri, 21 Jun 2013 15:07:25 -0400
+Subject: list_bl: Make list head locking RT safe
+
+As per changes in include/linux/jbd_common.h for avoiding the
+bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
+head lock rt safe") we do the same thing here.
+
+We use the non atomic __set_bit and __clear_bit inside the scope of
+the lock to preserve the ability of the existing LIST_DEBUG code to
+use the zero'th bit in the sanity checks.
+
+As a bit spinlock, we had no lockdep visibility into the usage
+of the list head locking. Now, if we were to implement it as a
+standard non-raw spinlock, we would see:
+
+BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
+in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
+5 locks held by udevd/122:
+ #0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
+ #1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
+ #2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
+ #3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
+ #4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
+Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
+Call Trace:
+ [<ffffffff810b9624>] __might_sleep+0x134/0x1f0
+ [<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
+ [<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
+ [<ffffffff811a1b2d>] __d_drop+0x1d/0x40
+ [<ffffffff811a24be>] __d_move+0x8e/0x320
+ [<ffffffff811a278e>] d_move+0x3e/0x60
+ [<ffffffff81199598>] vfs_rename+0x198/0x4c0
+ [<ffffffff8119b093>] sys_renameat+0x213/0x240
+ [<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
+ [<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
+ [<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
+ [<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
+ [<ffffffff8119b0db>] sys_rename+0x1b/0x20
+ [<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
+
+Since we are only taking the lock during short lived list operations,
+lets assume for now that it being raw won't be a significant latency
+concern.
+
+
+Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/list_bl.h | 28 ++++++++++++++++++++++++++--
+ 1 file changed, 26 insertions(+), 2 deletions(-)
+
+--- a/include/linux/list_bl.h
++++ b/include/linux/list_bl.h
+@@ -2,6 +2,7 @@
+ #define _LINUX_LIST_BL_H
+
+ #include <linux/list.h>
++#include <linux/spinlock.h>
+ #include <linux/bit_spinlock.h>
+
+ /*
+@@ -32,13 +33,22 @@
+
+ struct hlist_bl_head {
+ struct hlist_bl_node *first;
++#ifdef CONFIG_PREEMPT_RT_BASE
++ raw_spinlock_t lock;
++#endif
+ };
+
+ struct hlist_bl_node {
+ struct hlist_bl_node *next, **pprev;
+ };
+-#define INIT_HLIST_BL_HEAD(ptr) \
+- ((ptr)->first = NULL)
++
++static inline void INIT_HLIST_BL_HEAD(struct hlist_bl_head *h)
++{
++ h->first = NULL;
++#ifdef CONFIG_PREEMPT_RT_BASE
++ raw_spin_lock_init(&h->lock);
++#endif
++}
+
+ static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h)
+ {
+@@ -117,12 +127,26 @@ static inline void hlist_bl_del_init(str
+
+ static inline void hlist_bl_lock(struct hlist_bl_head *b)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ bit_spin_lock(0, (unsigned long *)b);
++#else
++ raw_spin_lock(&b->lock);
++#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
++ __set_bit(0, (unsigned long *)b);
++#endif
++#endif
+ }
+
+ static inline void hlist_bl_unlock(struct hlist_bl_head *b)
+ {
++#ifndef CONFIG_PREEMPT_RT_BASE
+ __bit_spin_unlock(0, (unsigned long *)b);
++#else
++#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
++ __clear_bit(0, (unsigned long *)b);
++#endif
++ raw_spin_unlock(&b->lock);
++#endif
+ }
+
+ static inline bool hlist_bl_is_locked(struct hlist_bl_head *b)
diff --git a/patches/local-irq-rt-depending-variants.patch b/patches/local-irq-rt-depending-variants.patch
new file mode 100644
index 00000000000000..4fac6c69dfeb2c
--- /dev/null
+++ b/patches/local-irq-rt-depending-variants.patch
@@ -0,0 +1,52 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 21 Jul 2009 22:34:14 +0200
+Subject: rt: local_irq_* variants depending on RT/!RT
+
+Add local_irq_*_(no)rt variant which are mainly used to break
+interrupt disabled sections on PREEMPT_RT or to explicitely disable
+interrupts on PREEMPT_RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/interrupt.h | 2 +-
+ include/linux/irqflags.h | 19 +++++++++++++++++++
+ 2 files changed, 20 insertions(+), 1 deletion(-)
+
+--- a/include/linux/interrupt.h
++++ b/include/linux/interrupt.h
+@@ -184,7 +184,7 @@ extern void devm_free_irq(struct device
+ #ifdef CONFIG_LOCKDEP
+ # define local_irq_enable_in_hardirq() do { } while (0)
+ #else
+-# define local_irq_enable_in_hardirq() local_irq_enable()
++# define local_irq_enable_in_hardirq() local_irq_enable_nort()
+ #endif
+
+ extern void disable_irq_nosync(unsigned int irq);
+--- a/include/linux/irqflags.h
++++ b/include/linux/irqflags.h
+@@ -148,4 +148,23 @@
+
+ #define irqs_disabled_flags(flags) raw_irqs_disabled_flags(flags)
+
++/*
++ * local_irq* variants depending on RT/!RT
++ */
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define local_irq_disable_nort() do { } while (0)
++# define local_irq_enable_nort() do { } while (0)
++# define local_irq_save_nort(flags) local_save_flags(flags)
++# define local_irq_restore_nort(flags) (void)(flags)
++# define local_irq_disable_rt() local_irq_disable()
++# define local_irq_enable_rt() local_irq_enable()
++#else
++# define local_irq_disable_nort() local_irq_disable()
++# define local_irq_enable_nort() local_irq_enable()
++# define local_irq_save_nort(flags) local_irq_save(flags)
++# define local_irq_restore_nort(flags) local_irq_restore(flags)
++# define local_irq_disable_rt() do { } while (0)
++# define local_irq_enable_rt() do { } while (0)
++#endif
++
+ #endif
diff --git a/patches/localversion.patch b/patches/localversion.patch
new file mode 100644
index 00000000000000..d3b21a746fdb5b
--- /dev/null
+++ b/patches/localversion.patch
@@ -0,0 +1,15 @@
+Subject: localversion: Add RT specific localversion file
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 08 Jul 2011 20:25:16 +0200
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Link: http://lkml.kernel.org/n/tip-8vdw4bfcsds27cvox6rpb334@git.kernel.org
+---
+ localversion-rt | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- /dev/null
++++ b/localversion-rt
+@@ -0,0 +1 @@
++-rt1
diff --git a/patches/lockdep-no-softirq-accounting-on-rt.patch b/patches/lockdep-no-softirq-accounting-on-rt.patch
new file mode 100644
index 00000000000000..9404763acf4a4f
--- /dev/null
+++ b/patches/lockdep-no-softirq-accounting-on-rt.patch
@@ -0,0 +1,58 @@
+Subject: lockdep: Make it RT aware
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 18:51:23 +0200
+
+teach lockdep that we don't really do softirqs on -RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/irqflags.h | 10 +++++++---
+ kernel/locking/lockdep.c | 2 ++
+ 2 files changed, 9 insertions(+), 3 deletions(-)
+
+--- a/include/linux/irqflags.h
++++ b/include/linux/irqflags.h
+@@ -25,8 +25,6 @@
+ # define trace_softirqs_enabled(p) ((p)->softirqs_enabled)
+ # define trace_hardirq_enter() do { current->hardirq_context++; } while (0)
+ # define trace_hardirq_exit() do { current->hardirq_context--; } while (0)
+-# define lockdep_softirq_enter() do { current->softirq_context++; } while (0)
+-# define lockdep_softirq_exit() do { current->softirq_context--; } while (0)
+ # define INIT_TRACE_IRQFLAGS .softirqs_enabled = 1,
+ #else
+ # define trace_hardirqs_on() do { } while (0)
+@@ -39,9 +37,15 @@
+ # define trace_softirqs_enabled(p) 0
+ # define trace_hardirq_enter() do { } while (0)
+ # define trace_hardirq_exit() do { } while (0)
++# define INIT_TRACE_IRQFLAGS
++#endif
++
++#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PREEMPT_RT_FULL)
++# define lockdep_softirq_enter() do { current->softirq_context++; } while (0)
++# define lockdep_softirq_exit() do { current->softirq_context--; } while (0)
++#else
+ # define lockdep_softirq_enter() do { } while (0)
+ # define lockdep_softirq_exit() do { } while (0)
+-# define INIT_TRACE_IRQFLAGS
+ #endif
+
+ #if defined(CONFIG_IRQSOFF_TRACER) || \
+--- a/kernel/locking/lockdep.c
++++ b/kernel/locking/lockdep.c
+@@ -3563,6 +3563,7 @@ static void check_flags(unsigned long fl
+ }
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /*
+ * We dont accurately track softirq state in e.g.
+ * hardirq contexts (such as on 4KSTACKS), so only
+@@ -3577,6 +3578,7 @@ static void check_flags(unsigned long fl
+ DEBUG_LOCKS_WARN_ON(!current->softirqs_enabled);
+ }
+ }
++#endif
+
+ if (!debug_locks)
+ print_irqtrace_events(current);
diff --git a/patches/lockdep-selftest-fix-warnings-due-to-missing-PREEMPT.patch b/patches/lockdep-selftest-fix-warnings-due-to-missing-PREEMPT.patch
new file mode 100644
index 00000000000000..3f503ad0e7a3bf
--- /dev/null
+++ b/patches/lockdep-selftest-fix-warnings-due-to-missing-PREEMPT.patch
@@ -0,0 +1,141 @@
+From: Josh Cartwright <josh.cartwright@ni.com>
+Date: Wed, 28 Jan 2015 13:08:45 -0600
+Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
+
+"lockdep: Selftest: Only do hardirq context test for raw spinlock"
+disabled the execution of certain tests with PREEMPT_RT_FULL, but did
+not prevent the tests from still being defined. This leads to warnings
+like:
+
+ ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
+ ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
+ ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
+ ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
+ ./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
+ ...
+
+Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT_FULL
+conditionals.
+
+
+Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
+Signed-off-by: Xander Huff <xander.huff@ni.com>
+Acked-by: Gratian Crisan <gratian.crisan@ni.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ lib/locking-selftest.c | 27 +++++++++++++++++++++++++++
+ 1 file changed, 27 insertions(+)
+
+--- a/lib/locking-selftest.c
++++ b/lib/locking-selftest.c
+@@ -590,6 +590,8 @@ GENERATE_TESTCASE(init_held_rsem)
+ #include "locking-selftest-spin-hardirq.h"
+ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_hard_spin)
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ #include "locking-selftest-rlock-hardirq.h"
+ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_hard_rlock)
+
+@@ -605,9 +607,12 @@ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_
+ #include "locking-selftest-wlock-softirq.h"
+ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_soft_wlock)
+
++#endif
++
+ #undef E1
+ #undef E2
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /*
+ * Enabling hardirqs with a softirq-safe lock held:
+ */
+@@ -640,6 +645,8 @@ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2A
+ #undef E1
+ #undef E2
+
++#endif
++
+ /*
+ * Enabling irqs with an irq-safe lock held:
+ */
+@@ -663,6 +670,8 @@ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2A
+ #include "locking-selftest-spin-hardirq.h"
+ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_hard_spin)
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ #include "locking-selftest-rlock-hardirq.h"
+ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_hard_rlock)
+
+@@ -678,6 +687,8 @@ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B
+ #include "locking-selftest-wlock-softirq.h"
+ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_soft_wlock)
+
++#endif
++
+ #undef E1
+ #undef E2
+
+@@ -709,6 +720,8 @@ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B
+ #include "locking-selftest-spin-hardirq.h"
+ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_hard_spin)
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ #include "locking-selftest-rlock-hardirq.h"
+ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_hard_rlock)
+
+@@ -724,6 +737,8 @@ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_
+ #include "locking-selftest-wlock-softirq.h"
+ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_soft_wlock)
+
++#endif
++
+ #undef E1
+ #undef E2
+ #undef E3
+@@ -757,6 +772,8 @@ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_
+ #include "locking-selftest-spin-hardirq.h"
+ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_hard_spin)
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ #include "locking-selftest-rlock-hardirq.h"
+ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_hard_rlock)
+
+@@ -772,10 +789,14 @@ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_
+ #include "locking-selftest-wlock-softirq.h"
+ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_soft_wlock)
+
++#endif
++
+ #undef E1
+ #undef E2
+ #undef E3
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ /*
+ * read-lock / write-lock irq inversion.
+ *
+@@ -838,6 +859,10 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_inver
+ #undef E2
+ #undef E3
+
++#endif
++
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ /*
+ * read-lock / write-lock recursion that is actually safe.
+ */
+@@ -876,6 +901,8 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_
+ #undef E2
+ #undef E3
+
++#endif
++
+ /*
+ * read-lock / write-lock recursion that is unsafe.
+ */
diff --git a/patches/lockdep-selftest-only-do-hardirq-context-test-for-raw-spinlock.patch b/patches/lockdep-selftest-only-do-hardirq-context-test-for-raw-spinlock.patch
new file mode 100644
index 00000000000000..9e71bad35c2cfa
--- /dev/null
+++ b/patches/lockdep-selftest-only-do-hardirq-context-test-for-raw-spinlock.patch
@@ -0,0 +1,56 @@
+Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
+From: Yong Zhang <yong.zhang0@gmail.com>
+Date: Mon, 16 Apr 2012 15:01:56 +0800
+
+From: Yong Zhang <yong.zhang@windriver.com>
+
+On -rt there is no softirq context any more and rwlock is sleepable,
+disable softirq context test and rwlock+irq test.
+
+Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
+Cc: Yong Zhang <yong.zhang@windriver.com>
+Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ lib/locking-selftest.c | 23 +++++++++++++++++++++++
+ 1 file changed, 23 insertions(+)
+
+--- a/lib/locking-selftest.c
++++ b/lib/locking-selftest.c
+@@ -1858,6 +1858,7 @@ void locking_selftest(void)
+
+ printk(" --------------------------------------------------------------------------\n");
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /*
+ * irq-context testcases:
+ */
+@@ -1870,6 +1871,28 @@ void locking_selftest(void)
+
+ DO_TESTCASE_6x2("irq read-recursion", irq_read_recursion);
+ // DO_TESTCASE_6x2B("irq read-recursion #2", irq_read_recursion2);
++#else
++ /* On -rt, we only do hardirq context test for raw spinlock */
++ DO_TESTCASE_1B("hard-irqs-on + irq-safe-A", irqsafe1_hard_spin, 12);
++ DO_TESTCASE_1B("hard-irqs-on + irq-safe-A", irqsafe1_hard_spin, 21);
++
++ DO_TESTCASE_1B("hard-safe-A + irqs-on", irqsafe2B_hard_spin, 12);
++ DO_TESTCASE_1B("hard-safe-A + irqs-on", irqsafe2B_hard_spin, 21);
++
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 123);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 132);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 213);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 231);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 312);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 321);
++
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 123);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 132);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 213);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 231);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 312);
++ DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 321);
++#endif
+
+ ww_tests();
+
diff --git a/patches/md-disable-bcache.patch b/patches/md-disable-bcache.patch
new file mode 100644
index 00000000000000..fdfe5933603df6
--- /dev/null
+++ b/patches/md-disable-bcache.patch
@@ -0,0 +1,31 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 29 Aug 2013 11:48:57 +0200
+Subject: md: disable bcache
+
+It uses anon semaphores
+|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
+|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
+| up_read_non_owner(&dc->writeback_lock);
+| ^
+|drivers/md/bcache/request.c: In function ‘request_write’:
+|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
+| down_read_non_owner(&dc->writeback_lock);
+| ^
+
+either we get rid of those or we have to introduce them…
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/md/bcache/Kconfig | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/drivers/md/bcache/Kconfig
++++ b/drivers/md/bcache/Kconfig
+@@ -1,6 +1,7 @@
+
+ config BCACHE
+ tristate "Block device as cache"
++ depends on !PREEMPT_RT_FULL
+ ---help---
+ Allows a block device to be used as cache for other devices; uses
+ a btree for indexing and the layout is optimized for SSDs.
diff --git a/patches/md-raid5-percpu-handling-rt-aware.patch b/patches/md-raid5-percpu-handling-rt-aware.patch
new file mode 100644
index 00000000000000..2476e3ef32d46f
--- /dev/null
+++ b/patches/md-raid5-percpu-handling-rt-aware.patch
@@ -0,0 +1,61 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 6 Apr 2010 16:51:31 +0200
+Subject: md: raid5: Make raid5_percpu handling RT aware
+
+__raid_run_ops() disables preemption with get_cpu() around the access
+to the raid5_percpu variables. That causes scheduling while atomic
+spews on RT.
+
+Serialize the access to the percpu data with a lock and keep the code
+preemptible.
+
+Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
+
+---
+ drivers/md/raid5.c | 7 +++++--
+ drivers/md/raid5.h | 1 +
+ 2 files changed, 6 insertions(+), 2 deletions(-)
+
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -1918,8 +1918,9 @@ static void raid_run_ops(struct stripe_h
+ struct raid5_percpu *percpu;
+ unsigned long cpu;
+
+- cpu = get_cpu();
++ cpu = get_cpu_light();
+ percpu = per_cpu_ptr(conf->percpu, cpu);
++ spin_lock(&percpu->lock);
+ if (test_bit(STRIPE_OP_BIOFILL, &ops_request)) {
+ ops_run_biofill(sh);
+ overlap_clear++;
+@@ -1975,7 +1976,8 @@ static void raid_run_ops(struct stripe_h
+ if (test_and_clear_bit(R5_Overlap, &dev->flags))
+ wake_up(&sh->raid_conf->wait_for_overlap);
+ }
+- put_cpu();
++ spin_unlock(&percpu->lock);
++ put_cpu_light();
+ }
+
+ static struct stripe_head *alloc_stripe(struct kmem_cache *sc, gfp_t gfp)
+@@ -6350,6 +6352,7 @@ static int raid5_alloc_percpu(struct r5c
+ __func__, cpu);
+ break;
+ }
++ spin_lock_init(&per_cpu_ptr(conf->percpu, cpu)->lock);
+ }
+ put_online_cpus();
+
+--- a/drivers/md/raid5.h
++++ b/drivers/md/raid5.h
+@@ -494,6 +494,7 @@ struct r5conf {
+ int recovery_disabled;
+ /* per cpu variables */
+ struct raid5_percpu {
++ spinlock_t lock; /* Protection for -RT */
+ struct page *spare_page; /* Used when checking P/Q in raid6 */
+ struct flex_array *scribble; /* space for constructing buffer
+ * lists and performing address
diff --git a/patches/mips-disable-highmem-on-rt.patch b/patches/mips-disable-highmem-on-rt.patch
new file mode 100644
index 00000000000000..3b42cc91eff963
--- /dev/null
+++ b/patches/mips-disable-highmem-on-rt.patch
@@ -0,0 +1,22 @@
+Subject: mips: Disable highmem on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 18 Jul 2011 17:10:12 +0200
+
+The current highmem handling on -RT is not compatible and needs fixups.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/mips/Kconfig | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/mips/Kconfig
++++ b/arch/mips/Kconfig
+@@ -2365,7 +2365,7 @@ config CPU_R4400_WORKAROUNDS
+ #
+ config HIGHMEM
+ bool "High Memory Support"
+- depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA
++ depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA && !PREEMPT_RT_FULL
+
+ config CPU_SUPPORTS_HIGHMEM
+ bool
diff --git a/patches/mm-bounce-local-irq-save-nort.patch b/patches/mm-bounce-local-irq-save-nort.patch
new file mode 100644
index 00000000000000..fb0f5b2b573a8d
--- /dev/null
+++ b/patches/mm-bounce-local-irq-save-nort.patch
@@ -0,0 +1,27 @@
+Subject: mm: bounce: Use local_irq_save_nort
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 09 Jan 2013 10:33:09 +0100
+
+kmap_atomic() is preemptible on RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ block/bounce.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/block/bounce.c
++++ b/block/bounce.c
+@@ -54,11 +54,11 @@ static void bounce_copy_vec(struct bio_v
+ unsigned long flags;
+ unsigned char *vto;
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ vto = kmap_atomic(to->bv_page);
+ memcpy(vto + to->bv_offset, vfrom, to->bv_len);
+ kunmap_atomic(vto);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+
+ #else /* CONFIG_HIGHMEM */
diff --git a/patches/mm-convert-swap-to-percpu-locked.patch b/patches/mm-convert-swap-to-percpu-locked.patch
new file mode 100644
index 00000000000000..aec16dd380badf
--- /dev/null
+++ b/patches/mm-convert-swap-to-percpu-locked.patch
@@ -0,0 +1,134 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:51 -0500
+Subject: mm/swap: Convert to percpu locked
+
+Replace global locks (get_cpu + local_irq_save) with "local_locks()".
+Currently there is one of for "rotate" and one for "swap".
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ mm/swap.c | 34 ++++++++++++++++++++--------------
+ 1 file changed, 20 insertions(+), 14 deletions(-)
+
+--- a/mm/swap.c
++++ b/mm/swap.c
+@@ -32,6 +32,7 @@
+ #include <linux/gfp.h>
+ #include <linux/uio.h>
+ #include <linux/hugetlb.h>
++#include <linux/locallock.h>
+
+ #include "internal.h"
+
+@@ -45,6 +46,9 @@ static DEFINE_PER_CPU(struct pagevec, lr
+ static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
+ static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
+
++static DEFINE_LOCAL_IRQ_LOCK(rotate_lock);
++static DEFINE_LOCAL_IRQ_LOCK(swapvec_lock);
++
+ /*
+ * This path almost never happens for VM activity - pages are normally
+ * freed via pagevecs. But it gets used by networking.
+@@ -481,11 +485,11 @@ void rotate_reclaimable_page(struct page
+ unsigned long flags;
+
+ page_cache_get(page);
+- local_irq_save(flags);
++ local_lock_irqsave(rotate_lock, flags);
+ pvec = this_cpu_ptr(&lru_rotate_pvecs);
+ if (!pagevec_add(pvec, page))
+ pagevec_move_tail(pvec);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(rotate_lock, flags);
+ }
+ }
+
+@@ -536,12 +540,13 @@ static bool need_activate_page_drain(int
+ void activate_page(struct page *page)
+ {
+ if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
+- struct pagevec *pvec = &get_cpu_var(activate_page_pvecs);
++ struct pagevec *pvec = &get_locked_var(swapvec_lock,
++ activate_page_pvecs);
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ pagevec_lru_move_fn(pvec, __activate_page, NULL);
+- put_cpu_var(activate_page_pvecs);
++ put_locked_var(swapvec_lock, activate_page_pvecs);
+ }
+ }
+
+@@ -567,7 +572,7 @@ void activate_page(struct page *page)
+
+ static void __lru_cache_activate_page(struct page *page)
+ {
+- struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
++ struct pagevec *pvec = &get_locked_var(swapvec_lock, lru_add_pvec);
+ int i;
+
+ /*
+@@ -589,7 +594,7 @@ static void __lru_cache_activate_page(st
+ }
+ }
+
+- put_cpu_var(lru_add_pvec);
++ put_locked_var(swapvec_lock, lru_add_pvec);
+ }
+
+ /*
+@@ -628,13 +633,13 @@ EXPORT_SYMBOL(mark_page_accessed);
+
+ static void __lru_cache_add(struct page *page)
+ {
+- struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
++ struct pagevec *pvec = &get_locked_var(swapvec_lock, lru_add_pvec);
+
+ page_cache_get(page);
+ if (!pagevec_space(pvec))
+ __pagevec_lru_add(pvec);
+ pagevec_add(pvec, page);
+- put_cpu_var(lru_add_pvec);
++ put_locked_var(swapvec_lock, lru_add_pvec);
+ }
+
+ /**
+@@ -814,9 +819,9 @@ void lru_add_drain_cpu(int cpu)
+ unsigned long flags;
+
+ /* No harm done if a racing interrupt already did this */
+- local_irq_save(flags);
++ local_lock_irqsave(rotate_lock, flags);
+ pagevec_move_tail(pvec);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(rotate_lock, flags);
+ }
+
+ pvec = &per_cpu(lru_deactivate_file_pvecs, cpu);
+@@ -844,18 +849,19 @@ void deactivate_file_page(struct page *p
+ return;
+
+ if (likely(get_page_unless_zero(page))) {
+- struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs);
++ struct pagevec *pvec = &get_locked_var(swapvec_lock,
++ lru_deactivate_file_pvecs);
+
+ if (!pagevec_add(pvec, page))
+ pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
+- put_cpu_var(lru_deactivate_file_pvecs);
++ put_locked_var(swapvec_lock, lru_deactivate_file_pvecs);
+ }
+ }
+
+ void lru_add_drain(void)
+ {
+- lru_add_drain_cpu(get_cpu());
+- put_cpu();
++ lru_add_drain_cpu(local_lock_cpu(swapvec_lock));
++ local_unlock_cpu(swapvec_lock);
+ }
+
+ static void lru_add_drain_per_cpu(struct work_struct *dummy)
diff --git a/patches/mm-disable-sloub-rt.patch b/patches/mm-disable-sloub-rt.patch
new file mode 100644
index 00000000000000..bbeab6257d117b
--- /dev/null
+++ b/patches/mm-disable-sloub-rt.patch
@@ -0,0 +1,31 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:44:03 -0500
+Subject: mm: Allow only slub on RT
+
+Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ init/Kconfig | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -1688,6 +1688,7 @@ choice
+
+ config SLAB
+ bool "SLAB"
++ depends on !PREEMPT_RT_FULL
+ help
+ The regular slab allocator that is established and known to work
+ well in all environments. It organizes cache hot objects in
+@@ -1706,6 +1707,7 @@ config SLUB
+ config SLOB
+ depends on EXPERT
+ bool "SLOB (Simple Allocator)"
++ depends on !PREEMPT_RT_FULL
+ help
+ SLOB replaces the stock allocator with a drastically simpler
+ allocator. SLOB is generally more space efficient but
diff --git a/patches/mm-enable-slub.patch b/patches/mm-enable-slub.patch
new file mode 100644
index 00000000000000..3a1e45bb4c687e
--- /dev/null
+++ b/patches/mm-enable-slub.patch
@@ -0,0 +1,394 @@
+Subject: mm: Enable SLUB for RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 25 Oct 2012 10:32:35 +0100
+
+Make SLUB RT aware by converting locks to raw and using free lists to
+move the freeing out of the lock held region.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ mm/slab.h | 4 ++
+ mm/slub.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++---------------
+ 2 files changed, 95 insertions(+), 27 deletions(-)
+
+--- a/mm/slab.h
++++ b/mm/slab.h
+@@ -330,7 +330,11 @@ static inline struct kmem_cache *cache_f
+ * The slab lists for all objects.
+ */
+ struct kmem_cache_node {
++#ifdef CONFIG_SLUB
++ raw_spinlock_t list_lock;
++#else
+ spinlock_t list_lock;
++#endif
+
+ #ifdef CONFIG_SLAB
+ struct list_head slabs_partial; /* partial list first, better asm code */
+--- a/mm/slub.c
++++ b/mm/slub.c
+@@ -1069,7 +1069,7 @@ static noinline struct kmem_cache_node *
+ {
+ struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+
+- spin_lock_irqsave(&n->list_lock, *flags);
++ raw_spin_lock_irqsave(&n->list_lock, *flags);
+ slab_lock(page);
+
+ if (!check_slab(s, page))
+@@ -1116,7 +1116,7 @@ static noinline struct kmem_cache_node *
+
+ fail:
+ slab_unlock(page);
+- spin_unlock_irqrestore(&n->list_lock, *flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, *flags);
+ slab_fix(s, "Object at 0x%p not freed", object);
+ return NULL;
+ }
+@@ -1242,6 +1242,12 @@ static inline void dec_slabs_node(struct
+
+ #endif /* CONFIG_SLUB_DEBUG */
+
++struct slub_free_list {
++ raw_spinlock_t lock;
++ struct list_head list;
++};
++static DEFINE_PER_CPU(struct slub_free_list, slub_free_list);
++
+ /*
+ * Hooks for other subsystems that check memory allocations. In a typical
+ * production configuration these hooks all should produce no code at all.
+@@ -1352,7 +1358,11 @@ static struct page *allocate_slab(struct
+
+ flags &= gfp_allowed_mask;
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++ if (system_state == SYSTEM_RUNNING)
++#else
+ if (flags & __GFP_WAIT)
++#endif
+ local_irq_enable();
+
+ flags |= s->allocflags;
+@@ -1421,7 +1431,11 @@ static struct page *allocate_slab(struct
+ page->frozen = 1;
+
+ out:
++#ifdef CONFIG_PREEMPT_RT_FULL
++ if (system_state == SYSTEM_RUNNING)
++#else
+ if (flags & __GFP_WAIT)
++#endif
+ local_irq_disable();
+ if (!page)
+ return NULL;
+@@ -1478,6 +1492,16 @@ static void __free_slab(struct kmem_cach
+ memcg_uncharge_slab(s, order);
+ }
+
++static void free_delayed(struct list_head *h)
++{
++ while(!list_empty(h)) {
++ struct page *page = list_first_entry(h, struct page, lru);
++
++ list_del(&page->lru);
++ __free_slab(page->slab_cache, page);
++ }
++}
++
+ #define need_reserve_slab_rcu \
+ (sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
+
+@@ -1512,6 +1536,12 @@ static void free_slab(struct kmem_cache
+ }
+
+ call_rcu(head, rcu_free_slab);
++ } else if (irqs_disabled()) {
++ struct slub_free_list *f = this_cpu_ptr(&slub_free_list);
++
++ raw_spin_lock(&f->lock);
++ list_add(&page->lru, &f->list);
++ raw_spin_unlock(&f->lock);
+ } else
+ __free_slab(s, page);
+ }
+@@ -1625,7 +1655,7 @@ static void *get_partial_node(struct kme
+ if (!n || !n->nr_partial)
+ return NULL;
+
+- spin_lock(&n->list_lock);
++ raw_spin_lock(&n->list_lock);
+ list_for_each_entry_safe(page, page2, &n->partial, lru) {
+ void *t;
+
+@@ -1650,7 +1680,7 @@ static void *get_partial_node(struct kme
+ break;
+
+ }
+- spin_unlock(&n->list_lock);
++ raw_spin_unlock(&n->list_lock);
+ return object;
+ }
+
+@@ -1896,7 +1926,7 @@ static void deactivate_slab(struct kmem_
+ * that acquire_slab() will see a slab page that
+ * is frozen
+ */
+- spin_lock(&n->list_lock);
++ raw_spin_lock(&n->list_lock);
+ }
+ } else {
+ m = M_FULL;
+@@ -1907,7 +1937,7 @@ static void deactivate_slab(struct kmem_
+ * slabs from diagnostic functions will not see
+ * any frozen slabs.
+ */
+- spin_lock(&n->list_lock);
++ raw_spin_lock(&n->list_lock);
+ }
+ }
+
+@@ -1942,7 +1972,7 @@ static void deactivate_slab(struct kmem_
+ goto redo;
+
+ if (lock)
+- spin_unlock(&n->list_lock);
++ raw_spin_unlock(&n->list_lock);
+
+ if (m == M_FREE) {
+ stat(s, DEACTIVATE_EMPTY);
+@@ -1974,10 +2004,10 @@ static void unfreeze_partials(struct kme
+ n2 = get_node(s, page_to_nid(page));
+ if (n != n2) {
+ if (n)
+- spin_unlock(&n->list_lock);
++ raw_spin_unlock(&n->list_lock);
+
+ n = n2;
+- spin_lock(&n->list_lock);
++ raw_spin_lock(&n->list_lock);
+ }
+
+ do {
+@@ -2006,7 +2036,7 @@ static void unfreeze_partials(struct kme
+ }
+
+ if (n)
+- spin_unlock(&n->list_lock);
++ raw_spin_unlock(&n->list_lock);
+
+ while (discard_page) {
+ page = discard_page;
+@@ -2045,14 +2075,21 @@ static void put_cpu_partial(struct kmem_
+ pobjects = oldpage->pobjects;
+ pages = oldpage->pages;
+ if (drain && pobjects > s->cpu_partial) {
++ struct slub_free_list *f;
+ unsigned long flags;
++ LIST_HEAD(tofree);
+ /*
+ * partial array is full. Move the existing
+ * set to the per node partial list.
+ */
+ local_irq_save(flags);
+ unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
++ f = this_cpu_ptr(&slub_free_list);
++ raw_spin_lock(&f->lock);
++ list_splice_init(&f->list, &tofree);
++ raw_spin_unlock(&f->lock);
+ local_irq_restore(flags);
++ free_delayed(&tofree);
+ oldpage = NULL;
+ pobjects = 0;
+ pages = 0;
+@@ -2124,7 +2161,22 @@ static bool has_cpu_slab(int cpu, void *
+
+ static void flush_all(struct kmem_cache *s)
+ {
++ LIST_HEAD(tofree);
++ int cpu;
++
+ on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1, GFP_ATOMIC);
++ for_each_online_cpu(cpu) {
++ struct slub_free_list *f;
++
++ if (!has_cpu_slab(cpu, s))
++ continue;
++
++ f = &per_cpu(slub_free_list, cpu);
++ raw_spin_lock_irq(&f->lock);
++ list_splice_init(&f->list, &tofree);
++ raw_spin_unlock_irq(&f->lock);
++ free_delayed(&tofree);
++ }
+ }
+
+ /*
+@@ -2160,10 +2212,10 @@ static unsigned long count_partial(struc
+ unsigned long x = 0;
+ struct page *page;
+
+- spin_lock_irqsave(&n->list_lock, flags);
++ raw_spin_lock_irqsave(&n->list_lock, flags);
+ list_for_each_entry(page, &n->partial, lru)
+ x += get_count(page);
+- spin_unlock_irqrestore(&n->list_lock, flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, flags);
+ return x;
+ }
+ #endif /* CONFIG_SLUB_DEBUG || CONFIG_SYSFS */
+@@ -2300,9 +2352,11 @@ static inline void *get_freelist(struct
+ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
+ unsigned long addr, struct kmem_cache_cpu *c)
+ {
++ struct slub_free_list *f;
+ void *freelist;
+ struct page *page;
+ unsigned long flags;
++ LIST_HEAD(tofree);
+
+ local_irq_save(flags);
+ #ifdef CONFIG_PREEMPT
+@@ -2370,7 +2424,13 @@ static void *__slab_alloc(struct kmem_ca
+ VM_BUG_ON(!c->page->frozen);
+ c->freelist = get_freepointer(s, freelist);
+ c->tid = next_tid(c->tid);
++out:
++ f = this_cpu_ptr(&slub_free_list);
++ raw_spin_lock(&f->lock);
++ list_splice_init(&f->list, &tofree);
++ raw_spin_unlock(&f->lock);
+ local_irq_restore(flags);
++ free_delayed(&tofree);
+ return freelist;
+
+ new_slab:
+@@ -2387,8 +2447,7 @@ static void *__slab_alloc(struct kmem_ca
+
+ if (unlikely(!freelist)) {
+ slab_out_of_memory(s, gfpflags, node);
+- local_irq_restore(flags);
+- return NULL;
++ goto out;
+ }
+
+ page = c->page;
+@@ -2403,8 +2462,7 @@ static void *__slab_alloc(struct kmem_ca
+ deactivate_slab(s, page, get_freepointer(s, freelist));
+ c->page = NULL;
+ c->freelist = NULL;
+- local_irq_restore(flags);
+- return freelist;
++ goto out;
+ }
+
+ /*
+@@ -2588,7 +2646,7 @@ static void __slab_free(struct kmem_cach
+
+ do {
+ if (unlikely(n)) {
+- spin_unlock_irqrestore(&n->list_lock, flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, flags);
+ n = NULL;
+ }
+ prior = page->freelist;
+@@ -2620,7 +2678,7 @@ static void __slab_free(struct kmem_cach
+ * Otherwise the list_lock will synchronize with
+ * other processors updating the list of slabs.
+ */
+- spin_lock_irqsave(&n->list_lock, flags);
++ raw_spin_lock_irqsave(&n->list_lock, flags);
+
+ }
+ }
+@@ -2662,7 +2720,7 @@ static void __slab_free(struct kmem_cach
+ add_partial(n, page, DEACTIVATE_TO_TAIL);
+ stat(s, FREE_ADD_PARTIAL);
+ }
+- spin_unlock_irqrestore(&n->list_lock, flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, flags);
+ return;
+
+ slab_empty:
+@@ -2677,7 +2735,7 @@ static void __slab_free(struct kmem_cach
+ remove_full(s, n, page);
+ }
+
+- spin_unlock_irqrestore(&n->list_lock, flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, flags);
+ stat(s, FREE_SLAB);
+ discard_slab(s, page);
+ }
+@@ -2876,7 +2934,7 @@ static void
+ init_kmem_cache_node(struct kmem_cache_node *n)
+ {
+ n->nr_partial = 0;
+- spin_lock_init(&n->list_lock);
++ raw_spin_lock_init(&n->list_lock);
+ INIT_LIST_HEAD(&n->partial);
+ #ifdef CONFIG_SLUB_DEBUG
+ atomic_long_set(&n->nr_slabs, 0);
+@@ -3458,7 +3516,7 @@ int __kmem_cache_shrink(struct kmem_cach
+ for (i = 0; i < SHRINK_PROMOTE_MAX; i++)
+ INIT_LIST_HEAD(promote + i);
+
+- spin_lock_irqsave(&n->list_lock, flags);
++ raw_spin_lock_irqsave(&n->list_lock, flags);
+
+ /*
+ * Build lists of slabs to discard or promote.
+@@ -3489,7 +3547,7 @@ int __kmem_cache_shrink(struct kmem_cach
+ for (i = SHRINK_PROMOTE_MAX - 1; i >= 0; i--)
+ list_splice(promote + i, &n->partial);
+
+- spin_unlock_irqrestore(&n->list_lock, flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, flags);
+
+ /* Release empty slabs */
+ list_for_each_entry_safe(page, t, &discard, lru)
+@@ -3665,6 +3723,12 @@ void __init kmem_cache_init(void)
+ {
+ static __initdata struct kmem_cache boot_kmem_cache,
+ boot_kmem_cache_node;
++ int cpu;
++
++ for_each_possible_cpu(cpu) {
++ raw_spin_lock_init(&per_cpu(slub_free_list, cpu).lock);
++ INIT_LIST_HEAD(&per_cpu(slub_free_list, cpu).list);
++ }
+
+ if (debug_guardpage_minorder())
+ slub_max_order = 0;
+@@ -3907,7 +3971,7 @@ static int validate_slab_node(struct kme
+ struct page *page;
+ unsigned long flags;
+
+- spin_lock_irqsave(&n->list_lock, flags);
++ raw_spin_lock_irqsave(&n->list_lock, flags);
+
+ list_for_each_entry(page, &n->partial, lru) {
+ validate_slab_slab(s, page, map);
+@@ -3929,7 +3993,7 @@ static int validate_slab_node(struct kme
+ s->name, count, atomic_long_read(&n->nr_slabs));
+
+ out:
+- spin_unlock_irqrestore(&n->list_lock, flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, flags);
+ return count;
+ }
+
+@@ -4117,12 +4181,12 @@ static int list_locations(struct kmem_ca
+ if (!atomic_long_read(&n->nr_slabs))
+ continue;
+
+- spin_lock_irqsave(&n->list_lock, flags);
++ raw_spin_lock_irqsave(&n->list_lock, flags);
+ list_for_each_entry(page, &n->partial, lru)
+ process_slab(&t, s, page, alloc, map);
+ list_for_each_entry(page, &n->full, lru)
+ process_slab(&t, s, page, alloc, map);
+- spin_unlock_irqrestore(&n->list_lock, flags);
++ raw_spin_unlock_irqrestore(&n->list_lock, flags);
+ }
+
+ for (i = 0; i < t.count; i++) {
diff --git a/patches/mm-make-vmstat-rt-aware.patch b/patches/mm-make-vmstat-rt-aware.patch
new file mode 100644
index 00000000000000..34b5f818f923fa
--- /dev/null
+++ b/patches/mm-make-vmstat-rt-aware.patch
@@ -0,0 +1,88 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:30:13 -0500
+Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
+
+Disable preemption on -RT for the vmstat code. On vanila the code runs in
+IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
+same ressources is not updated in parallel due to preemption.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/vmstat.h | 4 ++++
+ mm/vmstat.c | 6 ++++++
+ 2 files changed, 10 insertions(+)
+
+--- a/include/linux/vmstat.h
++++ b/include/linux/vmstat.h
+@@ -33,7 +33,9 @@ DECLARE_PER_CPU(struct vm_event_state, v
+ */
+ static inline void __count_vm_event(enum vm_event_item item)
+ {
++ preempt_disable_rt();
+ raw_cpu_inc(vm_event_states.event[item]);
++ preempt_enable_rt();
+ }
+
+ static inline void count_vm_event(enum vm_event_item item)
+@@ -43,7 +45,9 @@ static inline void count_vm_event(enum v
+
+ static inline void __count_vm_events(enum vm_event_item item, long delta)
+ {
++ preempt_disable_rt();
+ raw_cpu_add(vm_event_states.event[item], delta);
++ preempt_enable_rt();
+ }
+
+ static inline void count_vm_events(enum vm_event_item item, long delta)
+--- a/mm/vmstat.c
++++ b/mm/vmstat.c
+@@ -226,6 +226,7 @@ void __mod_zone_page_state(struct zone *
+ long x;
+ long t;
+
++ preempt_disable_rt();
+ x = delta + __this_cpu_read(*p);
+
+ t = __this_cpu_read(pcp->stat_threshold);
+@@ -235,6 +236,7 @@ void __mod_zone_page_state(struct zone *
+ x = 0;
+ }
+ __this_cpu_write(*p, x);
++ preempt_enable_rt();
+ }
+ EXPORT_SYMBOL(__mod_zone_page_state);
+
+@@ -267,6 +269,7 @@ void __inc_zone_state(struct zone *zone,
+ s8 __percpu *p = pcp->vm_stat_diff + item;
+ s8 v, t;
+
++ preempt_disable_rt();
+ v = __this_cpu_inc_return(*p);
+ t = __this_cpu_read(pcp->stat_threshold);
+ if (unlikely(v > t)) {
+@@ -275,6 +278,7 @@ void __inc_zone_state(struct zone *zone,
+ zone_page_state_add(v + overstep, zone, item);
+ __this_cpu_write(*p, -overstep);
+ }
++ preempt_enable_rt();
+ }
+
+ void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
+@@ -289,6 +293,7 @@ void __dec_zone_state(struct zone *zone,
+ s8 __percpu *p = pcp->vm_stat_diff + item;
+ s8 v, t;
+
++ preempt_disable_rt();
+ v = __this_cpu_dec_return(*p);
+ t = __this_cpu_read(pcp->stat_threshold);
+ if (unlikely(v < - t)) {
+@@ -297,6 +302,7 @@ void __dec_zone_state(struct zone *zone,
+ zone_page_state_add(v - overstep, zone, item);
+ __this_cpu_write(*p, overstep);
+ }
++ preempt_enable_rt();
+ }
+
+ void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
diff --git a/patches/mm-memcontrol-Don-t-call-schedule_work_on-in-preempt.patch b/patches/mm-memcontrol-Don-t-call-schedule_work_on-in-preempt.patch
new file mode 100644
index 00000000000000..1c6864305d47fa
--- /dev/null
+++ b/patches/mm-memcontrol-Don-t-call-schedule_work_on-in-preempt.patch
@@ -0,0 +1,68 @@
+From: Yang Shi <yang.shi@windriver.com>
+Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
+Date: Wed, 30 Oct 2013 11:48:33 -0700
+
+The following trace is triggered when running ltp oom test cases:
+
+BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
+in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
+Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
+
+CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
+Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
+ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
+ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
+ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
+Call Trace:
+[<ffffffff8169918d>] dump_stack+0x19/0x1b
+[<ffffffff8106db31>] __might_sleep+0xf1/0x170
+[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
+[<ffffffff81059da1>] queue_work_on+0x61/0x100
+[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
+[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
+[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
+[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
+[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
+[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
+[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
+[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
+[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
+[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
+[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
+[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
+[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
+[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
+[<ffffffff8103053e>] do_page_fault+0xe/0x10
+[<ffffffff8169e4c2>] page_fault+0x22/0x30
+
+So, to prevent schedule_work_on from being called in preempt disabled context,
+replace the pair of get/put_cpu() to get/put_cpu_light().
+
+
+Signed-off-by: Yang Shi <yang.shi@windriver.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+
+ mm/memcontrol.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/mm/memcontrol.c
++++ b/mm/memcontrol.c
+@@ -2147,7 +2147,7 @@ static void drain_all_stock(struct mem_c
+ return;
+ /* Notify other cpus that system-wide "drain" is running */
+ get_online_cpus();
+- curcpu = get_cpu();
++ curcpu = get_cpu_light();
+ for_each_online_cpu(cpu) {
+ struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
+ struct mem_cgroup *memcg;
+@@ -2164,7 +2164,7 @@ static void drain_all_stock(struct mem_c
+ schedule_work_on(cpu, &stock->work);
+ }
+ }
+- put_cpu();
++ put_cpu_light();
+ put_online_cpus();
+ mutex_unlock(&percpu_charge_mutex);
+ }
diff --git a/patches/mm-memcontrol-do_not_disable_irq.patch b/patches/mm-memcontrol-do_not_disable_irq.patch
new file mode 100644
index 00000000000000..9ac7155867b615
--- /dev/null
+++ b/patches/mm-memcontrol-do_not_disable_irq.patch
@@ -0,0 +1,137 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Subject: mm/memcontrol: Replace local_irq_disable with local locks
+Date: Wed, 28 Jan 2015 17:14:16 +0100
+
+There are a few local_irq_disable() which then take sleeping locks. This
+patch converts them local locks.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/swap.h | 1 +
+ mm/compaction.c | 6 ++++--
+ mm/memcontrol.c | 18 ++++++++++++------
+ mm/swap.c | 2 +-
+ 4 files changed, 18 insertions(+), 9 deletions(-)
+
+--- a/include/linux/swap.h
++++ b/include/linux/swap.h
+@@ -298,6 +298,7 @@ extern unsigned long nr_free_pagecache_p
+
+
+ /* linux/mm/swap.c */
++DECLARE_LOCAL_IRQ_LOCK(swapvec_lock);
+ extern void lru_cache_add(struct page *);
+ extern void lru_cache_add_anon(struct page *page);
+ extern void lru_cache_add_file(struct page *page);
+--- a/mm/compaction.c
++++ b/mm/compaction.c
+@@ -1406,10 +1406,12 @@ static int compact_zone(struct zone *zon
+ cc->migrate_pfn & ~((1UL << cc->order) - 1);
+
+ if (last_migrated_pfn < current_block_start) {
+- cpu = get_cpu();
++ cpu = get_cpu_light();
++ local_lock_irq(swapvec_lock);
+ lru_add_drain_cpu(cpu);
++ local_unlock_irq(swapvec_lock);
+ drain_local_pages(zone);
+- put_cpu();
++ put_cpu_light();
+ /* No more flushing until we migrate again */
+ last_migrated_pfn = 0;
+ }
+--- a/mm/memcontrol.c
++++ b/mm/memcontrol.c
+@@ -66,6 +66,8 @@
+ #include <net/sock.h>
+ #include <net/ip.h>
+ #include <net/tcp_memcontrol.h>
++#include <linux/locallock.h>
++
+ #include "slab.h"
+
+ #include <asm/uaccess.h>
+@@ -85,6 +87,7 @@ int do_swap_account __read_mostly;
+ #define do_swap_account 0
+ #endif
+
++static DEFINE_LOCAL_IRQ_LOCK(event_lock);
+ static const char * const mem_cgroup_stat_names[] = {
+ "cache",
+ "rss",
+@@ -4801,12 +4804,12 @@ static int mem_cgroup_move_account(struc
+
+ ret = 0;
+
+- local_irq_disable();
++ local_lock_irq(event_lock);
+ mem_cgroup_charge_statistics(to, page, nr_pages);
+ memcg_check_events(to, page);
+ mem_cgroup_charge_statistics(from, page, -nr_pages);
+ memcg_check_events(from, page);
+- local_irq_enable();
++ local_unlock_irq(event_lock);
+ out_unlock:
+ unlock_page(page);
+ out:
+@@ -5543,10 +5546,10 @@ void mem_cgroup_commit_charge(struct pag
+ VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+ }
+
+- local_irq_disable();
++ local_lock_irq(event_lock);
+ mem_cgroup_charge_statistics(memcg, page, nr_pages);
+ memcg_check_events(memcg, page);
+- local_irq_enable();
++ local_unlock_irq(event_lock);
+
+ if (do_swap_account && PageSwapCache(page)) {
+ swp_entry_t entry = { .val = page_private(page) };
+@@ -5602,14 +5605,14 @@ static void uncharge_batch(struct mem_cg
+ memcg_oom_recover(memcg);
+ }
+
+- local_irq_save(flags);
++ local_lock_irqsave(event_lock, flags);
+ __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS], nr_anon);
+ __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_CACHE], nr_file);
+ __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS_HUGE], nr_huge);
+ __this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_PGPGOUT], pgpgout);
+ __this_cpu_add(memcg->stat->nr_page_events, nr_pages);
+ memcg_check_events(memcg, dummy_page);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(event_lock, flags);
+
+ if (!mem_cgroup_is_root(memcg))
+ css_put_many(&memcg->css, nr_pages);
+@@ -5813,6 +5816,7 @@ void mem_cgroup_swapout(struct page *pag
+ {
+ struct mem_cgroup *memcg;
+ unsigned short oldid;
++ unsigned long flags;
+
+ VM_BUG_ON_PAGE(PageLRU(page), page);
+ VM_BUG_ON_PAGE(page_count(page), page);
+@@ -5835,9 +5839,11 @@ void mem_cgroup_swapout(struct page *pag
+ if (!mem_cgroup_is_root(memcg))
+ page_counter_uncharge(&memcg->memory, 1);
+
++ local_lock_irqsave(event_lock, flags);
+ /* Caller disabled preemption with mapping->tree_lock */
+ mem_cgroup_charge_statistics(memcg, page, -1);
+ memcg_check_events(memcg, page);
++ local_unlock_irqrestore(event_lock, flags);
+ }
+
+ /**
+--- a/mm/swap.c
++++ b/mm/swap.c
+@@ -47,7 +47,7 @@ static DEFINE_PER_CPU(struct pagevec, lr
+ static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
+
+ static DEFINE_LOCAL_IRQ_LOCK(rotate_lock);
+-static DEFINE_LOCAL_IRQ_LOCK(swapvec_lock);
++DEFINE_LOCAL_IRQ_LOCK(swapvec_lock);
+
+ /*
+ * This path almost never happens for VM activity - pages are normally
diff --git a/patches/mm-page-alloc-use-local-lock-on-target-cpu.patch b/patches/mm-page-alloc-use-local-lock-on-target-cpu.patch
new file mode 100644
index 00000000000000..f3205d67aef3fd
--- /dev/null
+++ b/patches/mm-page-alloc-use-local-lock-on-target-cpu.patch
@@ -0,0 +1,27 @@
+Subject: mm: page_alloc: Use local_lock_on() instead of plain spinlock
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 27 Sep 2012 11:11:46 +0200
+
+The plain spinlock while sufficient does not update the local_lock
+internals. Use a proper local_lock function instead to ease debugging.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ mm/page_alloc.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/mm/page_alloc.c
++++ b/mm/page_alloc.c
+@@ -238,9 +238,9 @@ static DEFINE_LOCAL_IRQ_LOCK(pa_lock);
+
+ #ifdef CONFIG_PREEMPT_RT_BASE
+ # define cpu_lock_irqsave(cpu, flags) \
+- spin_lock_irqsave(&per_cpu(pa_lock, cpu).lock, flags)
++ local_lock_irqsave_on(pa_lock, flags, cpu)
+ # define cpu_unlock_irqrestore(cpu, flags) \
+- spin_unlock_irqrestore(&per_cpu(pa_lock, cpu).lock, flags)
++ local_unlock_irqrestore_on(pa_lock, flags, cpu)
+ #else
+ # define cpu_lock_irqsave(cpu, flags) local_irq_save(flags)
+ # define cpu_unlock_irqrestore(cpu, flags) local_irq_restore(flags)
diff --git a/patches/mm-page_alloc-reduce-lock-sections-further.patch b/patches/mm-page_alloc-reduce-lock-sections-further.patch
new file mode 100644
index 00000000000000..5991d464983f72
--- /dev/null
+++ b/patches/mm-page_alloc-reduce-lock-sections-further.patch
@@ -0,0 +1,192 @@
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Fri Jul 3 08:44:37 2009 -0500
+Subject: mm: page_alloc: Reduce lock sections further
+
+Split out the pages which are to be freed into a separate list and
+call free_pages_bulk() outside of the percpu page allocator locks.
+
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ mm/page_alloc.c | 85 ++++++++++++++++++++++++++++++++++++++++----------------
+ 1 file changed, 61 insertions(+), 24 deletions(-)
+
+--- a/mm/page_alloc.c
++++ b/mm/page_alloc.c
+@@ -694,7 +694,7 @@ static inline int free_pages_check(struc
+ }
+
+ /*
+- * Frees a number of pages from the PCP lists
++ * Frees a number of pages which have been collected from the pcp lists.
+ * Assumes all pages on list are in same zone, and of same order.
+ * count is the number of pages to free.
+ *
+@@ -705,18 +705,51 @@ static inline int free_pages_check(struc
+ * pinned" detection logic.
+ */
+ static void free_pcppages_bulk(struct zone *zone, int count,
+- struct per_cpu_pages *pcp)
++ struct list_head *list)
+ {
+- int migratetype = 0;
+- int batch_free = 0;
+ int to_free = count;
+ unsigned long nr_scanned;
++ unsigned long flags;
++
++ spin_lock_irqsave(&zone->lock, flags);
+
+- spin_lock(&zone->lock);
+ nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
+ if (nr_scanned)
+ __mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
+
++ while (!list_empty(list)) {
++ struct page *page = list_first_entry(list, struct page, lru);
++ int mt; /* migratetype of the to-be-freed page */
++
++ /* must delete as __free_one_page list manipulates */
++ list_del(&page->lru);
++
++ mt = get_freepage_migratetype(page);
++ if (unlikely(has_isolate_pageblock(zone)))
++ mt = get_pageblock_migratetype(page);
++
++ /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
++ __free_one_page(page, page_to_pfn(page), zone, 0, mt);
++ trace_mm_page_pcpu_drain(page, 0, mt);
++ to_free--;
++ }
++ WARN_ON(to_free != 0);
++ spin_unlock_irqrestore(&zone->lock, flags);
++}
++
++/*
++ * Moves a number of pages from the PCP lists to free list which
++ * is freed outside of the locked region.
++ *
++ * Assumes all pages on list are in same zone, and of same order.
++ * count is the number of pages to free.
++ */
++static void isolate_pcp_pages(int to_free, struct per_cpu_pages *src,
++ struct list_head *dst)
++{
++ int migratetype = 0;
++ int batch_free = 0;
++
+ while (to_free) {
+ struct page *page;
+ struct list_head *list;
+@@ -732,7 +765,7 @@ static void free_pcppages_bulk(struct zo
+ batch_free++;
+ if (++migratetype == MIGRATE_PCPTYPES)
+ migratetype = 0;
+- list = &pcp->lists[migratetype];
++ list = &src->lists[migratetype];
+ } while (list_empty(list));
+
+ /* This is the only non-empty list. Free them all. */
+@@ -740,21 +773,11 @@ static void free_pcppages_bulk(struct zo
+ batch_free = to_free;
+
+ do {
+- int mt; /* migratetype of the to-be-freed page */
+-
+- page = list_entry(list->prev, struct page, lru);
+- /* must delete as __free_one_page list manipulates */
++ page = list_last_entry(list, struct page, lru);
+ list_del(&page->lru);
+- mt = get_freepage_migratetype(page);
+- if (unlikely(has_isolate_pageblock(zone)))
+- mt = get_pageblock_migratetype(page);
+-
+- /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
+- __free_one_page(page, page_to_pfn(page), zone, 0, mt);
+- trace_mm_page_pcpu_drain(page, 0, mt);
++ list_add(&page->lru, dst);
+ } while (--to_free && --batch_free && !list_empty(list));
+ }
+- spin_unlock(&zone->lock);
+ }
+
+ static void free_one_page(struct zone *zone,
+@@ -763,7 +786,9 @@ static void free_one_page(struct zone *z
+ int migratetype)
+ {
+ unsigned long nr_scanned;
+- spin_lock(&zone->lock);
++ unsigned long flags;
++
++ spin_lock_irqsave(&zone->lock, flags);
+ nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
+ if (nr_scanned)
+ __mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
+@@ -773,7 +798,7 @@ static void free_one_page(struct zone *z
+ migratetype = get_pfnblock_migratetype(page, pfn);
+ }
+ __free_one_page(page, pfn, zone, order, migratetype);
+- spin_unlock(&zone->lock);
++ spin_unlock_irqrestore(&zone->lock, flags);
+ }
+
+ static int free_tail_pages_check(struct page *head_page, struct page *page)
+@@ -1381,16 +1406,18 @@ static int rmqueue_bulk(struct zone *zon
+ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
+ {
+ unsigned long flags;
++ LIST_HEAD(dst);
+ int to_drain, batch;
+
+ local_lock_irqsave(pa_lock, flags);
+ batch = READ_ONCE(pcp->batch);
+ to_drain = min(pcp->count, batch);
+ if (to_drain > 0) {
+- free_pcppages_bulk(zone, to_drain, pcp);
++ isolate_pcp_pages(to_drain, pcp, &dst);
+ pcp->count -= to_drain;
+ }
+ local_unlock_irqrestore(pa_lock, flags);
++ free_pcppages_bulk(zone, to_drain, &dst);
+ }
+ #endif
+
+@@ -1406,16 +1433,21 @@ static void drain_pages_zone(unsigned in
+ unsigned long flags;
+ struct per_cpu_pageset *pset;
+ struct per_cpu_pages *pcp;
++ LIST_HEAD(dst);
++ int count;
+
+ cpu_lock_irqsave(cpu, flags);
+ pset = per_cpu_ptr(zone->pageset, cpu);
+
+ pcp = &pset->pcp;
+- if (pcp->count) {
+- free_pcppages_bulk(zone, pcp->count, pcp);
++ count = pcp->count;
++ if (count) {
++ isolate_pcp_pages(count, pcp, &dst);
+ pcp->count = 0;
+ }
+ cpu_unlock_irqrestore(cpu, flags);
++ if (count)
++ free_pcppages_bulk(zone, count, &dst);
+ }
+
+ /*
+@@ -1593,8 +1625,13 @@ void free_hot_cold_page(struct page *pag
+ pcp->count++;
+ if (pcp->count >= pcp->high) {
+ unsigned long batch = READ_ONCE(pcp->batch);
+- free_pcppages_bulk(zone, batch, pcp);
++ LIST_HEAD(dst);
++
++ isolate_pcp_pages(batch, pcp, &dst);
+ pcp->count -= batch;
++ local_unlock_irqrestore(pa_lock, flags);
++ free_pcppages_bulk(zone, batch, &dst);
++ return;
+ }
+
+ out:
diff --git a/patches/mm-page_alloc-rt-friendly-per-cpu-pages.patch b/patches/mm-page_alloc-rt-friendly-per-cpu-pages.patch
new file mode 100644
index 00000000000000..fb419e663796fc
--- /dev/null
+++ b/patches/mm-page_alloc-rt-friendly-per-cpu-pages.patch
@@ -0,0 +1,201 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:37 -0500
+Subject: mm: page_alloc: rt-friendly per-cpu pages
+
+rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
+method into a preemptible, explicit-per-cpu-locks method.
+
+Contains fixes from:
+ Peter Zijlstra <a.p.zijlstra@chello.nl>
+ Thomas Gleixner <tglx@linutronix.de>
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ mm/page_alloc.c | 57 ++++++++++++++++++++++++++++++++++++++++----------------
+ 1 file changed, 41 insertions(+), 16 deletions(-)
+
+--- a/mm/page_alloc.c
++++ b/mm/page_alloc.c
+@@ -60,6 +60,7 @@
+ #include <linux/page_ext.h>
+ #include <linux/hugetlb.h>
+ #include <linux/sched/rt.h>
++#include <linux/locallock.h>
+ #include <linux/page_owner.h>
+
+ #include <asm/sections.h>
+@@ -233,6 +234,18 @@ EXPORT_SYMBOL(nr_node_ids);
+ EXPORT_SYMBOL(nr_online_nodes);
+ #endif
+
++static DEFINE_LOCAL_IRQ_LOCK(pa_lock);
++
++#ifdef CONFIG_PREEMPT_RT_BASE
++# define cpu_lock_irqsave(cpu, flags) \
++ spin_lock_irqsave(&per_cpu(pa_lock, cpu).lock, flags)
++# define cpu_unlock_irqrestore(cpu, flags) \
++ spin_unlock_irqrestore(&per_cpu(pa_lock, cpu).lock, flags)
++#else
++# define cpu_lock_irqsave(cpu, flags) local_irq_save(flags)
++# define cpu_unlock_irqrestore(cpu, flags) local_irq_restore(flags)
++#endif
++
+ int page_group_by_mobility_disabled __read_mostly;
+
+ void set_pageblock_migratetype(struct page *page, int migratetype)
+@@ -825,11 +838,11 @@ static void __free_pages_ok(struct page
+ return;
+
+ migratetype = get_pfnblock_migratetype(page, pfn);
+- local_irq_save(flags);
++ local_lock_irqsave(pa_lock, flags);
+ __count_vm_events(PGFREE, 1 << order);
+ set_freepage_migratetype(page, migratetype);
+ free_one_page(page_zone(page), page, pfn, order, migratetype);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pa_lock, flags);
+ }
+
+ void __init __free_pages_bootmem(struct page *page, unsigned int order)
+@@ -1370,14 +1383,14 @@ void drain_zone_pages(struct zone *zone,
+ unsigned long flags;
+ int to_drain, batch;
+
+- local_irq_save(flags);
++ local_lock_irqsave(pa_lock, flags);
+ batch = READ_ONCE(pcp->batch);
+ to_drain = min(pcp->count, batch);
+ if (to_drain > 0) {
+ free_pcppages_bulk(zone, to_drain, pcp);
+ pcp->count -= to_drain;
+ }
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pa_lock, flags);
+ }
+ #endif
+
+@@ -1394,7 +1407,7 @@ static void drain_pages_zone(unsigned in
+ struct per_cpu_pageset *pset;
+ struct per_cpu_pages *pcp;
+
+- local_irq_save(flags);
++ cpu_lock_irqsave(cpu, flags);
+ pset = per_cpu_ptr(zone->pageset, cpu);
+
+ pcp = &pset->pcp;
+@@ -1402,7 +1415,7 @@ static void drain_pages_zone(unsigned in
+ free_pcppages_bulk(zone, pcp->count, pcp);
+ pcp->count = 0;
+ }
+- local_irq_restore(flags);
++ cpu_unlock_irqrestore(cpu, flags);
+ }
+
+ /*
+@@ -1488,8 +1501,17 @@ void drain_all_pages(struct zone *zone)
+ else
+ cpumask_clear_cpu(cpu, &cpus_with_pcps);
+ }
++#ifndef CONFIG_PREEMPT_RT_BASE
+ on_each_cpu_mask(&cpus_with_pcps, (smp_call_func_t) drain_local_pages,
+ zone, 1);
++#else
++ for_each_cpu(cpu, &cpus_with_pcps) {
++ if (zone)
++ drain_pages_zone(cpu, zone);
++ else
++ drain_pages(cpu);
++ }
++#endif
+ }
+
+ #ifdef CONFIG_HIBERNATION
+@@ -1545,7 +1567,7 @@ void free_hot_cold_page(struct page *pag
+
+ migratetype = get_pfnblock_migratetype(page, pfn);
+ set_freepage_migratetype(page, migratetype);
+- local_irq_save(flags);
++ local_lock_irqsave(pa_lock, flags);
+ __count_vm_event(PGFREE);
+
+ /*
+@@ -1576,7 +1598,7 @@ void free_hot_cold_page(struct page *pag
+ }
+
+ out:
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pa_lock, flags);
+ }
+
+ /*
+@@ -1707,7 +1729,7 @@ struct page *buffered_rmqueue(struct zon
+ struct per_cpu_pages *pcp;
+ struct list_head *list;
+
+- local_irq_save(flags);
++ local_lock_irqsave(pa_lock, flags);
+ pcp = &this_cpu_ptr(zone->pageset)->pcp;
+ list = &pcp->lists[migratetype];
+ if (list_empty(list)) {
+@@ -1739,13 +1761,15 @@ struct page *buffered_rmqueue(struct zon
+ */
+ WARN_ON_ONCE(order > 1);
+ }
+- spin_lock_irqsave(&zone->lock, flags);
++ local_spin_lock_irqsave(pa_lock, &zone->lock, flags);
+ page = __rmqueue(zone, order, migratetype);
+- spin_unlock(&zone->lock);
+- if (!page)
++ if (!page) {
++ spin_unlock(&zone->lock);
+ goto failed;
++ }
+ __mod_zone_freepage_state(zone, -(1 << order),
+ get_freepage_migratetype(page));
++ spin_unlock(&zone->lock);
+ }
+
+ __mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
+@@ -1755,13 +1779,13 @@ struct page *buffered_rmqueue(struct zon
+
+ __count_zone_vm_events(PGALLOC, zone, 1 << order);
+ zone_statistics(preferred_zone, zone, gfp_flags);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pa_lock, flags);
+
+ VM_BUG_ON_PAGE(bad_range(zone, page), page);
+ return page;
+
+ failed:
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pa_lock, flags);
+ return NULL;
+ }
+
+@@ -5650,6 +5674,7 @@ static int page_alloc_cpu_notify(struct
+ void __init page_alloc_init(void)
+ {
+ hotcpu_notifier(page_alloc_cpu_notify, 0);
++ local_irq_lock_init(pa_lock);
+ }
+
+ /*
+@@ -6544,7 +6569,7 @@ void zone_pcp_reset(struct zone *zone)
+ struct per_cpu_pageset *pset;
+
+ /* avoid races with drain_pages() */
+- local_irq_save(flags);
++ local_lock_irqsave(pa_lock, flags);
+ if (zone->pageset != &boot_pageset) {
+ for_each_online_cpu(cpu) {
+ pset = per_cpu_ptr(zone->pageset, cpu);
+@@ -6553,7 +6578,7 @@ void zone_pcp_reset(struct zone *zone)
+ free_percpu(zone->pageset);
+ zone->pageset = &boot_pageset;
+ }
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pa_lock, flags);
+ }
+
+ #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/patches/mm-protect-activate-switch-mm.patch b/patches/mm-protect-activate-switch-mm.patch
new file mode 100644
index 00000000000000..383e79fdb2651b
--- /dev/null
+++ b/patches/mm-protect-activate-switch-mm.patch
@@ -0,0 +1,71 @@
+From: Yong Zhang <yong.zhang0@gmail.com>
+Date: Tue, 15 May 2012 13:53:56 +0800
+Subject: mm: Protect activate_mm() by preempt_[disable&enable]_rt()
+
+User preempt_*_rt instead of local_irq_*_rt or otherwise there will be
+warning on ARM like below:
+
+WARNING: at build/linux/kernel/smp.c:459 smp_call_function_many+0x98/0x264()
+Modules linked in:
+[<c0013bb4>] (unwind_backtrace+0x0/0xe4) from [<c001be94>] (warn_slowpath_common+0x4c/0x64)
+[<c001be94>] (warn_slowpath_common+0x4c/0x64) from [<c001bec4>] (warn_slowpath_null+0x18/0x1c)
+[<c001bec4>] (warn_slowpath_null+0x18/0x1c) from [<c0053ff8>](smp_call_function_many+0x98/0x264)
+[<c0053ff8>] (smp_call_function_many+0x98/0x264) from [<c0054364>] (smp_call_function+0x44/0x6c)
+[<c0054364>] (smp_call_function+0x44/0x6c) from [<c0017d50>] (__new_context+0xbc/0x124)
+[<c0017d50>] (__new_context+0xbc/0x124) from [<c009e49c>] (flush_old_exec+0x460/0x5e4)
+[<c009e49c>] (flush_old_exec+0x460/0x5e4) from [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac)
+[<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) from [<c009d060>] (search_binary_handler+0x94/0x2a4)
+[<c009d060>] (search_binary_handler+0x94/0x2a4) from [<c009e8fc>] (do_execve+0x254/0x364)
+[<c009e8fc>] (do_execve+0x254/0x364) from [<c0010e84>] (sys_execve+0x34/0x54)
+[<c0010e84>] (sys_execve+0x34/0x54) from [<c000da00>] (ret_fast_syscall+0x0/0x30)
+---[ end trace 0000000000000002 ]---
+
+The reason is that ARM need irq enabled when doing activate_mm().
+According to mm-protect-activate-switch-mm.patch, actually
+preempt_[disable|enable]_rt() is sufficient.
+
+Inspired-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Link: http://lkml.kernel.org/r/1337061236-1766-1-git-send-email-yong.zhang0@gmail.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ fs/exec.c | 2 ++
+ mm/mmu_context.c | 2 ++
+ 2 files changed, 4 insertions(+)
+
+--- a/fs/exec.c
++++ b/fs/exec.c
+@@ -859,12 +859,14 @@ static int exec_mmap(struct mm_struct *m
+ }
+ }
+ task_lock(tsk);
++ preempt_disable_rt();
+ active_mm = tsk->active_mm;
+ tsk->mm = mm;
+ tsk->active_mm = mm;
+ activate_mm(active_mm, mm);
+ tsk->mm->vmacache_seqnum = 0;
+ vmacache_flush(tsk);
++ preempt_enable_rt();
+ task_unlock(tsk);
+ if (old_mm) {
+ up_read(&old_mm->mmap_sem);
+--- a/mm/mmu_context.c
++++ b/mm/mmu_context.c
+@@ -23,6 +23,7 @@ void use_mm(struct mm_struct *mm)
+ struct task_struct *tsk = current;
+
+ task_lock(tsk);
++ preempt_disable_rt();
+ active_mm = tsk->active_mm;
+ if (active_mm != mm) {
+ atomic_inc(&mm->mm_count);
+@@ -30,6 +31,7 @@ void use_mm(struct mm_struct *mm)
+ }
+ tsk->mm = mm;
+ switch_mm(active_mm, mm, tsk);
++ preempt_enable_rt();
+ task_unlock(tsk);
+ #ifdef finish_arch_post_lock_switch
+ finish_arch_post_lock_switch();
diff --git a/patches/mm-rt-kmap-atomic-scheduling.patch b/patches/mm-rt-kmap-atomic-scheduling.patch
new file mode 100644
index 00000000000000..ad4843a6c8baf5
--- /dev/null
+++ b/patches/mm-rt-kmap-atomic-scheduling.patch
@@ -0,0 +1,288 @@
+Subject: mm, rt: kmap_atomic scheduling
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Thu, 28 Jul 2011 10:43:51 +0200
+
+In fact, with migrate_disable() existing one could play games with
+kmap_atomic. You could save/restore the kmap_atomic slots on context
+switch (if there are any in use of course), this should be esp easy now
+that we have a kmap_atomic stack.
+
+Something like the below.. it wants replacing all the preempt_disable()
+stuff with pagefault_disable() && migrate_disable() of course, but then
+you can flip kmaps around like below.
+
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+[dvhart@linux.intel.com: build fix]
+Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
+
+[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
+ and the pte content right away in the task struct.
+ Shortens the context switch code. ]
+---
+ arch/x86/kernel/process_32.c | 32 ++++++++++++++++++++++++++++++++
+ arch/x86/mm/highmem_32.c | 13 ++++++++++---
+ arch/x86/mm/iomap_32.c | 9 ++++++++-
+ include/linux/highmem.h | 27 +++++++++++++++++++++++----
+ include/linux/sched.h | 7 +++++++
+ include/linux/uaccess.h | 2 ++
+ mm/highmem.c | 6 ++++--
+ 7 files changed, 86 insertions(+), 10 deletions(-)
+
+--- a/arch/x86/kernel/process_32.c
++++ b/arch/x86/kernel/process_32.c
+@@ -35,6 +35,7 @@
+ #include <linux/uaccess.h>
+ #include <linux/io.h>
+ #include <linux/kdebug.h>
++#include <linux/highmem.h>
+
+ #include <asm/pgtable.h>
+ #include <asm/ldt.h>
+@@ -210,6 +211,35 @@ start_thread(struct pt_regs *regs, unsig
+ }
+ EXPORT_SYMBOL_GPL(start_thread);
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++static void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p)
++{
++ int i;
++
++ /*
++ * Clear @prev's kmap_atomic mappings
++ */
++ for (i = 0; i < prev_p->kmap_idx; i++) {
++ int idx = i + KM_TYPE_NR * smp_processor_id();
++ pte_t *ptep = kmap_pte - idx;
++
++ kpte_clear_flush(ptep, __fix_to_virt(FIX_KMAP_BEGIN + idx));
++ }
++ /*
++ * Restore @next_p's kmap_atomic mappings
++ */
++ for (i = 0; i < next_p->kmap_idx; i++) {
++ int idx = i + KM_TYPE_NR * smp_processor_id();
++
++ if (!pte_none(next_p->kmap_pte[i]))
++ set_pte(kmap_pte - idx, next_p->kmap_pte[i]);
++ }
++}
++#else
++static inline void
++switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) { }
++#endif
++
+
+ /*
+ * switch_to(x,y) should switch tasks from x to y.
+@@ -292,6 +322,8 @@ EXPORT_SYMBOL_GPL(start_thread);
+ task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
+ __switch_to_xtra(prev_p, next_p, tss);
+
++ switch_kmaps(prev_p, next_p);
++
+ /*
+ * Leave lazy mode, flushing any hypercalls made here.
+ * This must be done before restoring TLS segments so
+--- a/arch/x86/mm/highmem_32.c
++++ b/arch/x86/mm/highmem_32.c
+@@ -32,10 +32,11 @@ EXPORT_SYMBOL(kunmap);
+ */
+ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+ {
++ pte_t pte = mk_pte(page, prot);
+ unsigned long vaddr;
+ int idx, type;
+
+- preempt_disable();
++ preempt_disable_nort();
+ pagefault_disable();
+
+ if (!PageHighMem(page))
+@@ -45,7 +46,10 @@ void *kmap_atomic_prot(struct page *page
+ idx = type + KM_TYPE_NR*smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ BUG_ON(!pte_none(*(kmap_pte-idx)));
+- set_pte(kmap_pte-idx, mk_pte(page, prot));
++#ifdef CONFIG_PREEMPT_RT_FULL
++ current->kmap_pte[type] = pte;
++#endif
++ set_pte(kmap_pte-idx, pte);
+ arch_flush_lazy_mmu_mode();
+
+ return (void *)vaddr;
+@@ -88,6 +92,9 @@ void __kunmap_atomic(void *kvaddr)
+ * is a bad idea also, in case the page changes cacheability
+ * attributes or becomes a protected page in a hypervisor.
+ */
++#ifdef CONFIG_PREEMPT_RT_FULL
++ current->kmap_pte[type] = __pte(0);
++#endif
+ kpte_clear_flush(kmap_pte-idx, vaddr);
+ kmap_atomic_idx_pop();
+ arch_flush_lazy_mmu_mode();
+@@ -100,7 +107,7 @@ void __kunmap_atomic(void *kvaddr)
+ #endif
+
+ pagefault_enable();
+- preempt_enable();
++ preempt_enable_nort();
+ }
+ EXPORT_SYMBOL(__kunmap_atomic);
+
+--- a/arch/x86/mm/iomap_32.c
++++ b/arch/x86/mm/iomap_32.c
+@@ -56,6 +56,7 @@ EXPORT_SYMBOL_GPL(iomap_free);
+
+ void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
+ {
++ pte_t pte = pfn_pte(pfn, prot);
+ unsigned long vaddr;
+ int idx, type;
+
+@@ -65,7 +66,10 @@ void *kmap_atomic_prot_pfn(unsigned long
+ type = kmap_atomic_idx_push();
+ idx = type + KM_TYPE_NR * smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+- set_pte(kmap_pte - idx, pfn_pte(pfn, prot));
++#ifdef CONFIG_PREEMPT_RT_FULL
++ current->kmap_pte[type] = pte;
++#endif
++ set_pte(kmap_pte - idx, pte);
+ arch_flush_lazy_mmu_mode();
+
+ return (void *)vaddr;
+@@ -113,6 +117,9 @@ iounmap_atomic(void __iomem *kvaddr)
+ * is a bad idea also, in case the page changes cacheability
+ * attributes or becomes a protected page in a hypervisor.
+ */
++#ifdef CONFIG_PREEMPT_RT_FULL
++ current->kmap_pte[type] = __pte(0);
++#endif
+ kpte_clear_flush(kmap_pte-idx, vaddr);
+ kmap_atomic_idx_pop();
+ }
+--- a/include/linux/highmem.h
++++ b/include/linux/highmem.h
+@@ -87,32 +87,51 @@ static inline void __kunmap_atomic(void
+
+ #if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ DECLARE_PER_CPU(int, __kmap_atomic_idx);
++#endif
+
+ static inline int kmap_atomic_idx_push(void)
+ {
++#ifndef CONFIG_PREEMPT_RT_FULL
+ int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
+
+-#ifdef CONFIG_DEBUG_HIGHMEM
++# ifdef CONFIG_DEBUG_HIGHMEM
+ WARN_ON_ONCE(in_irq() && !irqs_disabled());
+ BUG_ON(idx >= KM_TYPE_NR);
+-#endif
++# endif
+ return idx;
++#else
++ current->kmap_idx++;
++ BUG_ON(current->kmap_idx > KM_TYPE_NR);
++ return current->kmap_idx - 1;
++#endif
+ }
+
+ static inline int kmap_atomic_idx(void)
+ {
++#ifndef CONFIG_PREEMPT_RT_FULL
+ return __this_cpu_read(__kmap_atomic_idx) - 1;
++#else
++ return current->kmap_idx - 1;
++#endif
+ }
+
+ static inline void kmap_atomic_idx_pop(void)
+ {
+-#ifdef CONFIG_DEBUG_HIGHMEM
++#ifndef CONFIG_PREEMPT_RT_FULL
++# ifdef CONFIG_DEBUG_HIGHMEM
+ int idx = __this_cpu_dec_return(__kmap_atomic_idx);
+
+ BUG_ON(idx < 0);
+-#else
++# else
+ __this_cpu_dec(__kmap_atomic_idx);
++# endif
++#else
++ current->kmap_idx--;
++# ifdef CONFIG_DEBUG_HIGHMEM
++ BUG_ON(current->kmap_idx < 0);
++# endif
+ #endif
+ }
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -26,6 +26,7 @@ struct sched_param {
+ #include <linux/nodemask.h>
+ #include <linux/mm_types.h>
+ #include <linux/preempt_mask.h>
++#include <asm/kmap_types.h>
+
+ #include <asm/page.h>
+ #include <asm/ptrace.h>
+@@ -1796,6 +1797,12 @@ struct task_struct {
+ int softirq_nestcnt;
+ unsigned int softirqs_raised;
+ #endif
++#ifdef CONFIG_PREEMPT_RT_FULL
++# if defined CONFIG_HIGHMEM || defined CONFIG_X86_32
++ int kmap_idx;
++ pte_t kmap_pte[KM_TYPE_NR];
++# endif
++#endif
+ #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ unsigned long task_state_change;
+ #endif
+--- a/include/linux/uaccess.h
++++ b/include/linux/uaccess.h
+@@ -24,6 +24,7 @@ static __always_inline void pagefault_di
+ */
+ static inline void pagefault_disable(void)
+ {
++ migrate_disable();
+ pagefault_disabled_inc();
+ /*
+ * make sure to have issued the store before a pagefault
+@@ -40,6 +41,7 @@ static inline void pagefault_enable(void
+ */
+ barrier();
+ pagefault_disabled_dec();
++ migrate_enable();
+ }
+
+ /*
+--- a/mm/highmem.c
++++ b/mm/highmem.c
+@@ -29,10 +29,11 @@
+ #include <linux/kgdb.h>
+ #include <asm/tlbflush.h>
+
+-
++#ifndef CONFIG_PREEMPT_RT_FULL
+ #if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
+ DEFINE_PER_CPU(int, __kmap_atomic_idx);
+ #endif
++#endif
+
+ /*
+ * Virtual_count is not a pure "count".
+@@ -107,8 +108,9 @@ static inline wait_queue_head_t *get_pkm
+ unsigned long totalhigh_pages __read_mostly;
+ EXPORT_SYMBOL(totalhigh_pages);
+
+-
++#ifndef CONFIG_PREEMPT_RT_FULL
+ EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx);
++#endif
+
+ unsigned int nr_free_highpages (void)
+ {
diff --git a/patches/mm-scatterlist-dont-disable-irqs-on-RT.patch b/patches/mm-scatterlist-dont-disable-irqs-on-RT.patch
new file mode 100644
index 00000000000000..6da9ab217b4c6b
--- /dev/null
+++ b/patches/mm-scatterlist-dont-disable-irqs-on-RT.patch
@@ -0,0 +1,43 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 3 Jul 2009 08:44:34 -0500
+Subject: mm/scatterlist: Do not disable irqs on RT
+
+The local_irq_save() is not only used to get things done "fast" but
+also to ensure that in case of SG_MITER_ATOMIC we are in "atomic"
+context for kmap_atomic(). For -RT it is enough to keep pagefault
+disabled (which is currently handled by kmap_atomic()).
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ lib/scatterlist.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/lib/scatterlist.c
++++ b/lib/scatterlist.c
+@@ -592,7 +592,7 @@ void sg_miter_stop(struct sg_mapping_ite
+ flush_kernel_dcache_page(miter->page);
+
+ if (miter->__flags & SG_MITER_ATOMIC) {
+- WARN_ON_ONCE(preemptible());
++ WARN_ON_ONCE(!pagefault_disabled());
+ kunmap_atomic(miter->addr);
+ } else
+ kunmap(miter->page);
+@@ -637,7 +637,7 @@ static size_t sg_copy_buffer(struct scat
+ if (!sg_miter_skip(&miter, skip))
+ return false;
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+
+ while (sg_miter_next(&miter) && offset < buflen) {
+ unsigned int len;
+@@ -654,7 +654,7 @@ static size_t sg_copy_buffer(struct scat
+
+ sg_miter_stop(&miter);
+
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ return offset;
+ }
+
diff --git a/patches/mm-slub-move-slab-initialization-into-irq-enabled-region.patch b/patches/mm-slub-move-slab-initialization-into-irq-enabled-region.patch
new file mode 100644
index 00000000000000..eb1bdc93b9518a
--- /dev/null
+++ b/patches/mm-slub-move-slab-initialization-into-irq-enabled-region.patch
@@ -0,0 +1,162 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: mm/slub: move slab initialization into irq enabled region
+
+Initializing a new slab can introduce rather large latencies because most
+of the initialization runs always with interrupts disabled.
+
+There is no point in doing so. The newly allocated slab is not visible
+yet, so there is no reason to protect it against concurrent alloc/free.
+
+Move the expensive parts of the initialization into allocate_slab(), so
+for all allocations with GFP_WAIT set, interrupts are enabled.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Acked-by: Christoph Lameter <cl@linux.com>
+Cc: Pekka Enberg <penberg@kernel.org>
+Cc: David Rientjes <rientjes@google.com>
+Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
+Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+---
+
+ mm/slub.c | 89 +++++++++++++++++++++++++++++---------------------------------
+ 1 file changed, 42 insertions(+), 47 deletions(-)
+
+--- a/mm/slub.c
++++ b/mm/slub.c
+@@ -1306,6 +1306,17 @@ static inline void slab_free_hook(struct
+ kasan_slab_free(s, x);
+ }
+
++static void setup_object(struct kmem_cache *s, struct page *page,
++ void *object)
++{
++ setup_object_debug(s, page, object);
++ if (unlikely(s->ctor)) {
++ kasan_unpoison_object_data(s, object);
++ s->ctor(object);
++ kasan_poison_object_data(s, object);
++ }
++}
++
+ /*
+ * Slab allocation and freeing
+ */
+@@ -1336,6 +1347,8 @@ static struct page *allocate_slab(struct
+ struct page *page;
+ struct kmem_cache_order_objects oo = s->oo;
+ gfp_t alloc_gfp;
++ void *start, *p;
++ int idx, order;
+
+ flags &= gfp_allowed_mask;
+
+@@ -1359,13 +1372,13 @@ static struct page *allocate_slab(struct
+ * Try a lower order alloc if possible
+ */
+ page = alloc_slab_page(s, alloc_gfp, node, oo);
+-
+- if (page)
+- stat(s, ORDER_FALLBACK);
++ if (unlikely(!page))
++ goto out;
++ stat(s, ORDER_FALLBACK);
+ }
+
+- if (kmemcheck_enabled && page
+- && !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) {
++ if (kmemcheck_enabled &&
++ !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) {
+ int pages = 1 << oo_order(oo);
+
+ kmemcheck_alloc_shadow(page, oo_order(oo), alloc_gfp, node);
+@@ -1380,51 +1393,9 @@ static struct page *allocate_slab(struct
+ kmemcheck_mark_unallocated_pages(page, pages);
+ }
+
+- if (flags & __GFP_WAIT)
+- local_irq_disable();
+- if (!page)
+- return NULL;
+-
+ page->objects = oo_objects(oo);
+- mod_zone_page_state(page_zone(page),
+- (s->flags & SLAB_RECLAIM_ACCOUNT) ?
+- NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
+- 1 << oo_order(oo));
+-
+- return page;
+-}
+-
+-static void setup_object(struct kmem_cache *s, struct page *page,
+- void *object)
+-{
+- setup_object_debug(s, page, object);
+- if (unlikely(s->ctor)) {
+- kasan_unpoison_object_data(s, object);
+- s->ctor(object);
+- kasan_poison_object_data(s, object);
+- }
+-}
+-
+-static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+-{
+- struct page *page;
+- void *start;
+- void *p;
+- int order;
+- int idx;
+-
+- if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
+- pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+- BUG();
+- }
+-
+- page = allocate_slab(s,
+- flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
+- if (!page)
+- goto out;
+
+ order = compound_order(page);
+- inc_slabs_node(s, page_to_nid(page), page->objects);
+ page->slab_cache = s;
+ __SetPageSlab(page);
+ if (page->pfmemalloc)
+@@ -1448,10 +1419,34 @@ static struct page *new_slab(struct kmem
+ page->freelist = start;
+ page->inuse = page->objects;
+ page->frozen = 1;
++
+ out:
++ if (flags & __GFP_WAIT)
++ local_irq_disable();
++ if (!page)
++ return NULL;
++
++ mod_zone_page_state(page_zone(page),
++ (s->flags & SLAB_RECLAIM_ACCOUNT) ?
++ NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
++ 1 << oo_order(oo));
++
++ inc_slabs_node(s, page_to_nid(page), page->objects);
++
+ return page;
+ }
+
++static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
++{
++ if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
++ pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
++ BUG();
++ }
++
++ return allocate_slab(s,
++ flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
++}
++
+ static void __free_slab(struct kmem_cache *s, struct page *page)
+ {
+ int order = compound_order(page);
diff --git a/patches/mm-vmalloc-use-get-cpu-light.patch b/patches/mm-vmalloc-use-get-cpu-light.patch
new file mode 100644
index 00000000000000..fa6fad8e976fd2
--- /dev/null
+++ b/patches/mm-vmalloc-use-get-cpu-light.patch
@@ -0,0 +1,65 @@
+Subject: mm/vmalloc: Another preempt disable region which sucks
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 12 Jul 2011 11:39:36 +0200
+
+Avoid the preempt disable version of get_cpu_var(). The inner-lock should
+provide enough serialisation.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ mm/vmalloc.c | 13 ++++++++-----
+ 1 file changed, 8 insertions(+), 5 deletions(-)
+
+--- a/mm/vmalloc.c
++++ b/mm/vmalloc.c
+@@ -819,7 +819,7 @@ static void *new_vmap_block(unsigned int
+ struct vmap_block *vb;
+ struct vmap_area *va;
+ unsigned long vb_idx;
+- int node, err;
++ int node, err, cpu;
+ void *vaddr;
+
+ node = numa_node_id();
+@@ -862,11 +862,12 @@ static void *new_vmap_block(unsigned int
+ BUG_ON(err);
+ radix_tree_preload_end();
+
+- vbq = &get_cpu_var(vmap_block_queue);
++ cpu = get_cpu_light();
++ vbq = this_cpu_ptr(&vmap_block_queue);
+ spin_lock(&vbq->lock);
+ list_add_tail_rcu(&vb->free_list, &vbq->free);
+ spin_unlock(&vbq->lock);
+- put_cpu_var(vmap_block_queue);
++ put_cpu_light();
+
+ return vaddr;
+ }
+@@ -935,6 +936,7 @@ static void *vb_alloc(unsigned long size
+ struct vmap_block *vb;
+ void *vaddr = NULL;
+ unsigned int order;
++ int cpu;
+
+ BUG_ON(size & ~PAGE_MASK);
+ BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC);
+@@ -949,7 +951,8 @@ static void *vb_alloc(unsigned long size
+ order = get_order(size);
+
+ rcu_read_lock();
+- vbq = &get_cpu_var(vmap_block_queue);
++ cpu = get_cpu_light();
++ vbq = this_cpu_ptr(&vmap_block_queue);
+ list_for_each_entry_rcu(vb, &vbq->free, free_list) {
+ unsigned long pages_off;
+
+@@ -972,7 +975,7 @@ static void *vb_alloc(unsigned long size
+ break;
+ }
+
+- put_cpu_var(vmap_block_queue);
++ put_cpu_light();
+ rcu_read_unlock();
+
+ /* Allocate new block if nothing was found */
diff --git a/patches/mm-workingset-do-not-protect-workingset_shadow_nodes.patch b/patches/mm-workingset-do-not-protect-workingset_shadow_nodes.patch
new file mode 100644
index 00000000000000..9c388ef7fcd038
--- /dev/null
+++ b/patches/mm-workingset-do-not-protect-workingset_shadow_nodes.patch
@@ -0,0 +1,150 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 29 Jan 2015 17:19:44 +0100
+Subject: mm/workingset: Do not protect workingset_shadow_nodes with irq off
+
+workingset_shadow_nodes is protected by local_irq_disable(). Some users
+use spin_lock_irq().
+Replace the irq/on with a local_lock(). Rename workingset_shadow_nodes
+so I catch users of it which will be introduced later.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/swap.h | 4 +++-
+ mm/filemap.c | 11 ++++++++---
+ mm/truncate.c | 7 +++++--
+ mm/workingset.c | 23 ++++++++++++-----------
+ 4 files changed, 28 insertions(+), 17 deletions(-)
+
+--- a/include/linux/swap.h
++++ b/include/linux/swap.h
+@@ -11,6 +11,7 @@
+ #include <linux/fs.h>
+ #include <linux/atomic.h>
+ #include <linux/page-flags.h>
++#include <linux/locallock.h>
+ #include <asm/page.h>
+
+ struct notifier_block;
+@@ -252,7 +253,8 @@ struct swap_info_struct {
+ void *workingset_eviction(struct address_space *mapping, struct page *page);
+ bool workingset_refault(void *shadow);
+ void workingset_activation(struct page *page);
+-extern struct list_lru workingset_shadow_nodes;
++extern struct list_lru __workingset_shadow_nodes;
++DECLARE_LOCAL_IRQ_LOCK(workingset_shadow_lock);
+
+ static inline unsigned int workingset_node_pages(struct radix_tree_node *node)
+ {
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -167,7 +167,9 @@ static void page_cache_tree_delete(struc
+ if (!workingset_node_pages(node) &&
+ list_empty(&node->private_list)) {
+ node->private_data = mapping;
+- list_lru_add(&workingset_shadow_nodes, &node->private_list);
++ local_lock(workingset_shadow_lock);
++ list_lru_add(&__workingset_shadow_nodes, &node->private_list);
++ local_unlock(workingset_shadow_lock);
+ }
+ }
+
+@@ -533,9 +535,12 @@ static int page_cache_tree_insert(struct
+ * node->private_list is protected by
+ * mapping->tree_lock.
+ */
+- if (!list_empty(&node->private_list))
+- list_lru_del(&workingset_shadow_nodes,
++ if (!list_empty(&node->private_list)) {
++ local_lock(workingset_shadow_lock);
++ list_lru_del(&__workingset_shadow_nodes,
+ &node->private_list);
++ local_unlock(workingset_shadow_lock);
++ }
+ }
+ return 0;
+ }
+--- a/mm/truncate.c
++++ b/mm/truncate.c
+@@ -56,8 +56,11 @@ static void clear_exceptional_entry(stru
+ * protected by mapping->tree_lock.
+ */
+ if (!workingset_node_shadows(node) &&
+- !list_empty(&node->private_list))
+- list_lru_del(&workingset_shadow_nodes, &node->private_list);
++ !list_empty(&node->private_list)) {
++ local_lock(workingset_shadow_lock);
++ list_lru_del(&__workingset_shadow_nodes, &node->private_list);
++ local_unlock(workingset_shadow_lock);
++ }
+ __radix_tree_delete_node(&mapping->page_tree, node);
+ unlock:
+ spin_unlock_irq(&mapping->tree_lock);
+--- a/mm/workingset.c
++++ b/mm/workingset.c
+@@ -264,7 +264,8 @@ void workingset_activation(struct page *
+ * point where they would still be useful.
+ */
+
+-struct list_lru workingset_shadow_nodes;
++struct list_lru __workingset_shadow_nodes;
++DEFINE_LOCAL_IRQ_LOCK(workingset_shadow_lock);
+
+ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
+ struct shrink_control *sc)
+@@ -274,9 +275,9 @@ static unsigned long count_shadow_nodes(
+ unsigned long pages;
+
+ /* list_lru lock nests inside IRQ-safe mapping->tree_lock */
+- local_irq_disable();
+- shadow_nodes = list_lru_shrink_count(&workingset_shadow_nodes, sc);
+- local_irq_enable();
++ local_lock_irq(workingset_shadow_lock);
++ shadow_nodes = list_lru_shrink_count(&__workingset_shadow_nodes, sc);
++ local_unlock_irq(workingset_shadow_lock);
+
+ pages = node_present_pages(sc->nid);
+ /*
+@@ -363,9 +364,9 @@ static enum lru_status shadow_lru_isolat
+ spin_unlock(&mapping->tree_lock);
+ ret = LRU_REMOVED_RETRY;
+ out:
+- local_irq_enable();
++ local_unlock_irq(workingset_shadow_lock);
+ cond_resched();
+- local_irq_disable();
++ local_lock_irq(workingset_shadow_lock);
+ spin_lock(lru_lock);
+ return ret;
+ }
+@@ -376,10 +377,10 @@ static unsigned long scan_shadow_nodes(s
+ unsigned long ret;
+
+ /* list_lru lock nests inside IRQ-safe mapping->tree_lock */
+- local_irq_disable();
+- ret = list_lru_shrink_walk(&workingset_shadow_nodes, sc,
++ local_lock_irq(workingset_shadow_lock);
++ ret = list_lru_shrink_walk(&__workingset_shadow_nodes, sc,
+ shadow_lru_isolate, NULL);
+- local_irq_enable();
++ local_unlock_irq(workingset_shadow_lock);
+ return ret;
+ }
+
+@@ -400,7 +401,7 @@ static int __init workingset_init(void)
+ {
+ int ret;
+
+- ret = list_lru_init_key(&workingset_shadow_nodes, &shadow_nodes_key);
++ ret = list_lru_init_key(&__workingset_shadow_nodes, &shadow_nodes_key);
+ if (ret)
+ goto err;
+ ret = register_shrinker(&workingset_shadow_shrinker);
+@@ -408,7 +409,7 @@ static int __init workingset_init(void)
+ goto err_list_lru;
+ return 0;
+ err_list_lru:
+- list_lru_destroy(&workingset_shadow_nodes);
++ list_lru_destroy(&__workingset_shadow_nodes);
+ err:
+ return ret;
+ }
diff --git a/patches/mmc-sdhci-don-t-provide-hard-irq-handler.patch b/patches/mmc-sdhci-don-t-provide-hard-irq-handler.patch
new file mode 100644
index 00000000000000..7767df6b64a3c1
--- /dev/null
+++ b/patches/mmc-sdhci-don-t-provide-hard-irq-handler.patch
@@ -0,0 +1,73 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 26 Feb 2015 12:13:36 +0100
+Subject: mmc: sdhci: don't provide hard irq handler
+
+the sdhci code provides both irq handlers: the primary and the thread
+handler. Initially it was meant for the primary handler to be very
+short.
+The result is not that on -RT we have the primrary handler grabing locks
+and this isn't really working. As a hack for now I just push both
+handler into the threaded mode.
+
+
+Reported-By: Michal Šmucr <msmucr@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/mmc/host/sdhci.c | 32 +++++++++++++++++++++++++++-----
+ 1 file changed, 27 insertions(+), 5 deletions(-)
+
+--- a/drivers/mmc/host/sdhci.c
++++ b/drivers/mmc/host/sdhci.c
+@@ -2691,6 +2691,31 @@ static irqreturn_t sdhci_thread_irq(int
+ return isr ? IRQ_HANDLED : IRQ_NONE;
+ }
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++static irqreturn_t sdhci_rt_irq(int irq, void *dev_id)
++{
++ irqreturn_t ret;
++
++ local_bh_disable();
++ ret = sdhci_irq(irq, dev_id);
++ local_bh_enable();
++ if (ret == IRQ_WAKE_THREAD)
++ ret = sdhci_thread_irq(irq, dev_id);
++ return ret;
++}
++#endif
++
++static int sdhci_req_irq(struct sdhci_host *host)
++{
++#ifdef CONFIG_PREEMPT_RT_BASE
++ return request_threaded_irq(host->irq, NULL, sdhci_rt_irq,
++ IRQF_SHARED, mmc_hostname(host->mmc), host);
++#else
++ return request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq,
++ IRQF_SHARED, mmc_hostname(host->mmc), host);
++#endif
++}
++
+ /*****************************************************************************\
+ * *
+ * Suspend/resume *
+@@ -2758,9 +2783,7 @@ int sdhci_resume_host(struct sdhci_host
+ }
+
+ if (!device_may_wakeup(mmc_dev(host->mmc))) {
+- ret = request_threaded_irq(host->irq, sdhci_irq,
+- sdhci_thread_irq, IRQF_SHARED,
+- mmc_hostname(host->mmc), host);
++ ret = sdhci_req_irq(host);
+ if (ret)
+ return ret;
+ } else {
+@@ -3417,8 +3440,7 @@ int sdhci_add_host(struct sdhci_host *ho
+
+ sdhci_init(host, 0);
+
+- ret = request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq,
+- IRQF_SHARED, mmc_hostname(mmc), host);
++ ret = sdhci_req_irq(host);
+ if (ret) {
+ pr_err("%s: Failed to request IRQ %d: %d\n",
+ mmc_hostname(mmc), host->irq, ret);
diff --git a/patches/mmci-remove-bogus-irq-save.patch b/patches/mmci-remove-bogus-irq-save.patch
new file mode 100644
index 00000000000000..67523a158ec56e
--- /dev/null
+++ b/patches/mmci-remove-bogus-irq-save.patch
@@ -0,0 +1,39 @@
+Subject: mmci: Remove bogus local_irq_save()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 09 Jan 2013 12:11:12 +0100
+
+On !RT interrupt runs with interrupts disabled. On RT it's in a
+thread, so no need to disable interrupts at all.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/mmc/host/mmci.c | 5 -----
+ 1 file changed, 5 deletions(-)
+
+--- a/drivers/mmc/host/mmci.c
++++ b/drivers/mmc/host/mmci.c
+@@ -1155,15 +1155,12 @@ static irqreturn_t mmci_pio_irq(int irq,
+ struct sg_mapping_iter *sg_miter = &host->sg_miter;
+ struct variant_data *variant = host->variant;
+ void __iomem *base = host->base;
+- unsigned long flags;
+ u32 status;
+
+ status = readl(base + MMCISTATUS);
+
+ dev_dbg(mmc_dev(host->mmc), "irq1 (pio) %08x\n", status);
+
+- local_irq_save(flags);
+-
+ do {
+ unsigned int remain, len;
+ char *buffer;
+@@ -1203,8 +1200,6 @@ static irqreturn_t mmci_pio_irq(int irq,
+
+ sg_miter_stop(sg_miter);
+
+- local_irq_restore(flags);
+-
+ /*
+ * If we have less than the fifo 'half-full' threshold to transfer,
+ * trigger a PIO interrupt as soon as any data is available.
diff --git a/patches/move_sched_delayed_work_to_helper.patch b/patches/move_sched_delayed_work_to_helper.patch
new file mode 100644
index 00000000000000..044382974da4ad
--- /dev/null
+++ b/patches/move_sched_delayed_work_to_helper.patch
@@ -0,0 +1,88 @@
+Date: Wed, 26 Jun 2013 15:28:11 -0400
+From: Steven Rostedt <rostedt@goodmis.org>
+Subject: rt,ntp: Move call to schedule_delayed_work() to helper thread
+
+The ntp code for notify_cmos_timer() is called from a hard interrupt
+context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
+that have been converted to mutexes, thus calling schedule_delayed_work()
+from interrupt is not safe.
+
+Add a helper thread that does the call to schedule_delayed_work and wake
+up that thread instead of calling schedule_delayed_work() directly.
+This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
+schedule_delayed_work() directly in irq context.
+
+Note: There's a few places in the kernel that do this. Perhaps the RT
+code should have a dedicated thread that does the checks. Just register
+a notifier on boot up for your check and wake up the thread when
+needed. This will be a todo.
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+
+---
+ kernel/time/ntp.c | 43 +++++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 43 insertions(+)
+
+--- a/kernel/time/ntp.c
++++ b/kernel/time/ntp.c
+@@ -10,6 +10,7 @@
+ #include <linux/workqueue.h>
+ #include <linux/hrtimer.h>
+ #include <linux/jiffies.h>
++#include <linux/kthread.h>
+ #include <linux/math64.h>
+ #include <linux/timex.h>
+ #include <linux/time.h>
+@@ -529,10 +530,52 @@ static void sync_cmos_clock(struct work_
+ &sync_cmos_work, timespec_to_jiffies(&next));
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++/*
++ * RT can not call schedule_delayed_work from real interrupt context.
++ * Need to make a thread to do the real work.
++ */
++static struct task_struct *cmos_delay_thread;
++static bool do_cmos_delay;
++
++static int run_cmos_delay(void *ignore)
++{
++ while (!kthread_should_stop()) {
++ set_current_state(TASK_INTERRUPTIBLE);
++ if (do_cmos_delay) {
++ do_cmos_delay = false;
++ queue_delayed_work(system_power_efficient_wq,
++ &sync_cmos_work, 0);
++ }
++ schedule();
++ }
++ __set_current_state(TASK_RUNNING);
++ return 0;
++}
++
++void ntp_notify_cmos_timer(void)
++{
++ do_cmos_delay = true;
++ /* Make visible before waking up process */
++ smp_wmb();
++ wake_up_process(cmos_delay_thread);
++}
++
++static __init int create_cmos_delay_thread(void)
++{
++ cmos_delay_thread = kthread_run(run_cmos_delay, NULL, "kcmosdelayd");
++ BUG_ON(!cmos_delay_thread);
++ return 0;
++}
++early_initcall(create_cmos_delay_thread);
++
++#else
++
+ void ntp_notify_cmos_timer(void)
+ {
+ queue_delayed_work(system_power_efficient_wq, &sync_cmos_work, 0);
+ }
++#endif /* CONFIG_PREEMPT_RT_FULL */
+
+ #else
+ void ntp_notify_cmos_timer(void) { }
diff --git a/patches/mutex-no-spin-on-rt.patch b/patches/mutex-no-spin-on-rt.patch
new file mode 100644
index 00000000000000..e5308f6dd3df9f
--- /dev/null
+++ b/patches/mutex-no-spin-on-rt.patch
@@ -0,0 +1,28 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 21:51:45 +0200
+Subject: locking: Disable spin on owner for RT
+
+Drop spin on owner for mutex / rwsem. We are most likely not using it
+but…
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/Kconfig.locks | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/kernel/Kconfig.locks
++++ b/kernel/Kconfig.locks
+@@ -225,11 +225,11 @@ config ARCH_SUPPORTS_ATOMIC_RMW
+
+ config MUTEX_SPIN_ON_OWNER
+ def_bool y
+- depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW
++ depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW && !PREEMPT_RT_FULL
+
+ config RWSEM_SPIN_ON_OWNER
+ def_bool y
+- depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW
++ depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW && !PREEMPT_RT_FULL
+
+ config LOCK_SPIN_ON_OWNER
+ def_bool y
diff --git a/patches/net-another-local-irq-disable-alloc-atomic-headache.patch b/patches/net-another-local-irq-disable-alloc-atomic-headache.patch
new file mode 100644
index 00000000000000..884c953c1aba73
--- /dev/null
+++ b/patches/net-another-local-irq-disable-alloc-atomic-headache.patch
@@ -0,0 +1,41 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 26 Sep 2012 16:21:08 +0200
+Subject: net: Another local_irq_disable/kmalloc headache
+
+Replace it by a local lock. Though that's pretty inefficient :(
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ net/core/skbuff.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+--- a/net/core/skbuff.c
++++ b/net/core/skbuff.c
+@@ -63,6 +63,7 @@
+ #include <linux/errqueue.h>
+ #include <linux/prefetch.h>
+ #include <linux/if_vlan.h>
++#include <linux/locallock.h>
+
+ #include <net/protocol.h>
+ #include <net/dst.h>
+@@ -356,6 +357,7 @@ struct netdev_alloc_cache {
+ };
+ static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
+ static DEFINE_PER_CPU(struct netdev_alloc_cache, napi_alloc_cache);
++static DEFINE_LOCAL_IRQ_LOCK(netdev_alloc_lock);
+
+ static struct page *__page_frag_refill(struct netdev_alloc_cache *nc,
+ gfp_t gfp_mask)
+@@ -433,9 +435,9 @@ static void *__netdev_alloc_frag(unsigne
+ unsigned long flags;
+ void *data;
+
+- local_irq_save(flags);
++ local_lock_irqsave(netdev_alloc_lock, flags);
+ data = __alloc_page_frag(&netdev_alloc_cache, fragsz, gfp_mask);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(netdev_alloc_lock, flags);
+ return data;
+ }
+
diff --git a/patches/net-fix-iptable-xt-write-recseq-begin-rt-fallout.patch b/patches/net-fix-iptable-xt-write-recseq-begin-rt-fallout.patch
new file mode 100644
index 00000000000000..45c32002b1c512
--- /dev/null
+++ b/patches/net-fix-iptable-xt-write-recseq-begin-rt-fallout.patch
@@ -0,0 +1,73 @@
+Subject: net: netfilter: Serialize xt_write_recseq sections on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 28 Oct 2012 11:18:08 +0100
+
+The netfilter code relies only on the implicit semantics of
+local_bh_disable() for serializing wt_write_recseq sections. RT breaks
+that and needs explicit serialization here.
+
+Reported-by: Peter LaDow <petela@gocougs.wsu.edu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/netfilter/x_tables.h | 7 +++++++
+ net/netfilter/core.c | 6 ++++++
+ 2 files changed, 13 insertions(+)
+
+--- a/include/linux/netfilter/x_tables.h
++++ b/include/linux/netfilter/x_tables.h
+@@ -3,6 +3,7 @@
+
+
+ #include <linux/netdevice.h>
++#include <linux/locallock.h>
+ #include <uapi/linux/netfilter/x_tables.h>
+
+ /**
+@@ -282,6 +283,8 @@ void xt_free_table_info(struct xt_table_
+ */
+ DECLARE_PER_CPU(seqcount_t, xt_recseq);
+
++DECLARE_LOCAL_IRQ_LOCK(xt_write_lock);
++
+ /**
+ * xt_write_recseq_begin - start of a write section
+ *
+@@ -296,6 +299,9 @@ static inline unsigned int xt_write_recs
+ {
+ unsigned int addend;
+
++ /* RT protection */
++ local_lock(xt_write_lock);
++
+ /*
+ * Low order bit of sequence is set if we already
+ * called xt_write_recseq_begin().
+@@ -326,6 +332,7 @@ static inline void xt_write_recseq_end(u
+ /* this is kind of a write_seqcount_end(), but addend is 0 or 1 */
+ smp_wmb();
+ __this_cpu_add(xt_recseq.sequence, addend);
++ local_unlock(xt_write_lock);
+ }
+
+ /*
+--- a/net/netfilter/core.c
++++ b/net/netfilter/core.c
+@@ -22,11 +22,17 @@
+ #include <linux/proc_fs.h>
+ #include <linux/mutex.h>
+ #include <linux/slab.h>
++#include <linux/locallock.h>
+ #include <net/net_namespace.h>
+ #include <net/sock.h>
+
+ #include "nf_internals.h"
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++DEFINE_LOCAL_IRQ_LOCK(xt_write_lock);
++EXPORT_PER_CPU_SYMBOL(xt_write_lock);
++#endif
++
+ static DEFINE_MUTEX(afinfo_mutex);
+
+ const struct nf_afinfo __rcu *nf_afinfo[NFPROTO_NUMPROTO] __read_mostly;
diff --git a/patches/net-gianfar-do-not-disable-interrupts.patch b/patches/net-gianfar-do-not-disable-interrupts.patch
new file mode 100644
index 00000000000000..2533c46fc0e959
--- /dev/null
+++ b/patches/net-gianfar-do-not-disable-interrupts.patch
@@ -0,0 +1,76 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Tue, 25 Mar 2014 18:34:20 +0100
+Subject: net: gianfar: Do not disable interrupts
+
+each per-queue lock is taken with spin_lock_irqsave() except in the case
+where all of them are taken for some kind of serialisation. As an
+optimisation local_irq_save() is used so that lock_tx_qs() and
+lock_rx_qs() can use just the spin_lock() variant instead.
+On RT local_irq_save() behaves differently so we use the nort()
+variant.
+Lockdep screems easily by "ethtool -K eth0 rx off tx off"
+
+What remains is missing lockdep annotation that makes lockdep think
+lock_tx_qs() may cause a dead lock.
+
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/net/ethernet/freescale/gianfar.c | 12 ++++++------
+ 1 file changed, 6 insertions(+), 6 deletions(-)
+
+--- a/drivers/net/ethernet/freescale/gianfar.c
++++ b/drivers/net/ethernet/freescale/gianfar.c
+@@ -1540,7 +1540,7 @@ static int gfar_suspend(struct device *d
+
+ if (netif_running(ndev)) {
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ lock_tx_qs(priv);
+
+ gfar_halt_nodisable(priv);
+@@ -1556,7 +1556,7 @@ static int gfar_suspend(struct device *d
+ gfar_write(&regs->maccfg1, tempval);
+
+ unlock_tx_qs(priv);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ disable_napi(priv);
+
+@@ -1598,7 +1598,7 @@ static int gfar_resume(struct device *de
+ /* Disable Magic Packet mode, in case something
+ * else woke us up.
+ */
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ lock_tx_qs(priv);
+
+ tempval = gfar_read(&regs->maccfg2);
+@@ -1608,7 +1608,7 @@ static int gfar_resume(struct device *de
+ gfar_start(priv);
+
+ unlock_tx_qs(priv);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ netif_device_attach(ndev);
+
+@@ -3418,14 +3418,14 @@ static irqreturn_t gfar_error(int irq, v
+ dev->stats.tx_dropped++;
+ atomic64_inc(&priv->extra_stats.tx_underrun);
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ lock_tx_qs(priv);
+
+ /* Reactivate the Tx Queues */
+ gfar_write(&regs->tstat, gfargrp->tstat);
+
+ unlock_tx_qs(priv);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+ netif_dbg(priv, tx_err, dev, "Transmit Error\n");
+ }
diff --git a/patches/net-make-devnet_rename_seq-a-mutex.patch b/patches/net-make-devnet_rename_seq-a-mutex.patch
new file mode 100644
index 00000000000000..550e389f49c9f7
--- /dev/null
+++ b/patches/net-make-devnet_rename_seq-a-mutex.patch
@@ -0,0 +1,106 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Wed, 20 Mar 2013 18:06:20 +0100
+Subject: net: Add a mutex around devnet_rename_seq
+
+On RT write_seqcount_begin() disables preemption and device_rename()
+allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
+mutex. Serialize with a mutex and add use the non preemption disabling
+__write_seqcount_begin().
+
+To avoid writer starvation, let the reader grab the mutex and release
+it when it detects a writer in progress. This keeps the normal case
+(no reader on the fly) fast.
+
+[ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ net/core/dev.c | 34 ++++++++++++++++++++--------------
+ 1 file changed, 20 insertions(+), 14 deletions(-)
+
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -184,6 +184,7 @@ static unsigned int napi_gen_id;
+ static DEFINE_HASHTABLE(napi_hash, 8);
+
+ static seqcount_t devnet_rename_seq;
++static DEFINE_MUTEX(devnet_rename_mutex);
+
+ static inline void dev_base_seq_inc(struct net *net)
+ {
+@@ -856,7 +857,8 @@ int netdev_get_name(struct net *net, cha
+ strcpy(name, dev->name);
+ rcu_read_unlock();
+ if (read_seqcount_retry(&devnet_rename_seq, seq)) {
+- cond_resched();
++ mutex_lock(&devnet_rename_mutex);
++ mutex_unlock(&devnet_rename_mutex);
+ goto retry;
+ }
+
+@@ -1125,20 +1127,17 @@ int dev_change_name(struct net_device *d
+ if (dev->flags & IFF_UP)
+ return -EBUSY;
+
+- write_seqcount_begin(&devnet_rename_seq);
++ mutex_lock(&devnet_rename_mutex);
++ __raw_write_seqcount_begin(&devnet_rename_seq);
+
+- if (strncmp(newname, dev->name, IFNAMSIZ) == 0) {
+- write_seqcount_end(&devnet_rename_seq);
+- return 0;
+- }
++ if (strncmp(newname, dev->name, IFNAMSIZ) == 0)
++ goto outunlock;
+
+ memcpy(oldname, dev->name, IFNAMSIZ);
+
+ err = dev_get_valid_name(net, dev, newname);
+- if (err < 0) {
+- write_seqcount_end(&devnet_rename_seq);
+- return err;
+- }
++ if (err < 0)
++ goto outunlock;
+
+ if (oldname[0] && !strchr(oldname, '%'))
+ netdev_info(dev, "renamed from %s\n", oldname);
+@@ -1151,11 +1150,12 @@ int dev_change_name(struct net_device *d
+ if (ret) {
+ memcpy(dev->name, oldname, IFNAMSIZ);
+ dev->name_assign_type = old_assign_type;
+- write_seqcount_end(&devnet_rename_seq);
+- return ret;
++ err = ret;
++ goto outunlock;
+ }
+
+- write_seqcount_end(&devnet_rename_seq);
++ __raw_write_seqcount_end(&devnet_rename_seq);
++ mutex_unlock(&devnet_rename_mutex);
+
+ netdev_adjacent_rename_links(dev, oldname);
+
+@@ -1176,7 +1176,8 @@ int dev_change_name(struct net_device *d
+ /* err >= 0 after dev_alloc_name() or stores the first errno */
+ if (err >= 0) {
+ err = ret;
+- write_seqcount_begin(&devnet_rename_seq);
++ mutex_lock(&devnet_rename_mutex);
++ __raw_write_seqcount_begin(&devnet_rename_seq);
+ memcpy(dev->name, oldname, IFNAMSIZ);
+ memcpy(oldname, newname, IFNAMSIZ);
+ dev->name_assign_type = old_assign_type;
+@@ -1189,6 +1190,11 @@ int dev_change_name(struct net_device *d
+ }
+
+ return err;
++
++outunlock:
++ __raw_write_seqcount_end(&devnet_rename_seq);
++ mutex_unlock(&devnet_rename_mutex);
++ return err;
+ }
+
+ /**
diff --git a/patches/net-prevent-abba-deadlock.patch b/patches/net-prevent-abba-deadlock.patch
new file mode 100644
index 00000000000000..223c05704549ac
--- /dev/null
+++ b/patches/net-prevent-abba-deadlock.patch
@@ -0,0 +1,111 @@
+Subject: net-flip-lock-dep-thingy.patch
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 28 Jun 2011 10:59:58 +0200
+
+=======================================================
+[ INFO: possible circular locking dependency detected ]
+3.0.0-rc3+ #26
+-------------------------------------------------------
+ip/1104 is trying to acquire lock:
+ (local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68
+
+but task is already holding lock:
+ (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12
+
+which lock already depends on the new lock.
+
+
+the existing dependency chain (in reverse order) is:
+
+-> #1 (sk_lock-AF_INET){+.+...}:
+ [<ffffffff810836e5>] lock_acquire+0x103/0x12e
+ [<ffffffff813e2781>] lock_sock_nested+0x82/0x92
+ [<ffffffff81433308>] lock_sock+0x10/0x12
+ [<ffffffff81433afa>] tcp_close+0x1b/0x355
+ [<ffffffff81453c99>] inet_release+0xc3/0xcd
+ [<ffffffff813dff3f>] sock_release+0x1f/0x74
+ [<ffffffff813dffbb>] sock_close+0x27/0x2b
+ [<ffffffff81129c63>] fput+0x11d/0x1e3
+ [<ffffffff81126577>] filp_close+0x70/0x7b
+ [<ffffffff8112667a>] sys_close+0xf8/0x13d
+ [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b
+
+-> #0 (local_softirq_lock){+.+...}:
+ [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
+ [<ffffffff810836e5>] lock_acquire+0x103/0x12e
+ [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
+ [<ffffffff81056d12>] __local_lock+0x25/0x68
+ [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
+ [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
+ [<ffffffff81433c38>] tcp_close+0x159/0x355
+ [<ffffffff81453c99>] inet_release+0xc3/0xcd
+ [<ffffffff813dff3f>] sock_release+0x1f/0x74
+ [<ffffffff813dffbb>] sock_close+0x27/0x2b
+ [<ffffffff81129c63>] fput+0x11d/0x1e3
+ [<ffffffff81126577>] filp_close+0x70/0x7b
+ [<ffffffff8112667a>] sys_close+0xf8/0x13d
+ [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b
+
+other info that might help us debug this:
+
+ Possible unsafe locking scenario:
+
+ CPU0 CPU1
+ ---- ----
+ lock(sk_lock-AF_INET);
+ lock(local_softirq_lock);
+ lock(sk_lock-AF_INET);
+ lock(local_softirq_lock);
+
+ *** DEADLOCK ***
+
+1 lock held by ip/1104:
+ #0: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12
+
+stack backtrace:
+Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26
+Call Trace:
+ [<ffffffff81081649>] print_circular_bug+0x1f8/0x209
+ [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
+ [<ffffffff81056d12>] ? __local_lock+0x25/0x68
+ [<ffffffff810836e5>] lock_acquire+0x103/0x12e
+ [<ffffffff81056d12>] ? __local_lock+0x25/0x68
+ [<ffffffff81046c75>] ? get_parent_ip+0x11/0x41
+ [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
+ [<ffffffff81056d12>] ? __local_lock+0x25/0x68
+ [<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41
+ [<ffffffff81056d12>] __local_lock+0x25/0x68
+ [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
+ [<ffffffff81433308>] ? lock_sock+0x10/0x12
+ [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
+ [<ffffffff81433c38>] tcp_close+0x159/0x355
+ [<ffffffff81453c99>] inet_release+0xc3/0xcd
+ [<ffffffff813dff3f>] sock_release+0x1f/0x74
+ [<ffffffff813dffbb>] sock_close+0x27/0x2b
+ [<ffffffff81129c63>] fput+0x11d/0x1e3
+ [<ffffffff81126577>] filp_close+0x70/0x7b
+ [<ffffffff8112667a>] sys_close+0xf8/0x13d
+ [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b
+
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ net/core/sock.c | 3 +--
+ 1 file changed, 1 insertion(+), 2 deletions(-)
+
+--- a/net/core/sock.c
++++ b/net/core/sock.c
+@@ -2370,12 +2370,11 @@ void lock_sock_nested(struct sock *sk, i
+ if (sk->sk_lock.owned)
+ __lock_sock(sk);
+ sk->sk_lock.owned = 1;
+- spin_unlock(&sk->sk_lock.slock);
++ spin_unlock_bh(&sk->sk_lock.slock);
+ /*
+ * The sk_lock has mutex_lock() semantics here:
+ */
+ mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
+- local_bh_enable();
+ }
+ EXPORT_SYMBOL(lock_sock_nested);
+
diff --git a/patches/net-sched-dev_deactivate_many-use-msleep-1-instead-o.patch b/patches/net-sched-dev_deactivate_many-use-msleep-1-instead-o.patch
new file mode 100644
index 00000000000000..92829e408434c6
--- /dev/null
+++ b/patches/net-sched-dev_deactivate_many-use-msleep-1-instead-o.patch
@@ -0,0 +1,57 @@
+From: Marc Kleine-Budde <mkl@pengutronix.de>
+Date: Wed, 5 Mar 2014 00:49:47 +0100
+Subject: net: sched: Use msleep() instead of yield()
+
+On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50
+(by default). If a high priority userspace process tries to shut down a busy
+network interface it might spin in a yield loop waiting for the device to
+become idle. With the interrupt thread having a lower priority than the
+looping process it might never be scheduled and so result in a deadlock on UP
+systems.
+
+With Magic SysRq the following backtrace can be produced:
+
+> test_app R running 0 174 168 0x00000000
+> [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80)
+> [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20)
+> [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88)
+> [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264)
+> [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4)
+> [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c)
+> [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130)
+> [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48)
+> [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec)
+> [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450)
+> [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4)
+> [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0)
+> [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24)
+> [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198)
+> [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0)
+> [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98)
+> [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278)
+> [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78)
+> [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c)
+
+This patch works around the problem by replacing yield() by msleep(1), giving
+the interrupt thread time to finish, similar to other changes contained in the
+rt patch set. Using wait_for_completion() instead would probably be a better
+solution.
+
+
+Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ net/sched/sch_generic.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -894,7 +894,7 @@ void dev_deactivate_many(struct list_hea
+ /* Wait for outstanding qdisc_run calls. */
+ list_for_each_entry(dev, head, close_list)
+ while (some_qdisc_is_busy(dev))
+- yield();
++ msleep(1);
+ }
+
+ void dev_deactivate(struct net_device *dev)
diff --git a/patches/net-tx-action-avoid-livelock-on-rt.patch b/patches/net-tx-action-avoid-livelock-on-rt.patch
new file mode 100644
index 00000000000000..9843dd7b21a487
--- /dev/null
+++ b/patches/net-tx-action-avoid-livelock-on-rt.patch
@@ -0,0 +1,92 @@
+Subject: net: Avoid livelock in net_tx_action() on RT
+From: Steven Rostedt <srostedt@redhat.com>
+Date: Thu, 06 Oct 2011 10:48:39 -0400
+
+qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code
+holding a qdisc_lock() can be interrupted and softirqs can run on the
+return of interrupt in !RT.
+
+The spin_trylock() in net_tx_action() makes sure, that the softirq
+does not deadlock. When the lock can't be acquired q is requeued and
+the NET_TX softirq is raised. That causes the softirq to run over and
+over.
+
+That works in mainline as do_softirq() has a retry loop limit and
+leaves the softirq processing in the interrupt return path and
+schedules ksoftirqd. The task which holds qdisc_lock cannot be
+preempted, so the lock is released and either ksoftirqd or the next
+softirq in the return from interrupt path can proceed. Though it's a
+bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it
+decides to bail out even if it's clear in the first iteration :)
+
+On RT all softirq processing is done in a FIFO thread and we don't
+have a loop limit, so ksoftirqd preempts the lock holder forever and
+unqueues and requeues until the reset button is hit.
+
+Due to the forced threading of ksoftirqd on RT we actually cannot
+deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to
+replace the spin_trylock() with a spin_lock(). When contended,
+ksoftirqd is scheduled out and the lock holder can proceed.
+
+[ tglx: Massaged changelog and code comments ]
+
+Solved-by: Thomas Gleixner <tglx@linuxtronix.de>
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Tested-by: Carsten Emde <cbe@osadl.org>
+Cc: Clark Williams <williams@redhat.com>
+Cc: John Kacur <jkacur@redhat.com>
+Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ net/core/dev.c | 32 +++++++++++++++++++++++++++++++-
+ 1 file changed, 31 insertions(+), 1 deletion(-)
+
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -3445,6 +3445,36 @@ int netif_rx_ni(struct sk_buff *skb)
+ }
+ EXPORT_SYMBOL(netif_rx_ni);
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++/*
++ * RT runs ksoftirqd as a real time thread and the root_lock is a
++ * "sleeping spinlock". If the trylock fails then we can go into an
++ * infinite loop when ksoftirqd preempted the task which actually
++ * holds the lock, because we requeue q and raise NET_TX softirq
++ * causing ksoftirqd to loop forever.
++ *
++ * It's safe to use spin_lock on RT here as softirqs run in thread
++ * context and cannot deadlock against the thread which is holding
++ * root_lock.
++ *
++ * On !RT the trylock might fail, but there we bail out from the
++ * softirq loop after 10 attempts which we can't do on RT. And the
++ * task holding root_lock cannot be preempted, so the only downside of
++ * that trylock is that we need 10 loops to decide that we should have
++ * given up in the first one :)
++ */
++static inline int take_root_lock(spinlock_t *lock)
++{
++ spin_lock(lock);
++ return 1;
++}
++#else
++static inline int take_root_lock(spinlock_t *lock)
++{
++ return spin_trylock(lock);
++}
++#endif
++
+ static void net_tx_action(struct softirq_action *h)
+ {
+ struct softnet_data *sd = this_cpu_ptr(&softnet_data);
+@@ -3486,7 +3516,7 @@ static void net_tx_action(struct softirq
+ head = head->next_sched;
+
+ root_lock = qdisc_lock(q);
+- if (spin_trylock(root_lock)) {
++ if (take_root_lock(root_lock)) {
+ smp_mb__before_atomic();
+ clear_bit(__QDISC_STATE_SCHED,
+ &q->state);
diff --git a/patches/net-use-cpu-chill.patch b/patches/net-use-cpu-chill.patch
new file mode 100644
index 00000000000000..b5e52e5c41b3a3
--- /dev/null
+++ b/patches/net-use-cpu-chill.patch
@@ -0,0 +1,62 @@
+Subject: net: Use cpu_chill() instead of cpu_relax()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 07 Mar 2012 21:10:04 +0100
+
+Retry loops on RT might loop forever when the modifying side was
+preempted. Use cpu_chill() instead of cpu_relax() to let the system
+make progress.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ net/packet/af_packet.c | 5 +++--
+ net/rds/ib_rdma.c | 3 ++-
+ 2 files changed, 5 insertions(+), 3 deletions(-)
+
+--- a/net/packet/af_packet.c
++++ b/net/packet/af_packet.c
+@@ -63,6 +63,7 @@
+ #include <linux/if_packet.h>
+ #include <linux/wireless.h>
+ #include <linux/kernel.h>
++#include <linux/delay.h>
+ #include <linux/kmod.h>
+ #include <linux/slab.h>
+ #include <linux/vmalloc.h>
+@@ -698,7 +699,7 @@ static void prb_retire_rx_blk_timer_expi
+ if (BLOCK_NUM_PKTS(pbd)) {
+ while (atomic_read(&pkc->blk_fill_in_prog)) {
+ /* Waiting for skb_copy_bits to finish... */
+- cpu_relax();
++ cpu_chill();
+ }
+ }
+
+@@ -960,7 +961,7 @@ static void prb_retire_current_block(str
+ if (!(status & TP_STATUS_BLK_TMO)) {
+ while (atomic_read(&pkc->blk_fill_in_prog)) {
+ /* Waiting for skb_copy_bits to finish... */
+- cpu_relax();
++ cpu_chill();
+ }
+ }
+ prb_close_block(pkc, pbd, po, status);
+--- a/net/rds/ib_rdma.c
++++ b/net/rds/ib_rdma.c
+@@ -34,6 +34,7 @@
+ #include <linux/slab.h>
+ #include <linux/rculist.h>
+ #include <linux/llist.h>
++#include <linux/delay.h>
+
+ #include "rds.h"
+ #include "ib.h"
+@@ -286,7 +287,7 @@ static inline void wait_clean_list_grace
+ for_each_online_cpu(cpu) {
+ flag = &per_cpu(clean_list_grace, cpu);
+ while (test_bit(CLEAN_LIST_BUSY_BIT, flag))
+- cpu_relax();
++ cpu_chill();
+ }
+ }
+
diff --git a/patches/net-wireless-warn-nort.patch b/patches/net-wireless-warn-nort.patch
new file mode 100644
index 00000000000000..4f98708a2dbf71
--- /dev/null
+++ b/patches/net-wireless-warn-nort.patch
@@ -0,0 +1,23 @@
+Subject: net/wireless: Use WARN_ON_NORT()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 21 Jul 2011 21:05:33 +0200
+
+The softirq counter is meaningless on RT, so the check triggers a
+false positive.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ net/mac80211/rx.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/net/mac80211/rx.c
++++ b/net/mac80211/rx.c
+@@ -3554,7 +3554,7 @@ void ieee80211_rx(struct ieee80211_hw *h
+ struct ieee80211_supported_band *sband;
+ struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
+
+- WARN_ON_ONCE(softirq_count() == 0);
++ WARN_ON_ONCE_NONRT(softirq_count() == 0);
+
+ if (WARN_ON(status->band >= IEEE80211_NUM_BANDS))
+ goto drop;
diff --git a/patches/oleg-signal-rt-fix.patch b/patches/oleg-signal-rt-fix.patch
new file mode 100644
index 00000000000000..c0ec60493eb1df
--- /dev/null
+++ b/patches/oleg-signal-rt-fix.patch
@@ -0,0 +1,143 @@
+From: Oleg Nesterov <oleg@redhat.com>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: signal/x86: Delay calling signals in atomic
+
+On x86_64 we must disable preemption before we enable interrupts
+for stack faults, int3 and debugging, because the current task is using
+a per CPU debug stack defined by the IST. If we schedule out, another task
+can come in and use the same stack and cause the stack to be corrupted
+and crash the kernel on return.
+
+When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and
+one of these is the spin lock used in signal handling.
+
+Some of the debug code (int3) causes do_trap() to send a signal.
+This function calls a spin lock that has been converted to a mutex
+and has the possibility to sleep. If this happens, the above issues with
+the corrupted stack is possible.
+
+Instead of calling the signal right away, for PREEMPT_RT and x86_64,
+the signal information is stored on the stacks task_struct and
+TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
+code will send the signal when preemption is enabled.
+
+[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to
+ ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
+
+
+Signed-off-by: Oleg Nesterov <oleg@redhat.com>
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+
+ arch/x86/include/asm/signal.h | 13 +++++++++++++
+ arch/x86/kernel/signal.c | 8 ++++++++
+ include/linux/sched.h | 4 ++++
+ kernel/signal.c | 37 +++++++++++++++++++++++++++++++++++--
+ 4 files changed, 60 insertions(+), 2 deletions(-)
+
+--- a/arch/x86/include/asm/signal.h
++++ b/arch/x86/include/asm/signal.h
+@@ -23,6 +23,19 @@ typedef struct {
+ unsigned long sig[_NSIG_WORDS];
+ } sigset_t;
+
++/*
++ * Because some traps use the IST stack, we must keep preemption
++ * disabled while calling do_trap(), but do_trap() may call
++ * force_sig_info() which will grab the signal spin_locks for the
++ * task, which in PREEMPT_RT_FULL are mutexes. By defining
++ * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set
++ * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the
++ * trap.
++ */
++#if defined(CONFIG_PREEMPT_RT_FULL) && defined(CONFIG_X86_64)
++#define ARCH_RT_DELAYS_SIGNAL_SEND
++#endif
++
+ #ifndef CONFIG_COMPAT
+ typedef sigset_t compat_sigset_t;
+ #endif
+--- a/arch/x86/kernel/signal.c
++++ b/arch/x86/kernel/signal.c
+@@ -727,6 +727,14 @@ do_notify_resume(struct pt_regs *regs, v
+ {
+ user_exit();
+
++#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
++ if (unlikely(current->forced_info.si_signo)) {
++ struct task_struct *t = current;
++ force_sig_info(t->forced_info.si_signo, &t->forced_info, t);
++ t->forced_info.si_signo = 0;
++ }
++#endif
++
+ if (thread_info_flags & _TIF_UPROBE)
+ uprobe_notify_resume(regs);
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1538,6 +1538,10 @@ struct task_struct {
+ sigset_t blocked, real_blocked;
+ sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
+ struct sigpending pending;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ /* TODO: move me into ->restart_block ? */
++ struct siginfo forced_info;
++#endif
+
+ unsigned long sas_ss_sp;
+ size_t sas_ss_size;
+--- a/kernel/signal.c
++++ b/kernel/signal.c
+@@ -1282,8 +1282,8 @@ int do_send_sig_info(int sig, struct sig
+ * We don't want to have recursive SIGSEGV's etc, for example,
+ * that is why we also clear SIGNAL_UNKILLABLE.
+ */
+-int
+-force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
++static int
++do_force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
+ {
+ unsigned long int flags;
+ int ret, blocked, ignored;
+@@ -1308,6 +1308,39 @@ force_sig_info(int sig, struct siginfo *
+ return ret;
+ }
+
++int force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
++{
++/*
++ * On some archs, PREEMPT_RT has to delay sending a signal from a trap
++ * since it can not enable preemption, and the signal code's spin_locks
++ * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will
++ * send the signal on exit of the trap.
++ */
++#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
++ if (in_atomic()) {
++ if (WARN_ON_ONCE(t != current))
++ return 0;
++ if (WARN_ON_ONCE(t->forced_info.si_signo))
++ return 0;
++
++ if (is_si_special(info)) {
++ WARN_ON_ONCE(info != SEND_SIG_PRIV);
++ t->forced_info.si_signo = sig;
++ t->forced_info.si_errno = 0;
++ t->forced_info.si_code = SI_KERNEL;
++ t->forced_info.si_pid = 0;
++ t->forced_info.si_uid = 0;
++ } else {
++ t->forced_info = *info;
++ }
++
++ set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
++ return 0;
++ }
++#endif
++ return do_force_sig_info(sig, info, t);
++}
++
+ /*
+ * Nuke all other threads in the group.
+ */
diff --git a/patches/panic-disable-random-on-rt.patch b/patches/panic-disable-random-on-rt.patch
new file mode 100644
index 00000000000000..3f2f7fd466ad00
--- /dev/null
+++ b/patches/panic-disable-random-on-rt.patch
@@ -0,0 +1,26 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
+
+Disable on -RT. If this is invoked from irq-context we will have problems
+to acquire the sleeping lock.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/panic.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/kernel/panic.c
++++ b/kernel/panic.c
+@@ -387,9 +387,11 @@ static u64 oops_id;
+
+ static int init_oops_id(void)
+ {
++#ifndef CONFIG_PREEMPT_RT_FULL
+ if (!oops_id)
+ get_random_bytes(&oops_id, sizeof(oops_id));
+ else
++#endif
+ oops_id++;
+
+ return 0;
diff --git a/patches/patch-to-introduce-rcu-bh-qs-where-safe-from-softirq.patch b/patches/patch-to-introduce-rcu-bh-qs-where-safe-from-softirq.patch
new file mode 100644
index 00000000000000..2c3aed8dae6ea2
--- /dev/null
+++ b/patches/patch-to-introduce-rcu-bh-qs-where-safe-from-softirq.patch
@@ -0,0 +1,111 @@
+Subject: rcu: Make ksoftirqd do RCU quiescent states
+From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
+Date: Wed, 5 Oct 2011 11:45:18 -0700
+
+Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
+to network-based denial-of-service attacks. This patch therefore
+makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
+is running in ksoftirqd context. A wrapper layer in interposed so that
+other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying
+function __do_softirq_common() does the actual work.
+
+The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
+that there might be a local_bh_enable() inside an RCU-preempt read-side
+critical section. This local_bh_enable() can invoke __do_softirq()
+directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
+calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
+an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
+read-side critical section. Therefore, quiescent states can only happen
+in cases where __do_softirq() is invoked directly from ksoftirqd.
+
+Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/rcupdate.h | 6 ------
+ kernel/rcu/tree.c | 9 ++++++++-
+ kernel/rcu/tree_plugin.h | 9 +++++++--
+ 3 files changed, 15 insertions(+), 9 deletions(-)
+
+--- a/include/linux/rcupdate.h
++++ b/include/linux/rcupdate.h
+@@ -300,13 +300,7 @@ static inline int rcu_preempt_depth(void
+ void rcu_init(void);
+ void rcu_end_inkernel_boot(void);
+ void rcu_sched_qs(void);
+-
+-#ifdef CONFIG_PREEMPT_RT_FULL
+-static inline void rcu_bh_qs(void) { }
+-#else
+ void rcu_bh_qs(void);
+-#endif
+-
+ void rcu_check_callbacks(int user);
+ struct notifier_block;
+ void rcu_idle_enter(void);
+--- a/kernel/rcu/tree.c
++++ b/kernel/rcu/tree.c
+@@ -220,7 +220,14 @@ void rcu_sched_qs(void)
+ }
+ }
+
+-#ifndef CONFIG_PREEMPT_RT_FULL
++#ifdef CONFIG_PREEMPT_RT_FULL
++static void rcu_preempt_qs(void);
++
++void rcu_bh_qs(void)
++{
++ rcu_preempt_qs();
++}
++#else
+ void rcu_bh_qs(void)
+ {
+ if (!__this_cpu_read(rcu_bh_data.passed_quiesce)) {
+--- a/kernel/rcu/tree_plugin.h
++++ b/kernel/rcu/tree_plugin.h
+@@ -28,6 +28,7 @@
+ #include <linux/gfp.h>
+ #include <linux/oom.h>
+ #include <linux/smpboot.h>
++#include <linux/jiffies.h>
+ #include "../time/tick-internal.h"
+
+ #ifdef CONFIG_RCU_BOOST
+@@ -1356,7 +1357,7 @@ static void rcu_prepare_kthreads(int cpu
+
+ #endif /* #else #ifdef CONFIG_RCU_BOOST */
+
+-#if !defined(CONFIG_RCU_FAST_NO_HZ)
++#if !defined(CONFIG_RCU_FAST_NO_HZ) || defined(CONFIG_PREEMPT_RT_FULL)
+
+ /*
+ * Check to see if any future RCU-related work will need to be done
+@@ -1374,7 +1375,9 @@ int rcu_needs_cpu(unsigned long *delta_j
+ return rcu_cpu_has_callbacks(NULL);
+ }
+ #endif /* #ifndef CONFIG_RCU_NOCB_CPU_ALL */
++#endif /* !defined(CONFIG_RCU_FAST_NO_HZ) || defined(CONFIG_PREEMPT_RT_FULL) */
+
++#if !defined(CONFIG_RCU_FAST_NO_HZ)
+ /*
+ * Because we do not have RCU_FAST_NO_HZ, don't bother cleaning up
+ * after it.
+@@ -1472,6 +1475,8 @@ static bool __maybe_unused rcu_try_advan
+ return cbs_ready;
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ /*
+ * Allow the CPU to enter dyntick-idle mode unless it has callbacks ready
+ * to invoke. If the CPU has callbacks, try to advance them. Tell the
+@@ -1512,7 +1517,7 @@ int rcu_needs_cpu(unsigned long *dj)
+ return 0;
+ }
+ #endif /* #ifndef CONFIG_RCU_NOCB_CPU_ALL */
+-
++#endif /* #ifndef CONFIG_PREEMPT_RT_FULL */
+ /*
+ * Prepare a CPU for idle from an RCU perspective. The first major task
+ * is to sense whether nohz mode has been enabled or disabled via sysfs.
diff --git a/patches/pci-access-use-__wake_up_all_locked.patch b/patches/pci-access-use-__wake_up_all_locked.patch
new file mode 100644
index 00000000000000..271e1bc86c9067
--- /dev/null
+++ b/patches/pci-access-use-__wake_up_all_locked.patch
@@ -0,0 +1,25 @@
+Subject: pci: Use __wake_up_all_locked in pci_unblock_user_cfg_access()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 01 Dec 2011 00:07:16 +0100
+
+The waitqueue is protected by the pci_lock, so we can just avoid to
+lock the waitqueue lock itself. That prevents the
+might_sleep()/scheduling while atomic problem on RT
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/pci/access.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/pci/access.c
++++ b/drivers/pci/access.c
+@@ -521,7 +521,7 @@ void pci_cfg_access_unlock(struct pci_de
+ WARN_ON(!dev->block_cfg_access);
+
+ dev->block_cfg_access = 0;
+- wake_up_all(&pci_cfg_wait);
++ wake_up_all_locked(&pci_cfg_wait);
+ raw_spin_unlock_irqrestore(&pci_lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(pci_cfg_access_unlock);
diff --git a/patches/percpu_ida-use-locklocks.patch b/patches/percpu_ida-use-locklocks.patch
new file mode 100644
index 00000000000000..c5edf437a4d042
--- /dev/null
+++ b/patches/percpu_ida-use-locklocks.patch
@@ -0,0 +1,101 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Wed, 9 Apr 2014 11:58:17 +0200
+Subject: percpu_ida: Use local locks
+
+the local_irq_save() + spin_lock() does not work that well on -RT
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ lib/percpu_ida.c | 20 ++++++++++++--------
+ 1 file changed, 12 insertions(+), 8 deletions(-)
+
+--- a/lib/percpu_ida.c
++++ b/lib/percpu_ida.c
+@@ -26,6 +26,9 @@
+ #include <linux/string.h>
+ #include <linux/spinlock.h>
+ #include <linux/percpu_ida.h>
++#include <linux/locallock.h>
++
++static DEFINE_LOCAL_IRQ_LOCK(irq_off_lock);
+
+ struct percpu_ida_cpu {
+ /*
+@@ -148,13 +151,13 @@ int percpu_ida_alloc(struct percpu_ida *
+ unsigned long flags;
+ int tag;
+
+- local_irq_save(flags);
++ local_lock_irqsave(irq_off_lock, flags);
+ tags = this_cpu_ptr(pool->tag_cpu);
+
+ /* Fastpath */
+ tag = alloc_local_tag(tags);
+ if (likely(tag >= 0)) {
+- local_irq_restore(flags);
++ local_unlock_irqrestore(irq_off_lock, flags);
+ return tag;
+ }
+
+@@ -173,6 +176,7 @@ int percpu_ida_alloc(struct percpu_ida *
+
+ if (!tags->nr_free)
+ alloc_global_tags(pool, tags);
++
+ if (!tags->nr_free)
+ steal_tags(pool, tags);
+
+@@ -184,7 +188,7 @@ int percpu_ida_alloc(struct percpu_ida *
+ }
+
+ spin_unlock(&pool->lock);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(irq_off_lock, flags);
+
+ if (tag >= 0 || state == TASK_RUNNING)
+ break;
+@@ -196,7 +200,7 @@ int percpu_ida_alloc(struct percpu_ida *
+
+ schedule();
+
+- local_irq_save(flags);
++ local_lock_irqsave(irq_off_lock, flags);
+ tags = this_cpu_ptr(pool->tag_cpu);
+ }
+ if (state != TASK_RUNNING)
+@@ -221,7 +225,7 @@ void percpu_ida_free(struct percpu_ida *
+
+ BUG_ON(tag >= pool->nr_tags);
+
+- local_irq_save(flags);
++ local_lock_irqsave(irq_off_lock, flags);
+ tags = this_cpu_ptr(pool->tag_cpu);
+
+ spin_lock(&tags->lock);
+@@ -253,7 +257,7 @@ void percpu_ida_free(struct percpu_ida *
+ spin_unlock(&pool->lock);
+ }
+
+- local_irq_restore(flags);
++ local_unlock_irqrestore(irq_off_lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(percpu_ida_free);
+
+@@ -345,7 +349,7 @@ int percpu_ida_for_each_free(struct perc
+ struct percpu_ida_cpu *remote;
+ unsigned cpu, i, err = 0;
+
+- local_irq_save(flags);
++ local_lock_irqsave(irq_off_lock, flags);
+ for_each_possible_cpu(cpu) {
+ remote = per_cpu_ptr(pool->tag_cpu, cpu);
+ spin_lock(&remote->lock);
+@@ -367,7 +371,7 @@ int percpu_ida_for_each_free(struct perc
+ }
+ spin_unlock(&pool->lock);
+ out:
+- local_irq_restore(flags);
++ local_unlock_irqrestore(irq_off_lock, flags);
+ return err;
+ }
+ EXPORT_SYMBOL_GPL(percpu_ida_for_each_free);
diff --git a/patches/perf-make-swevent-hrtimer-irqsafe.patch b/patches/perf-make-swevent-hrtimer-irqsafe.patch
new file mode 100644
index 00000000000000..049e61ebe311dd
--- /dev/null
+++ b/patches/perf-make-swevent-hrtimer-irqsafe.patch
@@ -0,0 +1,68 @@
+From: Yong Zhang <yong.zhang@windriver.com>
+Date: Wed, 11 Jul 2012 22:05:21 +0000
+Subject: perf: Make swevent hrtimer run in irq instead of softirq
+
+Otherwise we get a deadlock like below:
+
+[ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003
+[ 1044.042752] INFO: lockdep is turned off.
+[ 1044.042754] Modules linked in:
+[ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29
+[ 1044.042759] Call Trace:
+[ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80
+[ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70
+[ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0
+[ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0
+[ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0
+[ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40
+[ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40
+[ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50
+[ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40
+[ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10
+[ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0
+[ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90
+[ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200
+[ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70
+[ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120
+[ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40
+[ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80
+[ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
+[ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70
+[ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70
+[ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200
+[ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
+[ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0
+[ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200
+[ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210
+[ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
+[ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
+[ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0
+[ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
+[ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10
+[ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110
+[ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
+[ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe
+[ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0
+[ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb
+
+Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
+Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+
+---
+ kernel/events/core.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/kernel/events/core.c
++++ b/kernel/events/core.c
+@@ -6890,6 +6890,7 @@ static void perf_swevent_init_hrtimer(st
+
+ hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ hwc->hrtimer.function = perf_swevent_hrtimer;
++ hwc->hrtimer.irqsafe = 1;
+
+ /*
+ * Since hrtimers have a fixed rate, we can do a static freq->period
diff --git a/patches/peter_zijlstra-frob-rcu.patch b/patches/peter_zijlstra-frob-rcu.patch
new file mode 100644
index 00000000000000..695632b01453a5
--- /dev/null
+++ b/patches/peter_zijlstra-frob-rcu.patch
@@ -0,0 +1,166 @@
+Subject: rcu: Frob softirq test
+From: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Date: Sat Aug 13 00:23:17 CEST 2011
+
+With RT_FULL we get the below wreckage:
+
+[ 126.060484] =======================================================
+[ 126.060486] [ INFO: possible circular locking dependency detected ]
+[ 126.060489] 3.0.1-rt10+ #30
+[ 126.060490] -------------------------------------------------------
+[ 126.060492] irq/24-eth0/1235 is trying to acquire lock:
+[ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
+[ 126.060503]
+[ 126.060504] but task is already holding lock:
+[ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
+[ 126.060511]
+[ 126.060511] which lock already depends on the new lock.
+[ 126.060513]
+[ 126.060514]
+[ 126.060514] the existing dependency chain (in reverse order) is:
+[ 126.060516]
+[ 126.060516] -> #1 (&p->pi_lock){-...-.}:
+[ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
+[ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85
+[ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f
+[ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a
+[ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f
+[ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde
+[ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b
+[ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1
+[ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
+[ 126.060551]
+[ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}:
+[ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
+[ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
+[ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
+[ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
+[ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
+[ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
+[ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
+[ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
+[ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
+[ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17
+[ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
+[ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55
+[ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
+[ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
+[ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
+[ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af
+[ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1
+[ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
+[ 126.060611]
+[ 126.060612] other info that might help us debug this:
+[ 126.060614]
+[ 126.060615] Possible unsafe locking scenario:
+[ 126.060616]
+[ 126.060617] CPU0 CPU1
+[ 126.060619] ---- ----
+[ 126.060620] lock(&p->pi_lock);
+[ 126.060623] lock(&(lock)->wait_lock);
+[ 126.060625] lock(&p->pi_lock);
+[ 126.060627] lock(&(lock)->wait_lock);
+[ 126.060629]
+[ 126.060629] *** DEADLOCK ***
+[ 126.060630]
+[ 126.060632] 1 lock held by irq/24-eth0/1235:
+[ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
+[ 126.060638]
+[ 126.060638] stack backtrace:
+[ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30
+[ 126.060643] Call Trace:
+[ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a
+[ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
+[ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99
+[ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
+[ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
+[ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
+[ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
+[ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
+[ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c
+[ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
+[ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4
+[ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
+[ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
+[ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
+[ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
+[ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5
+[ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90
+[ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
+[ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21
+[ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17
+[ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
+[ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55
+[ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
+[ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
+[ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d
+[ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f
+[ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f
+[ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
+[ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59
+[ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af
+[ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a
+[ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
+[ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
+[ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1
+[ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
+[ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a
+[ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe
+[ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c
+[ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb
+
+Because irq_exit() does:
+
+void irq_exit(void)
+{
+ account_system_vtime(current);
+ trace_hardirq_exit();
+ sub_preempt_count(IRQ_EXIT_OFFSET);
+ if (!in_interrupt() && local_softirq_pending())
+ invoke_softirq();
+
+ ...
+}
+
+Which triggers a wakeup, which uses RCU, now if the interrupted task has
+t->rcu_read_unlock_special set, the rcu usage from the wakeup will end
+up in rcu_read_unlock_special(). rcu_read_unlock_special() will test
+for in_irq(), which will fail as we just decremented preempt_count
+with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for
+PREEMPT_RT_FULL reads:
+
+int in_serving_softirq(void)
+{
+ int res;
+
+ preempt_disable();
+ res = __get_cpu_var(local_softirq_runner) == current;
+ preempt_enable();
+ return res;
+}
+
+Which will thus also fail, resulting in the above wreckage.
+
+The 'somewhat' ugly solution is to open-code the preempt_count() test
+in rcu_read_unlock_special().
+
+Also, we're not at all sure how ->rcu_read_unlock_special gets set
+here... so this is very likely a bandaid and more thought is required.
+
+Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+---
+ kernel/rcu/tree_plugin.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/kernel/rcu/tree_plugin.h
++++ b/kernel/rcu/tree_plugin.h
+@@ -291,7 +291,7 @@ void rcu_read_unlock_special(struct task
+ }
+
+ /* Hardware IRQ handlers cannot block, complain if they get here. */
+- if (in_irq() || in_serving_softirq()) {
++ if (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET)) {
+ lockdep_rcu_suspicious(__FILE__, __LINE__,
+ "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
+ pr_alert("->rcu_read_unlock_special: %#x (b: %d, nq: %d)\n",
diff --git a/patches/peterz-srcu-crypto-chain.patch b/patches/peterz-srcu-crypto-chain.patch
new file mode 100644
index 00000000000000..77d83b184494f7
--- /dev/null
+++ b/patches/peterz-srcu-crypto-chain.patch
@@ -0,0 +1,182 @@
+Subject: crypto: Convert crypto notifier chain to SRCU
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Fri, 05 Oct 2012 09:03:24 +0100
+
+The crypto notifier deadlocks on RT. Though this can be a real deadlock
+on mainline as well due to fifo fair rwsems.
+
+The involved parties here are:
+
+[ 82.172678] swapper/0 S 0000000000000001 0 1 0 0x00000000
+[ 82.172682] ffff88042f18fcf0 0000000000000046 ffff88042f18fc80 ffffffff81491238
+[ 82.172685] 0000000000011cc0 0000000000011cc0 ffff88042f18c040 ffff88042f18ffd8
+[ 82.172688] 0000000000011cc0 0000000000011cc0 ffff88042f18ffd8 0000000000011cc0
+[ 82.172689] Call Trace:
+[ 82.172697] [<ffffffff81491238>] ? _raw_spin_unlock_irqrestore+0x6c/0x7a
+[ 82.172701] [<ffffffff8148fd3f>] schedule+0x64/0x66
+[ 82.172704] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
+[ 82.172708] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
+[ 82.172713] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
+[ 82.172716] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
+[ 82.172719] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
+[ 82.172722] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
+[ 82.172726] [<ffffffff811debfd>] crypto_wait_for_test+0x49/0x6b
+[ 82.172728] [<ffffffff811ded32>] crypto_register_alg+0x53/0x5a
+[ 82.172730] [<ffffffff811ded6c>] crypto_register_algs+0x33/0x72
+[ 82.172734] [<ffffffff81ad7686>] ? aes_init+0x12/0x12
+[ 82.172737] [<ffffffff81ad76ea>] aesni_init+0x64/0x66
+[ 82.172741] [<ffffffff81000318>] do_one_initcall+0x7f/0x13b
+[ 82.172744] [<ffffffff81ac4d34>] kernel_init+0x199/0x22c
+[ 82.172747] [<ffffffff81ac44ef>] ? loglevel+0x31/0x31
+[ 82.172752] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
+[ 82.172755] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
+[ 82.172759] [<ffffffff81ac4b9b>] ? start_kernel+0x3ca/0x3ca
+[ 82.172761] [<ffffffff814987c0>] ? gs_change+0x13/0x13
+
+[ 82.174186] cryptomgr_test S 0000000000000001 0 41 2 0x00000000
+[ 82.174189] ffff88042c971980 0000000000000046 ffffffff81d74830 0000000000000292
+[ 82.174192] 0000000000011cc0 0000000000011cc0 ffff88042c96eb80 ffff88042c971fd8
+[ 82.174195] 0000000000011cc0 0000000000011cc0 ffff88042c971fd8 0000000000011cc0
+[ 82.174195] Call Trace:
+[ 82.174198] [<ffffffff8148fd3f>] schedule+0x64/0x66
+[ 82.174201] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
+[ 82.174204] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
+[ 82.174206] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
+[ 82.174209] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
+[ 82.174212] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
+[ 82.174215] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
+[ 82.174218] [<ffffffff811e4883>] cryptomgr_notify+0x280/0x385
+[ 82.174221] [<ffffffff814943de>] notifier_call_chain+0x6b/0x98
+[ 82.174224] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
+[ 82.174227] [<ffffffff810677cd>] __blocking_notifier_call_chain+0x70/0x8d
+[ 82.174230] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
+[ 82.174234] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
+[ 82.174236] [<ffffffff811dd7a1>] crypto_alg_mod_lookup+0x3e/0x74
+[ 82.174238] [<ffffffff811dd949>] crypto_alloc_base+0x36/0x8f
+[ 82.174241] [<ffffffff811e9408>] cryptd_alloc_ablkcipher+0x6e/0xb5
+[ 82.174243] [<ffffffff811dd591>] ? kzalloc.clone.5+0xe/0x10
+[ 82.174246] [<ffffffff8103085d>] ablk_init_common+0x1d/0x38
+[ 82.174249] [<ffffffff8103852a>] ablk_ecb_init+0x15/0x17
+[ 82.174251] [<ffffffff811dd8c6>] __crypto_alloc_tfm+0xc7/0x114
+[ 82.174254] [<ffffffff811e0caa>] ? crypto_lookup_skcipher+0x1f/0xe4
+[ 82.174256] [<ffffffff811e0dcf>] crypto_alloc_ablkcipher+0x60/0xa5
+[ 82.174258] [<ffffffff811e5bde>] alg_test_skcipher+0x24/0x9b
+[ 82.174261] [<ffffffff8106d96d>] ? finish_task_switch+0x3f/0xfa
+[ 82.174263] [<ffffffff811e6b8e>] alg_test+0x16f/0x1d7
+[ 82.174267] [<ffffffff811e45ac>] ? cryptomgr_probe+0xac/0xac
+[ 82.174269] [<ffffffff811e45d8>] cryptomgr_test+0x2c/0x47
+[ 82.174272] [<ffffffff81061161>] kthread+0x7e/0x86
+[ 82.174275] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
+[ 82.174278] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
+[ 82.174281] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
+[ 82.174284] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
+[ 82.174287] [<ffffffff814987c0>] ? gs_change+0x13/0x13
+
+[ 82.174329] cryptomgr_probe D 0000000000000002 0 47 2 0x00000000
+[ 82.174332] ffff88042c991b70 0000000000000046 ffff88042c991bb0 0000000000000006
+[ 82.174335] 0000000000011cc0 0000000000011cc0 ffff88042c98ed00 ffff88042c991fd8
+[ 82.174338] 0000000000011cc0 0000000000011cc0 ffff88042c991fd8 0000000000011cc0
+[ 82.174338] Call Trace:
+[ 82.174342] [<ffffffff8148fd3f>] schedule+0x64/0x66
+[ 82.174344] [<ffffffff814901ad>] __rt_mutex_slowlock+0x85/0xbe
+[ 82.174347] [<ffffffff814902d2>] rt_mutex_slowlock+0xec/0x159
+[ 82.174351] [<ffffffff81089c4d>] rt_mutex_fastlock.clone.8+0x29/0x2f
+[ 82.174353] [<ffffffff81490372>] rt_mutex_lock+0x33/0x37
+[ 82.174356] [<ffffffff8108a0f2>] __rt_down_read+0x50/0x5a
+[ 82.174358] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
+[ 82.174360] [<ffffffff8108a11c>] rt_down_read+0x10/0x12
+[ 82.174363] [<ffffffff810677b5>] __blocking_notifier_call_chain+0x58/0x8d
+[ 82.174366] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
+[ 82.174369] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
+[ 82.174372] [<ffffffff811debd6>] crypto_wait_for_test+0x22/0x6b
+[ 82.174374] [<ffffffff811decd3>] crypto_register_instance+0xb4/0xc0
+[ 82.174377] [<ffffffff811e9b76>] cryptd_create+0x378/0x3b6
+[ 82.174379] [<ffffffff811de512>] ? __crypto_lookup_template+0x5b/0x63
+[ 82.174382] [<ffffffff811e4545>] cryptomgr_probe+0x45/0xac
+[ 82.174385] [<ffffffff811e4500>] ? crypto_alloc_pcomp+0x1b/0x1b
+[ 82.174388] [<ffffffff81061161>] kthread+0x7e/0x86
+[ 82.174391] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
+[ 82.174394] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
+[ 82.174398] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
+[ 82.174401] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
+[ 82.174403] [<ffffffff814987c0>] ? gs_change+0x13/0x13
+
+cryptomgr_test spawns the cryptomgr_probe thread from the notifier
+call. The probe thread fires the same notifier as the test thread and
+deadlocks on the rwsem on RT.
+
+Now this is a potential deadlock in mainline as well, because we have
+fifo fair rwsems. If another thread blocks with a down_write() on the
+notifier chain before the probe thread issues the down_read() it will
+block the probe thread and the whole party is dead locked.
+
+Signed-off-by: Peter Zijlstra <peterz@infradead.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ crypto/algapi.c | 4 ++--
+ crypto/api.c | 6 +++---
+ crypto/internal.h | 4 ++--
+ 3 files changed, 7 insertions(+), 7 deletions(-)
+
+--- a/crypto/algapi.c
++++ b/crypto/algapi.c
+@@ -695,13 +695,13 @@ EXPORT_SYMBOL_GPL(crypto_spawn_tfm2);
+
+ int crypto_register_notifier(struct notifier_block *nb)
+ {
+- return blocking_notifier_chain_register(&crypto_chain, nb);
++ return srcu_notifier_chain_register(&crypto_chain, nb);
+ }
+ EXPORT_SYMBOL_GPL(crypto_register_notifier);
+
+ int crypto_unregister_notifier(struct notifier_block *nb)
+ {
+- return blocking_notifier_chain_unregister(&crypto_chain, nb);
++ return srcu_notifier_chain_unregister(&crypto_chain, nb);
+ }
+ EXPORT_SYMBOL_GPL(crypto_unregister_notifier);
+
+--- a/crypto/api.c
++++ b/crypto/api.c
+@@ -31,7 +31,7 @@ EXPORT_SYMBOL_GPL(crypto_alg_list);
+ DECLARE_RWSEM(crypto_alg_sem);
+ EXPORT_SYMBOL_GPL(crypto_alg_sem);
+
+-BLOCKING_NOTIFIER_HEAD(crypto_chain);
++SRCU_NOTIFIER_HEAD(crypto_chain);
+ EXPORT_SYMBOL_GPL(crypto_chain);
+
+ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg);
+@@ -236,10 +236,10 @@ int crypto_probing_notify(unsigned long
+ {
+ int ok;
+
+- ok = blocking_notifier_call_chain(&crypto_chain, val, v);
++ ok = srcu_notifier_call_chain(&crypto_chain, val, v);
+ if (ok == NOTIFY_DONE) {
+ request_module("cryptomgr");
+- ok = blocking_notifier_call_chain(&crypto_chain, val, v);
++ ok = srcu_notifier_call_chain(&crypto_chain, val, v);
+ }
+
+ return ok;
+--- a/crypto/internal.h
++++ b/crypto/internal.h
+@@ -48,7 +48,7 @@ struct crypto_larval {
+
+ extern struct list_head crypto_alg_list;
+ extern struct rw_semaphore crypto_alg_sem;
+-extern struct blocking_notifier_head crypto_chain;
++extern struct srcu_notifier_head crypto_chain;
+
+ #ifdef CONFIG_PROC_FS
+ void __init crypto_init_proc(void);
+@@ -142,7 +142,7 @@ static inline int crypto_is_moribund(str
+
+ static inline void crypto_notify(unsigned long val, void *v)
+ {
+- blocking_notifier_call_chain(&crypto_chain, val, v);
++ srcu_notifier_call_chain(&crypto_chain, val, v);
+ }
+
+ #endif /* _CRYPTO_INTERNAL_H */
diff --git a/patches/ping-sysrq.patch b/patches/ping-sysrq.patch
new file mode 100644
index 00000000000000..276a332fcb299a
--- /dev/null
+++ b/patches/ping-sysrq.patch
@@ -0,0 +1,121 @@
+Subject: net: sysrq via icmp
+From: Carsten Emde <C.Emde@osadl.org>
+Date: Tue, 19 Jul 2011 13:51:17 +0100
+
+There are (probably rare) situations when a system crashed and the system
+console becomes unresponsive but the network icmp layer still is alive.
+Wouldn't it be wonderful, if we then could submit a sysreq command via ping?
+
+This patch provides this facility. Please consult the updated documentation
+Documentation/sysrq.txt for details.
+
+Signed-off-by: Carsten Emde <C.Emde@osadl.org>
+
+---
+ Documentation/sysrq.txt | 11 +++++++++--
+ include/net/netns/ipv4.h | 1 +
+ net/ipv4/icmp.c | 30 ++++++++++++++++++++++++++++++
+ net/ipv4/sysctl_net_ipv4.c | 7 +++++++
+ 4 files changed, 47 insertions(+), 2 deletions(-)
+
+--- a/Documentation/sysrq.txt
++++ b/Documentation/sysrq.txt
+@@ -59,10 +59,17 @@ On PowerPC - Press 'ALT - Print Screen (
+ On other - If you know of the key combos for other architectures, please
+ let me know so I can add them to this section.
+
+-On all - write a character to /proc/sysrq-trigger. e.g.:
+-
++On all - write a character to /proc/sysrq-trigger, e.g.:
+ echo t > /proc/sysrq-trigger
+
++On all - Enable network SysRq by writing a cookie to icmp_echo_sysrq, e.g.
++ echo 0x01020304 >/proc/sys/net/ipv4/icmp_echo_sysrq
++ Send an ICMP echo request with this pattern plus the particular
++ SysRq command key. Example:
++ # ping -c1 -s57 -p0102030468
++ will trigger the SysRq-H (help) command.
++
++
+ * What are the 'command' keys?
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ 'b' - Will immediately reboot the system without syncing or unmounting
+--- a/include/net/netns/ipv4.h
++++ b/include/net/netns/ipv4.h
+@@ -69,6 +69,7 @@ struct netns_ipv4 {
+
+ int sysctl_icmp_echo_ignore_all;
+ int sysctl_icmp_echo_ignore_broadcasts;
++ int sysctl_icmp_echo_sysrq;
+ int sysctl_icmp_ignore_bogus_error_responses;
+ int sysctl_icmp_ratelimit;
+ int sysctl_icmp_ratemask;
+--- a/net/ipv4/icmp.c
++++ b/net/ipv4/icmp.c
+@@ -69,6 +69,7 @@
+ #include <linux/jiffies.h>
+ #include <linux/kernel.h>
+ #include <linux/fcntl.h>
++#include <linux/sysrq.h>
+ #include <linux/socket.h>
+ #include <linux/in.h>
+ #include <linux/inet.h>
+@@ -867,6 +868,30 @@ static bool icmp_redirect(struct sk_buff
+ }
+
+ /*
++ * 32bit and 64bit have different timestamp length, so we check for
++ * the cookie at offset 20 and verify it is repeated at offset 50
++ */
++#define CO_POS0 20
++#define CO_POS1 50
++#define CO_SIZE sizeof(int)
++#define ICMP_SYSRQ_SIZE 57
++
++/*
++ * We got a ICMP_SYSRQ_SIZE sized ping request. Check for the cookie
++ * pattern and if it matches send the next byte as a trigger to sysrq.
++ */
++static void icmp_check_sysrq(struct net *net, struct sk_buff *skb)
++{
++ int cookie = htonl(net->ipv4.sysctl_icmp_echo_sysrq);
++ char *p = skb->data;
++
++ if (!memcmp(&cookie, p + CO_POS0, CO_SIZE) &&
++ !memcmp(&cookie, p + CO_POS1, CO_SIZE) &&
++ p[CO_POS0 + CO_SIZE] == p[CO_POS1 + CO_SIZE])
++ handle_sysrq(p[CO_POS0 + CO_SIZE]);
++}
++
++/*
+ * Handle ICMP_ECHO ("ping") requests.
+ *
+ * RFC 1122: 3.2.2.6 MUST have an echo server that answers ICMP echo
+@@ -893,6 +918,11 @@ static bool icmp_echo(struct sk_buff *sk
+ icmp_param.data_len = skb->len;
+ icmp_param.head_len = sizeof(struct icmphdr);
+ icmp_reply(&icmp_param, skb);
++
++ if (skb->len == ICMP_SYSRQ_SIZE &&
++ net->ipv4.sysctl_icmp_echo_sysrq) {
++ icmp_check_sysrq(net, skb);
++ }
+ }
+ /* should there be an ICMP stat for ignored echos? */
+ return true;
+--- a/net/ipv4/sysctl_net_ipv4.c
++++ b/net/ipv4/sysctl_net_ipv4.c
+@@ -779,6 +779,13 @@ static struct ctl_table ipv4_net_table[]
+ .proc_handler = proc_dointvec
+ },
+ {
++ .procname = "icmp_echo_sysrq",
++ .data = &init_net.ipv4.sysctl_icmp_echo_sysrq,
++ .maxlen = sizeof(int),
++ .mode = 0644,
++ .proc_handler = proc_dointvec
++ },
++ {
+ .procname = "icmp_ignore_bogus_error_responses",
+ .data = &init_net.ipv4.sysctl_icmp_ignore_bogus_error_responses,
+ .maxlen = sizeof(int),
diff --git a/patches/posix-timers-no-broadcast.patch b/patches/posix-timers-no-broadcast.patch
new file mode 100644
index 00000000000000..21adaced5a86b6
--- /dev/null
+++ b/patches/posix-timers-no-broadcast.patch
@@ -0,0 +1,33 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 3 Jul 2009 08:29:20 -0500
+Subject: posix-timers: Prevent broadcast signals
+
+Posix timers should not send broadcast signals and kernel only
+signals. Prevent it.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/time/posix-timers.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- a/kernel/time/posix-timers.c
++++ b/kernel/time/posix-timers.c
+@@ -499,6 +499,7 @@ static enum hrtimer_restart posix_timer_
+ static struct pid *good_sigevent(sigevent_t * event)
+ {
+ struct task_struct *rtn = current->group_leader;
++ int sig = event->sigev_signo;
+
+ if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
+ (!(rtn = find_task_by_vpid(event->sigev_notify_thread_id)) ||
+@@ -507,7 +508,8 @@ static struct pid *good_sigevent(sigeven
+ return NULL;
+
+ if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
+- ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
++ (sig <= 0 || sig > SIGRTMAX || sig_kernel_only(sig) ||
++ sig_kernel_coredump(sig)))
+ return NULL;
+
+ return task_pid(rtn);
diff --git a/patches/posix-timers-thread-posix-cpu-timers-on-rt.patch b/patches/posix-timers-thread-posix-cpu-timers-on-rt.patch
new file mode 100644
index 00000000000000..5d276e6e61eebb
--- /dev/null
+++ b/patches/posix-timers-thread-posix-cpu-timers-on-rt.patch
@@ -0,0 +1,315 @@
+From: John Stultz <johnstul@us.ibm.com>
+Date: Fri, 3 Jul 2009 08:29:58 -0500
+Subject: posix-timers: Thread posix-cpu-timers on -rt
+
+posix-cpu-timer code takes non -rt safe locks in hard irq
+context. Move it to a thread.
+
+[ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ]
+
+Signed-off-by: John Stultz <johnstul@us.ibm.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/init_task.h | 7 +
+ include/linux/sched.h | 3
+ kernel/fork.c | 3
+ kernel/time/posix-cpu-timers.c | 198 +++++++++++++++++++++++++++++++++++++++--
+ 4 files changed, 205 insertions(+), 6 deletions(-)
+
+--- a/include/linux/init_task.h
++++ b/include/linux/init_task.h
+@@ -147,6 +147,12 @@ extern struct task_group root_task_group
+ # define INIT_PERF_EVENTS(tsk)
+ #endif
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++# define INIT_TIMER_LIST .posix_timer_list = NULL,
++#else
++# define INIT_TIMER_LIST
++#endif
++
+ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+ # define INIT_VTIME(tsk) \
+ .vtime_lock = __RAW_SPIN_LOCK_UNLOCKED(tsk.vtime_lock), \
+@@ -239,6 +245,7 @@ extern struct task_group root_task_group
+ .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \
+ .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock), \
+ .timer_slack_ns = 50000, /* 50 usec default slack */ \
++ INIT_TIMER_LIST \
+ .pids = { \
+ [PIDTYPE_PID] = INIT_PID_LINK(PIDTYPE_PID), \
+ [PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID), \
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1501,6 +1501,9 @@ struct task_struct {
+
+ struct task_cputime cputime_expires;
+ struct list_head cpu_timers[3];
++#ifdef CONFIG_PREEMPT_RT_BASE
++ struct task_struct *posix_timer_list;
++#endif
+
+ /* process credentials */
+ const struct cred __rcu *real_cred; /* objective and real subjective task
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1214,6 +1214,9 @@ static void rt_mutex_init_task(struct ta
+ */
+ static void posix_cpu_timers_init(struct task_struct *tsk)
+ {
++#ifdef CONFIG_PREEMPT_RT_BASE
++ tsk->posix_timer_list = NULL;
++#endif
+ tsk->cputime_expires.prof_exp = 0;
+ tsk->cputime_expires.virt_exp = 0;
+ tsk->cputime_expires.sched_exp = 0;
+--- a/kernel/time/posix-cpu-timers.c
++++ b/kernel/time/posix-cpu-timers.c
+@@ -3,6 +3,7 @@
+ */
+
+ #include <linux/sched.h>
++#include <linux/sched/rt.h>
+ #include <linux/posix-timers.h>
+ #include <linux/errno.h>
+ #include <linux/math64.h>
+@@ -626,7 +627,7 @@ static int posix_cpu_timer_set(struct k_
+ /*
+ * Disarm any old timer after extracting its expiry time.
+ */
+- WARN_ON_ONCE(!irqs_disabled());
++ WARN_ON_ONCE_NONRT(!irqs_disabled());
+
+ ret = 0;
+ old_incr = timer->it.cpu.incr;
+@@ -1047,7 +1048,7 @@ void posix_cpu_timer_schedule(struct k_i
+ /*
+ * Now re-arm for the new expiry time.
+ */
+- WARN_ON_ONCE(!irqs_disabled());
++ WARN_ON_ONCE_NONRT(!irqs_disabled());
+ arm_timer(timer);
+ unlock_task_sighand(p, &flags);
+
+@@ -1113,10 +1114,11 @@ static inline int fastpath_timer_check(s
+ sig = tsk->signal;
+ if (sig->cputimer.running) {
+ struct task_cputime group_sample;
++ unsigned long flags;
+
+- raw_spin_lock(&sig->cputimer.lock);
++ raw_spin_lock_irqsave(&sig->cputimer.lock, flags);
+ group_sample = sig->cputimer.cputime;
+- raw_spin_unlock(&sig->cputimer.lock);
++ raw_spin_unlock_irqrestore(&sig->cputimer.lock, flags);
+
+ if (task_cputime_expired(&group_sample, &sig->cputime_expires))
+ return 1;
+@@ -1130,13 +1132,13 @@ static inline int fastpath_timer_check(s
+ * already updated our counts. We need to check if any timers fire now.
+ * Interrupts are disabled.
+ */
+-void run_posix_cpu_timers(struct task_struct *tsk)
++static void __run_posix_cpu_timers(struct task_struct *tsk)
+ {
+ LIST_HEAD(firing);
+ struct k_itimer *timer, *next;
+ unsigned long flags;
+
+- WARN_ON_ONCE(!irqs_disabled());
++ WARN_ON_ONCE_NONRT(!irqs_disabled());
+
+ /*
+ * The fast path checks that there are no expired thread or thread
+@@ -1194,6 +1196,190 @@ void run_posix_cpu_timers(struct task_st
+ }
+ }
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++#include <linux/kthread.h>
++#include <linux/cpu.h>
++DEFINE_PER_CPU(struct task_struct *, posix_timer_task);
++DEFINE_PER_CPU(struct task_struct *, posix_timer_tasklist);
++
++static int posix_cpu_timers_thread(void *data)
++{
++ int cpu = (long)data;
++
++ BUG_ON(per_cpu(posix_timer_task,cpu) != current);
++
++ while (!kthread_should_stop()) {
++ struct task_struct *tsk = NULL;
++ struct task_struct *next = NULL;
++
++ if (cpu_is_offline(cpu))
++ goto wait_to_die;
++
++ /* grab task list */
++ raw_local_irq_disable();
++ tsk = per_cpu(posix_timer_tasklist, cpu);
++ per_cpu(posix_timer_tasklist, cpu) = NULL;
++ raw_local_irq_enable();
++
++ /* its possible the list is empty, just return */
++ if (!tsk) {
++ set_current_state(TASK_INTERRUPTIBLE);
++ schedule();
++ __set_current_state(TASK_RUNNING);
++ continue;
++ }
++
++ /* Process task list */
++ while (1) {
++ /* save next */
++ next = tsk->posix_timer_list;
++
++ /* run the task timers, clear its ptr and
++ * unreference it
++ */
++ __run_posix_cpu_timers(tsk);
++ tsk->posix_timer_list = NULL;
++ put_task_struct(tsk);
++
++ /* check if this is the last on the list */
++ if (next == tsk)
++ break;
++ tsk = next;
++ }
++ }
++ return 0;
++
++wait_to_die:
++ /* Wait for kthread_stop */
++ set_current_state(TASK_INTERRUPTIBLE);
++ while (!kthread_should_stop()) {
++ schedule();
++ set_current_state(TASK_INTERRUPTIBLE);
++ }
++ __set_current_state(TASK_RUNNING);
++ return 0;
++}
++
++static inline int __fastpath_timer_check(struct task_struct *tsk)
++{
++ /* tsk == current, ensure it is safe to use ->signal/sighand */
++ if (unlikely(tsk->exit_state))
++ return 0;
++
++ if (!task_cputime_zero(&tsk->cputime_expires))
++ return 1;
++
++ if (!task_cputime_zero(&tsk->signal->cputime_expires))
++ return 1;
++
++ return 0;
++}
++
++void run_posix_cpu_timers(struct task_struct *tsk)
++{
++ unsigned long cpu = smp_processor_id();
++ struct task_struct *tasklist;
++
++ BUG_ON(!irqs_disabled());
++ if(!per_cpu(posix_timer_task, cpu))
++ return;
++ /* get per-cpu references */
++ tasklist = per_cpu(posix_timer_tasklist, cpu);
++
++ /* check to see if we're already queued */
++ if (!tsk->posix_timer_list && __fastpath_timer_check(tsk)) {
++ get_task_struct(tsk);
++ if (tasklist) {
++ tsk->posix_timer_list = tasklist;
++ } else {
++ /*
++ * The list is terminated by a self-pointing
++ * task_struct
++ */
++ tsk->posix_timer_list = tsk;
++ }
++ per_cpu(posix_timer_tasklist, cpu) = tsk;
++
++ wake_up_process(per_cpu(posix_timer_task, cpu));
++ }
++}
++
++/*
++ * posix_cpu_thread_call - callback that gets triggered when a CPU is added.
++ * Here we can start up the necessary migration thread for the new CPU.
++ */
++static int posix_cpu_thread_call(struct notifier_block *nfb,
++ unsigned long action, void *hcpu)
++{
++ int cpu = (long)hcpu;
++ struct task_struct *p;
++ struct sched_param param;
++
++ switch (action) {
++ case CPU_UP_PREPARE:
++ p = kthread_create(posix_cpu_timers_thread, hcpu,
++ "posixcputmr/%d",cpu);
++ if (IS_ERR(p))
++ return NOTIFY_BAD;
++ p->flags |= PF_NOFREEZE;
++ kthread_bind(p, cpu);
++ /* Must be high prio to avoid getting starved */
++ param.sched_priority = MAX_RT_PRIO-1;
++ sched_setscheduler(p, SCHED_FIFO, &param);
++ per_cpu(posix_timer_task,cpu) = p;
++ break;
++ case CPU_ONLINE:
++ /* Strictly unneccessary, as first user will wake it. */
++ wake_up_process(per_cpu(posix_timer_task,cpu));
++ break;
++#ifdef CONFIG_HOTPLUG_CPU
++ case CPU_UP_CANCELED:
++ /* Unbind it from offline cpu so it can run. Fall thru. */
++ kthread_bind(per_cpu(posix_timer_task, cpu),
++ cpumask_any(cpu_online_mask));
++ kthread_stop(per_cpu(posix_timer_task,cpu));
++ per_cpu(posix_timer_task,cpu) = NULL;
++ break;
++ case CPU_DEAD:
++ kthread_stop(per_cpu(posix_timer_task,cpu));
++ per_cpu(posix_timer_task,cpu) = NULL;
++ break;
++#endif
++ }
++ return NOTIFY_OK;
++}
++
++/* Register at highest priority so that task migration (migrate_all_tasks)
++ * happens before everything else.
++ */
++static struct notifier_block posix_cpu_thread_notifier = {
++ .notifier_call = posix_cpu_thread_call,
++ .priority = 10
++};
++
++static int __init posix_cpu_thread_init(void)
++{
++ void *hcpu = (void *)(long)smp_processor_id();
++ /* Start one for boot CPU. */
++ unsigned long cpu;
++
++ /* init the per-cpu posix_timer_tasklets */
++ for_each_possible_cpu(cpu)
++ per_cpu(posix_timer_tasklist, cpu) = NULL;
++
++ posix_cpu_thread_call(&posix_cpu_thread_notifier, CPU_UP_PREPARE, hcpu);
++ posix_cpu_thread_call(&posix_cpu_thread_notifier, CPU_ONLINE, hcpu);
++ register_cpu_notifier(&posix_cpu_thread_notifier);
++ return 0;
++}
++early_initcall(posix_cpu_thread_init);
++#else /* CONFIG_PREEMPT_RT_BASE */
++void run_posix_cpu_timers(struct task_struct *tsk)
++{
++ __run_posix_cpu_timers(tsk);
++}
++#endif /* CONFIG_PREEMPT_RT_BASE */
++
+ /*
+ * Set one of the process-wide special case CPU timers or RLIMIT_CPU.
+ * The tsk->sighand->siglock must be held by the caller.
diff --git a/patches/power-disable-highmem-on-rt.patch b/patches/power-disable-highmem-on-rt.patch
new file mode 100644
index 00000000000000..725a8b4aa4e4d9
--- /dev/null
+++ b/patches/power-disable-highmem-on-rt.patch
@@ -0,0 +1,22 @@
+Subject: powerpc: Disable highmem on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 18 Jul 2011 17:08:34 +0200
+
+The current highmem handling on -RT is not compatible and needs fixups.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/powerpc/Kconfig | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/powerpc/Kconfig
++++ b/arch/powerpc/Kconfig
+@@ -313,7 +313,7 @@ menu "Kernel options"
+
+ config HIGHMEM
+ bool "High memory support"
+- depends on PPC32
++ depends on PPC32 && !PREEMPT_RT_FULL
+
+ source kernel/Kconfig.hz
+ source kernel/Kconfig.preempt
diff --git a/patches/power-use-generic-rwsem-on-rt.patch b/patches/power-use-generic-rwsem-on-rt.patch
new file mode 100644
index 00000000000000..7b49387491271b
--- /dev/null
+++ b/patches/power-use-generic-rwsem-on-rt.patch
@@ -0,0 +1,26 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 14 Jul 2015 14:26:34 +0200
+Subject: powerpc: Use generic rwsem on RT
+
+Use generic code which uses rtmutex
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/powerpc/Kconfig | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/arch/powerpc/Kconfig
++++ b/arch/powerpc/Kconfig
+@@ -60,10 +60,11 @@ config LOCKDEP_SUPPORT
+
+ config RWSEM_GENERIC_SPINLOCK
+ bool
++ default y if PREEMPT_RT_FULL
+
+ config RWSEM_XCHGADD_ALGORITHM
+ bool
+- default y
++ default y if !PREEMPT_RT_FULL
+
+ config GENERIC_LOCKBREAK
+ bool
diff --git a/patches/powerpc-kvm-Disable-in-kernel-MPIC-emulation-for-PRE.patch b/patches/powerpc-kvm-Disable-in-kernel-MPIC-emulation-for-PRE.patch
new file mode 100644
index 00000000000000..c9da5829921874
--- /dev/null
+++ b/patches/powerpc-kvm-Disable-in-kernel-MPIC-emulation-for-PRE.patch
@@ -0,0 +1,37 @@
+From: Bogdan Purcareata <bogdan.purcareata@freescale.com>
+Date: Fri, 24 Apr 2015 15:53:13 +0000
+Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT_FULL
+
+While converting the openpic emulation code to use a raw_spinlock_t enables
+guests to run on RT, there's still a performance issue. For interrupts sent in
+directed delivery mode with a multiple CPU mask, the emulated openpic will loop
+through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
+through all the pending interrupts for that VCPU. This is done while holding the
+raw_lock, meaning that in all this time the interrupts and preemption are
+disabled on the host Linux. A malicious user app can max both these number and
+cause a DoS.
+
+This temporary fix is sent for two reasons. First is so that users who want to
+use the in-kernel MPIC emulation are aware of the potential latencies, thus
+making sure that the hardware MPIC and their usage scenario does not involve
+interrupts sent in directed delivery mode, and the number of possible pending
+interrupts is kept small. Secondly, this should incentivize the development of a
+proper openpic emulation that would be better suited for RT.
+
+Acked-by: Scott Wood <scottwood@freescale.com>
+Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/powerpc/kvm/Kconfig | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/arch/powerpc/kvm/Kconfig
++++ b/arch/powerpc/kvm/Kconfig
+@@ -172,6 +172,7 @@ config KVM_E500MC
+ config KVM_MPIC
+ bool "KVM in-kernel MPIC emulation"
+ depends on KVM && E500
++ depends on !PREEMPT_RT_FULL
+ select HAVE_KVM_IRQCHIP
+ select HAVE_KVM_IRQFD
+ select HAVE_KVM_IRQ_ROUTING
diff --git a/patches/powerpc-preempt-lazy-support.patch b/patches/powerpc-preempt-lazy-support.patch
new file mode 100644
index 00000000000000..b12421a4c448dd
--- /dev/null
+++ b/patches/powerpc-preempt-lazy-support.patch
@@ -0,0 +1,173 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 1 Nov 2012 10:14:11 +0100
+Subject: powerpc: Add support for lazy preemption
+
+Implement the powerpc pieces for lazy preempt.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/powerpc/Kconfig | 1 +
+ arch/powerpc/include/asm/thread_info.h | 11 ++++++++---
+ arch/powerpc/kernel/asm-offsets.c | 1 +
+ arch/powerpc/kernel/entry_32.S | 17 ++++++++++++-----
+ arch/powerpc/kernel/entry_64.S | 14 +++++++++++---
+ 5 files changed, 33 insertions(+), 11 deletions(-)
+
+--- a/arch/powerpc/Kconfig
++++ b/arch/powerpc/Kconfig
+@@ -139,6 +139,7 @@ config PPC
+ select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
+ select GENERIC_STRNCPY_FROM_USER
+ select GENERIC_STRNLEN_USER
++ select HAVE_PREEMPT_LAZY
+ select HAVE_MOD_ARCH_SPECIFIC
+ select MODULES_USE_ELF_RELA
+ select CLONE_BACKWARDS
+--- a/arch/powerpc/include/asm/thread_info.h
++++ b/arch/powerpc/include/asm/thread_info.h
+@@ -42,6 +42,8 @@ struct thread_info {
+ int cpu; /* cpu we're on */
+ int preempt_count; /* 0 => preemptable,
+ <0 => BUG */
++ int preempt_lazy_count; /* 0 => preemptable,
++ <0 => BUG */
+ unsigned long local_flags; /* private flags for thread */
+
+ /* low level flags - has atomic operations done on it */
+@@ -82,8 +84,7 @@ static inline struct thread_info *curren
+ #define TIF_SYSCALL_TRACE 0 /* syscall trace active */
+ #define TIF_SIGPENDING 1 /* signal pending */
+ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */
+-#define TIF_POLLING_NRFLAG 3 /* true if poll_idle() is polling
+- TIF_NEED_RESCHED */
++#define TIF_NEED_RESCHED_LAZY 3 /* lazy rescheduling necessary */
+ #define TIF_32BIT 4 /* 32 bit binary */
+ #define TIF_RESTORE_TM 5 /* need to restore TM FP/VEC/VSX */
+ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
+@@ -101,6 +102,8 @@ static inline struct thread_info *curren
+ #if defined(CONFIG_PPC64)
+ #define TIF_ELF2ABI 18 /* function descriptors must die! */
+ #endif
++#define TIF_POLLING_NRFLAG 19 /* true if poll_idle() is polling
++ TIF_NEED_RESCHED */
+
+ /* as above, but as bit values */
+ #define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
+@@ -119,14 +122,16 @@ static inline struct thread_info *curren
+ #define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
+ #define _TIF_EMULATE_STACK_STORE (1<<TIF_EMULATE_STACK_STORE)
+ #define _TIF_NOHZ (1<<TIF_NOHZ)
++#define _TIF_NEED_RESCHED_LAZY (1<<TIF_NEED_RESCHED_LAZY)
+ #define _TIF_SYSCALL_DOTRACE (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
+ _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
+ _TIF_NOHZ)
+
+ #define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
+ _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
+- _TIF_RESTORE_TM)
++ _TIF_RESTORE_TM | _TIF_NEED_RESCHED_LAZY)
+ #define _TIF_PERSYSCALL_MASK (_TIF_RESTOREALL|_TIF_NOERROR)
++#define _TIF_NEED_RESCHED_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)
+
+ /* Bits in local_flags */
+ /* Don't move TLF_NAPPING without adjusting the code in entry_32.S */
+--- a/arch/powerpc/kernel/asm-offsets.c
++++ b/arch/powerpc/kernel/asm-offsets.c
+@@ -160,6 +160,7 @@ int main(void)
+ DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
+ DEFINE(TI_LOCAL_FLAGS, offsetof(struct thread_info, local_flags));
+ DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
++ DEFINE(TI_PREEMPT_LAZY, offsetof(struct thread_info, preempt_lazy_count));
+ DEFINE(TI_TASK, offsetof(struct thread_info, task));
+ DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
+
+--- a/arch/powerpc/kernel/entry_32.S
++++ b/arch/powerpc/kernel/entry_32.S
+@@ -813,7 +813,14 @@ user_exc_return: /* r10 contains MSR_KE
+ cmpwi 0,r0,0 /* if non-zero, just restore regs and return */
+ bne restore
+ andi. r8,r8,_TIF_NEED_RESCHED
++ bne+ 1f
++ lwz r0,TI_PREEMPT_LAZY(r9)
++ cmpwi 0,r0,0 /* if non-zero, just restore regs and return */
++ bne restore
++ lwz r0,TI_FLAGS(r9)
++ andi. r0,r0,_TIF_NEED_RESCHED_LAZY
+ beq+ restore
++1:
+ lwz r3,_MSR(r1)
+ andi. r0,r3,MSR_EE /* interrupts off? */
+ beq restore /* don't schedule if so */
+@@ -824,11 +831,11 @@ user_exc_return: /* r10 contains MSR_KE
+ */
+ bl trace_hardirqs_off
+ #endif
+-1: bl preempt_schedule_irq
++2: bl preempt_schedule_irq
+ CURRENT_THREAD_INFO(r9, r1)
+ lwz r3,TI_FLAGS(r9)
+- andi. r0,r3,_TIF_NEED_RESCHED
+- bne- 1b
++ andi. r0,r3,_TIF_NEED_RESCHED_MASK
++ bne- 2b
+ #ifdef CONFIG_TRACE_IRQFLAGS
+ /* And now, to properly rebalance the above, we tell lockdep they
+ * are being turned back on, which will happen when we return
+@@ -1149,7 +1156,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRE
+ #endif /* !(CONFIG_4xx || CONFIG_BOOKE) */
+
+ do_work: /* r10 contains MSR_KERNEL here */
+- andi. r0,r9,_TIF_NEED_RESCHED
++ andi. r0,r9,_TIF_NEED_RESCHED_MASK
+ beq do_user_signal
+
+ do_resched: /* r10 contains MSR_KERNEL here */
+@@ -1170,7 +1177,7 @@ do_resched: /* r10 contains MSR_KERNEL
+ MTMSRD(r10) /* disable interrupts */
+ CURRENT_THREAD_INFO(r9, r1)
+ lwz r9,TI_FLAGS(r9)
+- andi. r0,r9,_TIF_NEED_RESCHED
++ andi. r0,r9,_TIF_NEED_RESCHED_MASK
+ bne- do_resched
+ andi. r0,r9,_TIF_USER_WORK_MASK
+ beq restore_user
+--- a/arch/powerpc/kernel/entry_64.S
++++ b/arch/powerpc/kernel/entry_64.S
+@@ -636,7 +636,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_DSCR)
+ #else
+ beq restore
+ #endif
+-1: andi. r0,r4,_TIF_NEED_RESCHED
++1: andi. r0,r4,_TIF_NEED_RESCHED_MASK
+ beq 2f
+ bl restore_interrupts
+ SCHEDULE_USER
+@@ -698,10 +698,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_DSCR)
+
+ #ifdef CONFIG_PREEMPT
+ /* Check if we need to preempt */
++ lwz r8,TI_PREEMPT(r9)
++ cmpwi 0,r8,0 /* if non-zero, just restore regs and return */
++ bne restore
+ andi. r0,r4,_TIF_NEED_RESCHED
++ bne+ check_count
++
++ andi. r0,r4,_TIF_NEED_RESCHED_LAZY
+ beq+ restore
++ lwz r8,TI_PREEMPT_LAZY(r9)
++
+ /* Check that preempt_count() == 0 and interrupts are enabled */
+- lwz r8,TI_PREEMPT(r9)
++check_count:
+ cmpwi cr1,r8,0
+ ld r0,SOFTE(r1)
+ cmpdi r0,0
+@@ -718,7 +726,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_DSCR)
+ /* Re-test flags and eventually loop */
+ CURRENT_THREAD_INFO(r9, r1)
+ ld r4,TI_FLAGS(r9)
+- andi. r0,r4,_TIF_NEED_RESCHED
++ andi. r0,r4,_TIF_NEED_RESCHED_MASK
+ bne 1b
+
+ /*
diff --git a/patches/powerpc-ps3-device-init.c-adapt-to-completions-using.patch b/patches/powerpc-ps3-device-init.c-adapt-to-completions-using.patch
new file mode 100644
index 00000000000000..a153e1cf017e31
--- /dev/null
+++ b/patches/powerpc-ps3-device-init.c-adapt-to-completions-using.patch
@@ -0,0 +1,31 @@
+From: Paul Gortmaker <paul.gortmaker@windriver.com>
+Date: Sun, 31 May 2015 14:44:42 -0400
+Subject: powerpc: ps3/device-init.c - adapt to completions using swait vs wait
+
+To fix:
+
+ cc1: warnings being treated as errors
+ arch/powerpc/platforms/ps3/device-init.c: In function 'ps3_notification_read_write':
+ arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'prepare_to_wait_event' from incompatible pointer type
+ arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'abort_exclusive_wait' from incompatible pointer type
+ arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'finish_wait' from incompatible pointer type
+ arch/powerpc/platforms/ps3/device-init.o] Error 1
+ make[3]: *** Waiting for unfinished jobs....
+
+Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/powerpc/platforms/ps3/device-init.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/powerpc/platforms/ps3/device-init.c
++++ b/arch/powerpc/platforms/ps3/device-init.c
+@@ -752,7 +752,7 @@ static int ps3_notification_read_write(s
+ }
+ pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
+
+- res = wait_event_interruptible(dev->done.wait,
++ res = swait_event_interruptible(dev->done.wait,
+ dev->done.done || kthread_should_stop());
+ if (kthread_should_stop())
+ res = -EINTR;
diff --git a/patches/preempt-lazy-support.patch b/patches/preempt-lazy-support.patch
new file mode 100644
index 00000000000000..335b43e799dec4
--- /dev/null
+++ b/patches/preempt-lazy-support.patch
@@ -0,0 +1,589 @@
+Subject: sched: Add support for lazy preemption
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 26 Oct 2012 18:50:54 +0100
+
+It has become an obsession to mitigate the determinism vs. throughput
+loss of RT. Looking at the mainline semantics of preemption points
+gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
+tasks. One major issue is the wakeup of tasks which are right away
+preempting the waking task while the waking task holds a lock on which
+the woken task will block right after having preempted the wakee. In
+mainline this is prevented due to the implicit preemption disable of
+spin/rw_lock held regions. On RT this is not possible due to the fully
+preemptible nature of sleeping spinlocks.
+
+Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
+is really not a correctness issue. RT folks are concerned about
+SCHED_FIFO/RR tasks preemption and not about the purely fairness
+driven SCHED_OTHER preemption latencies.
+
+So I introduced a lazy preemption mechanism which only applies to
+SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
+existing preempt_count each tasks sports now a preempt_lazy_count
+which is manipulated on lock acquiry and release. This is slightly
+incorrect as for lazyness reasons I coupled this on
+migrate_disable/enable so some other mechanisms get the same treatment
+(e.g. get_cpu_light).
+
+Now on the scheduler side instead of setting NEED_RESCHED this sets
+NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
+therefor allows to exit the waking task the lock held region before
+the woken task preempts. That also works better for cross CPU wakeups
+as the other side can stay in the adaptive spinning loop.
+
+For RT class preemption there is no change. This simply sets
+NEED_RESCHED and forgoes the lazy preemption counter.
+
+ Initial test do not expose any observable latency increasement, but
+history shows that I've been proven wrong before :)
+
+The lazy preemption mode is per default on, but with
+CONFIG_SCHED_DEBUG enabled it can be disabled via:
+
+ # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
+
+and reenabled via
+
+ # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
+
+The test results so far are very machine and workload dependent, but
+there is a clear trend that it enhances the non RT workload
+performance.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/x86/include/asm/preempt.h | 18 +++++++++++++-
+ include/linux/ftrace_event.h | 1
+ include/linux/preempt.h | 29 ++++++++++++++++++++++-
+ include/linux/sched.h | 37 ++++++++++++++++++++++++++++++
+ include/linux/thread_info.h | 12 +++++++++
+ kernel/Kconfig.preempt | 6 ++++
+ kernel/sched/core.c | 50 ++++++++++++++++++++++++++++++++++++++++-
+ kernel/sched/fair.c | 16 ++++++-------
+ kernel/sched/features.h | 3 ++
+ kernel/sched/sched.h | 9 +++++++
+ kernel/trace/trace.c | 37 ++++++++++++++++++------------
+ kernel/trace/trace.h | 2 +
+ kernel/trace/trace_output.c | 13 +++++++++-
+ 13 files changed, 204 insertions(+), 29 deletions(-)
+
+--- a/arch/x86/include/asm/preempt.h
++++ b/arch/x86/include/asm/preempt.h
+@@ -82,17 +82,33 @@ static __always_inline void __preempt_co
+ * a decrement which hits zero means we have no preempt_count and should
+ * reschedule.
+ */
+-static __always_inline bool __preempt_count_dec_and_test(void)
++static __always_inline bool ____preempt_count_dec_and_test(void)
+ {
+ GEN_UNARY_RMWcc("decl", __preempt_count, __percpu_arg(0), "e");
+ }
+
++static __always_inline bool __preempt_count_dec_and_test(void)
++{
++ if (____preempt_count_dec_and_test())
++ return true;
++#ifdef CONFIG_PREEMPT_LAZY
++ return test_thread_flag(TIF_NEED_RESCHED_LAZY);
++#else
++ return false;
++#endif
++}
++
+ /*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+ static __always_inline bool should_resched(void)
+ {
++#ifdef CONFIG_PREEMPT_LAZY
++ return unlikely(!raw_cpu_read_4(__preempt_count) || \
++ test_thread_flag(TIF_NEED_RESCHED_LAZY));
++#else
+ return unlikely(!raw_cpu_read_4(__preempt_count));
++#endif
+ }
+
+ #ifdef CONFIG_PREEMPT
+--- a/include/linux/ftrace_event.h
++++ b/include/linux/ftrace_event.h
+@@ -68,6 +68,7 @@ struct trace_entry {
+ int pid;
+ unsigned short migrate_disable;
+ unsigned short padding;
++ unsigned char preempt_lazy_count;
+ };
+
+ #define FTRACE_MAX_EVENT \
+--- a/include/linux/preempt.h
++++ b/include/linux/preempt.h
+@@ -33,6 +33,20 @@ extern void preempt_count_sub(int val);
+ #define preempt_count_inc() preempt_count_add(1)
+ #define preempt_count_dec() preempt_count_sub(1)
+
++#ifdef CONFIG_PREEMPT_LAZY
++#define add_preempt_lazy_count(val) do { preempt_lazy_count() += (val); } while (0)
++#define sub_preempt_lazy_count(val) do { preempt_lazy_count() -= (val); } while (0)
++#define inc_preempt_lazy_count() add_preempt_lazy_count(1)
++#define dec_preempt_lazy_count() sub_preempt_lazy_count(1)
++#define preempt_lazy_count() (current_thread_info()->preempt_lazy_count)
++#else
++#define add_preempt_lazy_count(val) do { } while (0)
++#define sub_preempt_lazy_count(val) do { } while (0)
++#define inc_preempt_lazy_count() do { } while (0)
++#define dec_preempt_lazy_count() do { } while (0)
++#define preempt_lazy_count() (0)
++#endif
++
+ #ifdef CONFIG_PREEMPT_COUNT
+
+ #define preempt_disable() \
+@@ -41,6 +55,12 @@ do { \
+ barrier(); \
+ } while (0)
+
++#define preempt_lazy_disable() \
++do { \
++ inc_preempt_lazy_count(); \
++ barrier(); \
++} while (0)
++
+ #define sched_preempt_enable_no_resched() \
+ do { \
+ barrier(); \
+@@ -69,6 +89,13 @@ do { \
+ __preempt_schedule(); \
+ } while (0)
+
++#define preempt_lazy_enable() \
++do { \
++ dec_preempt_lazy_count(); \
++ barrier(); \
++ preempt_check_resched(); \
++} while (0)
++
+ #else
+ #define preempt_enable() \
+ do { \
+@@ -147,7 +174,7 @@ do { \
+ } while (0)
+ #define preempt_fold_need_resched() \
+ do { \
+- if (tif_need_resched()) \
++ if (tif_need_resched_now()) \
+ set_preempt_need_resched(); \
+ } while (0)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -2898,6 +2898,43 @@ static inline int test_tsk_need_resched(
+ return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED));
+ }
+
++#ifdef CONFIG_PREEMPT_LAZY
++static inline void set_tsk_need_resched_lazy(struct task_struct *tsk)
++{
++ set_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY);
++}
++
++static inline void clear_tsk_need_resched_lazy(struct task_struct *tsk)
++{
++ clear_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY);
++}
++
++static inline int test_tsk_need_resched_lazy(struct task_struct *tsk)
++{
++ return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY));
++}
++
++static inline int need_resched_lazy(void)
++{
++ return test_thread_flag(TIF_NEED_RESCHED_LAZY);
++}
++
++static inline int need_resched_now(void)
++{
++ return test_thread_flag(TIF_NEED_RESCHED);
++}
++
++#else
++static inline void clear_tsk_need_resched_lazy(struct task_struct *tsk) { }
++static inline int need_resched_lazy(void) { return 0; }
++
++static inline int need_resched_now(void)
++{
++ return test_thread_flag(TIF_NEED_RESCHED);
++}
++
++#endif
++
+ static inline int restart_syscall(void)
+ {
+ set_tsk_thread_flag(current, TIF_SIGPENDING);
+--- a/include/linux/thread_info.h
++++ b/include/linux/thread_info.h
+@@ -102,7 +102,17 @@ static inline int test_ti_thread_flag(st
+ #define test_thread_flag(flag) \
+ test_ti_thread_flag(current_thread_info(), flag)
+
+-#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
++#ifdef CONFIG_PREEMPT_LAZY
++#define tif_need_resched() (test_thread_flag(TIF_NEED_RESCHED) || \
++ test_thread_flag(TIF_NEED_RESCHED_LAZY))
++#define tif_need_resched_now() (test_thread_flag(TIF_NEED_RESCHED))
++#define tif_need_resched_lazy() test_thread_flag(TIF_NEED_RESCHED_LAZY))
++
++#else
++#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
++#define tif_need_resched_now() test_thread_flag(TIF_NEED_RESCHED)
++#define tif_need_resched_lazy() 0
++#endif
+
+ #if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
+ /*
+--- a/kernel/Kconfig.preempt
++++ b/kernel/Kconfig.preempt
+@@ -6,6 +6,12 @@ config PREEMPT_RT_BASE
+ bool
+ select PREEMPT
+
++config HAVE_PREEMPT_LAZY
++ bool
++
++config PREEMPT_LAZY
++ def_bool y if HAVE_PREEMPT_LAZY && PREEMPT_RT_FULL
++
+ choice
+ prompt "Preemption Model"
+ default PREEMPT_NONE
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -623,6 +623,38 @@ void resched_curr(struct rq *rq)
+ trace_sched_wake_idle_without_ipi(cpu);
+ }
+
++#ifdef CONFIG_PREEMPT_LAZY
++void resched_curr_lazy(struct rq *rq)
++{
++ struct task_struct *curr = rq->curr;
++ int cpu;
++
++ if (!sched_feat(PREEMPT_LAZY)) {
++ resched_curr(rq);
++ return;
++ }
++
++ lockdep_assert_held(&rq->lock);
++
++ if (test_tsk_need_resched(curr))
++ return;
++
++ if (test_tsk_need_resched_lazy(curr))
++ return;
++
++ set_tsk_need_resched_lazy(curr);
++
++ cpu = cpu_of(rq);
++ if (cpu == smp_processor_id())
++ return;
++
++ /* NEED_RESCHED_LAZY must be visible before we test polling */
++ smp_mb();
++ if (!tsk_is_polling(curr))
++ smp_send_reschedule(cpu);
++}
++#endif
++
+ void resched_cpu(int cpu)
+ {
+ struct rq *rq = cpu_rq(cpu);
+@@ -2018,6 +2050,9 @@ int sched_fork(unsigned long clone_flags
+ p->on_cpu = 0;
+ #endif
+ init_task_preempt_count(p);
++#ifdef CONFIG_HAVE_PREEMPT_LAZY
++ task_thread_info(p)->preempt_lazy_count = 0;
++#endif
+ #ifdef CONFIG_SMP
+ plist_node_init(&p->pushable_tasks, MAX_PRIO);
+ RB_CLEAR_NODE(&p->pushable_dl_tasks);
+@@ -2774,6 +2809,7 @@ void migrate_disable(void)
+ }
+
+ preempt_disable();
++ preempt_lazy_disable();
+ pin_current_cpu();
+ p->migrate_disable = 1;
+ preempt_enable();
+@@ -2831,6 +2867,7 @@ void migrate_enable(void)
+
+ unpin_current_cpu();
+ preempt_enable();
++ preempt_lazy_enable();
+ }
+ EXPORT_SYMBOL(migrate_enable);
+ #else
+@@ -2964,6 +3001,7 @@ static void __sched __schedule(void)
+
+ next = pick_next_task(rq, prev);
+ clear_tsk_need_resched(prev);
++ clear_tsk_need_resched_lazy(prev);
+ clear_preempt_need_resched();
+ rq->clock_skip_update = 0;
+
+@@ -3108,6 +3146,14 @@ asmlinkage __visible void __sched notrac
+ if (likely(!preemptible()))
+ return;
+
++#ifdef CONFIG_PREEMPT_LAZY
++ /*
++ * Check for lazy preemption
++ */
++ if (current_thread_info()->preempt_lazy_count &&
++ !test_thread_flag(TIF_NEED_RESCHED))
++ return;
++#endif
+ do {
+ __preempt_count_add(PREEMPT_ACTIVE);
+ /*
+@@ -4831,7 +4877,9 @@ void init_idle(struct task_struct *idle,
+
+ /* Set the preempt count _outside_ the spinlocks! */
+ init_idle_preempt_count(idle, cpu);
+-
++#ifdef CONFIG_HAVE_PREEMPT_LAZY
++ task_thread_info(idle)->preempt_lazy_count = 0;
++#endif
+ /*
+ * The idle tasks have their own, simple scheduling class:
+ */
+--- a/kernel/sched/fair.c
++++ b/kernel/sched/fair.c
+@@ -3201,7 +3201,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq
+ ideal_runtime = sched_slice(cfs_rq, curr);
+ delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+ if (delta_exec > ideal_runtime) {
+- resched_curr(rq_of(cfs_rq));
++ resched_curr_lazy(rq_of(cfs_rq));
+ /*
+ * The current task ran long enough, ensure it doesn't get
+ * re-elected due to buddy favours.
+@@ -3225,7 +3225,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq
+ return;
+
+ if (delta > ideal_runtime)
+- resched_curr(rq_of(cfs_rq));
++ resched_curr_lazy(rq_of(cfs_rq));
+ }
+
+ static void
+@@ -3366,7 +3366,7 @@ entity_tick(struct cfs_rq *cfs_rq, struc
+ * validating it and just reschedule.
+ */
+ if (queued) {
+- resched_curr(rq_of(cfs_rq));
++ resched_curr_lazy(rq_of(cfs_rq));
+ return;
+ }
+ /*
+@@ -3557,7 +3557,7 @@ static void __account_cfs_rq_runtime(str
+ * hierarchy can be throttled
+ */
+ if (!assign_cfs_rq_runtime(cfs_rq) && likely(cfs_rq->curr))
+- resched_curr(rq_of(cfs_rq));
++ resched_curr_lazy(rq_of(cfs_rq));
+ }
+
+ static __always_inline
+@@ -4180,7 +4180,7 @@ static void hrtick_start_fair(struct rq
+
+ if (delta < 0) {
+ if (rq->curr == p)
+- resched_curr(rq);
++ resched_curr_lazy(rq);
+ return;
+ }
+ hrtick_start(rq, delta);
+@@ -5076,7 +5076,7 @@ static void check_preempt_wakeup(struct
+ return;
+
+ preempt:
+- resched_curr(rq);
++ resched_curr_lazy(rq);
+ /*
+ * Only set the backward buddy when the current task is still
+ * on the rq. This can happen when a wakeup gets interleaved
+@@ -7866,7 +7866,7 @@ static void task_fork_fair(struct task_s
+ * 'current' within the tree based on its new key value.
+ */
+ swap(curr->vruntime, se->vruntime);
+- resched_curr(rq);
++ resched_curr_lazy(rq);
+ }
+
+ se->vruntime -= cfs_rq->min_vruntime;
+@@ -7891,7 +7891,7 @@ prio_changed_fair(struct rq *rq, struct
+ */
+ if (rq->curr == p) {
+ if (p->prio > oldprio)
+- resched_curr(rq);
++ resched_curr_lazy(rq);
+ } else
+ check_preempt_curr(rq, p, 0);
+ }
+--- a/kernel/sched/features.h
++++ b/kernel/sched/features.h
+@@ -52,6 +52,9 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
+
+ #ifdef CONFIG_PREEMPT_RT_FULL
+ SCHED_FEAT(TTWU_QUEUE, false)
++# ifdef CONFIG_PREEMPT_LAZY
++SCHED_FEAT(PREEMPT_LAZY, true)
++# endif
+ #else
+
+ /*
+--- a/kernel/sched/sched.h
++++ b/kernel/sched/sched.h
+@@ -1290,6 +1290,15 @@ extern void init_sched_dl_class(void);
+ extern void resched_curr(struct rq *rq);
+ extern void resched_cpu(int cpu);
+
++#ifdef CONFIG_PREEMPT_LAZY
++extern void resched_curr_lazy(struct rq *rq);
++#else
++static inline void resched_curr_lazy(struct rq *rq)
++{
++ resched_curr(rq);
++}
++#endif
++
+ extern struct rt_bandwidth def_rt_bandwidth;
+ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime);
+
+--- a/kernel/trace/trace.c
++++ b/kernel/trace/trace.c
+@@ -1630,6 +1630,7 @@ tracing_generic_entry_update(struct trac
+ struct task_struct *tsk = current;
+
+ entry->preempt_count = pc & 0xff;
++ entry->preempt_lazy_count = preempt_lazy_count();
+ entry->pid = (tsk) ? tsk->pid : 0;
+ entry->flags =
+ #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
+@@ -1639,7 +1640,8 @@ tracing_generic_entry_update(struct trac
+ #endif
+ ((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
+ ((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
+- (tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
++ (tif_need_resched_now() ? TRACE_FLAG_NEED_RESCHED : 0) |
++ (need_resched_lazy() ? TRACE_FLAG_NEED_RESCHED_LAZY : 0) |
+ (test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
+
+ entry->migrate_disable = (tsk) ? __migrate_disabled(tsk) & 0xFF : 0;
+@@ -2560,15 +2562,17 @@ get_total_entries(struct trace_buffer *b
+
+ static void print_lat_help_header(struct seq_file *m)
+ {
+- seq_puts(m, "# _------=> CPU# \n"
+- "# / _-----=> irqs-off \n"
+- "# | / _----=> need-resched \n"
+- "# || / _---=> hardirq/softirq \n"
+- "# ||| / _--=> preempt-depth \n"
+- "# |||| / _--=> migrate-disable\n"
+- "# ||||| / delay \n"
+- "# cmd pid |||||| time | caller \n"
+- "# \\ / ||||| \\ | / \n");
++ seq_puts(m, "# _--------=> CPU# \n"
++ "# / _-------=> irqs-off \n"
++ "# | / _------=> need-resched \n"
++ "# || / _-----=> need-resched_lazy \n"
++ "# ||| / _----=> hardirq/softirq \n"
++ "# |||| / _---=> preempt-depth \n"
++ "# ||||| / _--=> preempt-lazy-depth\n"
++ "# |||||| / _-=> migrate-disable \n"
++ "# ||||||| / delay \n"
++ "# cmd pid |||||||| time | caller \n"
++ "# \\ / |||||||| \\ | / \n");
+ }
+
+ static void print_event_info(struct trace_buffer *buf, struct seq_file *m)
+@@ -2594,11 +2598,14 @@ static void print_func_help_header_irq(s
+ print_event_info(buf, m);
+ seq_puts(m, "# _-----=> irqs-off\n"
+ "# / _----=> need-resched\n"
+- "# | / _---=> hardirq/softirq\n"
+- "# || / _--=> preempt-depth\n"
+- "# ||| / delay\n"
+- "# TASK-PID CPU# |||| TIMESTAMP FUNCTION\n"
+- "# | | | |||| | |\n");
++ "# |/ _-----=> need-resched_lazy\n"
++ "# || / _---=> hardirq/softirq\n"
++ "# ||| / _--=> preempt-depth\n"
++ "# |||| /_--=> preempt-lazy-depth\n"
++ "# ||||| _-=> migrate-disable \n"
++ "# ||||| / delay\n"
++ "# TASK-PID CPU# |||||| TIMESTAMP FUNCTION\n"
++ "# | | | |||||| | |\n");
+ }
+
+ void
+--- a/kernel/trace/trace.h
++++ b/kernel/trace/trace.h
+@@ -120,6 +120,7 @@ struct kretprobe_trace_entry_head {
+ * NEED_RESCHED - reschedule is requested
+ * HARDIRQ - inside an interrupt handler
+ * SOFTIRQ - inside a softirq handler
++ * NEED_RESCHED_LAZY - lazy reschedule is requested
+ */
+ enum trace_flag_type {
+ TRACE_FLAG_IRQS_OFF = 0x01,
+@@ -128,6 +129,7 @@ enum trace_flag_type {
+ TRACE_FLAG_HARDIRQ = 0x08,
+ TRACE_FLAG_SOFTIRQ = 0x10,
+ TRACE_FLAG_PREEMPT_RESCHED = 0x20,
++ TRACE_FLAG_NEED_RESCHED_LAZY = 0x40,
+ };
+
+ #define TRACE_BUF_SIZE 1024
+--- a/kernel/trace/trace_output.c
++++ b/kernel/trace/trace_output.c
+@@ -430,6 +430,7 @@ int trace_print_lat_fmt(struct trace_seq
+ {
+ char hardsoft_irq;
+ char need_resched;
++ char need_resched_lazy;
+ char irqs_off;
+ int hardirq;
+ int softirq;
+@@ -457,6 +458,8 @@ int trace_print_lat_fmt(struct trace_seq
+ need_resched = '.';
+ break;
+ }
++ need_resched_lazy =
++ (entry->flags & TRACE_FLAG_NEED_RESCHED_LAZY) ? 'L' : '.';
+
+ hardsoft_irq =
+ (hardirq && softirq) ? 'H' :
+@@ -464,14 +467,20 @@ int trace_print_lat_fmt(struct trace_seq
+ softirq ? 's' :
+ '.';
+
+- trace_seq_printf(s, "%c%c%c",
+- irqs_off, need_resched, hardsoft_irq);
++ trace_seq_printf(s, "%c%c%c%c",
++ irqs_off, need_resched, need_resched_lazy,
++ hardsoft_irq);
+
+ if (entry->preempt_count)
+ trace_seq_printf(s, "%x", entry->preempt_count);
+ else
+ trace_seq_putc(s, '.');
+
++ if (entry->preempt_lazy_count)
++ trace_seq_printf(s, "%x", entry->preempt_lazy_count);
++ else
++ trace_seq_putc(s, '.');
++
+ if (entry->migrate_disable)
+ trace_seq_printf(s, "%x", entry->migrate_disable);
+ else
diff --git a/patches/preempt-nort-rt-variants.patch b/patches/preempt-nort-rt-variants.patch
new file mode 100644
index 00000000000000..f607d428515fab
--- /dev/null
+++ b/patches/preempt-nort-rt-variants.patch
@@ -0,0 +1,47 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 24 Jul 2009 12:38:56 +0200
+Subject: preempt: Provide preempt_*_(no)rt variants
+
+RT needs a few preempt_disable/enable points which are not necessary
+otherwise. Implement variants to avoid #ifdeffery.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/preempt.h | 18 +++++++++++++++++-
+ 1 file changed, 17 insertions(+), 1 deletion(-)
+
+--- a/include/linux/preempt.h
++++ b/include/linux/preempt.h
+@@ -47,7 +47,11 @@ do { \
+ preempt_count_dec(); \
+ } while (0)
+
+-#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
++#ifdef CONFIG_PREEMPT_RT_BASE
++# define preempt_enable_no_resched() sched_preempt_enable_no_resched()
++#else
++# define preempt_enable_no_resched() preempt_enable()
++#endif
+
+ #ifdef CONFIG_PREEMPT
+ #define preempt_enable() \
+@@ -144,6 +148,18 @@ do { \
+ set_preempt_need_resched(); \
+ } while (0)
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define preempt_disable_rt() preempt_disable()
++# define preempt_enable_rt() preempt_enable()
++# define preempt_disable_nort() barrier()
++# define preempt_enable_nort() barrier()
++#else
++# define preempt_disable_rt() barrier()
++# define preempt_enable_rt() barrier()
++# define preempt_disable_nort() preempt_disable()
++# define preempt_enable_nort() preempt_enable()
++#endif
++
+ #ifdef CONFIG_PREEMPT_NOTIFIERS
+
+ struct preempt_notifier;
diff --git a/patches/printk-27force_early_printk-27-boot-param-to-help-with-debugging.patch b/patches/printk-27force_early_printk-27-boot-param-to-help-with-debugging.patch
new file mode 100644
index 00000000000000..38b2a0405bbac4
--- /dev/null
+++ b/patches/printk-27force_early_printk-27-boot-param-to-help-with-debugging.patch
@@ -0,0 +1,31 @@
+Subject: printk: Add "force_early_printk" boot param to help with debugging
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Fri, 02 Sep 2011 14:41:29 +0200
+
+Gives me an option to screw printk and actually see what the machine
+says.
+
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Link: http://lkml.kernel.org/r/1314967289.1301.11.camel@twins
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Link: http://lkml.kernel.org/n/tip-ykb97nsfmobq44xketrxs977@git.kernel.org
+---
+ kernel/printk/printk.c | 7 +++++++
+ 1 file changed, 7 insertions(+)
+
+--- a/kernel/printk/printk.c
++++ b/kernel/printk/printk.c
+@@ -1640,6 +1640,13 @@ asmlinkage void early_printk(const char
+ */
+ static bool __read_mostly printk_killswitch;
+
++static int __init force_early_printk_setup(char *str)
++{
++ printk_killswitch = true;
++ return 0;
++}
++early_param("force_early_printk", force_early_printk_setup);
++
+ void printk_kill(void)
+ {
+ printk_killswitch = true;
diff --git a/patches/printk-kill.patch b/patches/printk-kill.patch
new file mode 100644
index 00000000000000..45f71cdc95f00f
--- /dev/null
+++ b/patches/printk-kill.patch
@@ -0,0 +1,162 @@
+Subject: printk: Add a printk kill switch
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 22 Jul 2011 17:58:40 +0200
+
+Add a prinkt-kill-switch. This is used from (NMI) watchdog to ensure that
+it does not dead-lock with the early printk code.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/printk.h | 2 +
+ kernel/printk/printk.c | 76 ++++++++++++++++++++++++++++++++++++-------------
+ kernel/watchdog.c | 14 +++++++--
+ 3 files changed, 70 insertions(+), 22 deletions(-)
+
+--- a/include/linux/printk.h
++++ b/include/linux/printk.h
+@@ -115,9 +115,11 @@ int no_printk(const char *fmt, ...)
+ #ifdef CONFIG_EARLY_PRINTK
+ extern asmlinkage __printf(1, 2)
+ void early_printk(const char *fmt, ...);
++extern void printk_kill(void);
+ #else
+ static inline __printf(1, 2) __cold
+ void early_printk(const char *s, ...) { }
++static inline void printk_kill(void) { }
+ #endif
+
+ typedef int(*printk_func_t)(const char *fmt, va_list args);
+--- a/kernel/printk/printk.c
++++ b/kernel/printk/printk.c
+@@ -1610,6 +1610,55 @@ static size_t cont_print_text(char *text
+ return textlen;
+ }
+
++#ifdef CONFIG_EARLY_PRINTK
++struct console *early_console;
++
++static void early_vprintk(const char *fmt, va_list ap)
++{
++ if (early_console) {
++ char buf[512];
++ int n = vscnprintf(buf, sizeof(buf), fmt, ap);
++
++ early_console->write(early_console, buf, n);
++ }
++}
++
++asmlinkage void early_printk(const char *fmt, ...)
++{
++ va_list ap;
++
++ va_start(ap, fmt);
++ early_vprintk(fmt, ap);
++ va_end(ap);
++}
++
++/*
++ * This is independent of any log levels - a global
++ * kill switch that turns off all of printk.
++ *
++ * Used by the NMI watchdog if early-printk is enabled.
++ */
++static bool __read_mostly printk_killswitch;
++
++void printk_kill(void)
++{
++ printk_killswitch = true;
++}
++
++static int forced_early_printk(const char *fmt, va_list ap)
++{
++ if (!printk_killswitch)
++ return 0;
++ early_vprintk(fmt, ap);
++ return 1;
++}
++#else
++static inline int forced_early_printk(const char *fmt, va_list ap)
++{
++ return 0;
++}
++#endif
++
+ asmlinkage int vprintk_emit(int facility, int level,
+ const char *dict, size_t dictlen,
+ const char *fmt, va_list args)
+@@ -1626,6 +1675,13 @@ asmlinkage int vprintk_emit(int facility
+ /* cpu currently holding logbuf_lock in this function */
+ static unsigned int logbuf_cpu = UINT_MAX;
+
++ /*
++ * Fall back to early_printk if a debugging subsystem has
++ * killed printk output
++ */
++ if (unlikely(forced_early_printk(fmt, args)))
++ return 1;
++
+ if (level == LOGLEVEL_SCHED) {
+ level = LOGLEVEL_DEFAULT;
+ in_sched = true;
+@@ -1905,26 +1961,6 @@ DEFINE_PER_CPU(printk_func_t, printk_fun
+
+ #endif /* CONFIG_PRINTK */
+
+-#ifdef CONFIG_EARLY_PRINTK
+-struct console *early_console;
+-
+-asmlinkage __visible void early_printk(const char *fmt, ...)
+-{
+- va_list ap;
+- char buf[512];
+- int n;
+-
+- if (!early_console)
+- return;
+-
+- va_start(ap, fmt);
+- n = vscnprintf(buf, sizeof(buf), fmt, ap);
+- va_end(ap);
+-
+- early_console->write(early_console, buf, n);
+-}
+-#endif
+-
+ static int __add_preferred_console(char *name, int idx, char *options,
+ char *brl_options)
+ {
+--- a/kernel/watchdog.c
++++ b/kernel/watchdog.c
+@@ -262,6 +262,8 @@ static int is_softlockup(unsigned long t
+
+ #ifdef CONFIG_HARDLOCKUP_DETECTOR
+
++static DEFINE_RAW_SPINLOCK(watchdog_output_lock);
++
+ static struct perf_event_attr wd_hw_attr = {
+ .type = PERF_TYPE_HARDWARE,
+ .config = PERF_COUNT_HW_CPU_CYCLES,
+@@ -295,13 +297,21 @@ static void watchdog_overflow_callback(s
+ /* only print hardlockups once */
+ if (__this_cpu_read(hard_watchdog_warn) == true)
+ return;
++ /*
++ * If early-printk is enabled then make sure we do not
++ * lock up in printk() and kill console logging:
++ */
++ printk_kill();
+
+- if (hardlockup_panic)
++ if (hardlockup_panic) {
+ panic("Watchdog detected hard LOCKUP on cpu %d",
+ this_cpu);
+- else
++ } else {
++ raw_spin_lock(&watchdog_output_lock);
+ WARN(1, "Watchdog detected hard LOCKUP on cpu %d",
+ this_cpu);
++ raw_spin_unlock(&watchdog_output_lock);
++ }
+
+ __this_cpu_write(hard_watchdog_warn, true);
+ return;
diff --git a/patches/printk-rt-aware.patch b/patches/printk-rt-aware.patch
new file mode 100644
index 00000000000000..772aadac5eccef
--- /dev/null
+++ b/patches/printk-rt-aware.patch
@@ -0,0 +1,100 @@
+Subject: printk: Make rt aware
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 19 Sep 2012 14:50:37 +0200
+
+Drop the lock before calling the console driver and do not disable
+interrupts while printing to a serial console.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/printk/printk.c | 26 +++++++++++++++++++++++---
+ 1 file changed, 23 insertions(+), 3 deletions(-)
+
+--- a/kernel/printk/printk.c
++++ b/kernel/printk/printk.c
+@@ -1404,6 +1404,7 @@ static void call_console_drivers(int lev
+ if (!console_drivers)
+ return;
+
++ migrate_disable();
+ for_each_console(con) {
+ if (exclusive_console && con != exclusive_console)
+ continue;
+@@ -1416,6 +1417,7 @@ static void call_console_drivers(int lev
+ continue;
+ con->write(con, text, len);
+ }
++ migrate_enable();
+ }
+
+ /*
+@@ -1476,6 +1478,15 @@ static inline int can_use_console(unsign
+ static int console_trylock_for_printk(void)
+ {
+ unsigned int cpu = smp_processor_id();
++#ifdef CONFIG_PREEMPT_RT_FULL
++ int lock = !early_boot_irqs_disabled && (preempt_count() == 0) &&
++ !irqs_disabled();
++#else
++ int lock = 1;
++#endif
++
++ if (!lock)
++ return 0;
+
+ if (!console_trylock())
+ return 0;
+@@ -1830,8 +1841,7 @@ asmlinkage int vprintk_emit(int facility
+ * console_sem which would prevent anyone from printing to
+ * console
+ */
+- preempt_disable();
+-
++ migrate_disable();
+ /*
+ * Try to acquire and then immediately release the console
+ * semaphore. The release will print out buffers and wake up
+@@ -1839,7 +1849,7 @@ asmlinkage int vprintk_emit(int facility
+ */
+ if (console_trylock_for_printk())
+ console_unlock();
+- preempt_enable();
++ migrate_enable();
+ lockdep_on();
+ }
+
+@@ -2189,11 +2199,16 @@ static void console_cont_flush(char *tex
+ goto out;
+
+ len = cont_print_text(text, size);
++#ifndef CONFIG_PREEMPT_RT_FULL
+ raw_spin_unlock(&logbuf_lock);
+ stop_critical_timings();
+ call_console_drivers(cont.level, text, len);
+ start_critical_timings();
+ local_irq_restore(flags);
++#else
++ raw_spin_unlock_irqrestore(&logbuf_lock, flags);
++ call_console_drivers(cont.level, text, len);
++#endif
+ return;
+ out:
+ raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+@@ -2281,12 +2296,17 @@ void console_unlock(void)
+ console_idx = log_next(console_idx);
+ console_seq++;
+ console_prev = msg->flags;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ raw_spin_unlock_irqrestore(&logbuf_lock, flags);
++ call_console_drivers(level, text, len);
++#else
+ raw_spin_unlock(&logbuf_lock);
+
+ stop_critical_timings(); /* don't trace print latency */
+ call_console_drivers(level, text, len);
+ start_critical_timings();
+ local_irq_restore(flags);
++#endif
+ }
+ console_locked = 0;
+
diff --git a/patches/ptrace-fix-ptrace-vs-tasklist_lock-race.patch b/patches/ptrace-fix-ptrace-vs-tasklist_lock-race.patch
new file mode 100644
index 00000000000000..7592c4b3028774
--- /dev/null
+++ b/patches/ptrace-fix-ptrace-vs-tasklist_lock-race.patch
@@ -0,0 +1,160 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Thu, 29 Aug 2013 18:21:04 +0200
+Subject: ptrace: fix ptrace vs tasklist_lock race
+
+As explained by Alexander Fyodorov <halcy@yandex.ru>:
+
+|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
+|and it can remove __TASK_TRACED from task->state (by moving it to
+|task->saved_state). If parent does wait() on child followed by a sys_ptrace
+|call, the following race can happen:
+|
+|- child sets __TASK_TRACED in ptrace_stop()
+|- parent does wait() which eventually calls wait_task_stopped() and returns
+| child's pid
+|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
+| __TASK_TRACED flag to saved_state
+|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
+
+The patch is based on his initial patch where an additional check is
+added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
+taken in case the caller is interrupted between looking into ->state and
+->saved_state.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/sched.h | 48 +++++++++++++++++++++++++++++++++++++++++++++---
+ kernel/ptrace.c | 7 ++++++-
+ kernel/sched/core.c | 19 ++++++++++++++++---
+ 3 files changed, 67 insertions(+), 7 deletions(-)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -234,10 +234,7 @@ extern char ___assert_task_state[1 - 2*!
+ TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \
+ __TASK_TRACED | EXIT_ZOMBIE | EXIT_DEAD)
+
+-#define task_is_traced(task) ((task->state & __TASK_TRACED) != 0)
+ #define task_is_stopped(task) ((task->state & __TASK_STOPPED) != 0)
+-#define task_is_stopped_or_traced(task) \
+- ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+ #define task_contributes_to_load(task) \
+ ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
+ (task->flags & PF_FROZEN) == 0)
+@@ -2918,6 +2915,51 @@ static inline int signal_pending_state(l
+ return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
+ }
+
++static inline bool __task_is_stopped_or_traced(struct task_struct *task)
++{
++ if (task->state & (__TASK_STOPPED | __TASK_TRACED))
++ return true;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ if (task->saved_state & (__TASK_STOPPED | __TASK_TRACED))
++ return true;
++#endif
++ return false;
++}
++
++static inline bool task_is_stopped_or_traced(struct task_struct *task)
++{
++ bool traced_stopped;
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++ unsigned long flags;
++
++ raw_spin_lock_irqsave(&task->pi_lock, flags);
++ traced_stopped = __task_is_stopped_or_traced(task);
++ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
++#else
++ traced_stopped = __task_is_stopped_or_traced(task);
++#endif
++ return traced_stopped;
++}
++
++static inline bool task_is_traced(struct task_struct *task)
++{
++ bool traced = false;
++
++ if (task->state & __TASK_TRACED)
++ return true;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ /* in case the task is sleeping on tasklist_lock */
++ raw_spin_lock_irq(&task->pi_lock);
++ if (task->state & __TASK_TRACED)
++ traced = true;
++ else if (task->saved_state & __TASK_TRACED)
++ traced = true;
++ raw_spin_unlock_irq(&task->pi_lock);
++#endif
++ return traced;
++}
++
+ /*
+ * cond_resched() and cond_resched_lock(): latency reduction via
+ * explicit rescheduling in places that are safe. The return
+--- a/kernel/ptrace.c
++++ b/kernel/ptrace.c
+@@ -129,7 +129,12 @@ static bool ptrace_freeze_traced(struct
+
+ spin_lock_irq(&task->sighand->siglock);
+ if (task_is_traced(task) && !__fatal_signal_pending(task)) {
+- task->state = __TASK_TRACED;
++ raw_spin_lock_irq(&task->pi_lock);
++ if (task->state & __TASK_TRACED)
++ task->state = __TASK_TRACED;
++ else
++ task->saved_state = __TASK_TRACED;
++ raw_spin_unlock_irq(&task->pi_lock);
+ ret = true;
+ }
+ spin_unlock_irq(&task->sighand->siglock);
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1219,6 +1219,18 @@ struct migration_arg {
+
+ static int migration_cpu_stop(void *data);
+
++static bool check_task_state(struct task_struct *p, long match_state)
++{
++ bool match = false;
++
++ raw_spin_lock_irq(&p->pi_lock);
++ if (p->state == match_state || p->saved_state == match_state)
++ match = true;
++ raw_spin_unlock_irq(&p->pi_lock);
++
++ return match;
++}
++
+ /*
+ * wait_task_inactive - wait for a thread to unschedule.
+ *
+@@ -1263,7 +1275,7 @@ unsigned long wait_task_inactive(struct
+ * is actually now running somewhere else!
+ */
+ while (task_running(rq, p)) {
+- if (match_state && unlikely(p->state != match_state))
++ if (match_state && !check_task_state(p, match_state))
+ return 0;
+ cpu_relax();
+ }
+@@ -1278,7 +1290,8 @@ unsigned long wait_task_inactive(struct
+ running = task_running(rq, p);
+ queued = task_on_rq_queued(p);
+ ncsw = 0;
+- if (!match_state || p->state == match_state)
++ if (!match_state || p->state == match_state ||
++ p->saved_state == match_state)
+ ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+ task_rq_unlock(rq, p, &flags);
+
+@@ -1833,7 +1846,7 @@ static void try_to_wake_up_local(struct
+ */
+ int wake_up_process(struct task_struct *p)
+ {
+- WARN_ON(task_is_stopped_or_traced(p));
++ WARN_ON(__task_is_stopped_or_traced(p));
+ return try_to_wake_up(p, TASK_NORMAL, 0);
+ }
+ EXPORT_SYMBOL(wake_up_process);
diff --git a/patches/radix-tree-rt-aware.patch b/patches/radix-tree-rt-aware.patch
new file mode 100644
index 00000000000000..78762efd6a5a62
--- /dev/null
+++ b/patches/radix-tree-rt-aware.patch
@@ -0,0 +1,72 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 21:33:18 +0200
+Subject: radix-tree: Make RT aware
+
+Disable radix_tree_preload() on -RT. This functions returns with
+preemption disabled which may cause high latencies and breaks if the
+user tries to grab any locks after invoking it.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/radix-tree.h | 7 ++++++-
+ lib/radix-tree.c | 5 ++++-
+ 2 files changed, 10 insertions(+), 2 deletions(-)
+
+--- a/include/linux/radix-tree.h
++++ b/include/linux/radix-tree.h
+@@ -277,8 +277,13 @@ radix_tree_gang_lookup(struct radix_tree
+ unsigned int radix_tree_gang_lookup_slot(struct radix_tree_root *root,
+ void ***results, unsigned long *indices,
+ unsigned long first_index, unsigned int max_items);
++#ifndef CONFIG_PREEMPT_RT_FULL
+ int radix_tree_preload(gfp_t gfp_mask);
+ int radix_tree_maybe_preload(gfp_t gfp_mask);
++#else
++static inline int radix_tree_preload(gfp_t gm) { return 0; }
++static inline int radix_tree_maybe_preload(gfp_t gfp_mask) { return 0; }
++#endif
+ void radix_tree_init(void);
+ void *radix_tree_tag_set(struct radix_tree_root *root,
+ unsigned long index, unsigned int tag);
+@@ -303,7 +308,7 @@ unsigned long radix_tree_locate_item(str
+
+ static inline void radix_tree_preload_end(void)
+ {
+- preempt_enable();
++ preempt_enable_nort();
+ }
+
+ /**
+--- a/lib/radix-tree.c
++++ b/lib/radix-tree.c
+@@ -195,12 +195,13 @@ radix_tree_node_alloc(struct radix_tree_
+ * succeed in getting a node here (and never reach
+ * kmem_cache_alloc)
+ */
+- rtp = this_cpu_ptr(&radix_tree_preloads);
++ rtp = &get_cpu_var(radix_tree_preloads);
+ if (rtp->nr) {
+ ret = rtp->nodes[rtp->nr - 1];
+ rtp->nodes[rtp->nr - 1] = NULL;
+ rtp->nr--;
+ }
++ put_cpu_var(radix_tree_preloads);
+ /*
+ * Update the allocation stack trace as this is more useful
+ * for debugging.
+@@ -240,6 +241,7 @@ radix_tree_node_free(struct radix_tree_n
+ call_rcu(&node->rcu_head, radix_tree_node_rcu_free);
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /*
+ * Load up this CPU's radix_tree_node buffer with sufficient objects to
+ * ensure that the addition of a single element in the tree cannot fail. On
+@@ -305,6 +307,7 @@ int radix_tree_maybe_preload(gfp_t gfp_m
+ return 0;
+ }
+ EXPORT_SYMBOL(radix_tree_maybe_preload);
++#endif
+
+ /*
+ * Return the maximum key which can be store into a
diff --git a/patches/random-make-it-work-on-rt.patch b/patches/random-make-it-work-on-rt.patch
new file mode 100644
index 00000000000000..8da86e0e7102d5
--- /dev/null
+++ b/patches/random-make-it-work-on-rt.patch
@@ -0,0 +1,115 @@
+Subject: random: Make it work on rt
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 21 Aug 2012 20:38:50 +0200
+
+Delegate the random insertion to the forced threaded interrupt
+handler. Store the return IP of the hard interrupt handler in the irq
+descriptor and feed it into the random generator as a source of
+entropy.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ drivers/char/random.c | 11 +++++------
+ include/linux/irqdesc.h | 1 +
+ include/linux/random.h | 2 +-
+ kernel/irq/handle.c | 8 +++++++-
+ kernel/irq/manage.c | 6 ++++++
+ 5 files changed, 20 insertions(+), 8 deletions(-)
+
+--- a/drivers/char/random.c
++++ b/drivers/char/random.c
+@@ -868,28 +868,27 @@ static __u32 get_reg(struct fast_pool *f
+ return *(ptr + f->reg_idx++);
+ }
+
+-void add_interrupt_randomness(int irq, int irq_flags)
++void add_interrupt_randomness(int irq, int irq_flags, __u64 ip)
+ {
+ struct entropy_store *r;
+ struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness);
+- struct pt_regs *regs = get_irq_regs();
+ unsigned long now = jiffies;
+ cycles_t cycles = random_get_entropy();
+ __u32 c_high, j_high;
+- __u64 ip;
+ unsigned long seed;
+ int credit = 0;
+
+ if (cycles == 0)
+- cycles = get_reg(fast_pool, regs);
++ cycles = get_reg(fast_pool, NULL);
+ c_high = (sizeof(cycles) > 4) ? cycles >> 32 : 0;
+ j_high = (sizeof(now) > 4) ? now >> 32 : 0;
+ fast_pool->pool[0] ^= cycles ^ j_high ^ irq;
+ fast_pool->pool[1] ^= now ^ c_high;
+- ip = regs ? instruction_pointer(regs) : _RET_IP_;
++ if (!ip)
++ ip = _RET_IP_;
+ fast_pool->pool[2] ^= ip;
+ fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
+- get_reg(fast_pool, regs);
++ get_reg(fast_pool, NULL);
+
+ fast_mix(fast_pool);
+ add_interrupt_bench(cycles);
+--- a/include/linux/irqdesc.h
++++ b/include/linux/irqdesc.h
+@@ -63,6 +63,7 @@ struct irq_desc {
+ unsigned int irqs_unhandled;
+ atomic_t threads_handled;
+ int threads_handled_last;
++ u64 random_ip;
+ raw_spinlock_t lock;
+ struct cpumask *percpu_enabled;
+ #ifdef CONFIG_SMP
+--- a/include/linux/random.h
++++ b/include/linux/random.h
+@@ -11,7 +11,7 @@
+ extern void add_device_randomness(const void *, unsigned int);
+ extern void add_input_randomness(unsigned int type, unsigned int code,
+ unsigned int value);
+-extern void add_interrupt_randomness(int irq, int irq_flags);
++extern void add_interrupt_randomness(int irq, int irq_flags, __u64 ip);
+
+ extern void get_random_bytes(void *buf, int nbytes);
+ extern void get_random_bytes_arch(void *buf, int nbytes);
+--- a/kernel/irq/handle.c
++++ b/kernel/irq/handle.c
+@@ -133,6 +133,8 @@ void __irq_wake_thread(struct irq_desc *
+ irqreturn_t
+ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
+ {
++ struct pt_regs *regs = get_irq_regs();
++ u64 ip = regs ? instruction_pointer(regs) : 0;
+ irqreturn_t retval = IRQ_NONE;
+ unsigned int flags = 0, irq = desc->irq_data.irq;
+
+@@ -173,7 +175,11 @@ handle_irq_event_percpu(struct irq_desc
+ action = action->next;
+ } while (action);
+
+- add_interrupt_randomness(irq, flags);
++#ifndef CONFIG_PREEMPT_RT_FULL
++ add_interrupt_randomness(irq, flags, ip);
++#else
++ desc->random_ip = ip;
++#endif
+
+ if (!noirqdebug)
+ note_interrupt(irq, desc, retval);
+--- a/kernel/irq/manage.c
++++ b/kernel/irq/manage.c
+@@ -991,6 +991,12 @@ static int irq_thread(void *data)
+ if (action_ret == IRQ_HANDLED)
+ atomic_inc(&desc->threads_handled);
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++ migrate_disable();
++ add_interrupt_randomness(action->irq, 0,
++ desc->random_ip ^ (unsigned long) action);
++ migrate_enable();
++#endif
+ wake_threads_waitq(desc);
+ }
+
diff --git a/patches/rcu-Eliminate-softirq-processing-from-rcutree.patch b/patches/rcu-Eliminate-softirq-processing-from-rcutree.patch
new file mode 100644
index 00000000000000..ff4adc3db39b3d
--- /dev/null
+++ b/patches/rcu-Eliminate-softirq-processing-from-rcutree.patch
@@ -0,0 +1,422 @@
+From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
+Date: Mon, 4 Nov 2013 13:21:10 -0800
+Subject: rcu: Eliminate softirq processing from rcutree
+
+Running RCU out of softirq is a problem for some workloads that would
+like to manage RCU core processing independently of other softirq work,
+for example, setting kthread priority. This commit therefore moves the
+RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
+named rcuc. The SCHED_OTHER approach avoids the scalability problems
+that appeared with the earlier attempt to move RCU core processing to
+from softirq to kthreads. That said, kernels built with RCU_BOOST=y
+will run the rcuc kthreads at the RCU-boosting priority.
+
+Reported-by: Thomas Gleixner <tglx@linutronix.de>
+Tested-by: Mike Galbraith <bitbucket@online.de>
+Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/rcu/tree.c | 110 +++++++++++++++++++++++++++++++++---
+ kernel/rcu/tree.h | 5 -
+ kernel/rcu/tree_plugin.h | 141 +++++------------------------------------------
+ 3 files changed, 116 insertions(+), 140 deletions(-)
+
+--- a/kernel/rcu/tree.c
++++ b/kernel/rcu/tree.c
+@@ -56,6 +56,11 @@
+ #include <linux/random.h>
+ #include <linux/ftrace_event.h>
+ #include <linux/suspend.h>
++#include <linux/delay.h>
++#include <linux/gfp.h>
++#include <linux/oom.h>
++#include <linux/smpboot.h>
++#include "../time/tick-internal.h"
+
+ #include "tree.h"
+ #include "rcu.h"
+@@ -2882,18 +2887,17 @@ static void
+ /*
+ * Do RCU core processing for the current CPU.
+ */
+-static void rcu_process_callbacks(struct softirq_action *unused)
++static void rcu_process_callbacks(void)
+ {
+ struct rcu_state *rsp;
+
+ if (cpu_is_offline(smp_processor_id()))
+ return;
+- trace_rcu_utilization(TPS("Start RCU core"));
+ for_each_rcu_flavor(rsp)
+ __rcu_process_callbacks(rsp);
+- trace_rcu_utilization(TPS("End RCU core"));
+ }
+
++static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task);
+ /*
+ * Schedule RCU callback invocation. If the specified type of RCU
+ * does not support RCU priority boosting, just do a direct call,
+@@ -2905,18 +2909,105 @@ static void invoke_rcu_callbacks(struct
+ {
+ if (unlikely(!ACCESS_ONCE(rcu_scheduler_fully_active)))
+ return;
+- if (likely(!rsp->boost)) {
+- rcu_do_batch(rsp, rdp);
++ rcu_do_batch(rsp, rdp);
++}
++
++static void rcu_wake_cond(struct task_struct *t, int status)
++{
++ /*
++ * If the thread is yielding, only wake it when this
++ * is invoked from idle
++ */
++ if (t && (status != RCU_KTHREAD_YIELDING || is_idle_task(current)))
++ wake_up_process(t);
++}
++
++/*
++ * Wake up this CPU's rcuc kthread to do RCU core processing.
++ */
++static void invoke_rcu_core(void)
++{
++ unsigned long flags;
++ struct task_struct *t;
++
++ if (!cpu_online(smp_processor_id()))
+ return;
++ local_irq_save(flags);
++ __this_cpu_write(rcu_cpu_has_work, 1);
++ t = __this_cpu_read(rcu_cpu_kthread_task);
++ if (t != NULL && current != t)
++ rcu_wake_cond(t, __this_cpu_read(rcu_cpu_kthread_status));
++ local_irq_restore(flags);
++}
++
++static void rcu_cpu_kthread_park(unsigned int cpu)
++{
++ per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU;
++}
++
++static int rcu_cpu_kthread_should_run(unsigned int cpu)
++{
++ return __this_cpu_read(rcu_cpu_has_work);
++}
++
++/*
++ * Per-CPU kernel thread that invokes RCU callbacks. This replaces the
++ * RCU softirq used in flavors and configurations of RCU that do not
++ * support RCU priority boosting.
++ */
++static void rcu_cpu_kthread(unsigned int cpu)
++{
++ unsigned int *statusp = this_cpu_ptr(&rcu_cpu_kthread_status);
++ char work, *workp = this_cpu_ptr(&rcu_cpu_has_work);
++ int spincnt;
++
++ for (spincnt = 0; spincnt < 10; spincnt++) {
++ trace_rcu_utilization(TPS("Start CPU kthread@rcu_wait"));
++ local_bh_disable();
++ *statusp = RCU_KTHREAD_RUNNING;
++ this_cpu_inc(rcu_cpu_kthread_loops);
++ local_irq_disable();
++ work = *workp;
++ *workp = 0;
++ local_irq_enable();
++ if (work)
++ rcu_process_callbacks();
++ local_bh_enable();
++ if (*workp == 0) {
++ trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
++ *statusp = RCU_KTHREAD_WAITING;
++ return;
++ }
+ }
+- invoke_rcu_callbacks_kthread();
++ *statusp = RCU_KTHREAD_YIELDING;
++ trace_rcu_utilization(TPS("Start CPU kthread@rcu_yield"));
++ schedule_timeout_interruptible(2);
++ trace_rcu_utilization(TPS("End CPU kthread@rcu_yield"));
++ *statusp = RCU_KTHREAD_WAITING;
+ }
+
+-static void invoke_rcu_core(void)
++static struct smp_hotplug_thread rcu_cpu_thread_spec = {
++ .store = &rcu_cpu_kthread_task,
++ .thread_should_run = rcu_cpu_kthread_should_run,
++ .thread_fn = rcu_cpu_kthread,
++ .thread_comm = "rcuc/%u",
++ .setup = rcu_cpu_kthread_setup,
++ .park = rcu_cpu_kthread_park,
++};
++
++/*
++ * Spawn per-CPU RCU core processing kthreads.
++ */
++static int __init rcu_spawn_core_kthreads(void)
+ {
+- if (cpu_online(smp_processor_id()))
+- raise_softirq(RCU_SOFTIRQ);
++ int cpu;
++
++ for_each_possible_cpu(cpu)
++ per_cpu(rcu_cpu_has_work, cpu) = 0;
++ BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
++ return 0;
+ }
++early_initcall(rcu_spawn_core_kthreads);
+
+ /*
+ * Handle any core-RCU processing required by a call_rcu() invocation.
+@@ -4148,7 +4239,6 @@ void __init rcu_init(void)
+ rcu_init_one(&rcu_bh_state, &rcu_bh_data);
+ rcu_init_one(&rcu_sched_state, &rcu_sched_data);
+ __rcu_init_preempt();
+- open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
+
+ /*
+ * We don't need protection against CPU-hotplug here because
+--- a/kernel/rcu/tree.h
++++ b/kernel/rcu/tree.h
+@@ -530,12 +530,10 @@ extern struct rcu_state rcu_preempt_stat
+ DECLARE_PER_CPU(struct rcu_data, rcu_preempt_data);
+ #endif /* #ifdef CONFIG_PREEMPT_RCU */
+
+-#ifdef CONFIG_RCU_BOOST
+ DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
+ DECLARE_PER_CPU(int, rcu_cpu_kthread_cpu);
+ DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
+ DECLARE_PER_CPU(char, rcu_cpu_has_work);
+-#endif /* #ifdef CONFIG_RCU_BOOST */
+
+ #ifndef RCU_TREE_NONCORE
+
+@@ -554,10 +552,9 @@ void call_rcu(struct rcu_head *head, voi
+ static void __init __rcu_init_preempt(void);
+ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
+ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
+-static void invoke_rcu_callbacks_kthread(void);
+ static bool rcu_is_callbacks_kthread(void);
++static void rcu_cpu_kthread_setup(unsigned int cpu);
+ #ifdef CONFIG_RCU_BOOST
+-static void rcu_preempt_do_callbacks(void);
+ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
+ struct rcu_node *rnp);
+ #endif /* #ifdef CONFIG_RCU_BOOST */
+--- a/kernel/rcu/tree_plugin.h
++++ b/kernel/rcu/tree_plugin.h
+@@ -24,28 +24,20 @@
+ * Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ */
+
+-#include <linux/delay.h>
+-#include <linux/gfp.h>
+-#include <linux/oom.h>
+-#include <linux/smpboot.h>
+-#include <linux/jiffies.h>
+-#include "../time/tick-internal.h"
+-
+ #ifdef CONFIG_RCU_BOOST
+
+ #include "../locking/rtmutex_common.h"
+
++#endif /* #ifdef CONFIG_RCU_BOOST */
++
+ /*
+ * Control variables for per-CPU and per-rcu_node kthreads. These
+ * handle all flavors of RCU.
+ */
+-static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task);
+ DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
+ DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
+ DEFINE_PER_CPU(char, rcu_cpu_has_work);
+
+-#endif /* #ifdef CONFIG_RCU_BOOST */
+-
+ #ifdef CONFIG_RCU_NOCB_CPU
+ static cpumask_var_t rcu_nocb_mask; /* CPUs to have callbacks offloaded. */
+ static bool have_rcu_nocb_mask; /* Was rcu_nocb_mask allocated? */
+@@ -497,15 +489,6 @@ static void rcu_preempt_check_callbacks(
+ t->rcu_read_unlock_special.b.need_qs = true;
+ }
+
+-#ifdef CONFIG_RCU_BOOST
+-
+-static void rcu_preempt_do_callbacks(void)
+-{
+- rcu_do_batch(&rcu_preempt_state, this_cpu_ptr(&rcu_preempt_data));
+-}
+-
+-#endif /* #ifdef CONFIG_RCU_BOOST */
+-
+ /*
+ * Queue a preemptible-RCU callback for invocation after a grace period.
+ */
+@@ -940,6 +923,19 @@ void exit_rcu(void)
+
+ #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
++/*
++ * If boosting, set rcuc kthreads to realtime priority.
++ */
++static void rcu_cpu_kthread_setup(unsigned int cpu)
++{
++#ifdef CONFIG_RCU_BOOST
++ struct sched_param sp;
++
++ sp.sched_priority = kthread_prio;
++ sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
++#endif /* #ifdef CONFIG_RCU_BOOST */
++}
++
+ #ifdef CONFIG_RCU_BOOST
+
+ #include "../locking/rtmutex_common.h"
+@@ -971,16 +967,6 @@ static void rcu_initiate_boost_trace(str
+
+ #endif /* #else #ifdef CONFIG_RCU_TRACE */
+
+-static void rcu_wake_cond(struct task_struct *t, int status)
+-{
+- /*
+- * If the thread is yielding, only wake it when this
+- * is invoked from idle
+- */
+- if (status != RCU_KTHREAD_YIELDING || is_idle_task(current))
+- wake_up_process(t);
+-}
+-
+ /*
+ * Carry out RCU priority boosting on the task indicated by ->exp_tasks
+ * or ->boost_tasks, advancing the pointer to the next task in the
+@@ -1126,23 +1112,6 @@ static void rcu_initiate_boost(struct rc
+ }
+
+ /*
+- * Wake up the per-CPU kthread to invoke RCU callbacks.
+- */
+-static void invoke_rcu_callbacks_kthread(void)
+-{
+- unsigned long flags;
+-
+- local_irq_save(flags);
+- __this_cpu_write(rcu_cpu_has_work, 1);
+- if (__this_cpu_read(rcu_cpu_kthread_task) != NULL &&
+- current != __this_cpu_read(rcu_cpu_kthread_task)) {
+- rcu_wake_cond(__this_cpu_read(rcu_cpu_kthread_task),
+- __this_cpu_read(rcu_cpu_kthread_status));
+- }
+- local_irq_restore(flags);
+-}
+-
+-/*
+ * Is the current CPU running the RCU-callbacks kthread?
+ * Caller must have preemption disabled.
+ */
+@@ -1197,67 +1166,6 @@ static int rcu_spawn_one_boost_kthread(s
+ return 0;
+ }
+
+-static void rcu_kthread_do_work(void)
+-{
+- rcu_do_batch(&rcu_sched_state, this_cpu_ptr(&rcu_sched_data));
+- rcu_do_batch(&rcu_bh_state, this_cpu_ptr(&rcu_bh_data));
+- rcu_preempt_do_callbacks();
+-}
+-
+-static void rcu_cpu_kthread_setup(unsigned int cpu)
+-{
+- struct sched_param sp;
+-
+- sp.sched_priority = kthread_prio;
+- sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
+-}
+-
+-static void rcu_cpu_kthread_park(unsigned int cpu)
+-{
+- per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU;
+-}
+-
+-static int rcu_cpu_kthread_should_run(unsigned int cpu)
+-{
+- return __this_cpu_read(rcu_cpu_has_work);
+-}
+-
+-/*
+- * Per-CPU kernel thread that invokes RCU callbacks. This replaces the
+- * RCU softirq used in flavors and configurations of RCU that do not
+- * support RCU priority boosting.
+- */
+-static void rcu_cpu_kthread(unsigned int cpu)
+-{
+- unsigned int *statusp = this_cpu_ptr(&rcu_cpu_kthread_status);
+- char work, *workp = this_cpu_ptr(&rcu_cpu_has_work);
+- int spincnt;
+-
+- for (spincnt = 0; spincnt < 10; spincnt++) {
+- trace_rcu_utilization(TPS("Start CPU kthread@rcu_wait"));
+- local_bh_disable();
+- *statusp = RCU_KTHREAD_RUNNING;
+- this_cpu_inc(rcu_cpu_kthread_loops);
+- local_irq_disable();
+- work = *workp;
+- *workp = 0;
+- local_irq_enable();
+- if (work)
+- rcu_kthread_do_work();
+- local_bh_enable();
+- if (*workp == 0) {
+- trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
+- *statusp = RCU_KTHREAD_WAITING;
+- return;
+- }
+- }
+- *statusp = RCU_KTHREAD_YIELDING;
+- trace_rcu_utilization(TPS("Start CPU kthread@rcu_yield"));
+- schedule_timeout_interruptible(2);
+- trace_rcu_utilization(TPS("End CPU kthread@rcu_yield"));
+- *statusp = RCU_KTHREAD_WAITING;
+-}
+-
+ /*
+ * Set the per-rcu_node kthread's affinity to cover all CPUs that are
+ * served by the rcu_node in question. The CPU hotplug lock is still
+@@ -1287,26 +1195,12 @@ static void rcu_boost_kthread_setaffinit
+ free_cpumask_var(cm);
+ }
+
+-static struct smp_hotplug_thread rcu_cpu_thread_spec = {
+- .store = &rcu_cpu_kthread_task,
+- .thread_should_run = rcu_cpu_kthread_should_run,
+- .thread_fn = rcu_cpu_kthread,
+- .thread_comm = "rcuc/%u",
+- .setup = rcu_cpu_kthread_setup,
+- .park = rcu_cpu_kthread_park,
+-};
+-
+ /*
+ * Spawn boost kthreads -- called as soon as the scheduler is running.
+ */
+ static void __init rcu_spawn_boost_kthreads(void)
+ {
+ struct rcu_node *rnp;
+- int cpu;
+-
+- for_each_possible_cpu(cpu)
+- per_cpu(rcu_cpu_has_work, cpu) = 0;
+- BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
+ rcu_for_each_leaf_node(rcu_state_p, rnp)
+ (void)rcu_spawn_one_boost_kthread(rcu_state_p, rnp);
+ }
+@@ -1329,11 +1223,6 @@ static void rcu_initiate_boost(struct rc
+ raw_spin_unlock_irqrestore(&rnp->lock, flags);
+ }
+
+-static void invoke_rcu_callbacks_kthread(void)
+-{
+- WARN_ON_ONCE(1);
+-}
+-
+ static bool rcu_is_callbacks_kthread(void)
+ {
+ return false;
diff --git a/patches/rcu-disable-rcu-fast-no-hz-on-rt.patch b/patches/rcu-disable-rcu-fast-no-hz-on-rt.patch
new file mode 100644
index 00000000000000..12053fabce9938
--- /dev/null
+++ b/patches/rcu-disable-rcu-fast-no-hz-on-rt.patch
@@ -0,0 +1,24 @@
+Subject: rcu: Disable RCU_FAST_NO_HZ on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 28 Oct 2012 13:26:09 +0000
+
+This uses a timer_list timer from the irq disabled guts of the idle
+code. Disable it for now to prevent wreckage.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ init/Kconfig | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -637,7 +637,7 @@ config RCU_FANOUT_EXACT
+
+ config RCU_FAST_NO_HZ
+ bool "Accelerate last non-dyntick-idle CPU's grace periods"
+- depends on NO_HZ_COMMON && SMP
++ depends on NO_HZ_COMMON && SMP && !PREEMPT_RT_FULL
+ default n
+ help
+ This option permits CPUs to enter dynticks-idle state even if
diff --git a/patches/rcu-make-RCU_BOOST-default-on-RT.patch b/patches/rcu-make-RCU_BOOST-default-on-RT.patch
new file mode 100644
index 00000000000000..71fa55a7fa7769
--- /dev/null
+++ b/patches/rcu-make-RCU_BOOST-default-on-RT.patch
@@ -0,0 +1,26 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 21 Mar 2014 20:19:05 +0100
+Subject: rcu: make RCU_BOOST default on RT
+
+Since it is no longer invoked from the softirq people run into OOM more
+often if the priority of the RCU thread is too low. Making boosting
+default on RT should help in those case and it can be switched off if
+someone knows better.
+
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ init/Kconfig | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -664,7 +664,7 @@ config TREE_RCU_TRACE
+ config RCU_BOOST
+ bool "Enable RCU priority boosting"
+ depends on RT_MUTEXES && PREEMPT_RCU
+- default n
++ default y if PREEMPT_RT_FULL
+ help
+ This option boosts the priority of preempted RCU readers that
+ block the current preemptible RCU grace period for too long.
diff --git a/patches/rcu-merge-rcu-bh-into-rcu-preempt-for-rt.patch b/patches/rcu-merge-rcu-bh-into-rcu-preempt-for-rt.patch
new file mode 100644
index 00000000000000..1d3b4a2c4fc12a
--- /dev/null
+++ b/patches/rcu-merge-rcu-bh-into-rcu-preempt-for-rt.patch
@@ -0,0 +1,271 @@
+Subject: rcu: Merge RCU-bh into RCU-preempt
+Date: Wed, 5 Oct 2011 11:59:38 -0700
+From: Thomas Gleixner <tglx@linutronix.de>
+
+The Linux kernel has long RCU-bh read-side critical sections that
+intolerably increase scheduling latency under mainline's RCU-bh rules,
+which include RCU-bh read-side critical sections being non-preemptible.
+This patch therefore arranges for RCU-bh to be implemented in terms of
+RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.
+
+This has the downside of defeating the purpose of RCU-bh, namely,
+handling the case where the system is subjected to a network-based
+denial-of-service attack that keeps at least one CPU doing full-time
+softirq processing. This issue will be fixed by a later commit.
+
+The current commit will need some work to make it appropriate for
+mainline use, for example, it needs to be extended to cover Tiny RCU.
+
+[ paulmck: Added a useful changelog ]
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/rcupdate.h | 25 +++++++++++++++++++++++++
+ include/linux/rcutree.h | 18 ++++++++++++++++--
+ kernel/rcu/tree.c | 16 ++++++++++++++++
+ kernel/rcu/update.c | 2 ++
+ 4 files changed, 59 insertions(+), 2 deletions(-)
+
+--- a/include/linux/rcupdate.h
++++ b/include/linux/rcupdate.h
+@@ -167,6 +167,9 @@ void call_rcu(struct rcu_head *head,
+
+ #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++#define call_rcu_bh call_rcu
++#else
+ /**
+ * call_rcu_bh() - Queue an RCU for invocation after a quicker grace period.
+ * @head: structure to be used for queueing the RCU updates.
+@@ -190,6 +193,7 @@ void call_rcu(struct rcu_head *head,
+ */
+ void call_rcu_bh(struct rcu_head *head,
+ void (*func)(struct rcu_head *head));
++#endif
+
+ /**
+ * call_rcu_sched() - Queue an RCU for invocation after sched grace period.
+@@ -296,7 +300,13 @@ static inline int rcu_preempt_depth(void
+ void rcu_init(void);
+ void rcu_end_inkernel_boot(void);
+ void rcu_sched_qs(void);
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++static inline void rcu_bh_qs(void) { }
++#else
+ void rcu_bh_qs(void);
++#endif
++
+ void rcu_check_callbacks(int user);
+ struct notifier_block;
+ void rcu_idle_enter(void);
+@@ -470,7 +480,14 @@ extern struct lockdep_map rcu_callback_m
+ int debug_lockdep_rcu_enabled(void);
+
+ int rcu_read_lock_held(void);
++#ifdef CONFIG_PREEMPT_RT_FULL
++static inline int rcu_read_lock_bh_held(void)
++{
++ return rcu_read_lock_held();
++}
++#else
+ int rcu_read_lock_bh_held(void);
++#endif
+
+ /**
+ * rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
+@@ -997,10 +1014,14 @@ static inline void rcu_read_unlock(void)
+ static inline void rcu_read_lock_bh(void)
+ {
+ local_bh_disable();
++#ifdef CONFIG_PREEMPT_RT_FULL
++ rcu_read_lock();
++#else
+ __acquire(RCU_BH);
+ rcu_lock_acquire(&rcu_bh_lock_map);
+ rcu_lockdep_assert(rcu_is_watching(),
+ "rcu_read_lock_bh() used illegally while idle");
++#endif
+ }
+
+ /*
+@@ -1010,10 +1031,14 @@ static inline void rcu_read_lock_bh(void
+ */
+ static inline void rcu_read_unlock_bh(void)
+ {
++#ifdef CONFIG_PREEMPT_RT_FULL
++ rcu_read_unlock();
++#else
+ rcu_lockdep_assert(rcu_is_watching(),
+ "rcu_read_unlock_bh() used illegally while idle");
+ rcu_lock_release(&rcu_bh_lock_map);
+ __release(RCU_BH);
++#endif
+ local_bh_enable();
+ }
+
+--- a/include/linux/rcutree.h
++++ b/include/linux/rcutree.h
+@@ -46,7 +46,11 @@ static inline void rcu_virt_note_context
+ rcu_note_context_switch();
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define synchronize_rcu_bh synchronize_rcu
++#else
+ void synchronize_rcu_bh(void);
++#endif
+ void synchronize_sched_expedited(void);
+ void synchronize_rcu_expedited(void);
+
+@@ -74,7 +78,11 @@ static inline void synchronize_rcu_bh_ex
+ }
+
+ void rcu_barrier(void);
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define rcu_barrier_bh rcu_barrier
++#else
+ void rcu_barrier_bh(void);
++#endif
+ void rcu_barrier_sched(void);
+ unsigned long get_state_synchronize_rcu(void);
+ void cond_synchronize_rcu(unsigned long oldstate);
+@@ -85,12 +93,10 @@ unsigned long rcu_batches_started(void);
+ unsigned long rcu_batches_started_bh(void);
+ unsigned long rcu_batches_started_sched(void);
+ unsigned long rcu_batches_completed(void);
+-unsigned long rcu_batches_completed_bh(void);
+ unsigned long rcu_batches_completed_sched(void);
+ void show_rcu_gp_kthreads(void);
+
+ void rcu_force_quiescent_state(void);
+-void rcu_bh_force_quiescent_state(void);
+ void rcu_sched_force_quiescent_state(void);
+
+ void exit_rcu(void);
+@@ -100,6 +106,14 @@ extern int rcu_scheduler_active __read_m
+
+ bool rcu_is_watching(void);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++void rcu_bh_force_quiescent_state(void);
++unsigned long rcu_batches_completed_bh(void);
++#else
++# define rcu_bh_force_quiescent_state rcu_force_quiescent_state
++# define rcu_batches_completed_bh rcu_batches_completed
++#endif
++
+ void rcu_all_qs(void);
+
+ #endif /* __LINUX_RCUTREE_H */
+--- a/kernel/rcu/tree.c
++++ b/kernel/rcu/tree.c
+@@ -220,6 +220,7 @@ void rcu_sched_qs(void)
+ }
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ void rcu_bh_qs(void)
+ {
+ if (!__this_cpu_read(rcu_bh_data.passed_quiesce)) {
+@@ -229,6 +230,7 @@ void rcu_bh_qs(void)
+ __this_cpu_write(rcu_bh_data.passed_quiesce, 1);
+ }
+ }
++#endif
+
+ static DEFINE_PER_CPU(int, rcu_sched_qs_mask);
+
+@@ -404,6 +406,7 @@ unsigned long rcu_batches_completed_sche
+ }
+ EXPORT_SYMBOL_GPL(rcu_batches_completed_sched);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /*
+ * Return the number of RCU BH batches completed thus far for debug & stats.
+ */
+@@ -431,6 +434,13 @@ void rcu_bh_force_quiescent_state(void)
+ }
+ EXPORT_SYMBOL_GPL(rcu_bh_force_quiescent_state);
+
++#else
++void rcu_force_quiescent_state(void)
++{
++}
++EXPORT_SYMBOL_GPL(rcu_force_quiescent_state);
++#endif
++
+ /*
+ * Force a quiescent state for RCU-sched.
+ */
+@@ -3040,6 +3050,7 @@ void call_rcu_sched(struct rcu_head *hea
+ }
+ EXPORT_SYMBOL_GPL(call_rcu_sched);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /*
+ * Queue an RCU callback for invocation after a quicker grace period.
+ */
+@@ -3048,6 +3059,7 @@ void call_rcu_bh(struct rcu_head *head,
+ __call_rcu(head, func, &rcu_bh_state, -1, 0);
+ }
+ EXPORT_SYMBOL_GPL(call_rcu_bh);
++#endif
+
+ /*
+ * Queue an RCU callback for lazy invocation after a grace period.
+@@ -3139,6 +3151,7 @@ void synchronize_sched(void)
+ }
+ EXPORT_SYMBOL_GPL(synchronize_sched);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /**
+ * synchronize_rcu_bh - wait until an rcu_bh grace period has elapsed.
+ *
+@@ -3165,6 +3178,7 @@ void synchronize_rcu_bh(void)
+ wait_rcu_gp(call_rcu_bh);
+ }
+ EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
++#endif
+
+ /**
+ * get_state_synchronize_rcu - Snapshot current RCU state
+@@ -3677,6 +3691,7 @@ static void _rcu_barrier(struct rcu_stat
+ mutex_unlock(&rsp->barrier_mutex);
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /**
+ * rcu_barrier_bh - Wait until all in-flight call_rcu_bh() callbacks complete.
+ */
+@@ -3685,6 +3700,7 @@ void rcu_barrier_bh(void)
+ _rcu_barrier(&rcu_bh_state);
+ }
+ EXPORT_SYMBOL_GPL(rcu_barrier_bh);
++#endif
+
+ /**
+ * rcu_barrier_sched - Wait for in-flight call_rcu_sched() callbacks.
+--- a/kernel/rcu/update.c
++++ b/kernel/rcu/update.c
+@@ -227,6 +227,7 @@ int rcu_read_lock_held(void)
+ }
+ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /**
+ * rcu_read_lock_bh_held() - might we be in RCU-bh read-side critical section?
+ *
+@@ -253,6 +254,7 @@ int rcu_read_lock_bh_held(void)
+ return in_softirq() || irqs_disabled();
+ }
+ EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
++#endif
+
+ #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+
diff --git a/patches/rcu-more-swait-conversions.patch b/patches/rcu-more-swait-conversions.patch
new file mode 100644
index 00000000000000..3ec0dc8b7916e6
--- /dev/null
+++ b/patches/rcu-more-swait-conversions.patch
@@ -0,0 +1,174 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 31 Jul 2013 19:00:35 +0200
+Subject: rcu: use simple waitqueues
+
+Convert RCU's wait-queues into simple waitqueues.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+Merged Steven's
+
+ static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) {
+- swait_wake(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
++ wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
+ }
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/rcu/tree.c | 8 ++++----
+ kernel/rcu/tree.h | 7 ++++---
+ kernel/rcu/tree_plugin.h | 18 +++++++++---------
+ 3 files changed, 17 insertions(+), 16 deletions(-)
+
+--- a/kernel/rcu/tree.c
++++ b/kernel/rcu/tree.c
+@@ -1567,7 +1567,7 @@ static void rcu_gp_kthread_wake(struct r
+ !ACCESS_ONCE(rsp->gp_flags) ||
+ !rsp->gp_kthread)
+ return;
+- wake_up(&rsp->gp_wq);
++ swait_wake(&rsp->gp_wq);
+ }
+
+ /*
+@@ -2008,7 +2008,7 @@ static int __noreturn rcu_gp_kthread(voi
+ ACCESS_ONCE(rsp->gpnum),
+ TPS("reqwait"));
+ rsp->gp_state = RCU_GP_WAIT_GPS;
+- wait_event_interruptible(rsp->gp_wq,
++ swait_event_interruptible(rsp->gp_wq,
+ ACCESS_ONCE(rsp->gp_flags) &
+ RCU_GP_FLAG_INIT);
+ /* Locking provides needed memory barrier. */
+@@ -2037,7 +2037,7 @@ static int __noreturn rcu_gp_kthread(voi
+ ACCESS_ONCE(rsp->gpnum),
+ TPS("fqswait"));
+ rsp->gp_state = RCU_GP_WAIT_FQS;
+- ret = wait_event_interruptible_timeout(rsp->gp_wq,
++ ret = swait_event_interruptible_timeout(rsp->gp_wq,
+ ((gf = ACCESS_ONCE(rsp->gp_flags)) &
+ RCU_GP_FLAG_FQS) ||
+ (!ACCESS_ONCE(rnp->qsmask) &&
+@@ -4049,7 +4049,7 @@ static void __init rcu_init_one(struct r
+ }
+ }
+
+- init_waitqueue_head(&rsp->gp_wq);
++ init_swait_head(&rsp->gp_wq);
+ rnp = rsp->level[rcu_num_lvls - 1];
+ for_each_possible_cpu(i) {
+ while (i > rnp->grphi)
+--- a/kernel/rcu/tree.h
++++ b/kernel/rcu/tree.h
+@@ -27,6 +27,7 @@
+ #include <linux/threads.h>
+ #include <linux/cpumask.h>
+ #include <linux/seqlock.h>
++#include <linux/wait-simple.h>
+
+ /*
+ * Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
+@@ -210,7 +211,7 @@ struct rcu_node {
+ /* This can happen due to race conditions. */
+ #endif /* #ifdef CONFIG_RCU_BOOST */
+ #ifdef CONFIG_RCU_NOCB_CPU
+- wait_queue_head_t nocb_gp_wq[2];
++ struct swait_head nocb_gp_wq[2];
+ /* Place for rcu_nocb_kthread() to wait GP. */
+ #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
+ int need_future_gp[2];
+@@ -349,7 +350,7 @@ struct rcu_data {
+ atomic_long_t nocb_q_count_lazy; /* invocation (all stages). */
+ struct rcu_head *nocb_follower_head; /* CBs ready to invoke. */
+ struct rcu_head **nocb_follower_tail;
+- wait_queue_head_t nocb_wq; /* For nocb kthreads to sleep on. */
++ struct swait_head nocb_wq; /* For nocb kthreads to sleep on. */
+ struct task_struct *nocb_kthread;
+ int nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */
+
+@@ -438,7 +439,7 @@ struct rcu_state {
+ unsigned long gpnum; /* Current gp number. */
+ unsigned long completed; /* # of last completed gp. */
+ struct task_struct *gp_kthread; /* Task for grace periods. */
+- wait_queue_head_t gp_wq; /* Where GP task waits. */
++ struct swait_head gp_wq; /* Where GP task waits. */
+ short gp_flags; /* Commands for GP task. */
+ short gp_state; /* GP kthread sleep state. */
+
+--- a/kernel/rcu/tree_plugin.h
++++ b/kernel/rcu/tree_plugin.h
+@@ -1864,7 +1864,7 @@ early_param("rcu_nocb_poll", parse_rcu_n
+ */
+ static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
+ {
+- wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
++ swait_wake_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
+ }
+
+ /*
+@@ -1882,8 +1882,8 @@ static void rcu_nocb_gp_set(struct rcu_n
+
+ static void rcu_init_one_nocb(struct rcu_node *rnp)
+ {
+- init_waitqueue_head(&rnp->nocb_gp_wq[0]);
+- init_waitqueue_head(&rnp->nocb_gp_wq[1]);
++ init_swait_head(&rnp->nocb_gp_wq[0]);
++ init_swait_head(&rnp->nocb_gp_wq[1]);
+ }
+
+ #ifndef CONFIG_RCU_NOCB_CPU_ALL
+@@ -1908,7 +1908,7 @@ static void wake_nocb_leader(struct rcu_
+ if (ACCESS_ONCE(rdp_leader->nocb_leader_sleep) || force) {
+ /* Prior smp_mb__after_atomic() orders against prior enqueue. */
+ ACCESS_ONCE(rdp_leader->nocb_leader_sleep) = false;
+- wake_up(&rdp_leader->nocb_wq);
++ swait_wake(&rdp_leader->nocb_wq);
+ }
+ }
+
+@@ -2121,7 +2121,7 @@ static void rcu_nocb_wait_gp(struct rcu_
+ */
+ trace_rcu_future_gp(rnp, rdp, c, TPS("StartWait"));
+ for (;;) {
+- wait_event_interruptible(
++ swait_event_interruptible(
+ rnp->nocb_gp_wq[c & 0x1],
+ (d = ULONG_CMP_GE(ACCESS_ONCE(rnp->completed), c)));
+ if (likely(d))
+@@ -2149,7 +2149,7 @@ static void nocb_leader_wait(struct rcu_
+ /* Wait for callbacks to appear. */
+ if (!rcu_nocb_poll) {
+ trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Sleep");
+- wait_event_interruptible(my_rdp->nocb_wq,
++ swait_event_interruptible(my_rdp->nocb_wq,
+ !ACCESS_ONCE(my_rdp->nocb_leader_sleep));
+ /* Memory barrier handled by smp_mb() calls below and repoll. */
+ } else if (firsttime) {
+@@ -2224,7 +2224,7 @@ static void nocb_leader_wait(struct rcu_
+ * List was empty, wake up the follower.
+ * Memory barriers supplied by atomic_long_add().
+ */
+- wake_up(&rdp->nocb_wq);
++ swait_wake(&rdp->nocb_wq);
+ }
+ }
+
+@@ -2245,7 +2245,7 @@ static void nocb_follower_wait(struct rc
+ if (!rcu_nocb_poll) {
+ trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
+ "FollowerSleep");
+- wait_event_interruptible(rdp->nocb_wq,
++ swait_event_interruptible(rdp->nocb_wq,
+ ACCESS_ONCE(rdp->nocb_follower_head));
+ } else if (firsttime) {
+ /* Don't drown trace log with "Poll"! */
+@@ -2404,7 +2404,7 @@ void __init rcu_init_nohz(void)
+ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
+ {
+ rdp->nocb_tail = &rdp->nocb_head;
+- init_waitqueue_head(&rdp->nocb_wq);
++ init_swait_head(&rdp->nocb_wq);
+ rdp->nocb_follower_tail = &rdp->nocb_follower_head;
+ }
+
diff --git a/patches/rcutree-rcu_bh_qs-disable-irq-while-calling-rcu_pree.patch b/patches/rcutree-rcu_bh_qs-disable-irq-while-calling-rcu_pree.patch
new file mode 100644
index 00000000000000..b1c160a14230c9
--- /dev/null
+++ b/patches/rcutree-rcu_bh_qs-disable-irq-while-calling-rcu_pree.patch
@@ -0,0 +1,48 @@
+From: Tiejun Chen <tiejun.chen@windriver.com>
+Date: Wed, 18 Dec 2013 17:51:49 +0800
+Subject: rcutree/rcu_bh_qs: Disable irq while calling rcu_preempt_qs()
+
+Any callers to the function rcu_preempt_qs() must disable irqs in
+order to protect the assignment to ->rcu_read_unlock_special. In
+RT case, rcu_bh_qs() as the wrapper of rcu_preempt_qs() is called
+in some scenarios where irq is enabled, like this path,
+
+do_single_softirq()
+ |
+ + local_irq_enable();
+ + handle_softirq()
+ | |
+ | + rcu_bh_qs()
+ | |
+ | + rcu_preempt_qs()
+ |
+ + local_irq_disable()
+
+So here we'd better disable irq directly inside of rcu_bh_qs() to
+fix this, otherwise the kernel may be freezable sometimes as
+observed. And especially this way is also kind and safe for the
+potential rcu_bh_qs() usage elsewhere in the future.
+
+
+Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
+Signed-off-by: Bin Jiang <bin.jiang@windriver.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/rcu/tree.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+--- a/kernel/rcu/tree.c
++++ b/kernel/rcu/tree.c
+@@ -225,7 +225,12 @@ static void rcu_preempt_qs(void);
+
+ void rcu_bh_qs(void)
+ {
++ unsigned long flags;
++
++ /* Callers to this function, rcu_preempt_qs(), must disable irqs. */
++ local_irq_save(flags);
+ rcu_preempt_qs();
++ local_irq_restore(flags);
+ }
+ #else
+ void rcu_bh_qs(void)
diff --git a/patches/re-migrate_disable-race-with-cpu-hotplug-3f.patch b/patches/re-migrate_disable-race-with-cpu-hotplug-3f.patch
new file mode 100644
index 00000000000000..dc5416f0b255a2
--- /dev/null
+++ b/patches/re-migrate_disable-race-with-cpu-hotplug-3f.patch
@@ -0,0 +1,34 @@
+From: Yong Zhang <yong.zhang0@gmail.com>
+Date: Thu, 28 Jul 2011 11:16:00 +0800
+Subject: hotplug: Reread hotplug_pcp on pin_current_cpu() retry
+
+When retry happens, it's likely that the task has been migrated to
+another cpu (except unplug failed), but it still derefernces the
+original hotplug_pcp per cpu data.
+
+Update the pointer to hotplug_pcp in the retry path, so it points to
+the current cpu.
+
+Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
+Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Link: http://lkml.kernel.org/r/20110728031600.GA338@windriver.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/cpu.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- a/kernel/cpu.c
++++ b/kernel/cpu.c
+@@ -106,9 +106,11 @@ static DEFINE_PER_CPU(struct hotplug_pcp
+ */
+ void pin_current_cpu(void)
+ {
+- struct hotplug_pcp *hp = this_cpu_ptr(&hotplug_pcp);
++ struct hotplug_pcp *hp;
+
+ retry:
++ hp = this_cpu_ptr(&hotplug_pcp);
++
+ if (!hp->unplug || hp->refcount || preempt_count() > 1 ||
+ hp->unplug == current) {
+ hp->refcount++;
diff --git a/patches/re-preempt_rt_full-arm-coredump-fails-for-cpu-3e-3d-4.patch b/patches/re-preempt_rt_full-arm-coredump-fails-for-cpu-3e-3d-4.patch
new file mode 100644
index 00000000000000..a89c4d0733237e
--- /dev/null
+++ b/patches/re-preempt_rt_full-arm-coredump-fails-for-cpu-3e-3d-4.patch
@@ -0,0 +1,68 @@
+Subject: ARM: Initialize split page table locks for vector page
+From: Frank Rowand <frank.rowand@am.sony.com>
+Date: Sat, 1 Oct 2011 18:58:13 -0700
+
+Without this patch, ARM can not use SPLIT_PTLOCK_CPUS if
+PREEMPT_RT_FULL=y because vectors_user_mapping() creates a
+VM_ALWAYSDUMP mapping of the vector page (address 0xffff0000), but no
+ptl->lock has been allocated for the page. An attempt to coredump
+that page will result in a kernel NULL pointer dereference when
+follow_page() attempts to lock the page.
+
+The call tree to the NULL pointer dereference is:
+
+ do_notify_resume()
+ get_signal_to_deliver()
+ do_coredump()
+ elf_core_dump()
+ get_dump_page()
+ __get_user_pages()
+ follow_page()
+ pte_offset_map_lock() <----- a #define
+ ...
+ rt_spin_lock()
+
+The underlying problem is exposed by mm-shrink-the-page-frame-to-rt-size.patch.
+
+Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
+Cc: Frank <Frank_Rowand@sonyusa.com>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Link: http://lkml.kernel.org/r/4E87C535.2030907@am.sony.com
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/arm/kernel/process.c | 24 ++++++++++++++++++++++++
+ 1 file changed, 24 insertions(+)
+
+--- a/arch/arm/kernel/process.c
++++ b/arch/arm/kernel/process.c
+@@ -290,6 +290,30 @@ unsigned long arch_randomize_brk(struct
+ }
+
+ #ifdef CONFIG_MMU
++/*
++ * CONFIG_SPLIT_PTLOCK_CPUS results in a page->ptl lock. If the lock is not
++ * initialized by pgtable_page_ctor() then a coredump of the vector page will
++ * fail.
++ */
++static int __init vectors_user_mapping_init_page(void)
++{
++ struct page *page;
++ unsigned long addr = 0xffff0000;
++ pgd_t *pgd;
++ pud_t *pud;
++ pmd_t *pmd;
++
++ pgd = pgd_offset_k(addr);
++ pud = pud_offset(pgd, addr);
++ pmd = pmd_offset(pud, addr);
++ page = pmd_page(*(pmd));
++
++ pgtable_page_ctor(page);
++
++ return 0;
++}
++late_initcall(vectors_user_mapping_init_page);
++
+ #ifdef CONFIG_KUSER_HELPERS
+ /*
+ * The vectors page is always readable from user space for the
diff --git a/patches/relay-fix-timer-madness.patch b/patches/relay-fix-timer-madness.patch
new file mode 100644
index 00000000000000..e872958b3a1c83
--- /dev/null
+++ b/patches/relay-fix-timer-madness.patch
@@ -0,0 +1,52 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:44:07 -0500
+Subject: relay: Fix timer madness
+
+remove timer calls (!!!) from deep within the tracing infrastructure.
+This was totally bogus code that can cause lockups and worse. Poll
+the buffer every 2 jiffies for now.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/relay.c | 14 +++++---------
+ 1 file changed, 5 insertions(+), 9 deletions(-)
+
+--- a/kernel/relay.c
++++ b/kernel/relay.c
+@@ -339,6 +339,10 @@ static void wakeup_readers(unsigned long
+ {
+ struct rchan_buf *buf = (struct rchan_buf *)data;
+ wake_up_interruptible(&buf->read_wait);
++ /*
++ * Stupid polling for now:
++ */
++ mod_timer(&buf->timer, jiffies + 1);
+ }
+
+ /**
+@@ -356,6 +360,7 @@ static void __relay_reset(struct rchan_b
+ init_waitqueue_head(&buf->read_wait);
+ kref_init(&buf->kref);
+ setup_timer(&buf->timer, wakeup_readers, (unsigned long)buf);
++ mod_timer(&buf->timer, jiffies + 1);
+ } else
+ del_timer_sync(&buf->timer);
+
+@@ -739,15 +744,6 @@ size_t relay_switch_subbuf(struct rchan_
+ else
+ buf->early_bytes += buf->chan->subbuf_size -
+ buf->padding[old_subbuf];
+- smp_mb();
+- if (waitqueue_active(&buf->read_wait))
+- /*
+- * Calling wake_up_interruptible() from here
+- * will deadlock if we happen to be logging
+- * from the scheduler (trying to re-grab
+- * rq->lock), so defer it.
+- */
+- mod_timer(&buf->timer, jiffies + 1);
+ }
+
+ old = buf->data;
diff --git a/patches/rt-add-rt-locks.patch b/patches/rt-add-rt-locks.patch
new file mode 100644
index 00000000000000..0f7e02028131d7
--- /dev/null
+++ b/patches/rt-add-rt-locks.patch
@@ -0,0 +1,1982 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 26 Jul 2009 19:39:56 +0200
+Subject: rt: Add the preempt-rt lock replacement APIs
+
+Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex
+based locking functions for preempt-rt.
+This also introduces RT's sleeping locks.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/kernel.h | 4
+ include/linux/locallock.h | 6
+ include/linux/mutex.h | 20 +
+ include/linux/mutex_rt.h | 84 ++++++
+ include/linux/rtmutex.h | 27 +-
+ include/linux/rwlock_rt.h | 99 ++++++++
+ include/linux/rwlock_types_rt.h | 33 ++
+ include/linux/rwsem.h | 6
+ include/linux/rwsem_rt.h | 140 +++++++++++
+ include/linux/sched.h | 5
+ include/linux/spinlock.h | 12
+ include/linux/spinlock_api_smp.h | 4
+ include/linux/spinlock_rt.h | 173 ++++++++++++++
+ include/linux/spinlock_types.h | 11
+ include/linux/spinlock_types_rt.h | 51 ++++
+ kernel/futex.c | 5
+ kernel/locking/Makefile | 9
+ kernel/locking/rt.c | 461 ++++++++++++++++++++++++++++++++++++++
+ kernel/locking/rtmutex.c | 395 ++++++++++++++++++++++++++++++--
+ kernel/locking/rtmutex_common.h | 11
+ kernel/locking/spinlock.c | 7
+ kernel/locking/spinlock_debug.c | 5
+ 22 files changed, 1525 insertions(+), 43 deletions(-)
+
+--- a/include/linux/kernel.h
++++ b/include/linux/kernel.h
+@@ -188,6 +188,9 @@ extern int _cond_resched(void);
+ */
+ # define might_sleep() \
+ do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
++
++# define might_sleep_no_state_check() \
++ do { ___might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
+ # define sched_annotate_sleep() (current->task_state_change = 0)
+ #else
+ static inline void ___might_sleep(const char *file, int line,
+@@ -195,6 +198,7 @@ extern int _cond_resched(void);
+ static inline void __might_sleep(const char *file, int line,
+ int preempt_offset) { }
+ # define might_sleep() do { might_resched(); } while (0)
++# define might_sleep_no_state_check() do { might_resched(); } while (0)
+ # define sched_annotate_sleep() do { } while (0)
+ #endif
+
+--- a/include/linux/locallock.h
++++ b/include/linux/locallock.h
+@@ -42,9 +42,15 @@ struct local_irq_lock {
+ * already takes care of the migrate_disable/enable
+ * for CONFIG_PREEMPT_BASE map to the normal spin_* calls.
+ */
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define spin_lock_local(lock) rt_spin_lock(lock)
++# define spin_trylock_local(lock) rt_spin_trylock(lock)
++# define spin_unlock_local(lock) rt_spin_unlock(lock)
++#else
+ # define spin_lock_local(lock) spin_lock(lock)
+ # define spin_trylock_local(lock) spin_trylock(lock)
+ # define spin_unlock_local(lock) spin_unlock(lock)
++#endif
+
+ static inline void __local_lock(struct local_irq_lock *lv)
+ {
+--- a/include/linux/mutex.h
++++ b/include/linux/mutex.h
+@@ -19,6 +19,17 @@
+ #include <asm/processor.h>
+ #include <linux/osq_lock.h>
+
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
++ , .dep_map = { .name = #lockname }
++#else
++# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
++#endif
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++# include <linux/mutex_rt.h>
++#else
++
+ /*
+ * Simple, straightforward mutexes with strict semantics:
+ *
+@@ -99,13 +110,6 @@ do { \
+ static inline void mutex_destroy(struct mutex *lock) {}
+ #endif
+
+-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+-# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
+- , .dep_map = { .name = #lockname }
+-#else
+-# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
+-#endif
+-
+ #define __MUTEX_INITIALIZER(lockname) \
+ { .count = ATOMIC_INIT(1) \
+ , .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
+@@ -173,6 +177,8 @@ extern int __must_check mutex_lock_killa
+ extern int mutex_trylock(struct mutex *lock);
+ extern void mutex_unlock(struct mutex *lock);
+
++#endif /* !PREEMPT_RT_FULL */
++
+ extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
+
+ #endif /* __LINUX_MUTEX_H */
+--- /dev/null
++++ b/include/linux/mutex_rt.h
+@@ -0,0 +1,84 @@
++#ifndef __LINUX_MUTEX_RT_H
++#define __LINUX_MUTEX_RT_H
++
++#ifndef __LINUX_MUTEX_H
++#error "Please include mutex.h"
++#endif
++
++#include <linux/rtmutex.h>
++
++/* FIXME: Just for __lockfunc */
++#include <linux/spinlock.h>
++
++struct mutex {
++ struct rt_mutex lock;
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ struct lockdep_map dep_map;
++#endif
++};
++
++#define __MUTEX_INITIALIZER(mutexname) \
++ { \
++ .lock = __RT_MUTEX_INITIALIZER(mutexname.lock) \
++ __DEP_MAP_MUTEX_INITIALIZER(mutexname) \
++ }
++
++#define DEFINE_MUTEX(mutexname) \
++ struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
++
++extern void __mutex_do_init(struct mutex *lock, const char *name, struct lock_class_key *key);
++extern void __lockfunc _mutex_lock(struct mutex *lock);
++extern int __lockfunc _mutex_lock_interruptible(struct mutex *lock);
++extern int __lockfunc _mutex_lock_killable(struct mutex *lock);
++extern void __lockfunc _mutex_lock_nested(struct mutex *lock, int subclass);
++extern void __lockfunc _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest_lock);
++extern int __lockfunc _mutex_lock_interruptible_nested(struct mutex *lock, int subclass);
++extern int __lockfunc _mutex_lock_killable_nested(struct mutex *lock, int subclass);
++extern int __lockfunc _mutex_trylock(struct mutex *lock);
++extern void __lockfunc _mutex_unlock(struct mutex *lock);
++
++#define mutex_is_locked(l) rt_mutex_is_locked(&(l)->lock)
++#define mutex_lock(l) _mutex_lock(l)
++#define mutex_lock_interruptible(l) _mutex_lock_interruptible(l)
++#define mutex_lock_killable(l) _mutex_lock_killable(l)
++#define mutex_trylock(l) _mutex_trylock(l)
++#define mutex_unlock(l) _mutex_unlock(l)
++#define mutex_destroy(l) rt_mutex_destroy(&(l)->lock)
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++# define mutex_lock_nested(l, s) _mutex_lock_nested(l, s)
++# define mutex_lock_interruptible_nested(l, s) \
++ _mutex_lock_interruptible_nested(l, s)
++# define mutex_lock_killable_nested(l, s) \
++ _mutex_lock_killable_nested(l, s)
++
++# define mutex_lock_nest_lock(lock, nest_lock) \
++do { \
++ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \
++ _mutex_lock_nest_lock(lock, &(nest_lock)->dep_map); \
++} while (0)
++
++#else
++# define mutex_lock_nested(l, s) _mutex_lock(l)
++# define mutex_lock_interruptible_nested(l, s) \
++ _mutex_lock_interruptible(l)
++# define mutex_lock_killable_nested(l, s) \
++ _mutex_lock_killable(l)
++# define mutex_lock_nest_lock(lock, nest_lock) mutex_lock(lock)
++#endif
++
++# define mutex_init(mutex) \
++do { \
++ static struct lock_class_key __key; \
++ \
++ rt_mutex_init(&(mutex)->lock); \
++ __mutex_do_init((mutex), #mutex, &__key); \
++} while (0)
++
++# define __mutex_init(mutex, name, key) \
++do { \
++ rt_mutex_init(&(mutex)->lock); \
++ __mutex_do_init((mutex), name, key); \
++} while (0)
++
++#endif
+--- a/include/linux/rtmutex.h
++++ b/include/linux/rtmutex.h
+@@ -18,6 +18,10 @@
+
+ extern int max_lock_depth; /* for sysctl */
+
++#ifdef CONFIG_DEBUG_MUTEXES
++#include <linux/debug_locks.h>
++#endif
++
+ /**
+ * The rt_mutex structure
+ *
+@@ -31,8 +35,8 @@ struct rt_mutex {
+ struct rb_root waiters;
+ struct rb_node *waiters_leftmost;
+ struct task_struct *owner;
+-#ifdef CONFIG_DEBUG_RT_MUTEXES
+ int save_state;
++#ifdef CONFIG_DEBUG_RT_MUTEXES
+ const char *name, *file;
+ int line;
+ void *magic;
+@@ -55,22 +59,33 @@ struct hrtimer_sleeper;
+ # define rt_mutex_debug_check_no_locks_held(task) do { } while (0)
+ #endif
+
++# define rt_mutex_init(mutex) \
++ do { \
++ raw_spin_lock_init(&(mutex)->wait_lock); \
++ __rt_mutex_init(mutex, #mutex); \
++ } while (0)
++
+ #ifdef CONFIG_DEBUG_RT_MUTEXES
+ # define __DEBUG_RT_MUTEX_INITIALIZER(mutexname) \
+ , .name = #mutexname, .file = __FILE__, .line = __LINE__
+-# define rt_mutex_init(mutex) __rt_mutex_init(mutex, __func__)
+ extern void rt_mutex_debug_task_free(struct task_struct *tsk);
+ #else
+ # define __DEBUG_RT_MUTEX_INITIALIZER(mutexname)
+-# define rt_mutex_init(mutex) __rt_mutex_init(mutex, NULL)
+ # define rt_mutex_debug_task_free(t) do { } while (0)
+ #endif
+
+-#define __RT_MUTEX_INITIALIZER(mutexname) \
+- { .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \
++#define __RT_MUTEX_INITIALIZER_PLAIN(mutexname) \
++ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \
+ , .waiters = RB_ROOT \
+ , .owner = NULL \
+- __DEBUG_RT_MUTEX_INITIALIZER(mutexname)}
++ __DEBUG_RT_MUTEX_INITIALIZER(mutexname)
++
++#define __RT_MUTEX_INITIALIZER(mutexname) \
++ { __RT_MUTEX_INITIALIZER_PLAIN(mutexname) }
++
++#define __RT_MUTEX_INITIALIZER_SAVE_STATE(mutexname) \
++ { __RT_MUTEX_INITIALIZER_PLAIN(mutexname) \
++ , .save_state = 1 }
+
+ #define DEFINE_RT_MUTEX(mutexname) \
+ struct rt_mutex mutexname = __RT_MUTEX_INITIALIZER(mutexname)
+--- /dev/null
++++ b/include/linux/rwlock_rt.h
+@@ -0,0 +1,99 @@
++#ifndef __LINUX_RWLOCK_RT_H
++#define __LINUX_RWLOCK_RT_H
++
++#ifndef __LINUX_SPINLOCK_H
++#error Do not include directly. Use spinlock.h
++#endif
++
++#define rwlock_init(rwl) \
++do { \
++ static struct lock_class_key __key; \
++ \
++ rt_mutex_init(&(rwl)->lock); \
++ __rt_rwlock_init(rwl, #rwl, &__key); \
++} while (0)
++
++extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
++extern void __lockfunc rt_read_lock(rwlock_t *rwlock);
++extern int __lockfunc rt_write_trylock(rwlock_t *rwlock);
++extern int __lockfunc rt_write_trylock_irqsave(rwlock_t *trylock, unsigned long *flags);
++extern int __lockfunc rt_read_trylock(rwlock_t *rwlock);
++extern void __lockfunc rt_write_unlock(rwlock_t *rwlock);
++extern void __lockfunc rt_read_unlock(rwlock_t *rwlock);
++extern unsigned long __lockfunc rt_write_lock_irqsave(rwlock_t *rwlock);
++extern unsigned long __lockfunc rt_read_lock_irqsave(rwlock_t *rwlock);
++extern void __rt_rwlock_init(rwlock_t *rwlock, char *name, struct lock_class_key *key);
++
++#define read_trylock(lock) __cond_lock(lock, rt_read_trylock(lock))
++#define write_trylock(lock) __cond_lock(lock, rt_write_trylock(lock))
++
++#define write_trylock_irqsave(lock, flags) \
++ __cond_lock(lock, rt_write_trylock_irqsave(lock, &flags))
++
++#define read_lock_irqsave(lock, flags) \
++ do { \
++ typecheck(unsigned long, flags); \
++ flags = rt_read_lock_irqsave(lock); \
++ } while (0)
++
++#define write_lock_irqsave(lock, flags) \
++ do { \
++ typecheck(unsigned long, flags); \
++ flags = rt_write_lock_irqsave(lock); \
++ } while (0)
++
++#define read_lock(lock) rt_read_lock(lock)
++
++#define read_lock_bh(lock) \
++ do { \
++ local_bh_disable(); \
++ rt_read_lock(lock); \
++ } while (0)
++
++#define read_lock_irq(lock) read_lock(lock)
++
++#define write_lock(lock) rt_write_lock(lock)
++
++#define write_lock_bh(lock) \
++ do { \
++ local_bh_disable(); \
++ rt_write_lock(lock); \
++ } while (0)
++
++#define write_lock_irq(lock) write_lock(lock)
++
++#define read_unlock(lock) rt_read_unlock(lock)
++
++#define read_unlock_bh(lock) \
++ do { \
++ rt_read_unlock(lock); \
++ local_bh_enable(); \
++ } while (0)
++
++#define read_unlock_irq(lock) read_unlock(lock)
++
++#define write_unlock(lock) rt_write_unlock(lock)
++
++#define write_unlock_bh(lock) \
++ do { \
++ rt_write_unlock(lock); \
++ local_bh_enable(); \
++ } while (0)
++
++#define write_unlock_irq(lock) write_unlock(lock)
++
++#define read_unlock_irqrestore(lock, flags) \
++ do { \
++ typecheck(unsigned long, flags); \
++ (void) flags; \
++ rt_read_unlock(lock); \
++ } while (0)
++
++#define write_unlock_irqrestore(lock, flags) \
++ do { \
++ typecheck(unsigned long, flags); \
++ (void) flags; \
++ rt_write_unlock(lock); \
++ } while (0)
++
++#endif
+--- /dev/null
++++ b/include/linux/rwlock_types_rt.h
+@@ -0,0 +1,33 @@
++#ifndef __LINUX_RWLOCK_TYPES_RT_H
++#define __LINUX_RWLOCK_TYPES_RT_H
++
++#ifndef __LINUX_SPINLOCK_TYPES_H
++#error "Do not include directly. Include spinlock_types.h instead"
++#endif
++
++/*
++ * rwlocks - rtmutex which allows single reader recursion
++ */
++typedef struct {
++ struct rt_mutex lock;
++ int read_depth;
++ unsigned int break_lock;
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ struct lockdep_map dep_map;
++#endif
++} rwlock_t;
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++# define RW_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }
++#else
++# define RW_DEP_MAP_INIT(lockname)
++#endif
++
++#define __RW_LOCK_UNLOCKED(name) \
++ { .lock = __RT_MUTEX_INITIALIZER_SAVE_STATE(name.lock), \
++ RW_DEP_MAP_INIT(name) }
++
++#define DEFINE_RWLOCK(name) \
++ rwlock_t name __cacheline_aligned_in_smp = __RW_LOCK_UNLOCKED(name)
++
++#endif
+--- a/include/linux/rwsem.h
++++ b/include/linux/rwsem.h
+@@ -18,6 +18,10 @@
+ #include <linux/osq_lock.h>
+ #endif
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++#include <linux/rwsem_rt.h>
++#else /* PREEMPT_RT_FULL */
++
+ struct rw_semaphore;
+
+ #ifdef CONFIG_RWSEM_GENERIC_SPINLOCK
+@@ -177,4 +181,6 @@ extern void up_read_non_owner(struct rw_
+ # define up_read_non_owner(sem) up_read(sem)
+ #endif
+
++#endif /* !PREEMPT_RT_FULL */
++
+ #endif /* _LINUX_RWSEM_H */
+--- /dev/null
++++ b/include/linux/rwsem_rt.h
+@@ -0,0 +1,140 @@
++#ifndef _LINUX_RWSEM_RT_H
++#define _LINUX_RWSEM_RT_H
++
++#ifndef _LINUX_RWSEM_H
++#error "Include rwsem.h"
++#endif
++
++/*
++ * RW-semaphores are a spinlock plus a reader-depth count.
++ *
++ * Note that the semantics are different from the usual
++ * Linux rw-sems, in PREEMPT_RT mode we do not allow
++ * multiple readers to hold the lock at once, we only allow
++ * a read-lock owner to read-lock recursively. This is
++ * better for latency, makes the implementation inherently
++ * fair and makes it simpler as well.
++ */
++
++#include <linux/rtmutex.h>
++
++struct rw_semaphore {
++ struct rt_mutex lock;
++ int read_depth;
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ struct lockdep_map dep_map;
++#endif
++};
++
++#define __RWSEM_INITIALIZER(name) \
++ { .lock = __RT_MUTEX_INITIALIZER(name.lock), \
++ RW_DEP_MAP_INIT(name) }
++
++#define DECLARE_RWSEM(lockname) \
++ struct rw_semaphore lockname = __RWSEM_INITIALIZER(lockname)
++
++extern void __rt_rwsem_init(struct rw_semaphore *rwsem, const char *name,
++ struct lock_class_key *key);
++
++#define __rt_init_rwsem(sem, name, key) \
++ do { \
++ rt_mutex_init(&(sem)->lock); \
++ __rt_rwsem_init((sem), (name), (key));\
++ } while (0)
++
++#define __init_rwsem(sem, name, key) __rt_init_rwsem(sem, name, key)
++
++# define rt_init_rwsem(sem) \
++do { \
++ static struct lock_class_key __key; \
++ \
++ __rt_init_rwsem((sem), #sem, &__key); \
++} while (0)
++
++extern void rt_down_write(struct rw_semaphore *rwsem);
++extern void rt_down_read_nested(struct rw_semaphore *rwsem, int subclass);
++extern void rt_down_write_nested(struct rw_semaphore *rwsem, int subclass);
++extern void rt_down_write_nested_lock(struct rw_semaphore *rwsem,
++ struct lockdep_map *nest);
++extern void rt_down_read(struct rw_semaphore *rwsem);
++extern int rt_down_write_trylock(struct rw_semaphore *rwsem);
++extern int rt_down_read_trylock(struct rw_semaphore *rwsem);
++extern void __rt_up_read(struct rw_semaphore *rwsem);
++extern void rt_up_read(struct rw_semaphore *rwsem);
++extern void rt_up_write(struct rw_semaphore *rwsem);
++extern void rt_downgrade_write(struct rw_semaphore *rwsem);
++
++#define init_rwsem(sem) rt_init_rwsem(sem)
++#define rwsem_is_locked(s) rt_mutex_is_locked(&(s)->lock)
++
++static inline int rwsem_is_contended(struct rw_semaphore *sem)
++{
++ /* rt_mutex_has_waiters() */
++ return !RB_EMPTY_ROOT(&sem->lock.waiters);
++}
++
++static inline void down_read(struct rw_semaphore *sem)
++{
++ rt_down_read(sem);
++}
++
++static inline int down_read_trylock(struct rw_semaphore *sem)
++{
++ return rt_down_read_trylock(sem);
++}
++
++static inline void down_write(struct rw_semaphore *sem)
++{
++ rt_down_write(sem);
++}
++
++static inline int down_write_trylock(struct rw_semaphore *sem)
++{
++ return rt_down_write_trylock(sem);
++}
++
++static inline void __up_read(struct rw_semaphore *sem)
++{
++ __rt_up_read(sem);
++}
++
++static inline void up_read(struct rw_semaphore *sem)
++{
++ rt_up_read(sem);
++}
++
++static inline void up_write(struct rw_semaphore *sem)
++{
++ rt_up_write(sem);
++}
++
++static inline void downgrade_write(struct rw_semaphore *sem)
++{
++ rt_downgrade_write(sem);
++}
++
++static inline void down_read_nested(struct rw_semaphore *sem, int subclass)
++{
++ return rt_down_read_nested(sem, subclass);
++}
++
++static inline void down_write_nested(struct rw_semaphore *sem, int subclass)
++{
++ rt_down_write_nested(sem, subclass);
++}
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++static inline void down_write_nest_lock(struct rw_semaphore *sem,
++ struct rw_semaphore *nest_lock)
++{
++ rt_down_write_nested_lock(sem, &nest_lock->dep_map);
++}
++
++#else
++
++static inline void down_write_nest_lock(struct rw_semaphore *sem,
++ struct rw_semaphore *nest_lock)
++{
++ rt_down_write_nested_lock(sem, NULL);
++}
++#endif
++#endif
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -302,6 +302,11 @@ extern char ___assert_task_state[1 - 2*!
+
+ #endif
+
++#define __set_current_state_no_track(state_value) \
++ do { current->state = (state_value); } while (0)
++#define set_current_state_no_track(state_value) \
++ set_mb(current->state, (state_value))
++
+ /* Task command name length */
+ #define TASK_COMM_LEN 16
+
+--- a/include/linux/spinlock.h
++++ b/include/linux/spinlock.h
+@@ -281,7 +281,11 @@ static inline void do_raw_spin_unlock(ra
+ #define raw_spin_can_lock(lock) (!raw_spin_is_locked(lock))
+
+ /* Include rwlock functions */
+-#include <linux/rwlock.h>
++#ifdef CONFIG_PREEMPT_RT_FULL
++# include <linux/rwlock_rt.h>
++#else
++# include <linux/rwlock.h>
++#endif
+
+ /*
+ * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
+@@ -292,6 +296,10 @@ static inline void do_raw_spin_unlock(ra
+ # include <linux/spinlock_api_up.h>
+ #endif
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++# include <linux/spinlock_rt.h>
++#else /* PREEMPT_RT_FULL */
++
+ /*
+ * Map the spin_lock functions to the raw variants for PREEMPT_RT=n
+ */
+@@ -426,4 +434,6 @@ extern int _atomic_dec_and_lock(atomic_t
+ #define atomic_dec_and_lock(atomic, lock) \
+ __cond_lock(lock, _atomic_dec_and_lock(atomic, lock))
+
++#endif /* !PREEMPT_RT_FULL */
++
+ #endif /* __LINUX_SPINLOCK_H */
+--- a/include/linux/spinlock_api_smp.h
++++ b/include/linux/spinlock_api_smp.h
+@@ -189,6 +189,8 @@ static inline int __raw_spin_trylock_bh(
+ return 0;
+ }
+
+-#include <linux/rwlock_api_smp.h>
++#ifndef CONFIG_PREEMPT_RT_FULL
++# include <linux/rwlock_api_smp.h>
++#endif
+
+ #endif /* __LINUX_SPINLOCK_API_SMP_H */
+--- /dev/null
++++ b/include/linux/spinlock_rt.h
+@@ -0,0 +1,173 @@
++#ifndef __LINUX_SPINLOCK_RT_H
++#define __LINUX_SPINLOCK_RT_H
++
++#ifndef __LINUX_SPINLOCK_H
++#error Do not include directly. Use spinlock.h
++#endif
++
++#include <linux/bug.h>
++
++extern void
++__rt_spin_lock_init(spinlock_t *lock, char *name, struct lock_class_key *key);
++
++#define spin_lock_init(slock) \
++do { \
++ static struct lock_class_key __key; \
++ \
++ rt_mutex_init(&(slock)->lock); \
++ __rt_spin_lock_init(slock, #slock, &__key); \
++} while (0)
++
++extern void __lockfunc rt_spin_lock(spinlock_t *lock);
++extern unsigned long __lockfunc rt_spin_lock_trace_flags(spinlock_t *lock);
++extern void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass);
++extern void __lockfunc rt_spin_unlock(spinlock_t *lock);
++extern void __lockfunc rt_spin_unlock_wait(spinlock_t *lock);
++extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags);
++extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock);
++extern int __lockfunc rt_spin_trylock(spinlock_t *lock);
++extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock);
++
++/*
++ * lockdep-less calls, for derived types like rwlock:
++ * (for trylock they can use rt_mutex_trylock() directly.
++ */
++extern void __lockfunc __rt_spin_lock(struct rt_mutex *lock);
++extern void __lockfunc __rt_spin_unlock(struct rt_mutex *lock);
++
++#define spin_lock(lock) \
++ do { \
++ migrate_disable(); \
++ rt_spin_lock(lock); \
++ } while (0)
++
++#define spin_lock_bh(lock) \
++ do { \
++ local_bh_disable(); \
++ migrate_disable(); \
++ rt_spin_lock(lock); \
++ } while (0)
++
++#define spin_lock_irq(lock) spin_lock(lock)
++
++#define spin_do_trylock(lock) __cond_lock(lock, rt_spin_trylock(lock))
++
++#define spin_trylock(lock) \
++({ \
++ int __locked; \
++ migrate_disable(); \
++ __locked = spin_do_trylock(lock); \
++ if (!__locked) \
++ migrate_enable(); \
++ __locked; \
++})
++
++#ifdef CONFIG_LOCKDEP
++# define spin_lock_nested(lock, subclass) \
++ do { \
++ migrate_disable(); \
++ rt_spin_lock_nested(lock, subclass); \
++ } while (0)
++
++#define spin_lock_bh_nested(lock, subclass) \
++ do { \
++ local_bh_disable(); \
++ migrate_disable(); \
++ rt_spin_lock_nested(lock, subclass); \
++ } while (0)
++
++# define spin_lock_irqsave_nested(lock, flags, subclass) \
++ do { \
++ typecheck(unsigned long, flags); \
++ flags = 0; \
++ migrate_disable(); \
++ rt_spin_lock_nested(lock, subclass); \
++ } while (0)
++#else
++# define spin_lock_nested(lock, subclass) spin_lock(lock)
++# define spin_lock_bh_nested(lock, subclass) spin_lock_bh(lock)
++
++# define spin_lock_irqsave_nested(lock, flags, subclass) \
++ do { \
++ typecheck(unsigned long, flags); \
++ flags = 0; \
++ spin_lock(lock); \
++ } while (0)
++#endif
++
++#define spin_lock_irqsave(lock, flags) \
++ do { \
++ typecheck(unsigned long, flags); \
++ flags = 0; \
++ spin_lock(lock); \
++ } while (0)
++
++static inline unsigned long spin_lock_trace_flags(spinlock_t *lock)
++{
++ unsigned long flags = 0;
++#ifdef CONFIG_TRACE_IRQFLAGS
++ flags = rt_spin_lock_trace_flags(lock);
++#else
++ spin_lock(lock); /* lock_local */
++#endif
++ return flags;
++}
++
++/* FIXME: we need rt_spin_lock_nest_lock */
++#define spin_lock_nest_lock(lock, nest_lock) spin_lock_nested(lock, 0)
++
++#define spin_unlock(lock) \
++ do { \
++ rt_spin_unlock(lock); \
++ migrate_enable(); \
++ } while (0)
++
++#define spin_unlock_bh(lock) \
++ do { \
++ rt_spin_unlock(lock); \
++ migrate_enable(); \
++ local_bh_enable(); \
++ } while (0)
++
++#define spin_unlock_irq(lock) spin_unlock(lock)
++
++#define spin_unlock_irqrestore(lock, flags) \
++ do { \
++ typecheck(unsigned long, flags); \
++ (void) flags; \
++ spin_unlock(lock); \
++ } while (0)
++
++#define spin_trylock_bh(lock) __cond_lock(lock, rt_spin_trylock_bh(lock))
++#define spin_trylock_irq(lock) spin_trylock(lock)
++
++#define spin_trylock_irqsave(lock, flags) \
++ rt_spin_trylock_irqsave(lock, &(flags))
++
++#define spin_unlock_wait(lock) rt_spin_unlock_wait(lock)
++
++#ifdef CONFIG_GENERIC_LOCKBREAK
++# define spin_is_contended(lock) ((lock)->break_lock)
++#else
++# define spin_is_contended(lock) (((void)(lock), 0))
++#endif
++
++static inline int spin_can_lock(spinlock_t *lock)
++{
++ return !rt_mutex_is_locked(&lock->lock);
++}
++
++static inline int spin_is_locked(spinlock_t *lock)
++{
++ return rt_mutex_is_locked(&lock->lock);
++}
++
++static inline void assert_spin_locked(spinlock_t *lock)
++{
++ BUG_ON(!spin_is_locked(lock));
++}
++
++#define atomic_dec_and_lock(atomic, lock) \
++ atomic_dec_and_spin_lock(atomic, lock)
++
++#endif
+--- a/include/linux/spinlock_types.h
++++ b/include/linux/spinlock_types.h
+@@ -11,8 +11,13 @@
+
+ #include <linux/spinlock_types_raw.h>
+
+-#include <linux/spinlock_types_nort.h>
+-
+-#include <linux/rwlock_types.h>
++#ifndef CONFIG_PREEMPT_RT_FULL
++# include <linux/spinlock_types_nort.h>
++# include <linux/rwlock_types.h>
++#else
++# include <linux/rtmutex.h>
++# include <linux/spinlock_types_rt.h>
++# include <linux/rwlock_types_rt.h>
++#endif
+
+ #endif /* __LINUX_SPINLOCK_TYPES_H */
+--- /dev/null
++++ b/include/linux/spinlock_types_rt.h
+@@ -0,0 +1,51 @@
++#ifndef __LINUX_SPINLOCK_TYPES_RT_H
++#define __LINUX_SPINLOCK_TYPES_RT_H
++
++#ifndef __LINUX_SPINLOCK_TYPES_H
++#error "Do not include directly. Include spinlock_types.h instead"
++#endif
++
++#include <linux/cache.h>
++
++/*
++ * PREEMPT_RT: spinlocks - an RT mutex plus lock-break field:
++ */
++typedef struct spinlock {
++ struct rt_mutex lock;
++ unsigned int break_lock;
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ struct lockdep_map dep_map;
++#endif
++} spinlock_t;
++
++#ifdef CONFIG_DEBUG_RT_MUTEXES
++# define __RT_SPIN_INITIALIZER(name) \
++ { \
++ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \
++ .save_state = 1, \
++ .file = __FILE__, \
++ .line = __LINE__ , \
++ }
++#else
++# define __RT_SPIN_INITIALIZER(name) \
++ { \
++ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \
++ .save_state = 1, \
++ }
++#endif
++
++/*
++.wait_list = PLIST_HEAD_INIT_RAW((name).lock.wait_list, (name).lock.wait_lock)
++*/
++
++#define __SPIN_LOCK_UNLOCKED(name) \
++ { .lock = __RT_SPIN_INITIALIZER(name.lock), \
++ SPIN_DEP_MAP_INIT(name) }
++
++#define __DEFINE_SPINLOCK(name) \
++ spinlock_t name = __SPIN_LOCK_UNLOCKED(name)
++
++#define DEFINE_SPINLOCK(name) \
++ spinlock_t name __cacheline_aligned_in_smp = __SPIN_LOCK_UNLOCKED(name)
++
++#endif
+--- a/kernel/futex.c
++++ b/kernel/futex.c
+@@ -2613,10 +2613,7 @@ static int futex_wait_requeue_pi(u32 __u
+ * The waiter is allocated on our stack, manipulated by the requeue
+ * code while we sleep on uaddr.
+ */
+- debug_rt_mutex_init_waiter(&rt_waiter);
+- RB_CLEAR_NODE(&rt_waiter.pi_tree_entry);
+- RB_CLEAR_NODE(&rt_waiter.tree_entry);
+- rt_waiter.task = NULL;
++ rt_mutex_init_waiter(&rt_waiter, false);
+
+ ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, VERIFY_WRITE);
+ if (unlikely(ret != 0))
+--- a/kernel/locking/Makefile
++++ b/kernel/locking/Makefile
+@@ -1,5 +1,5 @@
+
+-obj-y += mutex.o semaphore.o rwsem.o
++obj-y += semaphore.o
+
+ ifdef CONFIG_FUNCTION_TRACER
+ CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
+@@ -8,7 +8,11 @@ CFLAGS_REMOVE_mutex-debug.o = $(CC_FLAGS
+ CFLAGS_REMOVE_rtmutex-debug.o = $(CC_FLAGS_FTRACE)
+ endif
+
++ifneq ($(CONFIG_PREEMPT_RT_FULL),y)
++obj-y += mutex.o
+ obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
++obj-y += rwsem.o
++endif
+ obj-$(CONFIG_LOCKDEP) += lockdep.o
+ ifeq ($(CONFIG_PROC_FS),y)
+ obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
+@@ -22,8 +26,11 @@ obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmute
+ obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o
+ obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
+ obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
++ifneq ($(CONFIG_PREEMPT_RT_FULL),y)
+ obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
+ obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o
++endif
+ obj-$(CONFIG_PERCPU_RWSEM) += percpu-rwsem.o
++obj-$(CONFIG_PREEMPT_RT_FULL) += rt.o
+ obj-$(CONFIG_QUEUE_RWLOCK) += qrwlock.o
+ obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o
+--- /dev/null
++++ b/kernel/locking/rt.c
+@@ -0,0 +1,461 @@
++/*
++ * kernel/rt.c
++ *
++ * Real-Time Preemption Support
++ *
++ * started by Ingo Molnar:
++ *
++ * Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
++ * Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com>
++ *
++ * historic credit for proving that Linux spinlocks can be implemented via
++ * RT-aware mutexes goes to many people: The Pmutex project (Dirk Grambow
++ * and others) who prototyped it on 2.4 and did lots of comparative
++ * research and analysis; TimeSys, for proving that you can implement a
++ * fully preemptible kernel via the use of IRQ threading and mutexes;
++ * Bill Huey for persuasively arguing on lkml that the mutex model is the
++ * right one; and to MontaVista, who ported pmutexes to 2.6.
++ *
++ * This code is a from-scratch implementation and is not based on pmutexes,
++ * but the idea of converting spinlocks to mutexes is used here too.
++ *
++ * lock debugging, locking tree, deadlock detection:
++ *
++ * Copyright (C) 2004, LynuxWorks, Inc., Igor Manyilov, Bill Huey
++ * Released under the General Public License (GPL).
++ *
++ * Includes portions of the generic R/W semaphore implementation from:
++ *
++ * Copyright (c) 2001 David Howells (dhowells@redhat.com).
++ * - Derived partially from idea by Andrea Arcangeli <andrea@suse.de>
++ * - Derived also from comments by Linus
++ *
++ * Pending ownership of locks and ownership stealing:
++ *
++ * Copyright (C) 2005, Kihon Technologies Inc., Steven Rostedt
++ *
++ * (also by Steven Rostedt)
++ * - Converted single pi_lock to individual task locks.
++ *
++ * By Esben Nielsen:
++ * Doing priority inheritance with help of the scheduler.
++ *
++ * Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com>
++ * - major rework based on Esben Nielsens initial patch
++ * - replaced thread_info references by task_struct refs
++ * - removed task->pending_owner dependency
++ * - BKL drop/reacquire for semaphore style locks to avoid deadlocks
++ * in the scheduler return path as discussed with Steven Rostedt
++ *
++ * Copyright (C) 2006, Kihon Technologies Inc.
++ * Steven Rostedt <rostedt@goodmis.org>
++ * - debugged and patched Thomas Gleixner's rework.
++ * - added back the cmpxchg to the rework.
++ * - turned atomic require back on for SMP.
++ */
++
++#include <linux/spinlock.h>
++#include <linux/rtmutex.h>
++#include <linux/sched.h>
++#include <linux/delay.h>
++#include <linux/module.h>
++#include <linux/kallsyms.h>
++#include <linux/syscalls.h>
++#include <linux/interrupt.h>
++#include <linux/plist.h>
++#include <linux/fs.h>
++#include <linux/futex.h>
++#include <linux/hrtimer.h>
++
++#include "rtmutex_common.h"
++
++/*
++ * struct mutex functions
++ */
++void __mutex_do_init(struct mutex *mutex, const char *name,
++ struct lock_class_key *key)
++{
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ /*
++ * Make sure we are not reinitializing a held lock:
++ */
++ debug_check_no_locks_freed((void *)mutex, sizeof(*mutex));
++ lockdep_init_map(&mutex->dep_map, name, key, 0);
++#endif
++ mutex->lock.save_state = 0;
++}
++EXPORT_SYMBOL(__mutex_do_init);
++
++void __lockfunc _mutex_lock(struct mutex *lock)
++{
++ mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
++ rt_mutex_lock(&lock->lock);
++}
++EXPORT_SYMBOL(_mutex_lock);
++
++int __lockfunc _mutex_lock_interruptible(struct mutex *lock)
++{
++ int ret;
++
++ mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
++ ret = rt_mutex_lock_interruptible(&lock->lock);
++ if (ret)
++ mutex_release(&lock->dep_map, 1, _RET_IP_);
++ return ret;
++}
++EXPORT_SYMBOL(_mutex_lock_interruptible);
++
++int __lockfunc _mutex_lock_killable(struct mutex *lock)
++{
++ int ret;
++
++ mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
++ ret = rt_mutex_lock_killable(&lock->lock);
++ if (ret)
++ mutex_release(&lock->dep_map, 1, _RET_IP_);
++ return ret;
++}
++EXPORT_SYMBOL(_mutex_lock_killable);
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++void __lockfunc _mutex_lock_nested(struct mutex *lock, int subclass)
++{
++ mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_);
++ rt_mutex_lock(&lock->lock);
++}
++EXPORT_SYMBOL(_mutex_lock_nested);
++
++void __lockfunc _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
++{
++ mutex_acquire_nest(&lock->dep_map, 0, 0, nest, _RET_IP_);
++ rt_mutex_lock(&lock->lock);
++}
++EXPORT_SYMBOL(_mutex_lock_nest_lock);
++
++int __lockfunc _mutex_lock_interruptible_nested(struct mutex *lock, int subclass)
++{
++ int ret;
++
++ mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_);
++ ret = rt_mutex_lock_interruptible(&lock->lock);
++ if (ret)
++ mutex_release(&lock->dep_map, 1, _RET_IP_);
++ return ret;
++}
++EXPORT_SYMBOL(_mutex_lock_interruptible_nested);
++
++int __lockfunc _mutex_lock_killable_nested(struct mutex *lock, int subclass)
++{
++ int ret;
++
++ mutex_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
++ ret = rt_mutex_lock_killable(&lock->lock);
++ if (ret)
++ mutex_release(&lock->dep_map, 1, _RET_IP_);
++ return ret;
++}
++EXPORT_SYMBOL(_mutex_lock_killable_nested);
++#endif
++
++int __lockfunc _mutex_trylock(struct mutex *lock)
++{
++ int ret = rt_mutex_trylock(&lock->lock);
++
++ if (ret)
++ mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
++
++ return ret;
++}
++EXPORT_SYMBOL(_mutex_trylock);
++
++void __lockfunc _mutex_unlock(struct mutex *lock)
++{
++ mutex_release(&lock->dep_map, 1, _RET_IP_);
++ rt_mutex_unlock(&lock->lock);
++}
++EXPORT_SYMBOL(_mutex_unlock);
++
++/*
++ * rwlock_t functions
++ */
++int __lockfunc rt_write_trylock(rwlock_t *rwlock)
++{
++ int ret;
++
++ migrate_disable();
++ ret = rt_mutex_trylock(&rwlock->lock);
++ if (ret)
++ rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
++ else
++ migrate_enable();
++
++ return ret;
++}
++EXPORT_SYMBOL(rt_write_trylock);
++
++int __lockfunc rt_write_trylock_irqsave(rwlock_t *rwlock, unsigned long *flags)
++{
++ int ret;
++
++ *flags = 0;
++ ret = rt_write_trylock(rwlock);
++ return ret;
++}
++EXPORT_SYMBOL(rt_write_trylock_irqsave);
++
++int __lockfunc rt_read_trylock(rwlock_t *rwlock)
++{
++ struct rt_mutex *lock = &rwlock->lock;
++ int ret = 1;
++
++ /*
++ * recursive read locks succeed when current owns the lock,
++ * but not when read_depth == 0 which means that the lock is
++ * write locked.
++ */
++ if (rt_mutex_owner(lock) != current) {
++ migrate_disable();
++ ret = rt_mutex_trylock(lock);
++ if (ret)
++ rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
++ else
++ migrate_enable();
++
++ } else if (!rwlock->read_depth) {
++ ret = 0;
++ }
++
++ if (ret)
++ rwlock->read_depth++;
++
++ return ret;
++}
++EXPORT_SYMBOL(rt_read_trylock);
++
++void __lockfunc rt_write_lock(rwlock_t *rwlock)
++{
++ rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_);
++ migrate_disable();
++ __rt_spin_lock(&rwlock->lock);
++}
++EXPORT_SYMBOL(rt_write_lock);
++
++void __lockfunc rt_read_lock(rwlock_t *rwlock)
++{
++ struct rt_mutex *lock = &rwlock->lock;
++
++
++ /*
++ * recursive read locks succeed when current owns the lock
++ */
++ if (rt_mutex_owner(lock) != current) {
++ migrate_disable();
++ rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_);
++ __rt_spin_lock(lock);
++ }
++ rwlock->read_depth++;
++}
++
++EXPORT_SYMBOL(rt_read_lock);
++
++void __lockfunc rt_write_unlock(rwlock_t *rwlock)
++{
++ /* NOTE: we always pass in '1' for nested, for simplicity */
++ rwlock_release(&rwlock->dep_map, 1, _RET_IP_);
++ __rt_spin_unlock(&rwlock->lock);
++ migrate_enable();
++}
++EXPORT_SYMBOL(rt_write_unlock);
++
++void __lockfunc rt_read_unlock(rwlock_t *rwlock)
++{
++ /* Release the lock only when read_depth is down to 0 */
++ if (--rwlock->read_depth == 0) {
++ rwlock_release(&rwlock->dep_map, 1, _RET_IP_);
++ __rt_spin_unlock(&rwlock->lock);
++ migrate_enable();
++ }
++}
++EXPORT_SYMBOL(rt_read_unlock);
++
++unsigned long __lockfunc rt_write_lock_irqsave(rwlock_t *rwlock)
++{
++ rt_write_lock(rwlock);
++
++ return 0;
++}
++EXPORT_SYMBOL(rt_write_lock_irqsave);
++
++unsigned long __lockfunc rt_read_lock_irqsave(rwlock_t *rwlock)
++{
++ rt_read_lock(rwlock);
++
++ return 0;
++}
++EXPORT_SYMBOL(rt_read_lock_irqsave);
++
++void __rt_rwlock_init(rwlock_t *rwlock, char *name, struct lock_class_key *key)
++{
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ /*
++ * Make sure we are not reinitializing a held lock:
++ */
++ debug_check_no_locks_freed((void *)rwlock, sizeof(*rwlock));
++ lockdep_init_map(&rwlock->dep_map, name, key, 0);
++#endif
++ rwlock->lock.save_state = 1;
++ rwlock->read_depth = 0;
++}
++EXPORT_SYMBOL(__rt_rwlock_init);
++
++/*
++ * rw_semaphores
++ */
++
++void rt_up_write(struct rw_semaphore *rwsem)
++{
++ rwsem_release(&rwsem->dep_map, 1, _RET_IP_);
++ rt_mutex_unlock(&rwsem->lock);
++}
++EXPORT_SYMBOL(rt_up_write);
++
++void __rt_up_read(struct rw_semaphore *rwsem)
++{
++ if (--rwsem->read_depth == 0)
++ rt_mutex_unlock(&rwsem->lock);
++}
++
++void rt_up_read(struct rw_semaphore *rwsem)
++{
++ rwsem_release(&rwsem->dep_map, 1, _RET_IP_);
++ __rt_up_read(rwsem);
++}
++EXPORT_SYMBOL(rt_up_read);
++
++/*
++ * downgrade a write lock into a read lock
++ * - just wake up any readers at the front of the queue
++ */
++void rt_downgrade_write(struct rw_semaphore *rwsem)
++{
++ BUG_ON(rt_mutex_owner(&rwsem->lock) != current);
++ rwsem->read_depth = 1;
++}
++EXPORT_SYMBOL(rt_downgrade_write);
++
++int rt_down_write_trylock(struct rw_semaphore *rwsem)
++{
++ int ret = rt_mutex_trylock(&rwsem->lock);
++
++ if (ret)
++ rwsem_acquire(&rwsem->dep_map, 0, 1, _RET_IP_);
++ return ret;
++}
++EXPORT_SYMBOL(rt_down_write_trylock);
++
++void rt_down_write(struct rw_semaphore *rwsem)
++{
++ rwsem_acquire(&rwsem->dep_map, 0, 0, _RET_IP_);
++ rt_mutex_lock(&rwsem->lock);
++}
++EXPORT_SYMBOL(rt_down_write);
++
++void rt_down_write_nested(struct rw_semaphore *rwsem, int subclass)
++{
++ rwsem_acquire(&rwsem->dep_map, subclass, 0, _RET_IP_);
++ rt_mutex_lock(&rwsem->lock);
++}
++EXPORT_SYMBOL(rt_down_write_nested);
++
++void rt_down_write_nested_lock(struct rw_semaphore *rwsem,
++ struct lockdep_map *nest)
++{
++ rwsem_acquire_nest(&rwsem->dep_map, 0, 0, nest, _RET_IP_);
++ rt_mutex_lock(&rwsem->lock);
++}
++EXPORT_SYMBOL(rt_down_write_nested_lock);
++
++int rt_down_read_trylock(struct rw_semaphore *rwsem)
++{
++ struct rt_mutex *lock = &rwsem->lock;
++ int ret = 1;
++
++ /*
++ * recursive read locks succeed when current owns the rwsem,
++ * but not when read_depth == 0 which means that the rwsem is
++ * write locked.
++ */
++ if (rt_mutex_owner(lock) != current)
++ ret = rt_mutex_trylock(&rwsem->lock);
++ else if (!rwsem->read_depth)
++ ret = 0;
++
++ if (ret) {
++ rwsem->read_depth++;
++ rwsem_acquire(&rwsem->dep_map, 0, 1, _RET_IP_);
++ }
++ return ret;
++}
++EXPORT_SYMBOL(rt_down_read_trylock);
++
++static void __rt_down_read(struct rw_semaphore *rwsem, int subclass)
++{
++ struct rt_mutex *lock = &rwsem->lock;
++
++ rwsem_acquire_read(&rwsem->dep_map, subclass, 0, _RET_IP_);
++
++ if (rt_mutex_owner(lock) != current)
++ rt_mutex_lock(&rwsem->lock);
++ rwsem->read_depth++;
++}
++
++void rt_down_read(struct rw_semaphore *rwsem)
++{
++ __rt_down_read(rwsem, 0);
++}
++EXPORT_SYMBOL(rt_down_read);
++
++void rt_down_read_nested(struct rw_semaphore *rwsem, int subclass)
++{
++ __rt_down_read(rwsem, subclass);
++}
++EXPORT_SYMBOL(rt_down_read_nested);
++
++void __rt_rwsem_init(struct rw_semaphore *rwsem, const char *name,
++ struct lock_class_key *key)
++{
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ /*
++ * Make sure we are not reinitializing a held lock:
++ */
++ debug_check_no_locks_freed((void *)rwsem, sizeof(*rwsem));
++ lockdep_init_map(&rwsem->dep_map, name, key, 0);
++#endif
++ rwsem->read_depth = 0;
++ rwsem->lock.save_state = 0;
++}
++EXPORT_SYMBOL(__rt_rwsem_init);
++
++/**
++ * atomic_dec_and_mutex_lock - return holding mutex if we dec to 0
++ * @cnt: the atomic which we are to dec
++ * @lock: the mutex to return holding if we dec to 0
++ *
++ * return true and hold lock if we dec to 0, return false otherwise
++ */
++int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock)
++{
++ /* dec if we can't possibly hit 0 */
++ if (atomic_add_unless(cnt, -1, 1))
++ return 0;
++ /* we might hit 0, so take the lock */
++ mutex_lock(lock);
++ if (!atomic_dec_and_test(cnt)) {
++ /* when we actually did the dec, we didn't hit 0 */
++ mutex_unlock(lock);
++ return 0;
++ }
++ /* we hit 0, and we hold the lock */
++ return 1;
++}
++EXPORT_SYMBOL(atomic_dec_and_mutex_lock);
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -7,6 +7,11 @@
+ * Copyright (C) 2005-2006 Timesys Corp., Thomas Gleixner <tglx@timesys.com>
+ * Copyright (C) 2005 Kihon Technologies Inc., Steven Rostedt
+ * Copyright (C) 2006 Esben Nielsen
++ * Adaptive Spinlocks:
++ * Copyright (C) 2008 Novell, Inc., Gregory Haskins, Sven Dietrich,
++ * and Peter Morreale,
++ * Adaptive Spinlocks simplification:
++ * Copyright (C) 2008 Red Hat, Inc., Steven Rostedt <srostedt@redhat.com>
+ *
+ * See Documentation/locking/rt-mutex-design.txt for details.
+ */
+@@ -341,6 +346,14 @@ static bool rt_mutex_cond_detect_deadloc
+ return debug_rt_mutex_detect_deadlock(waiter, chwalk);
+ }
+
++static void rt_mutex_wake_waiter(struct rt_mutex_waiter *waiter)
++{
++ if (waiter->savestate)
++ wake_up_lock_sleeper(waiter->task);
++ else
++ wake_up_process(waiter->task);
++}
++
+ /*
+ * Max number of times we'll walk the boosting chain:
+ */
+@@ -648,13 +661,16 @@ static int rt_mutex_adjust_prio_chain(st
+ * follow here. This is the end of the chain we are walking.
+ */
+ if (!rt_mutex_owner(lock)) {
++ struct rt_mutex_waiter *lock_top_waiter;
++
+ /*
+ * If the requeue [7] above changed the top waiter,
+ * then we need to wake the new top waiter up to try
+ * to get the lock.
+ */
+- if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
+- wake_up_process(rt_mutex_top_waiter(lock)->task);
++ lock_top_waiter = rt_mutex_top_waiter(lock);
++ if (prerequeue_top_waiter != lock_top_waiter)
++ rt_mutex_wake_waiter(lock_top_waiter);
+ raw_spin_unlock(&lock->wait_lock);
+ return 0;
+ }
+@@ -747,6 +763,25 @@ static int rt_mutex_adjust_prio_chain(st
+ return ret;
+ }
+
++
++#define STEAL_NORMAL 0
++#define STEAL_LATERAL 1
++
++/*
++ * Note that RT tasks are excluded from lateral-steals to prevent the
++ * introduction of an unbounded latency
++ */
++static inline int lock_is_stealable(struct task_struct *task,
++ struct task_struct *pendowner, int mode)
++{
++ if (mode == STEAL_NORMAL || rt_task(task)) {
++ if (task->prio >= pendowner->prio)
++ return 0;
++ } else if (task->prio > pendowner->prio)
++ return 0;
++ return 1;
++}
++
+ /*
+ * Try to take an rt-mutex
+ *
+@@ -757,8 +792,9 @@ static int rt_mutex_adjust_prio_chain(st
+ * @waiter: The waiter that is queued to the lock's wait list if the
+ * callsite called task_blocked_on_lock(), otherwise NULL
+ */
+-static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
+- struct rt_mutex_waiter *waiter)
++static int __try_to_take_rt_mutex(struct rt_mutex *lock,
++ struct task_struct *task,
++ struct rt_mutex_waiter *waiter, int mode)
+ {
+ unsigned long flags;
+
+@@ -797,8 +833,10 @@ static int try_to_take_rt_mutex(struct r
+ * If waiter is not the highest priority waiter of
+ * @lock, give up.
+ */
+- if (waiter != rt_mutex_top_waiter(lock))
++ if (waiter != rt_mutex_top_waiter(lock)) {
++ /* XXX lock_is_stealable() ? */
+ return 0;
++ }
+
+ /*
+ * We can acquire the lock. Remove the waiter from the
+@@ -816,14 +854,10 @@ static int try_to_take_rt_mutex(struct r
+ * not need to be dequeued.
+ */
+ if (rt_mutex_has_waiters(lock)) {
+- /*
+- * If @task->prio is greater than or equal to
+- * the top waiter priority (kernel view),
+- * @task lost.
+- */
+- if (task->prio >= rt_mutex_top_waiter(lock)->prio)
+- return 0;
++ struct task_struct *pown = rt_mutex_top_waiter(lock)->task;
+
++ if (task != pown && !lock_is_stealable(task, pown, mode))
++ return 0;
+ /*
+ * The current top waiter stays enqueued. We
+ * don't have to change anything in the lock
+@@ -872,6 +906,308 @@ static int try_to_take_rt_mutex(struct r
+ return 1;
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++/*
++ * preemptible spin_lock functions:
++ */
++static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
++ void (*slowfn)(struct rt_mutex *lock))
++{
++ might_sleep_no_state_check();
++
++ if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
++ rt_mutex_deadlock_account_lock(lock, current);
++ else
++ slowfn(lock);
++}
++
++static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock,
++ void (*slowfn)(struct rt_mutex *lock))
++{
++ if (likely(rt_mutex_cmpxchg(lock, current, NULL)))
++ rt_mutex_deadlock_account_unlock(current);
++ else
++ slowfn(lock);
++}
++#ifdef CONFIG_SMP
++/*
++ * Note that owner is a speculative pointer and dereferencing relies
++ * on rcu_read_lock() and the check against the lock owner.
++ */
++static int adaptive_wait(struct rt_mutex *lock,
++ struct task_struct *owner)
++{
++ int res = 0;
++
++ rcu_read_lock();
++ for (;;) {
++ if (owner != rt_mutex_owner(lock))
++ break;
++ /*
++ * Ensure that owner->on_cpu is dereferenced _after_
++ * checking the above to be valid.
++ */
++ barrier();
++ if (!owner->on_cpu) {
++ res = 1;
++ break;
++ }
++ cpu_relax();
++ }
++ rcu_read_unlock();
++ return res;
++}
++#else
++static int adaptive_wait(struct rt_mutex *lock,
++ struct task_struct *orig_owner)
++{
++ return 1;
++}
++#endif
++
++# define pi_lock(lock) raw_spin_lock_irq(lock)
++# define pi_unlock(lock) raw_spin_unlock_irq(lock)
++
++static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
++ struct rt_mutex_waiter *waiter,
++ struct task_struct *task,
++ enum rtmutex_chainwalk chwalk);
++/*
++ * Slow path lock function spin_lock style: this variant is very
++ * careful not to miss any non-lock wakeups.
++ *
++ * We store the current state under p->pi_lock in p->saved_state and
++ * the try_to_wake_up() code handles this accordingly.
++ */
++static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
++{
++ struct task_struct *lock_owner, *self = current;
++ struct rt_mutex_waiter waiter, *top_waiter;
++ int ret;
++
++ rt_mutex_init_waiter(&waiter, true);
++
++ raw_spin_lock(&lock->wait_lock);
++
++ if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
++ raw_spin_unlock(&lock->wait_lock);
++ return;
++ }
++
++ BUG_ON(rt_mutex_owner(lock) == self);
++
++ /*
++ * We save whatever state the task is in and we'll restore it
++ * after acquiring the lock taking real wakeups into account
++ * as well. We are serialized via pi_lock against wakeups. See
++ * try_to_wake_up().
++ */
++ pi_lock(&self->pi_lock);
++ self->saved_state = self->state;
++ __set_current_state_no_track(TASK_UNINTERRUPTIBLE);
++ pi_unlock(&self->pi_lock);
++
++ ret = task_blocks_on_rt_mutex(lock, &waiter, self, 0);
++ BUG_ON(ret);
++
++ for (;;) {
++ /* Try to acquire the lock again. */
++ if (__try_to_take_rt_mutex(lock, self, &waiter, STEAL_LATERAL))
++ break;
++
++ top_waiter = rt_mutex_top_waiter(lock);
++ lock_owner = rt_mutex_owner(lock);
++
++ raw_spin_unlock(&lock->wait_lock);
++
++ debug_rt_mutex_print_deadlock(&waiter);
++
++ if (top_waiter != &waiter || adaptive_wait(lock, lock_owner))
++ schedule_rt_mutex(lock);
++
++ raw_spin_lock(&lock->wait_lock);
++
++ pi_lock(&self->pi_lock);
++ __set_current_state_no_track(TASK_UNINTERRUPTIBLE);
++ pi_unlock(&self->pi_lock);
++ }
++
++ /*
++ * Restore the task state to current->saved_state. We set it
++ * to the original state above and the try_to_wake_up() code
++ * has possibly updated it when a real (non-rtmutex) wakeup
++ * happened while we were blocked. Clear saved_state so
++ * try_to_wakeup() does not get confused.
++ */
++ pi_lock(&self->pi_lock);
++ __set_current_state_no_track(self->saved_state);
++ self->saved_state = TASK_RUNNING;
++ pi_unlock(&self->pi_lock);
++
++ /*
++ * try_to_take_rt_mutex() sets the waiter bit
++ * unconditionally. We might have to fix that up:
++ */
++ fixup_rt_mutex_waiters(lock);
++
++ BUG_ON(rt_mutex_has_waiters(lock) && &waiter == rt_mutex_top_waiter(lock));
++ BUG_ON(!RB_EMPTY_NODE(&waiter.tree_entry));
++
++ raw_spin_unlock(&lock->wait_lock);
++
++ debug_rt_mutex_free_waiter(&waiter);
++}
++
++static void wakeup_next_waiter(struct rt_mutex *lock);
++/*
++ * Slow path to release a rt_mutex spin_lock style
++ */
++static void noinline __sched rt_spin_lock_slowunlock(struct rt_mutex *lock)
++{
++ raw_spin_lock(&lock->wait_lock);
++
++ debug_rt_mutex_unlock(lock);
++
++ rt_mutex_deadlock_account_unlock(current);
++
++ if (!rt_mutex_has_waiters(lock)) {
++ lock->owner = NULL;
++ raw_spin_unlock(&lock->wait_lock);
++ return;
++ }
++
++ wakeup_next_waiter(lock);
++
++ raw_spin_unlock(&lock->wait_lock);
++
++ /* Undo pi boosting.when necessary */
++ rt_mutex_adjust_prio(current);
++}
++
++void __lockfunc rt_spin_lock(spinlock_t *lock)
++{
++ rt_spin_lock_fastlock(&lock->lock, rt_spin_lock_slowlock);
++ spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
++}
++EXPORT_SYMBOL(rt_spin_lock);
++
++void __lockfunc __rt_spin_lock(struct rt_mutex *lock)
++{
++ rt_spin_lock_fastlock(lock, rt_spin_lock_slowlock);
++}
++EXPORT_SYMBOL(__rt_spin_lock);
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass)
++{
++ rt_spin_lock_fastlock(&lock->lock, rt_spin_lock_slowlock);
++ spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
++}
++EXPORT_SYMBOL(rt_spin_lock_nested);
++#endif
++
++void __lockfunc rt_spin_unlock(spinlock_t *lock)
++{
++ /* NOTE: we always pass in '1' for nested, for simplicity */
++ spin_release(&lock->dep_map, 1, _RET_IP_);
++ rt_spin_lock_fastunlock(&lock->lock, rt_spin_lock_slowunlock);
++}
++EXPORT_SYMBOL(rt_spin_unlock);
++
++void __lockfunc __rt_spin_unlock(struct rt_mutex *lock)
++{
++ rt_spin_lock_fastunlock(lock, rt_spin_lock_slowunlock);
++}
++EXPORT_SYMBOL(__rt_spin_unlock);
++
++/*
++ * Wait for the lock to get unlocked: instead of polling for an unlock
++ * (like raw spinlocks do), we lock and unlock, to force the kernel to
++ * schedule if there's contention:
++ */
++void __lockfunc rt_spin_unlock_wait(spinlock_t *lock)
++{
++ spin_lock(lock);
++ spin_unlock(lock);
++}
++EXPORT_SYMBOL(rt_spin_unlock_wait);
++
++int __lockfunc rt_spin_trylock(spinlock_t *lock)
++{
++ int ret = rt_mutex_trylock(&lock->lock);
++
++ if (ret)
++ spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
++ return ret;
++}
++EXPORT_SYMBOL(rt_spin_trylock);
++
++int __lockfunc rt_spin_trylock_bh(spinlock_t *lock)
++{
++ int ret;
++
++ local_bh_disable();
++ ret = rt_mutex_trylock(&lock->lock);
++ if (ret) {
++ migrate_disable();
++ spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
++ } else
++ local_bh_enable();
++ return ret;
++}
++EXPORT_SYMBOL(rt_spin_trylock_bh);
++
++int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags)
++{
++ int ret;
++
++ *flags = 0;
++ ret = rt_mutex_trylock(&lock->lock);
++ if (ret) {
++ migrate_disable();
++ spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
++ }
++ return ret;
++}
++EXPORT_SYMBOL(rt_spin_trylock_irqsave);
++
++int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock)
++{
++ /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
++ if (atomic_add_unless(atomic, -1, 1))
++ return 0;
++ migrate_disable();
++ rt_spin_lock(lock);
++ if (atomic_dec_and_test(atomic))
++ return 1;
++ rt_spin_unlock(lock);
++ migrate_enable();
++ return 0;
++}
++EXPORT_SYMBOL(atomic_dec_and_spin_lock);
++
++ void
++__rt_spin_lock_init(spinlock_t *lock, char *name, struct lock_class_key *key)
++{
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ /*
++ * Make sure we are not reinitializing a held lock:
++ */
++ debug_check_no_locks_freed((void *)lock, sizeof(*lock));
++ lockdep_init_map(&lock->dep_map, name, key, 0);
++#endif
++}
++EXPORT_SYMBOL(__rt_spin_lock_init);
++
++#endif /* PREEMPT_RT_FULL */
++
++static inline int
++try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
++ struct rt_mutex_waiter *waiter)
++{
++ return __try_to_take_rt_mutex(lock, task, waiter, STEAL_NORMAL);
++}
++
+ /*
+ * Task blocks on lock.
+ *
+@@ -1021,7 +1357,7 @@ static void wakeup_next_waiter(struct rt
+ * long as we hold lock->wait_lock. The waiter task needs to
+ * acquire it in order to dequeue the waiter.
+ */
+- wake_up_process(waiter->task);
++ rt_mutex_wake_waiter(waiter);
+ }
+
+ /*
+@@ -1103,11 +1439,11 @@ void rt_mutex_adjust_pi(struct task_stru
+ return;
+ }
+ next_lock = waiter->lock;
+- raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+
+ /* gets dropped in rt_mutex_adjust_prio_chain()! */
+ get_task_struct(task);
+
++ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+ rt_mutex_adjust_prio_chain(task, RT_MUTEX_MIN_CHAINWALK, NULL,
+ next_lock, NULL, task);
+ }
+@@ -1193,9 +1529,7 @@ rt_mutex_slowlock(struct rt_mutex *lock,
+ struct rt_mutex_waiter waiter;
+ int ret = 0;
+
+- debug_rt_mutex_init_waiter(&waiter);
+- RB_CLEAR_NODE(&waiter.pi_tree_entry);
+- RB_CLEAR_NODE(&waiter.tree_entry);
++ rt_mutex_init_waiter(&waiter, false);
+
+ raw_spin_lock(&lock->wait_lock);
+
+@@ -1554,13 +1888,12 @@ EXPORT_SYMBOL_GPL(rt_mutex_destroy);
+ void __rt_mutex_init(struct rt_mutex *lock, const char *name)
+ {
+ lock->owner = NULL;
+- raw_spin_lock_init(&lock->wait_lock);
+ lock->waiters = RB_ROOT;
+ lock->waiters_leftmost = NULL;
+
+ debug_rt_mutex_init(lock, name);
+ }
+-EXPORT_SYMBOL_GPL(__rt_mutex_init);
++EXPORT_SYMBOL(__rt_mutex_init);
+
+ /**
+ * rt_mutex_init_proxy_locked - initialize and lock a rt_mutex on behalf of a
+@@ -1575,7 +1908,7 @@ EXPORT_SYMBOL_GPL(__rt_mutex_init);
+ void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
+ struct task_struct *proxy_owner)
+ {
+- __rt_mutex_init(lock, NULL);
++ rt_mutex_init(lock);
+ debug_rt_mutex_proxy_lock(lock, proxy_owner);
+ rt_mutex_set_owner(lock, proxy_owner);
+ rt_mutex_deadlock_account_lock(lock, proxy_owner);
+@@ -1737,3 +2070,25 @@ int rt_mutex_finish_proxy_lock(struct rt
+
+ return ret;
+ }
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++struct ww_mutex {
++};
++struct ww_acquire_ctx {
++};
++int __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
++{
++ BUG();
++}
++EXPORT_SYMBOL_GPL(__ww_mutex_lock);
++int __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
++{
++ BUG();
++}
++EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
++void __sched ww_mutex_unlock(struct ww_mutex *lock)
++{
++ BUG();
++}
++EXPORT_SYMBOL_GPL(ww_mutex_unlock);
++#endif
+--- a/kernel/locking/rtmutex_common.h
++++ b/kernel/locking/rtmutex_common.h
+@@ -49,6 +49,7 @@ struct rt_mutex_waiter {
+ struct rb_node pi_tree_entry;
+ struct task_struct *task;
+ struct rt_mutex *lock;
++ bool savestate;
+ #ifdef CONFIG_DEBUG_RT_MUTEXES
+ unsigned long ip;
+ struct pid *deadlock_task_pid;
+@@ -145,4 +146,14 @@ extern void rt_mutex_adjust_prio(struct
+ # include "rtmutex.h"
+ #endif
+
++static inline void
++rt_mutex_init_waiter(struct rt_mutex_waiter *waiter, bool savestate)
++{
++ debug_rt_mutex_init_waiter(waiter);
++ waiter->task = NULL;
++ waiter->savestate = savestate;
++ RB_CLEAR_NODE(&waiter->pi_tree_entry);
++ RB_CLEAR_NODE(&waiter->tree_entry);
++}
++
+ #endif
+--- a/kernel/locking/spinlock.c
++++ b/kernel/locking/spinlock.c
+@@ -124,8 +124,11 @@ void __lockfunc __raw_##op##_lock_bh(loc
+ * __[spin|read|write]_lock_bh()
+ */
+ BUILD_LOCK_OPS(spin, raw_spinlock);
++
++#ifndef CONFIG_PREEMPT_RT_FULL
+ BUILD_LOCK_OPS(read, rwlock);
+ BUILD_LOCK_OPS(write, rwlock);
++#endif
+
+ #endif
+
+@@ -209,6 +212,8 @@ void __lockfunc _raw_spin_unlock_bh(raw_
+ EXPORT_SYMBOL(_raw_spin_unlock_bh);
+ #endif
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++
+ #ifndef CONFIG_INLINE_READ_TRYLOCK
+ int __lockfunc _raw_read_trylock(rwlock_t *lock)
+ {
+@@ -353,6 +358,8 @@ void __lockfunc _raw_write_unlock_bh(rwl
+ EXPORT_SYMBOL(_raw_write_unlock_bh);
+ #endif
+
++#endif /* !PREEMPT_RT_FULL */
++
+ #ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+ void __lockfunc _raw_spin_lock_nested(raw_spinlock_t *lock, int subclass)
+--- a/kernel/locking/spinlock_debug.c
++++ b/kernel/locking/spinlock_debug.c
+@@ -31,6 +31,7 @@ void __raw_spin_lock_init(raw_spinlock_t
+
+ EXPORT_SYMBOL(__raw_spin_lock_init);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ void __rwlock_init(rwlock_t *lock, const char *name,
+ struct lock_class_key *key)
+ {
+@@ -48,6 +49,7 @@ void __rwlock_init(rwlock_t *lock, const
+ }
+
+ EXPORT_SYMBOL(__rwlock_init);
++#endif
+
+ static void spin_dump(raw_spinlock_t *lock, const char *msg)
+ {
+@@ -159,6 +161,7 @@ void do_raw_spin_unlock(raw_spinlock_t *
+ arch_spin_unlock(&lock->raw_lock);
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ static void rwlock_bug(rwlock_t *lock, const char *msg)
+ {
+ if (!debug_locks_off())
+@@ -300,3 +303,5 @@ void do_raw_write_unlock(rwlock_t *lock)
+ debug_write_unlock(lock);
+ arch_write_unlock(&lock->raw_lock);
+ }
++
++#endif
diff --git a/patches/rt-introduce-cpu-chill.patch b/patches/rt-introduce-cpu-chill.patch
new file mode 100644
index 00000000000000..894b7abca8ed50
--- /dev/null
+++ b/patches/rt-introduce-cpu-chill.patch
@@ -0,0 +1,128 @@
+Subject: rt: Introduce cpu_chill()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 07 Mar 2012 20:51:03 +0100
+
+Retry loops on RT might loop forever when the modifying side was
+preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
+defaults to cpu_relax() for non RT. On RT it puts the looping task to
+sleep for a tick so the preempted task can make progress.
+
+Steven Rostedt changed it to use a hrtimer instead of msleep():
+|
+|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
+|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
+|called from softirq context, it may block the ksoftirqd() from running, in
+|which case, it may never wake up the msleep() causing the deadlock.
+|
+|I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
+|running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
+|runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
+|ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
+|blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
+|
+|The solution is not to convert the cpu_chill() back to a cpu_relax() as that
+|will re-create a possible live lock that the cpu_chill() fixed earlier, and may
+|also leave this bug open on other softirqs. The fix is to remove the
+|dependency on ksoftirqd from cpu_chill(). That is, instead of calling
+|msleep() that requires ksoftirqd to wake it up, use the
+|hrtimer_nanosleep() code that does the wakeup from hard irq context.
+|
+||Looks to be the lock of the block softirq. I don't have the core dump
+||anymore, but from what I could tell the ksoftirqd was blocked on the
+||block softirq lock, where the block softirq handler did a msleep
+||(called by the qla2xxx interrupt handler).
+||
+||Looking at trigger_softirq() in block/blk-softirq.c, it can do a
+||smp_callfunction() to another cpu to run the block softirq. If that
+||happens to be the cpu where the qla2xx irq handler is doing the block
+||softirq and is in a middle of a msleep(), I believe the ksoftirqd will
+||try to run the softirq. If it does that, then BOOM, it's deadlocked
+||because the ksoftirqd will never run the timer softirq either.
+|
+||I should have also stated that it was only one lock that was involved.
+||But the lock owner was doing a msleep() that requires a wakeup by
+||ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
+||held by the msleep() caller, then you have your deadlock.
+||
+||It's best not to have any softirqs going to sleep requiring another
+||softirq to wake it up. Note, if we ever require a timer softirq to do a
+||cpu_chill() it will most definitely hit this deadlock.
+
++ bigeasy: add PF_NOFREEZE:
+| [....] Waiting for /dev to be fully populated...
+| =====================================
+| [ BUG: udevd/229 still has locks held! ]
+| 3.12.11-rt17 #23 Not tainted
+| -------------------------------------
+| 1 lock held by udevd/229:
+| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
+|
+| stack backtrace:
+| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
+| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
+| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
+| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
+| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
+| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
+| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
+| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
+| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
+| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
+| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
+| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
+| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
+| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
+| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
+| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
+| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
+| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/delay.h | 6 ++++++
+ kernel/time/hrtimer.c | 19 +++++++++++++++++++
+ 2 files changed, 25 insertions(+)
+
+--- a/include/linux/delay.h
++++ b/include/linux/delay.h
+@@ -52,4 +52,10 @@ static inline void ssleep(unsigned int s
+ msleep(seconds * 1000);
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++extern void cpu_chill(void);
++#else
++# define cpu_chill() cpu_relax()
++#endif
++
+ #endif /* defined(_LINUX_DELAY_H) */
+--- a/kernel/time/hrtimer.c
++++ b/kernel/time/hrtimer.c
+@@ -1867,6 +1867,25 @@ SYSCALL_DEFINE2(nanosleep, struct timesp
+ return hrtimer_nanosleep(&tu, rmtp, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++/*
++ * Sleep for 1 ms in hope whoever holds what we want will let it go.
++ */
++void cpu_chill(void)
++{
++ struct timespec tu = {
++ .tv_nsec = NSEC_PER_MSEC,
++ };
++ unsigned int freeze_flag = current->flags & PF_NOFREEZE;
++
++ current->flags |= PF_NOFREEZE;
++ hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
++ if (!freeze_flag)
++ current->flags &= ~PF_NOFREEZE;
++}
++EXPORT_SYMBOL(cpu_chill);
++#endif
++
+ /*
+ * Functions related to boot-time initialization:
+ */
diff --git a/patches/rt-local-irq-lock.patch b/patches/rt-local-irq-lock.patch
new file mode 100644
index 00000000000000..fdd9b7d2140c1f
--- /dev/null
+++ b/patches/rt-local-irq-lock.patch
@@ -0,0 +1,323 @@
+Subject: rt: Add local irq locks
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 20 Jun 2011 09:03:47 +0200
+
+Introduce locallock. For !RT this maps to preempt_disable()/
+local_irq_disable() so there is not much that changes. For RT this will
+map to a spinlock. This makes preemption possible and locked "ressource"
+gets the lockdep anotation it wouldn't have otherwise. The locks are
+recursive for owner == current. Also, all locks user migrate_disable()
+which ensures that the task is not migrated to another CPU while the lock
+is held and the owner is preempted.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/locallock.h | 264 ++++++++++++++++++++++++++++++++++++++++++++++
+ include/linux/percpu.h | 29 +++++
+ 2 files changed, 293 insertions(+)
+
+--- /dev/null
++++ b/include/linux/locallock.h
+@@ -0,0 +1,264 @@
++#ifndef _LINUX_LOCALLOCK_H
++#define _LINUX_LOCALLOCK_H
++
++#include <linux/percpu.h>
++#include <linux/spinlock.h>
++
++#ifdef CONFIG_PREEMPT_RT_BASE
++
++#ifdef CONFIG_DEBUG_SPINLOCK
++# define LL_WARN(cond) WARN_ON(cond)
++#else
++# define LL_WARN(cond) do { } while (0)
++#endif
++
++/*
++ * per cpu lock based substitute for local_irq_*()
++ */
++struct local_irq_lock {
++ spinlock_t lock;
++ struct task_struct *owner;
++ int nestcnt;
++ unsigned long flags;
++};
++
++#define DEFINE_LOCAL_IRQ_LOCK(lvar) \
++ DEFINE_PER_CPU(struct local_irq_lock, lvar) = { \
++ .lock = __SPIN_LOCK_UNLOCKED((lvar).lock) }
++
++#define DECLARE_LOCAL_IRQ_LOCK(lvar) \
++ DECLARE_PER_CPU(struct local_irq_lock, lvar)
++
++#define local_irq_lock_init(lvar) \
++ do { \
++ int __cpu; \
++ for_each_possible_cpu(__cpu) \
++ spin_lock_init(&per_cpu(lvar, __cpu).lock); \
++ } while (0)
++
++/*
++ * spin_lock|trylock|unlock_local flavour that does not migrate disable
++ * used for __local_lock|trylock|unlock where get_local_var/put_local_var
++ * already takes care of the migrate_disable/enable
++ * for CONFIG_PREEMPT_BASE map to the normal spin_* calls.
++ */
++# define spin_lock_local(lock) spin_lock(lock)
++# define spin_trylock_local(lock) spin_trylock(lock)
++# define spin_unlock_local(lock) spin_unlock(lock)
++
++static inline void __local_lock(struct local_irq_lock *lv)
++{
++ if (lv->owner != current) {
++ spin_lock_local(&lv->lock);
++ LL_WARN(lv->owner);
++ LL_WARN(lv->nestcnt);
++ lv->owner = current;
++ }
++ lv->nestcnt++;
++}
++
++#define local_lock(lvar) \
++ do { __local_lock(&get_local_var(lvar)); } while (0)
++
++static inline int __local_trylock(struct local_irq_lock *lv)
++{
++ if (lv->owner != current && spin_trylock_local(&lv->lock)) {
++ LL_WARN(lv->owner);
++ LL_WARN(lv->nestcnt);
++ lv->owner = current;
++ lv->nestcnt = 1;
++ return 1;
++ }
++ return 0;
++}
++
++#define local_trylock(lvar) \
++ ({ \
++ int __locked; \
++ __locked = __local_trylock(&get_local_var(lvar)); \
++ if (!__locked) \
++ put_local_var(lvar); \
++ __locked; \
++ })
++
++static inline void __local_unlock(struct local_irq_lock *lv)
++{
++ LL_WARN(lv->nestcnt == 0);
++ LL_WARN(lv->owner != current);
++ if (--lv->nestcnt)
++ return;
++
++ lv->owner = NULL;
++ spin_unlock_local(&lv->lock);
++}
++
++#define local_unlock(lvar) \
++ do { \
++ __local_unlock(this_cpu_ptr(&lvar)); \
++ put_local_var(lvar); \
++ } while (0)
++
++static inline void __local_lock_irq(struct local_irq_lock *lv)
++{
++ spin_lock_irqsave(&lv->lock, lv->flags);
++ LL_WARN(lv->owner);
++ LL_WARN(lv->nestcnt);
++ lv->owner = current;
++ lv->nestcnt = 1;
++}
++
++#define local_lock_irq(lvar) \
++ do { __local_lock_irq(&get_local_var(lvar)); } while (0)
++
++#define local_lock_irq_on(lvar, cpu) \
++ do { __local_lock_irq(&per_cpu(lvar, cpu)); } while (0)
++
++static inline void __local_unlock_irq(struct local_irq_lock *lv)
++{
++ LL_WARN(!lv->nestcnt);
++ LL_WARN(lv->owner != current);
++ lv->owner = NULL;
++ lv->nestcnt = 0;
++ spin_unlock_irq(&lv->lock);
++}
++
++#define local_unlock_irq(lvar) \
++ do { \
++ __local_unlock_irq(this_cpu_ptr(&lvar)); \
++ put_local_var(lvar); \
++ } while (0)
++
++#define local_unlock_irq_on(lvar, cpu) \
++ do { \
++ __local_unlock_irq(&per_cpu(lvar, cpu)); \
++ } while (0)
++
++static inline int __local_lock_irqsave(struct local_irq_lock *lv)
++{
++ if (lv->owner != current) {
++ __local_lock_irq(lv);
++ return 0;
++ } else {
++ lv->nestcnt++;
++ return 1;
++ }
++}
++
++#define local_lock_irqsave(lvar, _flags) \
++ do { \
++ if (__local_lock_irqsave(&get_local_var(lvar))) \
++ put_local_var(lvar); \
++ _flags = __this_cpu_read(lvar.flags); \
++ } while (0)
++
++#define local_lock_irqsave_on(lvar, _flags, cpu) \
++ do { \
++ __local_lock_irqsave(&per_cpu(lvar, cpu)); \
++ _flags = per_cpu(lvar, cpu).flags; \
++ } while (0)
++
++static inline int __local_unlock_irqrestore(struct local_irq_lock *lv,
++ unsigned long flags)
++{
++ LL_WARN(!lv->nestcnt);
++ LL_WARN(lv->owner != current);
++ if (--lv->nestcnt)
++ return 0;
++
++ lv->owner = NULL;
++ spin_unlock_irqrestore(&lv->lock, lv->flags);
++ return 1;
++}
++
++#define local_unlock_irqrestore(lvar, flags) \
++ do { \
++ if (__local_unlock_irqrestore(this_cpu_ptr(&lvar), flags)) \
++ put_local_var(lvar); \
++ } while (0)
++
++#define local_unlock_irqrestore_on(lvar, flags, cpu) \
++ do { \
++ __local_unlock_irqrestore(&per_cpu(lvar, cpu), flags); \
++ } while (0)
++
++#define local_spin_trylock_irq(lvar, lock) \
++ ({ \
++ int __locked; \
++ local_lock_irq(lvar); \
++ __locked = spin_trylock(lock); \
++ if (!__locked) \
++ local_unlock_irq(lvar); \
++ __locked; \
++ })
++
++#define local_spin_lock_irq(lvar, lock) \
++ do { \
++ local_lock_irq(lvar); \
++ spin_lock(lock); \
++ } while (0)
++
++#define local_spin_unlock_irq(lvar, lock) \
++ do { \
++ spin_unlock(lock); \
++ local_unlock_irq(lvar); \
++ } while (0)
++
++#define local_spin_lock_irqsave(lvar, lock, flags) \
++ do { \
++ local_lock_irqsave(lvar, flags); \
++ spin_lock(lock); \
++ } while (0)
++
++#define local_spin_unlock_irqrestore(lvar, lock, flags) \
++ do { \
++ spin_unlock(lock); \
++ local_unlock_irqrestore(lvar, flags); \
++ } while (0)
++
++#define get_locked_var(lvar, var) \
++ (*({ \
++ local_lock(lvar); \
++ this_cpu_ptr(&var); \
++ }))
++
++#define put_locked_var(lvar, var) local_unlock(lvar);
++
++#define local_lock_cpu(lvar) \
++ ({ \
++ local_lock(lvar); \
++ smp_processor_id(); \
++ })
++
++#define local_unlock_cpu(lvar) local_unlock(lvar)
++
++#else /* PREEMPT_RT_BASE */
++
++#define DEFINE_LOCAL_IRQ_LOCK(lvar) __typeof__(const int) lvar
++#define DECLARE_LOCAL_IRQ_LOCK(lvar) extern __typeof__(const int) lvar
++
++static inline void local_irq_lock_init(int lvar) { }
++
++#define local_lock(lvar) preempt_disable()
++#define local_unlock(lvar) preempt_enable()
++#define local_lock_irq(lvar) local_irq_disable()
++#define local_unlock_irq(lvar) local_irq_enable()
++#define local_lock_irqsave(lvar, flags) local_irq_save(flags)
++#define local_unlock_irqrestore(lvar, flags) local_irq_restore(flags)
++
++#define local_spin_trylock_irq(lvar, lock) spin_trylock_irq(lock)
++#define local_spin_lock_irq(lvar, lock) spin_lock_irq(lock)
++#define local_spin_unlock_irq(lvar, lock) spin_unlock_irq(lock)
++#define local_spin_lock_irqsave(lvar, lock, flags) \
++ spin_lock_irqsave(lock, flags)
++#define local_spin_unlock_irqrestore(lvar, lock, flags) \
++ spin_unlock_irqrestore(lock, flags)
++
++#define get_locked_var(lvar, var) get_cpu_var(var)
++#define put_locked_var(lvar, var) put_cpu_var(var)
++
++#define local_lock_cpu(lvar) get_cpu()
++#define local_unlock_cpu(lvar) put_cpu()
++
++#endif
++
++#endif
+--- a/include/linux/percpu.h
++++ b/include/linux/percpu.h
+@@ -24,6 +24,35 @@
+ PERCPU_MODULE_RESERVE)
+ #endif
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++
++#define get_local_var(var) (*({ \
++ migrate_disable(); \
++ this_cpu_ptr(&var); }))
++
++#define put_local_var(var) do { \
++ (void)&(var); \
++ migrate_enable(); \
++} while (0)
++
++# define get_local_ptr(var) ({ \
++ migrate_disable(); \
++ this_cpu_ptr(var); })
++
++# define put_local_ptr(var) do { \
++ (void)(var); \
++ migrate_enable(); \
++} while (0)
++
++#else
++
++#define get_local_var(var) get_cpu_var(var)
++#define put_local_var(var) put_cpu_var(var)
++#define get_local_ptr(var) get_cpu_ptr(var)
++#define put_local_ptr(var) put_cpu_ptr(var)
++
++#endif
++
+ /* minimum unit size, also is the maximum supported allocation size */
+ #define PCPU_MIN_UNIT_SIZE PFN_ALIGN(32 << 10)
+
diff --git a/patches/rt-preempt-base-config.patch b/patches/rt-preempt-base-config.patch
new file mode 100644
index 00000000000000..3f962fb7497ce8
--- /dev/null
+++ b/patches/rt-preempt-base-config.patch
@@ -0,0 +1,53 @@
+Subject: rt: Provide PREEMPT_RT_BASE config switch
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 17 Jun 2011 12:39:57 +0200
+
+Introduce PREEMPT_RT_BASE which enables parts of
+PREEMPT_RT_FULL. Forces interrupt threading and enables some of the RT
+substitutions for testing.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/Kconfig.preempt | 19 +++++++++++++++++--
+ 1 file changed, 17 insertions(+), 2 deletions(-)
+
+--- a/kernel/Kconfig.preempt
++++ b/kernel/Kconfig.preempt
+@@ -1,3 +1,10 @@
++config PREEMPT
++ bool
++ select PREEMPT_COUNT
++
++config PREEMPT_RT_BASE
++ bool
++ select PREEMPT
+
+ choice
+ prompt "Preemption Model"
+@@ -33,9 +40,9 @@ config PREEMPT_VOLUNTARY
+
+ Select this if you are building a kernel for a desktop system.
+
+-config PREEMPT
++config PREEMPT__LL
+ bool "Preemptible Kernel (Low-Latency Desktop)"
+- select PREEMPT_COUNT
++ select PREEMPT
+ select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK
+ help
+ This option reduces the latency of the kernel by making
+@@ -52,6 +59,14 @@ config PREEMPT
+ embedded system with latency requirements in the milliseconds
+ range.
+
++config PREEMPT_RTB
++ bool "Preemptible Kernel (Basic RT)"
++ select PREEMPT_RT_BASE
++ help
++ This option is basically the same as (Low-Latency Desktop) but
++ enables changes which are preliminary for the full preemptible
++ RT kernel.
++
+ endchoice
+
+ config PREEMPT_COUNT
diff --git a/patches/rt-serial-warn-fix.patch b/patches/rt-serial-warn-fix.patch
new file mode 100644
index 00000000000000..cf13097b5e7cfa
--- /dev/null
+++ b/patches/rt-serial-warn-fix.patch
@@ -0,0 +1,37 @@
+Subject: rt: Improve the serial console PASS_LIMIT
+From: Ingo Molnar <mingo@elte.hu>
+Date: Wed Dec 14 13:05:54 CET 2011
+
+Beyond the warning:
+
+ drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]
+
+the solution of just looping infinitely was ugly - up it to 1 million to
+give it a chance to continue in some really ugly situation.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/tty/serial/8250/8250_core.c | 11 ++++++++++-
+ 1 file changed, 10 insertions(+), 1 deletion(-)
+
+--- a/drivers/tty/serial/8250/8250_core.c
++++ b/drivers/tty/serial/8250/8250_core.c
+@@ -80,7 +80,16 @@ static unsigned int skip_txen_test; /* f
+ #define DEBUG_INTR(fmt...) do { } while (0)
+ #endif
+
+-#define PASS_LIMIT 512
++/*
++ * On -rt we can have a more delays, and legitimately
++ * so - so don't drop work spuriously and spam the
++ * syslog:
++ */
++#ifdef CONFIG_PREEMPT_RT_FULL
++# define PASS_LIMIT 1000000
++#else
++# define PASS_LIMIT 512
++#endif
+
+ #define BOTH_EMPTY (UART_LSR_TEMT | UART_LSR_THRE)
+
diff --git a/patches/rtmutex-add-a-first-shot-of-ww_mutex.patch b/patches/rtmutex-add-a-first-shot-of-ww_mutex.patch
new file mode 100644
index 00000000000000..97c8c8c1cb30bf
--- /dev/null
+++ b/patches/rtmutex-add-a-first-shot-of-ww_mutex.patch
@@ -0,0 +1,423 @@
+From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
+Date: Mon, 28 Oct 2013 09:36:37 +0100
+Subject: rtmutex: Add RT aware ww locks
+
+lockdep says:
+| --------------------------------------------------------------------------
+| | Wound/wait tests |
+| ---------------------
+| ww api failures: ok | ok | ok |
+| ww contexts mixing: ok | ok |
+| finishing ww context: ok | ok | ok | ok |
+| locking mismatches: ok | ok | ok |
+| EDEADLK handling: ok | ok | ok | ok | ok | ok | ok | ok | ok | ok |
+| spinlock nest unlocked: ok |
+| -----------------------------------------------------
+| |block | try |context|
+| -----------------------------------------------------
+| context: ok | ok | ok |
+| try: ok | ok | ok |
+| block: ok | ok | ok |
+| spinlock: ok | ok | ok |
+
+Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
+---
+ kernel/locking/rtmutex.c | 251 ++++++++++++++++++++++++++++++++++++++++++-----
+ 1 file changed, 226 insertions(+), 25 deletions(-)
+
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -21,6 +21,7 @@
+ #include <linux/sched/rt.h>
+ #include <linux/sched/deadline.h>
+ #include <linux/timer.h>
++#include <linux/ww_mutex.h>
+
+ #include "rtmutex_common.h"
+
+@@ -1201,6 +1202,40 @@ EXPORT_SYMBOL(__rt_spin_lock_init);
+
+ #endif /* PREEMPT_RT_FULL */
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++ static inline int __sched
++__mutex_lock_check_stamp(struct rt_mutex *lock, struct ww_acquire_ctx *ctx)
++{
++ struct ww_mutex *ww = container_of(lock, struct ww_mutex, base.lock);
++ struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
++
++ if (!hold_ctx)
++ return 0;
++
++ if (unlikely(ctx == hold_ctx))
++ return -EALREADY;
++
++ if (ctx->stamp - hold_ctx->stamp <= LONG_MAX &&
++ (ctx->stamp != hold_ctx->stamp || ctx > hold_ctx)) {
++#ifdef CONFIG_DEBUG_MUTEXES
++ DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
++ ctx->contending_lock = ww;
++#endif
++ return -EDEADLK;
++ }
++
++ return 0;
++}
++#else
++ static inline int __sched
++__mutex_lock_check_stamp(struct rt_mutex *lock, struct ww_acquire_ctx *ctx)
++{
++ BUG();
++ return 0;
++}
++
++#endif
++
+ static inline int
+ try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
+ struct rt_mutex_waiter *waiter)
+@@ -1461,7 +1496,8 @@ void rt_mutex_adjust_pi(struct task_stru
+ static int __sched
+ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
+ struct hrtimer_sleeper *timeout,
+- struct rt_mutex_waiter *waiter)
++ struct rt_mutex_waiter *waiter,
++ struct ww_acquire_ctx *ww_ctx)
+ {
+ int ret = 0;
+
+@@ -1484,6 +1520,12 @@ static int __sched
+ break;
+ }
+
++ if (ww_ctx && ww_ctx->acquired > 0) {
++ ret = __mutex_lock_check_stamp(lock, ww_ctx);
++ if (ret)
++ break;
++ }
++
+ raw_spin_unlock(&lock->wait_lock);
+
+ debug_rt_mutex_print_deadlock(waiter);
+@@ -1518,13 +1560,90 @@ static void rt_mutex_handle_deadlock(int
+ }
+ }
+
++static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
++ struct ww_acquire_ctx *ww_ctx)
++{
++#ifdef CONFIG_DEBUG_MUTEXES
++ /*
++ * If this WARN_ON triggers, you used ww_mutex_lock to acquire,
++ * but released with a normal mutex_unlock in this call.
++ *
++ * This should never happen, always use ww_mutex_unlock.
++ */
++ DEBUG_LOCKS_WARN_ON(ww->ctx);
++
++ /*
++ * Not quite done after calling ww_acquire_done() ?
++ */
++ DEBUG_LOCKS_WARN_ON(ww_ctx->done_acquire);
++
++ if (ww_ctx->contending_lock) {
++ /*
++ * After -EDEADLK you tried to
++ * acquire a different ww_mutex? Bad!
++ */
++ DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww);
++
++ /*
++ * You called ww_mutex_lock after receiving -EDEADLK,
++ * but 'forgot' to unlock everything else first?
++ */
++ DEBUG_LOCKS_WARN_ON(ww_ctx->acquired > 0);
++ ww_ctx->contending_lock = NULL;
++ }
++
++ /*
++ * Naughty, using a different class will lead to undefined behavior!
++ */
++ DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
++#endif
++ ww_ctx->acquired++;
++}
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++static void ww_mutex_account_lock(struct rt_mutex *lock,
++ struct ww_acquire_ctx *ww_ctx)
++{
++ struct ww_mutex *ww = container_of(lock, struct ww_mutex, base.lock);
++ struct rt_mutex_waiter *waiter, *n;
++
++ /*
++ * This branch gets optimized out for the common case,
++ * and is only important for ww_mutex_lock.
++ */
++ ww_mutex_lock_acquired(ww, ww_ctx);
++ ww->ctx = ww_ctx;
++
++ /*
++ * Give any possible sleeping processes the chance to wake up,
++ * so they can recheck if they have to back off.
++ */
++ rbtree_postorder_for_each_entry_safe(waiter, n, &lock->waiters,
++ tree_entry) {
++ /* XXX debug rt mutex waiter wakeup */
++
++ BUG_ON(waiter->lock != lock);
++ rt_mutex_wake_waiter(waiter);
++ }
++}
++
++#else
++
++static void ww_mutex_account_lock(struct rt_mutex *lock,
++ struct ww_acquire_ctx *ww_ctx)
++{
++ BUG();
++}
++#endif
++
+ /*
+ * Slow path lock function:
+ */
+ static int __sched
+ rt_mutex_slowlock(struct rt_mutex *lock, int state,
+ struct hrtimer_sleeper *timeout,
+- enum rtmutex_chainwalk chwalk)
++ enum rtmutex_chainwalk chwalk,
++ struct ww_acquire_ctx *ww_ctx)
+ {
+ struct rt_mutex_waiter waiter;
+ int ret = 0;
+@@ -1535,6 +1654,8 @@ rt_mutex_slowlock(struct rt_mutex *lock,
+
+ /* Try to acquire the lock again: */
+ if (try_to_take_rt_mutex(lock, current, NULL)) {
++ if (ww_ctx)
++ ww_mutex_account_lock(lock, ww_ctx);
+ raw_spin_unlock(&lock->wait_lock);
+ return 0;
+ }
+@@ -1552,13 +1673,23 @@ rt_mutex_slowlock(struct rt_mutex *lock,
+
+ if (likely(!ret))
+ /* sleep on the mutex */
+- ret = __rt_mutex_slowlock(lock, state, timeout, &waiter);
++ ret = __rt_mutex_slowlock(lock, state, timeout, &waiter,
++ ww_ctx);
++ else if (ww_ctx) {
++ /* ww_mutex received EDEADLK, let it become EALREADY */
++ ret = __mutex_lock_check_stamp(lock, ww_ctx);
++ BUG_ON(!ret);
++ }
+
+ if (unlikely(ret)) {
+ __set_current_state(TASK_RUNNING);
+ if (rt_mutex_has_waiters(lock))
+ remove_waiter(lock, &waiter);
+- rt_mutex_handle_deadlock(ret, chwalk, &waiter);
++ /* ww_mutex want to report EDEADLK/EALREADY, let them */
++ if (!ww_ctx)
++ rt_mutex_handle_deadlock(ret, chwalk, &waiter);
++ } else if (ww_ctx) {
++ ww_mutex_account_lock(lock, ww_ctx);
+ }
+
+ /*
+@@ -1682,31 +1813,36 @@ rt_mutex_slowunlock(struct rt_mutex *loc
+ */
+ static inline int
+ rt_mutex_fastlock(struct rt_mutex *lock, int state,
++ struct ww_acquire_ctx *ww_ctx,
+ int (*slowfn)(struct rt_mutex *lock, int state,
+ struct hrtimer_sleeper *timeout,
+- enum rtmutex_chainwalk chwalk))
++ enum rtmutex_chainwalk chwalk,
++ struct ww_acquire_ctx *ww_ctx))
+ {
+ if (likely(rt_mutex_cmpxchg(lock, NULL, current))) {
+ rt_mutex_deadlock_account_lock(lock, current);
+ return 0;
+ } else
+- return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK);
++ return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK,
++ ww_ctx);
+ }
+
+ static inline int
+ rt_mutex_timed_fastlock(struct rt_mutex *lock, int state,
+ struct hrtimer_sleeper *timeout,
+ enum rtmutex_chainwalk chwalk,
++ struct ww_acquire_ctx *ww_ctx,
+ int (*slowfn)(struct rt_mutex *lock, int state,
+ struct hrtimer_sleeper *timeout,
+- enum rtmutex_chainwalk chwalk))
++ enum rtmutex_chainwalk chwalk,
++ struct ww_acquire_ctx *ww_ctx))
+ {
+ if (chwalk == RT_MUTEX_MIN_CHAINWALK &&
+ likely(rt_mutex_cmpxchg(lock, NULL, current))) {
+ rt_mutex_deadlock_account_lock(lock, current);
+ return 0;
+ } else
+- return slowfn(lock, state, timeout, chwalk);
++ return slowfn(lock, state, timeout, chwalk, ww_ctx);
+ }
+
+ static inline int
+@@ -1741,7 +1877,7 @@ void __sched rt_mutex_lock(struct rt_mut
+ {
+ might_sleep();
+
+- rt_mutex_fastlock(lock, TASK_UNINTERRUPTIBLE, rt_mutex_slowlock);
++ rt_mutex_fastlock(lock, TASK_UNINTERRUPTIBLE, NULL, rt_mutex_slowlock);
+ }
+ EXPORT_SYMBOL_GPL(rt_mutex_lock);
+
+@@ -1758,7 +1894,7 @@ int __sched rt_mutex_lock_interruptible(
+ {
+ might_sleep();
+
+- return rt_mutex_fastlock(lock, TASK_INTERRUPTIBLE, rt_mutex_slowlock);
++ return rt_mutex_fastlock(lock, TASK_INTERRUPTIBLE, NULL, rt_mutex_slowlock);
+ }
+ EXPORT_SYMBOL_GPL(rt_mutex_lock_interruptible);
+
+@@ -1771,7 +1907,7 @@ int rt_mutex_timed_futex_lock(struct rt_
+ might_sleep();
+
+ return rt_mutex_timed_fastlock(lock, TASK_INTERRUPTIBLE, timeout,
+- RT_MUTEX_FULL_CHAINWALK,
++ RT_MUTEX_FULL_CHAINWALK, NULL,
+ rt_mutex_slowlock);
+ }
+
+@@ -1790,7 +1926,7 @@ int __sched rt_mutex_lock_killable(struc
+ {
+ might_sleep();
+
+- return rt_mutex_fastlock(lock, TASK_KILLABLE, rt_mutex_slowlock);
++ return rt_mutex_fastlock(lock, TASK_KILLABLE, NULL, rt_mutex_slowlock);
+ }
+ EXPORT_SYMBOL_GPL(rt_mutex_lock_killable);
+
+@@ -1814,6 +1950,7 @@ rt_mutex_timed_lock(struct rt_mutex *loc
+
+ return rt_mutex_timed_fastlock(lock, TASK_INTERRUPTIBLE, timeout,
+ RT_MUTEX_MIN_CHAINWALK,
++ NULL,
+ rt_mutex_slowlock);
+ }
+ EXPORT_SYMBOL_GPL(rt_mutex_timed_lock);
+@@ -2055,7 +2192,7 @@ int rt_mutex_finish_proxy_lock(struct rt
+ set_current_state(TASK_INTERRUPTIBLE);
+
+ /* sleep on the mutex */
+- ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
++ ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter, NULL);
+
+ if (unlikely(ret))
+ remove_waiter(lock, waiter);
+@@ -2071,24 +2208,88 @@ int rt_mutex_finish_proxy_lock(struct rt
+ return ret;
+ }
+
+-#ifdef CONFIG_PREEMPT_RT_FULL
+-struct ww_mutex {
+-};
+-struct ww_acquire_ctx {
+-};
+-int __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
++static inline int
++ww_mutex_deadlock_injection(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
+ {
+- BUG();
++#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
++ unsigned tmp;
++
++ if (ctx->deadlock_inject_countdown-- == 0) {
++ tmp = ctx->deadlock_inject_interval;
++ if (tmp > UINT_MAX/4)
++ tmp = UINT_MAX;
++ else
++ tmp = tmp*2 + tmp + tmp/2;
++
++ ctx->deadlock_inject_interval = tmp;
++ ctx->deadlock_inject_countdown = tmp;
++ ctx->contending_lock = lock;
++
++ ww_mutex_unlock(lock);
++
++ return -EDEADLK;
++ }
++#endif
++
++ return 0;
+ }
+-EXPORT_SYMBOL_GPL(__ww_mutex_lock);
+-int __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++int __sched
++__ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
+ {
+- BUG();
++ int ret;
++
++ might_sleep();
++
++ mutex_acquire_nest(&lock->base.dep_map, 0, 0, &ww_ctx->dep_map, _RET_IP_);
++ ret = rt_mutex_slowlock(&lock->base.lock, TASK_INTERRUPTIBLE, NULL, 0, ww_ctx);
++ if (ret)
++ mutex_release(&lock->base.dep_map, 1, _RET_IP_);
++ else if (!ret && ww_ctx->acquired > 1)
++ return ww_mutex_deadlock_injection(lock, ww_ctx);
++
++ return ret;
+ }
+ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
++
++int __sched
++__ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
++{
++ int ret;
++
++ might_sleep();
++
++ mutex_acquire_nest(&lock->base.dep_map, 0, 0, &ww_ctx->dep_map, _RET_IP_);
++ ret = rt_mutex_slowlock(&lock->base.lock, TASK_UNINTERRUPTIBLE, NULL, 0, ww_ctx);
++ if (ret)
++ mutex_release(&lock->base.dep_map, 1, _RET_IP_);
++ else if (!ret && ww_ctx->acquired > 1)
++ return ww_mutex_deadlock_injection(lock, ww_ctx);
++
++ return ret;
++}
++EXPORT_SYMBOL_GPL(__ww_mutex_lock);
++
+ void __sched ww_mutex_unlock(struct ww_mutex *lock)
+ {
+- BUG();
++ int nest = !!lock->ctx;
++
++ /*
++ * The unlocking fastpath is the 0->1 transition from 'locked'
++ * into 'unlocked' state:
++ */
++ if (nest) {
++#ifdef CONFIG_DEBUG_MUTEXES
++ DEBUG_LOCKS_WARN_ON(!lock->ctx->acquired);
++#endif
++ if (lock->ctx->acquired > 0)
++ lock->ctx->acquired--;
++ lock->ctx = NULL;
++ }
++
++ mutex_release(&lock->base.dep_map, nest, _RET_IP_);
++ rt_mutex_unlock(&lock->base.lock);
+ }
+-EXPORT_SYMBOL_GPL(ww_mutex_unlock);
++EXPORT_SYMBOL(ww_mutex_unlock);
+ #endif
diff --git a/patches/rtmutex-avoid-include-hell.patch b/patches/rtmutex-avoid-include-hell.patch
new file mode 100644
index 00000000000000..20d85673292ede
--- /dev/null
+++ b/patches/rtmutex-avoid-include-hell.patch
@@ -0,0 +1,23 @@
+Subject: rtmutex: Avoid include hell
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 29 Jun 2011 20:06:39 +0200
+
+Include only the required raw types. This avoids pulling in the
+complete spinlock header which in turn requires rtmutex.h at some point.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/rtmutex.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/include/linux/rtmutex.h
++++ b/include/linux/rtmutex.h
+@@ -14,7 +14,7 @@
+
+ #include <linux/linkage.h>
+ #include <linux/rbtree.h>
+-#include <linux/spinlock_types.h>
++#include <linux/spinlock_types_raw.h>
+
+ extern int max_lock_depth; /* for sysctl */
+
diff --git a/patches/rtmutex-futex-prepare-rt.patch b/patches/rtmutex-futex-prepare-rt.patch
new file mode 100644
index 00000000000000..b70f3bb4cdd23a
--- /dev/null
+++ b/patches/rtmutex-futex-prepare-rt.patch
@@ -0,0 +1,238 @@
+Subject: rtmutex: Handle the various new futex race conditions
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 10 Jun 2011 11:04:15 +0200
+
+RT opens a few new interesting race conditions in the rtmutex/futex
+combo due to futex hash bucket lock being a 'sleeping' spinlock and
+therefor not disabling preemption.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/futex.c | 77 ++++++++++++++++++++++++++++++++--------
+ kernel/locking/rtmutex.c | 36 +++++++++++++++---
+ kernel/locking/rtmutex_common.h | 2 +
+ 3 files changed, 94 insertions(+), 21 deletions(-)
+
+--- a/kernel/futex.c
++++ b/kernel/futex.c
+@@ -1717,6 +1717,16 @@ static int futex_requeue(u32 __user *uad
+ requeue_pi_wake_futex(this, &key2, hb2);
+ drop_count++;
+ continue;
++ } else if (ret == -EAGAIN) {
++ /*
++ * Waiter was woken by timeout or
++ * signal and has set pi_blocked_on to
++ * PI_WAKEUP_INPROGRESS before we
++ * tried to enqueue it on the rtmutex.
++ */
++ this->pi_state = NULL;
++ free_pi_state(pi_state);
++ continue;
+ } else if (ret) {
+ /* -EDEADLK */
+ this->pi_state = NULL;
+@@ -2576,7 +2586,7 @@ static int futex_wait_requeue_pi(u32 __u
+ struct hrtimer_sleeper timeout, *to = NULL;
+ struct rt_mutex_waiter rt_waiter;
+ struct rt_mutex *pi_mutex = NULL;
+- struct futex_hash_bucket *hb;
++ struct futex_hash_bucket *hb, *hb2;
+ union futex_key key2 = FUTEX_KEY_INIT;
+ struct futex_q q = futex_q_init;
+ int res, ret;
+@@ -2635,20 +2645,55 @@ static int futex_wait_requeue_pi(u32 __u
+ /* Queue the futex_q, drop the hb lock, wait for wakeup. */
+ futex_wait_queue_me(hb, &q, to);
+
+- spin_lock(&hb->lock);
+- ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
+- spin_unlock(&hb->lock);
+- if (ret)
+- goto out_put_keys;
++ /*
++ * On RT we must avoid races with requeue and trying to block
++ * on two mutexes (hb->lock and uaddr2's rtmutex) by
++ * serializing access to pi_blocked_on with pi_lock.
++ */
++ raw_spin_lock_irq(&current->pi_lock);
++ if (current->pi_blocked_on) {
++ /*
++ * We have been requeued or are in the process of
++ * being requeued.
++ */
++ raw_spin_unlock_irq(&current->pi_lock);
++ } else {
++ /*
++ * Setting pi_blocked_on to PI_WAKEUP_INPROGRESS
++ * prevents a concurrent requeue from moving us to the
++ * uaddr2 rtmutex. After that we can safely acquire
++ * (and possibly block on) hb->lock.
++ */
++ current->pi_blocked_on = PI_WAKEUP_INPROGRESS;
++ raw_spin_unlock_irq(&current->pi_lock);
++
++ spin_lock(&hb->lock);
++
++ /*
++ * Clean up pi_blocked_on. We might leak it otherwise
++ * when we succeeded with the hb->lock in the fast
++ * path.
++ */
++ raw_spin_lock_irq(&current->pi_lock);
++ current->pi_blocked_on = NULL;
++ raw_spin_unlock_irq(&current->pi_lock);
++
++ ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
++ spin_unlock(&hb->lock);
++ if (ret)
++ goto out_put_keys;
++ }
+
+ /*
+- * In order for us to be here, we know our q.key == key2, and since
+- * we took the hb->lock above, we also know that futex_requeue() has
+- * completed and we no longer have to concern ourselves with a wakeup
+- * race with the atomic proxy lock acquisition by the requeue code. The
+- * futex_requeue dropped our key1 reference and incremented our key2
+- * reference count.
++ * In order to be here, we have either been requeued, are in
++ * the process of being requeued, or requeue successfully
++ * acquired uaddr2 on our behalf. If pi_blocked_on was
++ * non-null above, we may be racing with a requeue. Do not
++ * rely on q->lock_ptr to be hb2->lock until after blocking on
++ * hb->lock or hb2->lock. The futex_requeue dropped our key1
++ * reference and incremented our key2 reference count.
+ */
++ hb2 = hash_futex(&key2);
+
+ /* Check if the requeue code acquired the second futex for us. */
+ if (!q.rt_waiter) {
+@@ -2657,9 +2702,10 @@ static int futex_wait_requeue_pi(u32 __u
+ * did a lock-steal - fix up the PI-state in that case.
+ */
+ if (q.pi_state && (q.pi_state->owner != current)) {
+- spin_lock(q.lock_ptr);
++ spin_lock(&hb2->lock);
++ BUG_ON(&hb2->lock != q.lock_ptr);
+ ret = fixup_pi_state_owner(uaddr2, &q, current);
+- spin_unlock(q.lock_ptr);
++ spin_unlock(&hb2->lock);
+ }
+ } else {
+ /*
+@@ -2672,7 +2718,8 @@ static int futex_wait_requeue_pi(u32 __u
+ ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter);
+ debug_rt_mutex_free_waiter(&rt_waiter);
+
+- spin_lock(q.lock_ptr);
++ spin_lock(&hb2->lock);
++ BUG_ON(&hb2->lock != q.lock_ptr);
+ /*
+ * Fixup the pi_state owner and possibly acquire the lock if we
+ * haven't already.
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -69,6 +69,11 @@ static void fixup_rt_mutex_waiters(struc
+ clear_rt_mutex_waiters(lock);
+ }
+
++static int rt_mutex_real_waiter(struct rt_mutex_waiter *waiter)
++{
++ return waiter && waiter != PI_WAKEUP_INPROGRESS;
++}
++
+ /*
+ * We can speed up the acquire/release, if the architecture
+ * supports cmpxchg and if there's no debugging state to be set up
+@@ -342,7 +347,8 @@ int max_lock_depth = 1024;
+
+ static inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p)
+ {
+- return p->pi_blocked_on ? p->pi_blocked_on->lock : NULL;
++ return rt_mutex_real_waiter(p->pi_blocked_on) ?
++ p->pi_blocked_on->lock : NULL;
+ }
+
+ /*
+@@ -479,7 +485,7 @@ static int rt_mutex_adjust_prio_chain(st
+ * reached or the state of the chain has changed while we
+ * dropped the locks.
+ */
+- if (!waiter)
++ if (!rt_mutex_real_waiter(waiter))
+ goto out_unlock_pi;
+
+ /*
+@@ -896,6 +902,23 @@ static int task_blocks_on_rt_mutex(struc
+ return -EDEADLK;
+
+ raw_spin_lock_irqsave(&task->pi_lock, flags);
++
++ /*
++ * In the case of futex requeue PI, this will be a proxy
++ * lock. The task will wake unaware that it is enqueueed on
++ * this lock. Avoid blocking on two locks and corrupting
++ * pi_blocked_on via the PI_WAKEUP_INPROGRESS
++ * flag. futex_wait_requeue_pi() sets this when it wakes up
++ * before requeue (due to a signal or timeout). Do not enqueue
++ * the task if PI_WAKEUP_INPROGRESS is set.
++ */
++ if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) {
++ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
++ return -EAGAIN;
++ }
++
++ BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on));
++
+ __rt_mutex_adjust_prio(task);
+ waiter->task = task;
+ waiter->lock = lock;
+@@ -919,7 +942,7 @@ static int task_blocks_on_rt_mutex(struc
+ rt_mutex_enqueue_pi(owner, waiter);
+
+ __rt_mutex_adjust_prio(owner);
+- if (owner->pi_blocked_on)
++ if (rt_mutex_real_waiter(owner->pi_blocked_on))
+ chain_walk = 1;
+ } else if (rt_mutex_cond_detect_deadlock(waiter, chwalk)) {
+ chain_walk = 1;
+@@ -1011,7 +1034,7 @@ static void remove_waiter(struct rt_mute
+ {
+ bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
+ struct task_struct *owner = rt_mutex_owner(lock);
+- struct rt_mutex *next_lock;
++ struct rt_mutex *next_lock = NULL;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&current->pi_lock, flags);
+@@ -1036,7 +1059,8 @@ static void remove_waiter(struct rt_mute
+ __rt_mutex_adjust_prio(owner);
+
+ /* Store the lock on which owner is blocked or NULL */
+- next_lock = task_blocked_on_lock(owner);
++ if (rt_mutex_real_waiter(owner->pi_blocked_on))
++ next_lock = task_blocked_on_lock(owner);
+
+ raw_spin_unlock_irqrestore(&owner->pi_lock, flags);
+
+@@ -1072,7 +1096,7 @@ void rt_mutex_adjust_pi(struct task_stru
+ raw_spin_lock_irqsave(&task->pi_lock, flags);
+
+ waiter = task->pi_blocked_on;
+- if (!waiter || (waiter->prio == task->prio &&
++ if (!rt_mutex_real_waiter(waiter) || (waiter->prio == task->prio &&
+ !dl_prio(task->prio))) {
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+ return;
+--- a/kernel/locking/rtmutex_common.h
++++ b/kernel/locking/rtmutex_common.h
+@@ -119,6 +119,8 @@ enum rtmutex_chainwalk {
+ /*
+ * PI-futex support (proxy locking functions, etc.):
+ */
++#define PI_WAKEUP_INPROGRESS ((struct rt_mutex_waiter *) 1)
++
+ extern struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock);
+ extern void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
+ struct task_struct *proxy_owner);
diff --git a/patches/rtmutex-lock-killable.patch b/patches/rtmutex-lock-killable.patch
new file mode 100644
index 00000000000000..7c8865764abc3a
--- /dev/null
+++ b/patches/rtmutex-lock-killable.patch
@@ -0,0 +1,51 @@
+Subject: rtmutex: Add rtmutex_lock_killable()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 09 Jun 2011 11:43:52 +0200
+
+Add "killable" type to rtmutex. We need this since rtmutex are used as
+"normal" mutexes which do use this type.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/rtmutex.h | 1 +
+ kernel/locking/rtmutex.c | 19 +++++++++++++++++++
+ 2 files changed, 20 insertions(+)
+
+--- a/include/linux/rtmutex.h
++++ b/include/linux/rtmutex.h
+@@ -91,6 +91,7 @@ extern void rt_mutex_destroy(struct rt_m
+
+ extern void rt_mutex_lock(struct rt_mutex *lock);
+ extern int rt_mutex_lock_interruptible(struct rt_mutex *lock);
++extern int rt_mutex_lock_killable(struct rt_mutex *lock);
+ extern int rt_mutex_timed_lock(struct rt_mutex *lock,
+ struct hrtimer_sleeper *timeout);
+
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -1442,6 +1442,25 @@ int rt_mutex_timed_futex_lock(struct rt_
+ }
+
+ /**
++ * rt_mutex_lock_killable - lock a rt_mutex killable
++ *
++ * @lock: the rt_mutex to be locked
++ * @detect_deadlock: deadlock detection on/off
++ *
++ * Returns:
++ * 0 on success
++ * -EINTR when interrupted by a signal
++ * -EDEADLK when the lock would deadlock (when deadlock detection is on)
++ */
++int __sched rt_mutex_lock_killable(struct rt_mutex *lock)
++{
++ might_sleep();
++
++ return rt_mutex_fastlock(lock, TASK_KILLABLE, rt_mutex_slowlock);
++}
++EXPORT_SYMBOL_GPL(rt_mutex_lock_killable);
++
++/**
+ * rt_mutex_timed_lock - lock a rt_mutex interruptible
+ * the timeout structure is provided
+ * by the caller
diff --git a/patches/sas-ata-isci-dont-t-disable-interrupts-in-qc_issue-h.patch b/patches/sas-ata-isci-dont-t-disable-interrupts-in-qc_issue-h.patch
new file mode 100644
index 00000000000000..8ec5a8e74817a5
--- /dev/null
+++ b/patches/sas-ata-isci-dont-t-disable-interrupts-in-qc_issue-h.patch
@@ -0,0 +1,78 @@
+From: Paul Gortmaker <paul.gortmaker@windriver.com>
+Date: Sat, 14 Feb 2015 11:01:16 -0500
+Subject: sas-ata/isci: dont't disable interrupts in qc_issue handler
+
+On 3.14-rt we see the following trace on Canoe Pass for
+SCSI_ISCI "Intel(R) C600 Series Chipset SAS Controller"
+when the sas qc_issue handler is run:
+
+ BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:905
+ in_atomic(): 0, irqs_disabled(): 1, pid: 432, name: udevd
+ CPU: 11 PID: 432 Comm: udevd Not tainted 3.14.28-rt22 #2
+ Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013
+ ffff880fab500000 ffff880fa9f239c0 ffffffff81a2d273 0000000000000000
+ ffff880fa9f239d8 ffffffff8107f023 ffff880faac23dc0 ffff880fa9f239f0
+ ffffffff81a33cc0 ffff880faaeb1400 ffff880fa9f23a40 ffffffff815de891
+ Call Trace:
+ [<ffffffff81a2d273>] dump_stack+0x4e/0x7a
+ [<ffffffff8107f023>] __might_sleep+0xe3/0x160
+ [<ffffffff81a33cc0>] rt_spin_lock+0x20/0x50
+ [<ffffffff815de891>] isci_task_execute_task+0x171/0x2f0 <-----
+ [<ffffffff815cfecb>] sas_ata_qc_issue+0x25b/0x2a0
+ [<ffffffff81606363>] ata_qc_issue+0x1f3/0x370
+ [<ffffffff8160c600>] ? ata_scsi_invalid_field+0x40/0x40
+ [<ffffffff8160c8f5>] ata_scsi_translate+0xa5/0x1b0
+ [<ffffffff8160efc6>] ata_sas_queuecmd+0x86/0x280
+ [<ffffffff815ce446>] sas_queuecommand+0x196/0x230
+ [<ffffffff81081fad>] ? get_parent_ip+0xd/0x50
+ [<ffffffff815b05a4>] scsi_dispatch_cmd+0xb4/0x210
+ [<ffffffff815b7744>] scsi_request_fn+0x314/0x530
+
+and gdb shows:
+
+(gdb) list * isci_task_execute_task+0x171
+0xffffffff815ddfb1 is in isci_task_execute_task (drivers/scsi/isci/task.c:138).
+133 dev_dbg(&ihost->pdev->dev, "%s: num=%d\n", __func__, num);
+134
+135 for_each_sas_task(num, task) {
+136 enum sci_status status = SCI_FAILURE;
+137
+138 spin_lock_irqsave(&ihost->scic_lock, flags); <-----
+139 idev = isci_lookup_device(task->dev);
+140 io_ready = isci_device_io_ready(idev, task);
+141 tag = isci_alloc_tag(ihost);
+142 spin_unlock_irqrestore(&ihost->scic_lock, flags);
+(gdb)
+
+In addition to the scic_lock, the function also contains locking of
+the task_state_lock -- which is clearly not a candidate for raw lock
+conversion. As can be seen by the comment nearby, we really should
+be running the qc_issue code with interrupts enabled anyway.
+
+
+Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/scsi/libsas/sas_ata.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/drivers/scsi/libsas/sas_ata.c
++++ b/drivers/scsi/libsas/sas_ata.c
+@@ -190,7 +190,7 @@ static unsigned int sas_ata_qc_issue(str
+ /* TODO: audit callers to ensure they are ready for qc_issue to
+ * unconditionally re-enable interrupts
+ */
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ spin_unlock(ap->lock);
+
+ /* If the device fell off, no sense in issuing commands */
+@@ -255,7 +255,7 @@ static unsigned int sas_ata_qc_issue(str
+
+ out:
+ spin_lock(ap->lock);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ return ret;
+ }
+
diff --git a/patches/sched-deadline-dl_task_timer-has-to-be-irqsafe.patch b/patches/sched-deadline-dl_task_timer-has-to-be-irqsafe.patch
new file mode 100644
index 00000000000000..7680aaa3249c4f
--- /dev/null
+++ b/patches/sched-deadline-dl_task_timer-has-to-be-irqsafe.patch
@@ -0,0 +1,22 @@
+From: Juri Lelli <juri.lelli@gmail.com>
+Date: Tue, 13 May 2014 15:30:20 +0200
+Subject: sched/deadline: dl_task_timer has to be irqsafe
+
+As for rt_period_timer, dl_task_timer has to be irqsafe.
+
+Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/sched/deadline.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/kernel/sched/deadline.c
++++ b/kernel/sched/deadline.c
+@@ -637,6 +637,7 @@ void init_dl_task_timer(struct sched_dl_
+
+ hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ timer->function = dl_task_timer;
++ timer->irqsafe = 1;
+ }
+
+ static
diff --git a/patches/sched-delay-put-task.patch b/patches/sched-delay-put-task.patch
new file mode 100644
index 00000000000000..2312781575fadc
--- /dev/null
+++ b/patches/sched-delay-put-task.patch
@@ -0,0 +1,81 @@
+Subject: sched: Move task_struct cleanup to RCU
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 31 May 2011 16:59:16 +0200
+
+__put_task_struct() does quite some expensive work. We don't want to
+burden random tasks with that.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/sched.h | 13 +++++++++++++
+ kernel/fork.c | 15 ++++++++++++++-
+ 2 files changed, 27 insertions(+), 1 deletion(-)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1788,6 +1788,9 @@ struct task_struct {
+ unsigned int sequential_io;
+ unsigned int sequential_io_avg;
+ #endif
++#ifdef CONFIG_PREEMPT_RT_BASE
++ struct rcu_head put_rcu;
++#endif
+ #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ unsigned long task_state_change;
+ #endif
+@@ -1982,6 +1985,15 @@ extern struct pid *cad_pid;
+ extern void free_task(struct task_struct *tsk);
+ #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++extern void __put_task_struct_cb(struct rcu_head *rhp);
++
++static inline void put_task_struct(struct task_struct *t)
++{
++ if (atomic_dec_and_test(&t->usage))
++ call_rcu(&t->put_rcu, __put_task_struct_cb);
++}
++#else
+ extern void __put_task_struct(struct task_struct *t);
+
+ static inline void put_task_struct(struct task_struct *t)
+@@ -1989,6 +2001,7 @@ static inline void put_task_struct(struc
+ if (atomic_dec_and_test(&t->usage))
+ __put_task_struct(t);
+ }
++#endif
+
+ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+ extern void task_cputime(struct task_struct *t,
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -244,7 +244,9 @@ static inline void put_signal_struct(str
+ if (atomic_dec_and_test(&sig->sigcnt))
+ free_signal_struct(sig);
+ }
+-
++#ifdef CONFIG_PREEMPT_RT_BASE
++static
++#endif
+ void __put_task_struct(struct task_struct *tsk)
+ {
+ WARN_ON(!tsk->exit_state);
+@@ -260,7 +262,18 @@ void __put_task_struct(struct task_struc
+ if (!profile_handoff_task(tsk))
+ free_task(tsk);
+ }
++#ifndef CONFIG_PREEMPT_RT_BASE
+ EXPORT_SYMBOL_GPL(__put_task_struct);
++#else
++void __put_task_struct_cb(struct rcu_head *rhp)
++{
++ struct task_struct *tsk = container_of(rhp, struct task_struct, put_rcu);
++
++ __put_task_struct(tsk);
++
++}
++EXPORT_SYMBOL_GPL(__put_task_struct_cb);
++#endif
+
+ void __init __weak arch_task_cache_init(void) { }
+
diff --git a/patches/sched-disable-rt-group-sched-on-rt.patch b/patches/sched-disable-rt-group-sched-on-rt.patch
new file mode 100644
index 00000000000000..b4ed5bc8e5def8
--- /dev/null
+++ b/patches/sched-disable-rt-group-sched-on-rt.patch
@@ -0,0 +1,28 @@
+Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 18 Jul 2011 17:03:52 +0200
+
+Carsten reported problems when running:
+
+ taskset 01 chrt -f 1 sleep 1
+
+from within rc.local on a F15 machine. The task stays running and
+never gets on the run queue because some of the run queues have
+rt_throttled=1 which does not go away. Works nice from a ssh login
+shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ init/Kconfig | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -1101,6 +1101,7 @@ config CFS_BANDWIDTH
+ config RT_GROUP_SCHED
+ bool "Group scheduling for SCHED_RR/FIFO"
+ depends on CGROUP_SCHED
++ depends on !PREEMPT_RT_FULL
+ default n
+ help
+ This feature lets you explicitly allocate real CPU bandwidth
diff --git a/patches/sched-disable-ttwu-queue.patch b/patches/sched-disable-ttwu-queue.patch
new file mode 100644
index 00000000000000..9bb14049d16359
--- /dev/null
+++ b/patches/sched-disable-ttwu-queue.patch
@@ -0,0 +1,31 @@
+Subject: sched: Disable TTWU_QUEUE on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 13 Sep 2011 16:42:35 +0200
+
+The queued remote wakeup mechanism can introduce rather large
+latencies if the number of migrated tasks is high. Disable it for RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/sched/features.h | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+--- a/kernel/sched/features.h
++++ b/kernel/sched/features.h
+@@ -50,11 +50,16 @@ SCHED_FEAT(LB_BIAS, true)
+ */
+ SCHED_FEAT(NONTASK_CAPACITY, true)
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++SCHED_FEAT(TTWU_QUEUE, false)
++#else
++
+ /*
+ * Queue remote wakeups on the target CPU and process them
+ * using the scheduler IPI. Reduces rq->lock contention/bounces.
+ */
+ SCHED_FEAT(TTWU_QUEUE, true)
++#endif
+
+ #ifdef HAVE_RT_PUSH_IPI
+ /*
diff --git a/patches/sched-limit-nr-migrate.patch b/patches/sched-limit-nr-migrate.patch
new file mode 100644
index 00000000000000..3c52ca37143bf5
--- /dev/null
+++ b/patches/sched-limit-nr-migrate.patch
@@ -0,0 +1,26 @@
+Subject: sched: Limit the number of task migrations per batch
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 06 Jun 2011 12:12:51 +0200
+
+Put an upper limit on the number of tasks which are migrated per batch
+to avoid large latencies.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/sched/core.c | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -282,7 +282,11 @@ late_initcall(sched_init_debug);
+ * Number of tasks to iterate in a single balance run.
+ * Limited because this is done with IRQs disabled.
+ */
++#ifndef CONFIG_PREEMPT_RT_FULL
+ const_debug unsigned int sysctl_sched_nr_migrate = 32;
++#else
++const_debug unsigned int sysctl_sched_nr_migrate = 8;
++#endif
+
+ /*
+ * period over which we average the RT time consumption, measured
diff --git a/patches/sched-might-sleep-do-not-account-rcu-depth.patch b/patches/sched-might-sleep-do-not-account-rcu-depth.patch
new file mode 100644
index 00000000000000..e8c15cedd78cb8
--- /dev/null
+++ b/patches/sched-might-sleep-do-not-account-rcu-depth.patch
@@ -0,0 +1,48 @@
+Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 07 Jun 2011 09:19:06 +0200
+
+RT changes the rcu_preempt_depth semantics, so we cannot check for it
+in might_sleep().
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/rcupdate.h | 7 +++++++
+ kernel/sched/core.c | 3 ++-
+ 2 files changed, 9 insertions(+), 1 deletion(-)
+
+--- a/include/linux/rcupdate.h
++++ b/include/linux/rcupdate.h
+@@ -260,6 +260,11 @@ void synchronize_rcu(void);
+ * types of kernel builds, the rcu_read_lock() nesting depth is unknowable.
+ */
+ #define rcu_preempt_depth() (current->rcu_read_lock_nesting)
++#ifndef CONFIG_PREEMPT_RT_FULL
++#define sched_rcu_preempt_depth() rcu_preempt_depth()
++#else
++static inline int sched_rcu_preempt_depth(void) { return 0; }
++#endif
+
+ #else /* #ifdef CONFIG_PREEMPT_RCU */
+
+@@ -283,6 +288,8 @@ static inline int rcu_preempt_depth(void
+ return 0;
+ }
+
++#define sched_rcu_preempt_depth() rcu_preempt_depth()
++
+ #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
+ /* Internal to kernel */
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -7496,7 +7496,8 @@ void __init sched_init(void)
+ #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ static inline int preempt_count_equals(int preempt_offset)
+ {
+- int nested = (preempt_count() & ~PREEMPT_ACTIVE) + rcu_preempt_depth();
++ int nested = (preempt_count() & ~PREEMPT_ACTIVE) +
++ sched_rcu_preempt_depth();
+
+ return (nested == preempt_offset);
+ }
diff --git a/patches/sched-mmdrop-delayed.patch b/patches/sched-mmdrop-delayed.patch
new file mode 100644
index 00000000000000..93ec1134dfa1ab
--- /dev/null
+++ b/patches/sched-mmdrop-delayed.patch
@@ -0,0 +1,133 @@
+Subject: sched: Move mmdrop to RCU on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 06 Jun 2011 12:20:33 +0200
+
+Takes sleeping locks and calls into the memory allocator, so nothing
+we want to do in task switch and oder atomic contexts.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/mm_types.h | 4 ++++
+ include/linux/sched.h | 12 ++++++++++++
+ kernel/fork.c | 13 +++++++++++++
+ kernel/sched/core.c | 18 ++++++++++++++++--
+ 4 files changed, 45 insertions(+), 2 deletions(-)
+
+--- a/include/linux/mm_types.h
++++ b/include/linux/mm_types.h
+@@ -11,6 +11,7 @@
+ #include <linux/completion.h>
+ #include <linux/cpumask.h>
+ #include <linux/uprobes.h>
++#include <linux/rcupdate.h>
+ #include <linux/page-flags-layout.h>
+ #include <asm/page.h>
+ #include <asm/mmu.h>
+@@ -462,6 +463,9 @@ struct mm_struct {
+ bool tlb_flush_pending;
+ #endif
+ struct uprobes_state uprobes_state;
++#ifdef CONFIG_PREEMPT_RT_BASE
++ struct rcu_head delayed_drop;
++#endif
+ #ifdef CONFIG_X86_INTEL_MPX
+ /* address of the bounds directory */
+ void __user *bd_addr;
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -2548,12 +2548,24 @@ extern struct mm_struct * mm_alloc(void)
+
+ /* mmdrop drops the mm and the page tables */
+ extern void __mmdrop(struct mm_struct *);
++
+ static inline void mmdrop(struct mm_struct * mm)
+ {
+ if (unlikely(atomic_dec_and_test(&mm->mm_count)))
+ __mmdrop(mm);
+ }
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++extern void __mmdrop_delayed(struct rcu_head *rhp);
++static inline void mmdrop_delayed(struct mm_struct *mm)
++{
++ if (atomic_dec_and_test(&mm->mm_count))
++ call_rcu(&mm->delayed_drop, __mmdrop_delayed);
++}
++#else
++# define mmdrop_delayed(mm) mmdrop(mm)
++#endif
++
+ /* mmput gets rid of the mappings and all user-space */
+ extern void mmput(struct mm_struct *);
+ /* Grab a reference to a task's mm, if it is not already going away */
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -693,6 +693,19 @@ void __mmdrop(struct mm_struct *mm)
+ }
+ EXPORT_SYMBOL_GPL(__mmdrop);
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++/*
++ * RCU callback for delayed mm drop. Not strictly rcu, but we don't
++ * want another facility to make this work.
++ */
++void __mmdrop_delayed(struct rcu_head *rhp)
++{
++ struct mm_struct *mm = container_of(rhp, struct mm_struct, delayed_drop);
++
++ __mmdrop(mm);
++}
++#endif
++
+ /*
+ * Decrement the use count and release all resources for an mm.
+ */
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2286,8 +2286,12 @@ static struct rq *finish_task_switch(str
+ finish_arch_post_lock_switch();
+
+ fire_sched_in_preempt_notifiers(current);
++ /*
++ * We use mmdrop_delayed() here so we don't have to do the
++ * full __mmdrop() when we are the last user.
++ */
+ if (mm)
+- mmdrop(mm);
++ mmdrop_delayed(mm);
+ if (unlikely(prev_state == TASK_DEAD)) {
+ if (prev->sched_class->task_dead)
+ prev->sched_class->task_dead(prev);
+@@ -5125,6 +5129,8 @@ static int migration_cpu_stop(void *data
+
+ #ifdef CONFIG_HOTPLUG_CPU
+
++static DEFINE_PER_CPU(struct mm_struct *, idle_last_mm);
++
+ /*
+ * Ensures that the idle task is using init_mm right before its cpu goes
+ * offline.
+@@ -5139,7 +5145,11 @@ void idle_task_exit(void)
+ switch_mm(mm, &init_mm, current);
+ finish_arch_post_lock_switch();
+ }
+- mmdrop(mm);
++ /*
++ * Defer the cleanup to an alive cpu. On RT we can neither
++ * call mmdrop() nor mmdrop_delayed() from here.
++ */
++ per_cpu(idle_last_mm, smp_processor_id()) = mm;
+ }
+
+ /*
+@@ -5482,6 +5492,10 @@ migration_call(struct notifier_block *nf
+
+ case CPU_DEAD:
+ calc_load_migrate(rq);
++ if (per_cpu(idle_last_mm, cpu)) {
++ mmdrop(per_cpu(idle_last_mm, cpu));
++ per_cpu(idle_last_mm, cpu) = NULL;
++ }
+ break;
+ #endif
+ }
diff --git a/patches/sched-rt-mutex-wakeup.patch b/patches/sched-rt-mutex-wakeup.patch
new file mode 100644
index 00000000000000..2eb7810ddd76a8
--- /dev/null
+++ b/patches/sched-rt-mutex-wakeup.patch
@@ -0,0 +1,93 @@
+Subject: sched: Add saved_state for tasks blocked on sleeping locks
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sat, 25 Jun 2011 09:21:04 +0200
+
+Spinlocks are state preserving in !RT. RT changes the state when a
+task gets blocked on a lock. So we need to remember the state before
+the lock contention. If a regular wakeup (not a RTmutex related
+wakeup) happens, the saved_state is updated to running. When the lock
+sleep is done, the saved state is restored.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/sched.h | 2 ++
+ kernel/sched/core.c | 31 ++++++++++++++++++++++++++++++-
+ kernel/sched/sched.h | 1 +
+ 3 files changed, 33 insertions(+), 1 deletion(-)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1335,6 +1335,7 @@ enum perf_event_task_context {
+
+ struct task_struct {
+ volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
++ volatile long saved_state; /* saved state for "spinlock sleepers" */
+ void *stack;
+ atomic_t usage;
+ unsigned int flags; /* per process flags, defined below */
+@@ -2432,6 +2433,7 @@ extern void xtime_update(unsigned long t
+
+ extern int wake_up_state(struct task_struct *tsk, unsigned int state);
+ extern int wake_up_process(struct task_struct *tsk);
++extern int wake_up_lock_sleeper(struct task_struct * tsk);
+ extern void wake_up_new_task(struct task_struct *tsk);
+ #ifdef CONFIG_SMP
+ extern void kick_process(struct task_struct *tsk);
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1721,8 +1721,25 @@ try_to_wake_up(struct task_struct *p, un
+ */
+ smp_mb__before_spinlock();
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
+- if (!(p->state & state))
++ if (!(p->state & state)) {
++ /*
++ * The task might be running due to a spinlock sleeper
++ * wakeup. Check the saved state and set it to running
++ * if the wakeup condition is true.
++ */
++ if (!(wake_flags & WF_LOCK_SLEEPER)) {
++ if (p->saved_state & state)
++ p->saved_state = TASK_RUNNING;
++ }
+ goto out;
++ }
++
++ /*
++ * If this is a regular wakeup, then we can unconditionally
++ * clear the saved state of a "lock sleeper".
++ */
++ if (!(wake_flags & WF_LOCK_SLEEPER))
++ p->saved_state = TASK_RUNNING;
+
+ success = 1; /* we're going to change ->state */
+ cpu = task_cpu(p);
+@@ -1819,6 +1836,18 @@ int wake_up_process(struct task_struct *
+ }
+ EXPORT_SYMBOL(wake_up_process);
+
++/**
++ * wake_up_lock_sleeper - Wake up a specific process blocked on a "sleeping lock"
++ * @p: The process to be woken up.
++ *
++ * Same as wake_up_process() above, but wake_flags=WF_LOCK_SLEEPER to indicate
++ * the nature of the wakeup.
++ */
++int wake_up_lock_sleeper(struct task_struct *p)
++{
++ return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
++}
++
+ int wake_up_state(struct task_struct *p, unsigned int state)
+ {
+ return try_to_wake_up(p, state, 0);
+--- a/kernel/sched/sched.h
++++ b/kernel/sched/sched.h
+@@ -1092,6 +1092,7 @@ static inline void finish_lock_switch(st
+ #define WF_SYNC 0x01 /* waker goes to sleep after wakeup */
+ #define WF_FORK 0x02 /* child wakeup after fork */
+ #define WF_MIGRATED 0x4 /* internal use, task got migrated */
++#define WF_LOCK_SLEEPER 0x08 /* wakeup spinlock "sleeper" */
+
+ /*
+ * To aid in avoiding the subversion of "niceness" due to uneven distribution
diff --git a/patches/sched-ttwu-ensure-success-return-is-correct.patch b/patches/sched-ttwu-ensure-success-return-is-correct.patch
new file mode 100644
index 00000000000000..5ec55b369b2ce4
--- /dev/null
+++ b/patches/sched-ttwu-ensure-success-return-is-correct.patch
@@ -0,0 +1,34 @@
+Subject: sched: ttwu: Return success when only changing the saved_state value
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 13 Dec 2011 21:42:19 +0100
+
+When a task blocks on a rt lock, it saves the current state in
+p->saved_state, so a lock related wake up will not destroy the
+original state.
+
+When a real wakeup happens, while the task is running due to a lock
+wakeup already, we update p->saved_state to TASK_RUNNING, but we do
+not return success, which might cause another wakeup in the waitqueue
+code and the task remains in the waitqueue list. Return success in
+that case as well.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/sched/core.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1728,8 +1728,10 @@ try_to_wake_up(struct task_struct *p, un
+ * if the wakeup condition is true.
+ */
+ if (!(wake_flags & WF_LOCK_SLEEPER)) {
+- if (p->saved_state & state)
++ if (p->saved_state & state) {
+ p->saved_state = TASK_RUNNING;
++ success = 1;
++ }
+ }
+ goto out;
+ }
diff --git a/patches/sched-workqueue-Only-wake-up-idle-workers-if-not-blo.patch b/patches/sched-workqueue-Only-wake-up-idle-workers-if-not-blo.patch
new file mode 100644
index 00000000000000..ba81c247424c4b
--- /dev/null
+++ b/patches/sched-workqueue-Only-wake-up-idle-workers-if-not-blo.patch
@@ -0,0 +1,37 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Mon, 18 Mar 2013 15:12:49 -0400
+Subject: sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lock
+
+In -rt, most spin_locks() turn into mutexes. One of these spin_lock
+conversions is performed on the workqueue gcwq->lock. When the idle
+worker is worken, the first thing it will do is grab that same lock and
+it too will block, possibly jumping into the same code, but because
+nr_running would already be decremented it prevents an infinite loop.
+
+But this is still a waste of CPU cycles, and it doesn't follow the method
+of mainline, as new workers should only be woken when a worker thread is
+truly going to sleep, and not just blocked on a spin_lock().
+
+Check the saved_state too before waking up new workers.
+
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/sched/core.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2981,8 +2981,10 @@ static void __sched __schedule(void)
+ * If a worker went to sleep, notify and ask workqueue
+ * whether it wants to wake up a task to maintain
+ * concurrency.
++ * Only call wake up if prev isn't blocked on a sleeping
++ * spin lock.
+ */
+- if (prev->flags & PF_WQ_WORKER) {
++ if (prev->flags & PF_WQ_WORKER && !prev->saved_state) {
+ struct task_struct *to_wakeup;
+
+ to_wakeup = wq_worker_sleeping(prev, cpu);
diff --git a/patches/scsi-fcoe-rt-aware.patch b/patches/scsi-fcoe-rt-aware.patch
new file mode 100644
index 00000000000000..f061d17348be0c
--- /dev/null
+++ b/patches/scsi-fcoe-rt-aware.patch
@@ -0,0 +1,114 @@
+Subject: scsi/fcoe: Make RT aware.
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sat, 12 Nov 2011 14:00:48 +0100
+
+Do not disable preemption while taking sleeping locks. All user look safe
+for migrate_diable() only.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/scsi/fcoe/fcoe.c | 18 +++++++++---------
+ drivers/scsi/fcoe/fcoe_ctlr.c | 4 ++--
+ drivers/scsi/libfc/fc_exch.c | 4 ++--
+ 3 files changed, 13 insertions(+), 13 deletions(-)
+
+--- a/drivers/scsi/fcoe/fcoe.c
++++ b/drivers/scsi/fcoe/fcoe.c
+@@ -1287,7 +1287,7 @@ static void fcoe_percpu_thread_destroy(u
+ struct sk_buff *skb;
+ #ifdef CONFIG_SMP
+ struct fcoe_percpu_s *p0;
+- unsigned targ_cpu = get_cpu();
++ unsigned targ_cpu = get_cpu_light();
+ #endif /* CONFIG_SMP */
+
+ FCOE_DBG("Destroying receive thread for CPU %d\n", cpu);
+@@ -1343,7 +1343,7 @@ static void fcoe_percpu_thread_destroy(u
+ kfree_skb(skb);
+ spin_unlock_bh(&p->fcoe_rx_list.lock);
+ }
+- put_cpu();
++ put_cpu_light();
+ #else
+ /*
+ * This a non-SMP scenario where the singular Rx thread is
+@@ -1567,11 +1567,11 @@ static int fcoe_rcv(struct sk_buff *skb,
+ static int fcoe_alloc_paged_crc_eof(struct sk_buff *skb, int tlen)
+ {
+ struct fcoe_percpu_s *fps;
+- int rc;
++ int rc, cpu = get_cpu_light();
+
+- fps = &get_cpu_var(fcoe_percpu);
++ fps = &per_cpu(fcoe_percpu, cpu);
+ rc = fcoe_get_paged_crc_eof(skb, tlen, fps);
+- put_cpu_var(fcoe_percpu);
++ put_cpu_light();
+
+ return rc;
+ }
+@@ -1767,11 +1767,11 @@ static inline int fcoe_filter_frames(str
+ return 0;
+ }
+
+- stats = per_cpu_ptr(lport->stats, get_cpu());
++ stats = per_cpu_ptr(lport->stats, get_cpu_light());
+ stats->InvalidCRCCount++;
+ if (stats->InvalidCRCCount < 5)
+ printk(KERN_WARNING "fcoe: dropping frame with CRC error\n");
+- put_cpu();
++ put_cpu_light();
+ return -EINVAL;
+ }
+
+@@ -1847,13 +1847,13 @@ static void fcoe_recv_frame(struct sk_bu
+ goto drop;
+
+ if (!fcoe_filter_frames(lport, fp)) {
+- put_cpu();
++ put_cpu_light();
+ fc_exch_recv(lport, fp);
+ return;
+ }
+ drop:
+ stats->ErrorFrames++;
+- put_cpu();
++ put_cpu_light();
+ kfree_skb(skb);
+ }
+
+--- a/drivers/scsi/fcoe/fcoe_ctlr.c
++++ b/drivers/scsi/fcoe/fcoe_ctlr.c
+@@ -831,7 +831,7 @@ static unsigned long fcoe_ctlr_age_fcfs(
+
+ INIT_LIST_HEAD(&del_list);
+
+- stats = per_cpu_ptr(fip->lp->stats, get_cpu());
++ stats = per_cpu_ptr(fip->lp->stats, get_cpu_light());
+
+ list_for_each_entry_safe(fcf, next, &fip->fcfs, list) {
+ deadline = fcf->time + fcf->fka_period + fcf->fka_period / 2;
+@@ -867,7 +867,7 @@ static unsigned long fcoe_ctlr_age_fcfs(
+ sel_time = fcf->time;
+ }
+ }
+- put_cpu();
++ put_cpu_light();
+
+ list_for_each_entry_safe(fcf, next, &del_list, list) {
+ /* Removes fcf from current list */
+--- a/drivers/scsi/libfc/fc_exch.c
++++ b/drivers/scsi/libfc/fc_exch.c
+@@ -816,10 +816,10 @@ static struct fc_exch *fc_exch_em_alloc(
+ }
+ memset(ep, 0, sizeof(*ep));
+
+- cpu = get_cpu();
++ cpu = get_cpu_light();
+ pool = per_cpu_ptr(mp->pool, cpu);
+ spin_lock_bh(&pool->lock);
+- put_cpu();
++ put_cpu_light();
+
+ /* peek cache of free slot */
+ if (pool->left != FC_XID_UNKNOWN) {
diff --git a/patches/scsi-qla2xxx-fix-bug-sleeping-function-called-from-invalid-context.patch b/patches/scsi-qla2xxx-fix-bug-sleeping-function-called-from-invalid-context.patch
new file mode 100644
index 00000000000000..64a7596854dcf9
--- /dev/null
+++ b/patches/scsi-qla2xxx-fix-bug-sleeping-function-called-from-invalid-context.patch
@@ -0,0 +1,47 @@
+Subject: scsi: qla2xxx: Use local_irq_save_nort() in qla2x00_poll
+From: John Kacur <jkacur@redhat.com>
+Date: Fri, 27 Apr 2012 12:48:46 +0200
+
+RT triggers the following:
+
+[ 11.307652] [<ffffffff81077b27>] __might_sleep+0xe7/0x110
+[ 11.307663] [<ffffffff8150e524>] rt_spin_lock+0x24/0x60
+[ 11.307670] [<ffffffff8150da78>] ? rt_spin_lock_slowunlock+0x78/0x90
+[ 11.307703] [<ffffffffa0272d83>] qla24xx_intr_handler+0x63/0x2d0 [qla2xxx]
+[ 11.307736] [<ffffffffa0262307>] qla2x00_poll+0x67/0x90 [qla2xxx]
+
+Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler
+which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed
+to call them with interrupts disabled. Therefore we use local_irq_save_nort()
+instead which saves flags without disabling interrupts.
+
+This fix needs to be applied to v3.0-rt, v3.2-rt and v3.4-rt
+
+Suggested-by: Thomas Gleixner
+Signed-off-by: John Kacur <jkacur@redhat.com>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+Cc: David Sommerseth <davids@redhat.com>
+Link: http://lkml.kernel.org/r/1335523726-10024-1-git-send-email-jkacur@redhat.com
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ drivers/scsi/qla2xxx/qla_inline.h | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/drivers/scsi/qla2xxx/qla_inline.h
++++ b/drivers/scsi/qla2xxx/qla_inline.h
+@@ -59,12 +59,12 @@ qla2x00_poll(struct rsp_que *rsp)
+ {
+ unsigned long flags;
+ struct qla_hw_data *ha = rsp->hw;
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ if (IS_P3P_TYPE(ha))
+ qla82xx_poll(0, rsp);
+ else
+ ha->isp_ops->intr_handler(0, rsp);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+
+ static inline uint8_t *
diff --git a/patches/seqlock-prevent-rt-starvation.patch b/patches/seqlock-prevent-rt-starvation.patch
new file mode 100644
index 00000000000000..a83fee4a2b8a8d
--- /dev/null
+++ b/patches/seqlock-prevent-rt-starvation.patch
@@ -0,0 +1,190 @@
+Subject: seqlock: Prevent rt starvation
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 22 Feb 2012 12:03:30 +0100
+
+If a low prio writer gets preempted while holding the seqlock write
+locked, a high prio reader spins forever on RT.
+
+To prevent this let the reader grab the spinlock, so it blocks and
+eventually boosts the writer. This way the writer can proceed and
+endless spinning is prevented.
+
+For seqcount writers we disable preemption over the update code
+path. Thanks to Al Viro for distangling some VFS code to make that
+possible.
+
+Nicholas Mc Guire:
+- spin_lock+unlock => spin_unlock_wait
+- __write_seqcount_begin => __raw_write_seqcount_begin
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+
+---
+ include/linux/seqlock.h | 56 +++++++++++++++++++++++++++++++++++++-----------
+ include/net/dst.h | 2 -
+ include/net/neighbour.h | 4 +--
+ 3 files changed, 47 insertions(+), 15 deletions(-)
+
+--- a/include/linux/seqlock.h
++++ b/include/linux/seqlock.h
+@@ -219,20 +219,30 @@ static inline int read_seqcount_retry(co
+ return __read_seqcount_retry(s, start);
+ }
+
+-
+-
+-static inline void raw_write_seqcount_begin(seqcount_t *s)
++static inline void __raw_write_seqcount_begin(seqcount_t *s)
+ {
+ s->sequence++;
+ smp_wmb();
+ }
+
+-static inline void raw_write_seqcount_end(seqcount_t *s)
++static inline void raw_write_seqcount_begin(seqcount_t *s)
++{
++ preempt_disable_rt();
++ __raw_write_seqcount_begin(s);
++}
++
++static inline void __raw_write_seqcount_end(seqcount_t *s)
+ {
+ smp_wmb();
+ s->sequence++;
+ }
+
++static inline void raw_write_seqcount_end(seqcount_t *s)
++{
++ __raw_write_seqcount_end(s);
++ preempt_enable_rt();
++}
++
+ /*
+ * raw_write_seqcount_latch - redirect readers to even/odd copy
+ * @s: pointer to seqcount_t
+@@ -305,10 +315,32 @@ typedef struct {
+ /*
+ * Read side functions for starting and finalizing a read side section.
+ */
++#ifndef CONFIG_PREEMPT_RT_FULL
+ static inline unsigned read_seqbegin(const seqlock_t *sl)
+ {
+ return read_seqcount_begin(&sl->seqcount);
+ }
++#else
++/*
++ * Starvation safe read side for RT
++ */
++static inline unsigned read_seqbegin(seqlock_t *sl)
++{
++ unsigned ret;
++
++repeat:
++ ret = ACCESS_ONCE(sl->seqcount.sequence);
++ if (unlikely(ret & 1)) {
++ /*
++ * Take the lock and let the writer proceed (i.e. evtl
++ * boost it), otherwise we could loop here forever.
++ */
++ spin_unlock_wait(&sl->lock);
++ goto repeat;
++ }
++ return ret;
++}
++#endif
+
+ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
+ {
+@@ -323,36 +355,36 @@ static inline unsigned read_seqretry(con
+ static inline void write_seqlock(seqlock_t *sl)
+ {
+ spin_lock(&sl->lock);
+- write_seqcount_begin(&sl->seqcount);
++ __raw_write_seqcount_begin(&sl->seqcount);
+ }
+
+ static inline void write_sequnlock(seqlock_t *sl)
+ {
+- write_seqcount_end(&sl->seqcount);
++ __raw_write_seqcount_end(&sl->seqcount);
+ spin_unlock(&sl->lock);
+ }
+
+ static inline void write_seqlock_bh(seqlock_t *sl)
+ {
+ spin_lock_bh(&sl->lock);
+- write_seqcount_begin(&sl->seqcount);
++ __raw_write_seqcount_begin(&sl->seqcount);
+ }
+
+ static inline void write_sequnlock_bh(seqlock_t *sl)
+ {
+- write_seqcount_end(&sl->seqcount);
++ __raw_write_seqcount_end(&sl->seqcount);
+ spin_unlock_bh(&sl->lock);
+ }
+
+ static inline void write_seqlock_irq(seqlock_t *sl)
+ {
+ spin_lock_irq(&sl->lock);
+- write_seqcount_begin(&sl->seqcount);
++ __raw_write_seqcount_begin(&sl->seqcount);
+ }
+
+ static inline void write_sequnlock_irq(seqlock_t *sl)
+ {
+- write_seqcount_end(&sl->seqcount);
++ __raw_write_seqcount_end(&sl->seqcount);
+ spin_unlock_irq(&sl->lock);
+ }
+
+@@ -361,7 +393,7 @@ static inline unsigned long __write_seql
+ unsigned long flags;
+
+ spin_lock_irqsave(&sl->lock, flags);
+- write_seqcount_begin(&sl->seqcount);
++ __raw_write_seqcount_begin(&sl->seqcount);
+ return flags;
+ }
+
+@@ -371,7 +403,7 @@ static inline unsigned long __write_seql
+ static inline void
+ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
+ {
+- write_seqcount_end(&sl->seqcount);
++ __raw_write_seqcount_end(&sl->seqcount);
+ spin_unlock_irqrestore(&sl->lock, flags);
+ }
+
+--- a/include/net/dst.h
++++ b/include/net/dst.h
+@@ -403,7 +403,7 @@ static inline void dst_confirm(struct ds
+ static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
+ struct sk_buff *skb)
+ {
+- const struct hh_cache *hh;
++ struct hh_cache *hh;
+
+ if (dst->pending_confirm) {
+ unsigned long now = jiffies;
+--- a/include/net/neighbour.h
++++ b/include/net/neighbour.h
+@@ -445,7 +445,7 @@ static inline int neigh_hh_bridge(struct
+ }
+ #endif
+
+-static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb)
++static inline int neigh_hh_output(struct hh_cache *hh, struct sk_buff *skb)
+ {
+ unsigned int seq;
+ int hh_len;
+@@ -500,7 +500,7 @@ struct neighbour_cb {
+
+ #define NEIGH_CB(skb) ((struct neighbour_cb *)(skb)->cb)
+
+-static inline void neigh_ha_snapshot(char *dst, const struct neighbour *n,
++static inline void neigh_ha_snapshot(char *dst, struct neighbour *n,
+ const struct net_device *dev)
+ {
+ unsigned int seq;
diff --git a/patches/series b/patches/series
new file mode 100644
index 00000000000000..96745dc20015b3
--- /dev/null
+++ b/patches/series
@@ -0,0 +1,571 @@
+###########################################################
+# DELTA against a known Linus release
+###########################################################
+
+############################################################
+# UPSTREAM changes queued
+############################################################
+
+############################################################
+# UPSTREAM FIXES, patches pending
+############################################################
+0001-arm64-Mark-PMU-interrupt-IRQF_NO_THREAD.patch
+0002-arm64-Allow-forced-irq-threading.patch
+0001-uaccess-count-pagefault_disable-levels-in-pagefault_.patch
+0002-mm-uaccess-trigger-might_sleep-in-might_fault-with-d.patch
+0003-uaccess-clarify-that-uaccess-may-only-sleep-if-pagef.patch
+0004-mm-explicitly-disable-enable-preemption-in-kmap_atom.patch
+0005-mips-kmap_coherent-relies-on-disabled-preemption.patch
+0006-mm-use-pagefault_disable-to-check-for-disabled-pagef.patch
+0007-drm-i915-use-pagefault_disabled-to-check-for-disable.patch
+0008-futex-UP-futex_atomic_op_inuser-relies-on-disabled-p.patch
+0009-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on-dis.patch
+0010-arm-futex-UP-futex_atomic_cmpxchg_inatomic-relies-on.patch
+0011-arm-futex-UP-futex_atomic_op_inuser-relies-on-disabl.patch
+0012-futex-clarify-that-preemption-doesn-t-have-to-be-dis.patch
+0013-mips-properly-lock-access-to-the-fpu.patch
+0014-uaccess-decouple-preemption-from-the-pagefault-logic.patch
+0001-sched-Implement-lockless-wake-queues.patch
+0002-futex-Implement-lockless-wakeups.patch
+0004-ipc-mqueue-Implement-lockless-pipelined-wakeups.patch
+mm-slub-move-slab-initialization-into-irq-enabled-region.patch
+
+############################################################
+# Stuff broken upstream, patches submitted
+############################################################
+
+############################################################
+# Stuff which needs addressing upstream, but requires more
+# information
+############################################################
+
+############################################################
+# Stuff broken upstream, need to be sent
+############################################################
+
+############################################################
+# Submitted on LKML
+############################################################
+
+# SPARC part of erly printk consolidation
+sparc64-use-generic-rwsem-spinlocks-rt.patch
+
+# SRCU
+kernel-SRCU-provide-a-static-initializer.patch
+
+############################################################
+# Submitted to mips ML
+############################################################
+
+############################################################
+# Submitted to ARM ML
+############################################################
+
+############################################################
+# Submitted to PPC ML
+############################################################
+
+############################################################
+# Submitted on LKML
+############################################################
+
+############################################################
+# Submitted to net-dev
+############################################################
+
+############################################################
+# Pending in tip
+############################################################
+
+############################################################
+# Stuff which should go upstream ASAP
+############################################################
+gpio-omap-use-raw-locks-for-locking.patch
+
+# SCHED BLOCK/WQ
+block-shorten-interrupt-disabled-regions.patch
+
+# Timekeeping split jiffies lock. Needs a good argument :)
+timekeeping-split-jiffies-lock.patch
+
+# CHECKME: Should local_irq_enable() generally do a preemption check ?
+vtime-split-lock-and-seqcount.patch
+
+# Tracing
+tracing-account-for-preempt-off-in-preempt_schedule.patch
+
+# PTRACE/SIGNAL crap
+signal-revert-ptrace-preempt-magic.patch
+
+# ARM lock annotation
+arm-convert-boot-lock-to-raw.patch
+
+# PREEMPT_ENABLE_NO_RESCHED
+
+# SIGNALS / POSIXTIMERS
+posix-timers-no-broadcast.patch
+signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
+
+# SCHED
+
+# GENERIC CMPXCHG
+
+# SHORTEN PREEMPT DISABLED
+drivers-random-reduce-preempt-disabled-region.patch
+
+# CLOCKSOURCE
+arm-at91-pit-remove-irq-handler-when-clock-is-unused.patch
+clocksource-tclib-allow-higher-clockrates.patch
+
+# DRIVERS NET
+drivers-net-8139-disable-irq-nosync.patch
+
+# PREEMPT
+
+# PM
+suspend-prevernt-might-sleep-splats.patch
+
+# NETWORKING
+net-prevent-abba-deadlock.patch
+net-sched-dev_deactivate_many-use-msleep-1-instead-o.patch
+
+# X86
+x86-io-apic-migra-no-unmask.patch
+fix-rt-int3-x86_32-3.2-rt.patch
+
+# RCU
+
+# LOCKING INIT FIXES
+
+# PCI
+pci-access-use-__wake_up_all_locked.patch
+
+# WORKQUEUE
+
+
+#####################################################
+# Stuff which should go mainline, but wants some care
+#####################################################
+futex-avoid-double-wake-up-in-PI-futex-wait-wake-on-.patch
+
+# SEQLOCK
+
+# ANON RW SEMAPHORES
+
+# TRACING
+latency-hist.patch
+
+# HW LATENCY DETECTOR - this really wants a rewrite
+hwlatdetect.patch
+hwlat-detector-Update-hwlat_detector-to-add-outer-lo.patch
+hwlat-detector-Use-trace_clock_local-if-available.patch
+hwlat-detector-Use-thread-instead-of-stop-machine.patch
+hwlat-detector-Don-t-ignore-threshold-module-paramet.patch
+
+##################################################
+# REAL RT STUFF starts here
+##################################################
+
+# PRINTK
+printk-kill.patch
+printk-27force_early_printk-27-boot-param-to-help-with-debugging.patch
+
+# Enable RT CONFIG
+rt-preempt-base-config.patch
+kconfig-disable-a-few-options-rt.patch
+kconfig-preempt-rt-full.patch
+
+# WARN/BUG_ON_RT
+bug-rt-dependend-variants.patch
+
+# LOCAL_IRQ_RT/NON_RT
+local-irq-rt-depending-variants.patch
+
+# PREEMPT NORT
+preempt-nort-rt-variants.patch
+
+# local locks & migrate disable
+introduce_migrate_disable_cpu_light.patch
+rt-local-irq-lock.patch
+
+# ANNOTATE local_irq_disable sites
+ata-disable-interrupts-if-non-rt.patch
+ide-use-nort-local-irq-variants.patch
+infiniband-mellanox-ib-use-nort-irq.patch
+inpt-gameport-use-local-irq-nort.patch
+user-use-local-irq-nort.patch
+usb-use-_nort-in-giveback.patch
+mm-scatterlist-dont-disable-irqs-on-RT.patch
+mm-workingset-do-not-protect-workingset_shadow_nodes.patch
+
+# Sigh
+signal-fix-up-rcu-wreckage.patch
+oleg-signal-rt-fix.patch
+
+# ANNOTATE BUG/WARNON
+net-wireless-warn-nort.patch
+
+# BIT SPINLOCKS - SIGH
+fs-replace-bh_uptodate_lock-for-rt.patch
+fs-jbd-replace-bh_state-lock.patch
+
+# GENIRQ
+list_bl.h-make-list-head-locking-RT-safe.patch
+genirq-disable-irqpoll-on-rt.patch
+genirq-force-threading.patch
+genirq-do-not-invoke-the-affinity-callback-via-a-wor.patch
+
+# DRIVERS NET
+drivers-net-fix-livelock-issues.patch
+drivers-net-vortex-fix-locking-issues.patch
+net-gianfar-do-not-disable-interrupts.patch
+
+# MM PAGE_ALLOC
+mm-page_alloc-rt-friendly-per-cpu-pages.patch
+mm-page_alloc-reduce-lock-sections-further.patch
+
+# MM SWAP
+mm-convert-swap-to-percpu-locked.patch
+
+# MM vmstat
+mm-make-vmstat-rt-aware.patch
+
+# MM memory
+re-preempt_rt_full-arm-coredump-fails-for-cpu-3e-3d-4.patch
+
+# MM bounce
+mm-bounce-local-irq-save-nort.patch
+
+# MM SLxB
+mm-disable-sloub-rt.patch
+mm-enable-slub.patch
+slub-enable-irqs-for-no-wait.patch
+slub-disable-SLUB_CPU_PARTIAL.patch
+
+# MM
+mm-page-alloc-use-local-lock-on-target-cpu.patch
+mm-memcontrol-Don-t-call-schedule_work_on-in-preempt.patch
+mm-memcontrol-do_not_disable_irq.patch
+
+# RADIX TREE
+radix-tree-rt-aware.patch
+
+# PANIC
+panic-disable-random-on-rt.patch
+
+# IPC
+ipc-make-rt-aware.patch
+
+# RELAY
+relay-fix-timer-madness.patch
+
+# NETWORKING
+
+# WORKQUEUE SIGH
+
+# TIMERS
+timers-prepare-for-full-preemption.patch
+timers-preempt-rt-support.patch
+timer-delay-waking-softirqs-from-the-jiffy-tick.patch
+timers-avoid-the-base-null-otptimization-on-rt.patch
+
+# HRTIMERS
+hrtimers-prepare-full-preemption.patch
+hrtimer-fixup-hrtimer-callback-changes-for-preempt-r.patch
+sched-deadline-dl_task_timer-has-to-be-irqsafe.patch
+timer-fd-avoid-live-lock.patch
+hrtimer-raise-softirq-if-hrtimer-irq-stalled.patch
+hrtimer-Move-schedule_work-call-to-helper-thread.patch
+
+# POSIX-CPU-TIMERS
+posix-timers-thread-posix-cpu-timers-on-rt.patch
+
+# SCHEDULER
+sched-delay-put-task.patch
+sched-limit-nr-migrate.patch
+sched-mmdrop-delayed.patch
+sched-rt-mutex-wakeup.patch
+sched-might-sleep-do-not-account-rcu-depth.patch
+cond-resched-softirq-rt.patch
+cond-resched-lock-rt-tweak.patch
+sched-disable-ttwu-queue.patch
+sched-disable-rt-group-sched-on-rt.patch
+sched-ttwu-ensure-success-return-is-correct.patch
+sched-workqueue-Only-wake-up-idle-workers-if-not-blo.patch
+
+# STOP MACHINE
+stop_machine-convert-stop_machine_run-to-PREEMPT_RT.patch
+stop-machine-raw-lock.patch
+
+# MIGRATE DISABLE AND PER CPU
+hotplug-light-get-online-cpus.patch
+hotplug-sync_unplug-no-27-5cn-27-in-task-name.patch
+re-migrate_disable-race-with-cpu-hotplug-3f.patch
+ftrace-migrate-disable-tracing.patch
+hotplug-use-migrate-disable.patch
+
+# NETWORKING
+sunrpc-make-svc_xprt_do_enqueue-use-get_cpu_light.patch
+
+# NOHZ
+
+# LOCKDEP
+lockdep-no-softirq-accounting-on-rt.patch
+
+# SOFTIRQ
+mutex-no-spin-on-rt.patch
+tasklet-rt-prevent-tasklets-from-going-into-infinite-spin-in-rt.patch
+softirq-preempt-fix-3-re.patch
+softirq-disable-softirq-stacks-for-rt.patch
+softirq-split-locks.patch
+irq-allow-disabling-of-softirq-processing-in-irq-thread-context.patch
+
+# RAID5
+md-raid5-percpu-handling-rt-aware.patch
+
+# FUTEX/RTMUTEX
+rtmutex-futex-prepare-rt.patch
+futex-requeue-pi-fix.patch
+0005-futex-Ensure-lock-unlock-symetry-versus-pi_lock-and-.patch
+
+# RTMUTEX
+rtmutex-lock-killable.patch
+spinlock-types-separate-raw.patch
+rtmutex-avoid-include-hell.patch
+rt-add-rt-locks.patch
+rtmutex-add-a-first-shot-of-ww_mutex.patch
+
+ptrace-fix-ptrace-vs-tasklist_lock-race.patch
+
+# RTMUTEX Fallout
+tasklist-lock-fix-section-conflict.patch
+
+# RCU
+peter_zijlstra-frob-rcu.patch
+rcu-merge-rcu-bh-into-rcu-preempt-for-rt.patch
+patch-to-introduce-rcu-bh-qs-where-safe-from-softirq.patch
+rcutree-rcu_bh_qs-disable-irq-while-calling-rcu_pree.patch
+
+# LGLOCKS - lovely
+lglocks-rt.patch
+
+# STOP machine (depend on lglock & rtmutex)
+stomp-machine-create-lg_global_trylock_relax-primiti.patch
+stomp-machine-use-lg_global_trylock_relax-to-dead-wi.patch
+
+# DRIVERS SERIAL
+drivers-tty-fix-omap-lock-crap.patch
+drivers-tty-pl011-irq-disable-madness.patch
+rt-serial-warn-fix.patch
+
+# SIMPLE WAITQUEUE
+wait.h-include-atomic.h.patch
+wait-simple-implementation.patch
+work-simple-Simple-work-queue-implemenation.patch
+rcu-more-swait-conversions.patch
+completion-use-simple-wait-queues.patch
+fs-aio-simple-simple-work.patch
+
+# FS
+fs-namespace-preemption-fix.patch
+mm-protect-activate-switch-mm.patch
+fs-block-rt-support.patch
+fs-ntfs-disable-interrupt-non-rt.patch
+fs-jbd-pull-plug-when-waiting-for-space.patch
+fs-jbd2-pull-your-plug-when-waiting-for-space.patch
+
+# X86
+x86-mce-timer-hrtimer.patch
+x86-mce-use-swait-queue-for-mce-wakeups.patch
+x86-stackprot-no-random-on-rt.patch
+x86-use-gen-rwsem-spinlocks-rt.patch
+x86-UV-raw_spinlock-conversion.patch
+thermal-Defer-thermal-wakups-to-threads.patch
+
+# CPU get light
+epoll-use-get-cpu-light.patch
+mm-vmalloc-use-get-cpu-light.patch
+block-mq-use-cpu_light.patch
+block-mq-drop-preempt-disable.patch
+block-mq-don-t-complete-requests-via-IPI.patch
+
+# CPU CHILL
+rt-introduce-cpu-chill.patch
+cpu_chill-Add-a-UNINTERRUPTIBLE-hrtimer_nanosleep.patch
+
+# block
+blk-mq-revert-raw-locks-post-pone-notifier-to-POST_D.patchto-POST_D.patch
+block-blk-mq-use-swait.patch
+block-mq-drop-per-ctx-cpu_lock.patch
+
+# BLOCK LIVELOCK PREVENTION
+block-use-cpu-chill.patch
+
+# FS LIVELOCK PREVENTION
+fs-dcache-use-cpu-chill-in-trylock-loops.patch
+net-use-cpu-chill.patch
+
+# WORKQUEUE more fixes
+workqueue-use-rcu.patch
+workqueue-use-locallock.patch
+work-queue-work-around-irqsafe-timer-optimization.patch
+workqueue-distangle-from-rq-lock.patch
+
+# IDR
+idr-use-local-lock-for-protection.patch
+percpu_ida-use-locklocks.patch
+
+# DEBUGOBJECTS
+debugobjects-rt.patch
+
+# JUMPLABEL
+jump-label-rt.patch
+
+# NET
+skbufhead-raw-lock.patch
+
+# irqwork
+irqwork-push_most_work_into_softirq_context.patch
+
+# Sound
+snd-pcm-fix-snd_pcm_stream_lock-irqs_disabled-splats.patch
+ASoC-Intel-sst-use-instead-of-at-the-of-a-C-statemen.patch
+
+# CONSOLE. NEEDS more thought !!!
+printk-rt-aware.patch
+HACK-printk-drop-the-logbuf_lock-more-often.patch
+
+# POWERC
+power-use-generic-rwsem-on-rt.patch
+powerpc-kvm-Disable-in-kernel-MPIC-emulation-for-PRE.patch
+powerpc-ps3-device-init.c-adapt-to-completions-using.patch
+
+# ARM
+arm-at91-tclib-default-to-tclib-timer-for-rt.patch
+arm-unwind-use_raw_lock.patch
+ARM-enable-irq-in-translation-section-permission-fau.patch
+ARM-cmpxchg-define-__HAVE_ARCH_CMPXCHG-for-armv6-and.patch
+
+# NETWORK livelock fix
+net-tx-action-avoid-livelock-on-rt.patch
+
+# NETWORK DEBUGGING AID
+ping-sysrq.patch
+
+# KGDB
+kgb-serial-hackaround.patch
+
+# SYSFS - RT indicator
+sysfs-realtime-entry.patch
+
+# KMAP/HIGHMEM
+power-disable-highmem-on-rt.patch
+mips-disable-highmem-on-rt.patch
+mm-rt-kmap-atomic-scheduling.patch
+x86-highmem-add-a-already-used-pte-check.patch
+arm-highmem-flush-tlb-on-unmap.patch
+arm-enable-highmem-for-rt.patch
+
+# IPC
+ipc-sem-rework-semaphore-wakeups.patch
+
+# SYSRQ
+
+# KVM require constant freq TSC (smp function call -> cpufreq)
+x86-kvm-require-const-tsc-for-rt.patch
+KVM-lapic-mark-LAPIC-timer-handler-as-irqsafe.patch
+KVM-use-simple-waitqueue-for-vcpu-wq.patch
+
+# SCSI/FCOE
+scsi-fcoe-rt-aware.patch
+sas-ata-isci-dont-t-disable-interrupts-in-qc_issue-h.patch
+
+# X86 crypto
+x86-crypto-reduce-preempt-disabled-regions.patch
+crypto-Reduce-preempt-disabled-regions-more-algos.patch
+
+# Device mapper
+dm-make-rt-aware.patch
+
+# ACPI
+acpi-rt-Convert-acpi_gbl_hardware-lock-back-to-a-raw.patch
+
+# CPUMASK OFFSTACK
+cpumask-disable-offstack-on-rt.patch
+
+# RANDOM
+random-make-it-work-on-rt.patch
+
+# SEQLOCKS
+seqlock-prevent-rt-starvation.patch
+
+# HOTPLUG
+cpu-rt-make-hotplug-lock-a-sleeping-spinlock-on-rt.patch
+cpu-rt-rework-cpu-down.patch
+cpu-hotplug-Document-why-PREEMPT_RT-uses-a-spinlock.patch
+kernel-cpu-fix-cpu-down-problem-if-kthread-s-cpu-is-.patch
+kernel-hotplug-restore-original-cpu-mask-oncpu-down.patch
+cpu_down_move_migrate_enable_back.patch
+hotplug-Use-set_cpus_allowed_ptr-in-sync_unplug_thre.patch
+
+# SCSCI QLA2xxx
+scsi-qla2xxx-fix-bug-sleeping-function-called-from-invalid-context.patch
+
+# NET
+upstream-net-rt-remove-preemption-disabling-in-netif_rx.patch
+net-another-local-irq-disable-alloc-atomic-headache.patch
+net-fix-iptable-xt-write-recseq-begin-rt-fallout.patch
+net-make-devnet_rename_seq-a-mutex.patch
+
+# CRYPTO
+peterz-srcu-crypto-chain.patch
+
+# LOCKDEP
+lockdep-selftest-only-do-hardirq-context-test-for-raw-spinlock.patch
+lockdep-selftest-fix-warnings-due-to-missing-PREEMPT.patch
+
+# PERF
+perf-make-swevent-hrtimer-irqsafe.patch
+
+# RCU
+rcu-disable-rcu-fast-no-hz-on-rt.patch
+rcu-Eliminate-softirq-processing-from-rcutree.patch
+rcu-make-RCU_BOOST-default-on-RT.patch
+
+# PREEMPT LAZY
+preempt-lazy-support.patch
+x86-preempt-lazy.patch
+arm-preempt-lazy-support.patch
+powerpc-preempt-lazy-support.patch
+arch-arm64-Add-lazy-preempt-support.patch
+
+# LEDS
+leds-trigger-disable-CPU-trigger-on-RT.patch
+
+# DRIVERS
+i2c-omap-drop-the-lock-hard-irq-context.patch
+mmci-remove-bogus-irq-save.patch
+mmc-sdhci-don-t-provide-hard-irq-handler.patch
+cpufreq-drop-K8-s-driver-from-beeing-selected.patch
+
+# I915
+i915_compile_fix.patch
+drm-i915-drop-trace_i915_gem_ring_dispatch-onrt.patch
+i915-bogus-warning-from-i915-when-running-on-PREEMPT.patch
+
+
+cgroups-use-simple-wait-in-css_release.patch
+cgroups-scheduling-while-atomic-in-cgroup-code.patch
+
+# New stuff
+# Revisit: We need this in other places as well
+move_sched_delayed_work_to_helper.patch
+
+# bcache disabled
+md-disable-bcache.patch
+
+# Latest fixes
+workqueue-prevent-deadlock-stall.patch
+
+# Add RT to version
+localversion.patch
diff --git a/patches/signal-fix-up-rcu-wreckage.patch b/patches/signal-fix-up-rcu-wreckage.patch
new file mode 100644
index 00000000000000..97f2a9e8cd2382
--- /dev/null
+++ b/patches/signal-fix-up-rcu-wreckage.patch
@@ -0,0 +1,38 @@
+Subject: signal: Make __lock_task_sighand() RT aware
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 22 Jul 2011 08:07:08 +0200
+
+local_irq_save() + spin_lock(&sighand->siglock) does not work on
+-RT. Use the nort variants.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/signal.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/kernel/signal.c
++++ b/kernel/signal.c
+@@ -1342,12 +1342,12 @@ struct sighand_struct *__lock_task_sigha
+ * Disable interrupts early to avoid deadlocks.
+ * See rcu_read_unlock() comment header for details.
+ */
+- local_irq_save(*flags);
++ local_irq_save_nort(*flags);
+ rcu_read_lock();
+ sighand = rcu_dereference(tsk->sighand);
+ if (unlikely(sighand == NULL)) {
+ rcu_read_unlock();
+- local_irq_restore(*flags);
++ local_irq_restore_nort(*flags);
+ break;
+ }
+ /*
+@@ -1368,7 +1368,7 @@ struct sighand_struct *__lock_task_sigha
+ }
+ spin_unlock(&sighand->siglock);
+ rcu_read_unlock();
+- local_irq_restore(*flags);
++ local_irq_restore_nort(*flags);
+ }
+
+ return sighand;
diff --git a/patches/signal-revert-ptrace-preempt-magic.patch b/patches/signal-revert-ptrace-preempt-magic.patch
new file mode 100644
index 00000000000000..0dae85433fdeb3
--- /dev/null
+++ b/patches/signal-revert-ptrace-preempt-magic.patch
@@ -0,0 +1,31 @@
+Subject: signal: Revert ptrace preempt magic
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 21 Sep 2011 19:57:12 +0200
+
+Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
+than a bandaid around the ptrace design trainwreck. It's not a
+correctness issue, it's merily a cosmetic bandaid.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/signal.c | 8 --------
+ 1 file changed, 8 deletions(-)
+
+--- a/kernel/signal.c
++++ b/kernel/signal.c
+@@ -1897,15 +1897,7 @@ static void ptrace_stop(int exit_code, i
+ if (gstop_done && ptrace_reparented(current))
+ do_notify_parent_cldstop(current, false, why);
+
+- /*
+- * Don't want to allow preemption here, because
+- * sys_ptrace() needs this task to be inactive.
+- *
+- * XXX: implement read_unlock_no_resched().
+- */
+- preempt_disable();
+ read_unlock(&tasklist_lock);
+- preempt_enable_no_resched();
+ freezable_schedule();
+ } else {
+ /*
diff --git a/patches/signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch b/patches/signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
new file mode 100644
index 00000000000000..f123d926a44f67
--- /dev/null
+++ b/patches/signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
@@ -0,0 +1,213 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 3 Jul 2009 08:44:56 -0500
+Subject: signals: Allow rt tasks to cache one sigqueue struct
+
+To avoid allocation allow rt tasks to cache one sigqueue struct in
+task struct.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/sched.h | 1
+ include/linux/signal.h | 1
+ kernel/exit.c | 2 -
+ kernel/fork.c | 1
+ kernel/signal.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++---
+ 5 files changed, 84 insertions(+), 5 deletions(-)
+
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1527,6 +1527,7 @@ struct task_struct {
+ /* signal handlers */
+ struct signal_struct *signal;
+ struct sighand_struct *sighand;
++ struct sigqueue *sigqueue_cache;
+
+ sigset_t blocked, real_blocked;
+ sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
+--- a/include/linux/signal.h
++++ b/include/linux/signal.h
+@@ -218,6 +218,7 @@ static inline void init_sigpending(struc
+ }
+
+ extern void flush_sigqueue(struct sigpending *queue);
++extern void flush_task_sigqueue(struct task_struct *tsk);
+
+ /* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
+ static inline int valid_signal(unsigned long sig)
+--- a/kernel/exit.c
++++ b/kernel/exit.c
+@@ -144,7 +144,7 @@ static void __exit_signal(struct task_st
+ * Do this under ->siglock, we can race with another thread
+ * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
+ */
+- flush_sigqueue(&tsk->pending);
++ flush_task_sigqueue(tsk);
+ tsk->sighand = NULL;
+ spin_unlock(&sighand->siglock);
+
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1338,6 +1338,7 @@ static struct task_struct *copy_process(
+ spin_lock_init(&p->alloc_lock);
+
+ init_sigpending(&p->pending);
++ p->sigqueue_cache = NULL;
+
+ p->utime = p->stime = p->gtime = 0;
+ p->utimescaled = p->stimescaled = 0;
+--- a/kernel/signal.c
++++ b/kernel/signal.c
+@@ -14,6 +14,7 @@
+ #include <linux/export.h>
+ #include <linux/init.h>
+ #include <linux/sched.h>
++#include <linux/sched/rt.h>
+ #include <linux/fs.h>
+ #include <linux/tty.h>
+ #include <linux/binfmts.h>
+@@ -352,13 +353,45 @@ static bool task_participate_group_stop(
+ return false;
+ }
+
++#ifdef __HAVE_ARCH_CMPXCHG
++static inline struct sigqueue *get_task_cache(struct task_struct *t)
++{
++ struct sigqueue *q = t->sigqueue_cache;
++
++ if (cmpxchg(&t->sigqueue_cache, q, NULL) != q)
++ return NULL;
++ return q;
++}
++
++static inline int put_task_cache(struct task_struct *t, struct sigqueue *q)
++{
++ if (cmpxchg(&t->sigqueue_cache, NULL, q) == NULL)
++ return 0;
++ return 1;
++}
++
++#else
++
++static inline struct sigqueue *get_task_cache(struct task_struct *t)
++{
++ return NULL;
++}
++
++static inline int put_task_cache(struct task_struct *t, struct sigqueue *q)
++{
++ return 1;
++}
++
++#endif
++
+ /*
+ * allocate a new signal queue record
+ * - this may be called without locks if and only if t == current, otherwise an
+ * appropriate lock must be held to stop the target task from exiting
+ */
+ static struct sigqueue *
+-__sigqueue_alloc(int sig, struct task_struct *t, gfp_t flags, int override_rlimit)
++__sigqueue_do_alloc(int sig, struct task_struct *t, gfp_t flags,
++ int override_rlimit, int fromslab)
+ {
+ struct sigqueue *q = NULL;
+ struct user_struct *user;
+@@ -375,7 +408,10 @@ static struct sigqueue *
+ if (override_rlimit ||
+ atomic_read(&user->sigpending) <=
+ task_rlimit(t, RLIMIT_SIGPENDING)) {
+- q = kmem_cache_alloc(sigqueue_cachep, flags);
++ if (!fromslab)
++ q = get_task_cache(t);
++ if (!q)
++ q = kmem_cache_alloc(sigqueue_cachep, flags);
+ } else {
+ print_dropped_signal(sig);
+ }
+@@ -392,6 +428,13 @@ static struct sigqueue *
+ return q;
+ }
+
++static struct sigqueue *
++__sigqueue_alloc(int sig, struct task_struct *t, gfp_t flags,
++ int override_rlimit)
++{
++ return __sigqueue_do_alloc(sig, t, flags, override_rlimit, 0);
++}
++
+ static void __sigqueue_free(struct sigqueue *q)
+ {
+ if (q->flags & SIGQUEUE_PREALLOC)
+@@ -401,6 +444,21 @@ static void __sigqueue_free(struct sigqu
+ kmem_cache_free(sigqueue_cachep, q);
+ }
+
++static void sigqueue_free_current(struct sigqueue *q)
++{
++ struct user_struct *up;
++
++ if (q->flags & SIGQUEUE_PREALLOC)
++ return;
++
++ up = q->user;
++ if (rt_prio(current->normal_prio) && !put_task_cache(current, q)) {
++ atomic_dec(&up->sigpending);
++ free_uid(up);
++ } else
++ __sigqueue_free(q);
++}
++
+ void flush_sigqueue(struct sigpending *queue)
+ {
+ struct sigqueue *q;
+@@ -414,6 +472,21 @@ void flush_sigqueue(struct sigpending *q
+ }
+
+ /*
++ * Called from __exit_signal. Flush tsk->pending and
++ * tsk->sigqueue_cache
++ */
++void flush_task_sigqueue(struct task_struct *tsk)
++{
++ struct sigqueue *q;
++
++ flush_sigqueue(&tsk->pending);
++
++ q = get_task_cache(tsk);
++ if (q)
++ kmem_cache_free(sigqueue_cachep, q);
++}
++
++/*
+ * Flush all pending signals for a task.
+ */
+ void __flush_signals(struct task_struct *t)
+@@ -565,7 +638,7 @@ static void collect_signal(int sig, stru
+ still_pending:
+ list_del_init(&first->list);
+ copy_siginfo(info, &first->info);
+- __sigqueue_free(first);
++ sigqueue_free_current(first);
+ } else {
+ /*
+ * Ok, it wasn't in the queue. This must be
+@@ -611,6 +684,8 @@ int dequeue_signal(struct task_struct *t
+ {
+ int signr;
+
++ WARN_ON_ONCE(tsk != current);
++
+ /* We only dequeue private signals from ourselves, we don't let
+ * signalfd steal them
+ */
+@@ -1536,7 +1611,8 @@ EXPORT_SYMBOL(kill_pid);
+ */
+ struct sigqueue *sigqueue_alloc(void)
+ {
+- struct sigqueue *q = __sigqueue_alloc(-1, current, GFP_KERNEL, 0);
++ /* Preallocated sigqueue objects always from the slabcache ! */
++ struct sigqueue *q = __sigqueue_do_alloc(-1, current, GFP_KERNEL, 0, 1);
+
+ if (q)
+ q->flags |= SIGQUEUE_PREALLOC;
diff --git a/patches/skbufhead-raw-lock.patch b/patches/skbufhead-raw-lock.patch
new file mode 100644
index 00000000000000..de359704781c5e
--- /dev/null
+++ b/patches/skbufhead-raw-lock.patch
@@ -0,0 +1,113 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 12 Jul 2011 15:38:34 +0200
+Subject: net: Use skbufhead with raw lock
+
+Use the rps lock as rawlock so we can keep irq-off regions. It looks low
+latency. However we can't kfree() from this context therefore we defer this
+to the softirq and use the tofree_queue list for it (similar to process_queue).
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/netdevice.h | 1 +
+ include/linux/skbuff.h | 7 +++++++
+ net/core/dev.c | 19 +++++++++++++------
+ 3 files changed, 21 insertions(+), 6 deletions(-)
+
+--- a/include/linux/netdevice.h
++++ b/include/linux/netdevice.h
+@@ -2469,6 +2469,7 @@ struct softnet_data {
+ unsigned int dropped;
+ struct sk_buff_head input_pkt_queue;
+ struct napi_struct backlog;
++ struct sk_buff_head tofree_queue;
+
+ };
+
+--- a/include/linux/skbuff.h
++++ b/include/linux/skbuff.h
+@@ -187,6 +187,7 @@ struct sk_buff_head {
+
+ __u32 qlen;
+ spinlock_t lock;
++ raw_spinlock_t raw_lock;
+ };
+
+ struct sk_buff;
+@@ -1336,6 +1337,12 @@ static inline void skb_queue_head_init(s
+ __skb_queue_head_init(list);
+ }
+
++static inline void skb_queue_head_init_raw(struct sk_buff_head *list)
++{
++ raw_spin_lock_init(&list->raw_lock);
++ __skb_queue_head_init(list);
++}
++
+ static inline void skb_queue_head_init_class(struct sk_buff_head *list,
+ struct lock_class_key *class)
+ {
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -205,14 +205,14 @@ static inline struct hlist_head *dev_ind
+ static inline void rps_lock(struct softnet_data *sd)
+ {
+ #ifdef CONFIG_RPS
+- spin_lock(&sd->input_pkt_queue.lock);
++ raw_spin_lock(&sd->input_pkt_queue.raw_lock);
+ #endif
+ }
+
+ static inline void rps_unlock(struct softnet_data *sd)
+ {
+ #ifdef CONFIG_RPS
+- spin_unlock(&sd->input_pkt_queue.lock);
++ raw_spin_unlock(&sd->input_pkt_queue.raw_lock);
+ #endif
+ }
+
+@@ -3885,7 +3885,7 @@ static void flush_backlog(void *arg)
+ skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
+ if (skb->dev == dev) {
+ __skb_unlink(skb, &sd->input_pkt_queue);
+- kfree_skb(skb);
++ __skb_queue_tail(&sd->tofree_queue, skb);
+ input_queue_head_incr(sd);
+ }
+ }
+@@ -3894,10 +3894,13 @@ static void flush_backlog(void *arg)
+ skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
+ if (skb->dev == dev) {
+ __skb_unlink(skb, &sd->process_queue);
+- kfree_skb(skb);
++ __skb_queue_tail(&sd->tofree_queue, skb);
+ input_queue_head_incr(sd);
+ }
+ }
++
++ if (!skb_queue_empty(&sd->tofree_queue))
++ raise_softirq_irqoff(NET_RX_SOFTIRQ);
+ }
+
+ static int napi_gro_complete(struct sk_buff *skb)
+@@ -7183,6 +7186,9 @@ static int dev_cpu_callback(struct notif
+ netif_rx_ni(skb);
+ input_queue_head_incr(oldsd);
+ }
++ while ((skb = __skb_dequeue(&oldsd->tofree_queue))) {
++ kfree_skb(skb);
++ }
+
+ return NOTIFY_OK;
+ }
+@@ -7484,8 +7490,9 @@ static int __init net_dev_init(void)
+ for_each_possible_cpu(i) {
+ struct softnet_data *sd = &per_cpu(softnet_data, i);
+
+- skb_queue_head_init(&sd->input_pkt_queue);
+- skb_queue_head_init(&sd->process_queue);
++ skb_queue_head_init_raw(&sd->input_pkt_queue);
++ skb_queue_head_init_raw(&sd->process_queue);
++ skb_queue_head_init_raw(&sd->tofree_queue);
+ INIT_LIST_HEAD(&sd->poll_list);
+ sd->output_queue_tailp = &sd->output_queue;
+ #ifdef CONFIG_RPS
diff --git a/patches/slub-disable-SLUB_CPU_PARTIAL.patch b/patches/slub-disable-SLUB_CPU_PARTIAL.patch
new file mode 100644
index 00000000000000..12c21be64bf5af
--- /dev/null
+++ b/patches/slub-disable-SLUB_CPU_PARTIAL.patch
@@ -0,0 +1,47 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Wed, 15 Apr 2015 19:00:47 +0200
+Subject: slub: Disable SLUB_CPU_PARTIAL
+
+|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
+|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
+|1 lock held by rcuop/7/87:
+| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
+|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
+|
+|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
+|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
+| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
+| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
+| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
+|Call Trace:
+| [<ffffffff817441c7>] dump_stack+0x4f/0x90
+| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
+| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
+| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
+| [<ffffffff811a729d>] __free_pages+0x1d/0x30
+| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
+| [<ffffffff811edf46>] free_delayed+0x56/0x70
+| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
+| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
+| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
+| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
+| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
+| [<ffffffff810e951c>] kthread+0xfc/0x120
+| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ init/Kconfig | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -1717,7 +1717,7 @@ endchoice
+
+ config SLUB_CPU_PARTIAL
+ default y
+- depends on SLUB && SMP
++ depends on SLUB && SMP && !PREEMPT_RT_FULL
+ bool "SLUB per cpu partial cache"
+ help
+ Per cpu partial caches accellerate objects allocation and freeing
diff --git a/patches/slub-enable-irqs-for-no-wait.patch b/patches/slub-enable-irqs-for-no-wait.patch
new file mode 100644
index 00000000000000..5da834967e9f7a
--- /dev/null
+++ b/patches/slub-enable-irqs-for-no-wait.patch
@@ -0,0 +1,46 @@
+Subject: slub: Enable irqs for __GFP_WAIT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 09 Jan 2013 12:08:15 +0100
+
+SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
+with GFP_WAIT can happen before that. So use this as an indicator.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ mm/slub.c | 13 +++++--------
+ 1 file changed, 5 insertions(+), 8 deletions(-)
+
+--- a/mm/slub.c
++++ b/mm/slub.c
+@@ -1355,14 +1355,15 @@ static struct page *allocate_slab(struct
+ gfp_t alloc_gfp;
+ void *start, *p;
+ int idx, order;
++ bool enableirqs;
+
+ flags &= gfp_allowed_mask;
+
++ enableirqs = (flags & __GFP_WAIT) != 0;
+ #ifdef CONFIG_PREEMPT_RT_FULL
+- if (system_state == SYSTEM_RUNNING)
+-#else
+- if (flags & __GFP_WAIT)
++ enableirqs |= system_state == SYSTEM_RUNNING;
+ #endif
++ if (enableirqs)
+ local_irq_enable();
+
+ flags |= s->allocflags;
+@@ -1431,11 +1432,7 @@ static struct page *allocate_slab(struct
+ page->frozen = 1;
+
+ out:
+-#ifdef CONFIG_PREEMPT_RT_FULL
+- if (system_state == SYSTEM_RUNNING)
+-#else
+- if (flags & __GFP_WAIT)
+-#endif
++ if (enableirqs)
+ local_irq_disable();
+ if (!page)
+ return NULL;
diff --git a/patches/snd-pcm-fix-snd_pcm_stream_lock-irqs_disabled-splats.patch b/patches/snd-pcm-fix-snd_pcm_stream_lock-irqs_disabled-splats.patch
new file mode 100644
index 00000000000000..23992cb73413da
--- /dev/null
+++ b/patches/snd-pcm-fix-snd_pcm_stream_lock-irqs_disabled-splats.patch
@@ -0,0 +1,69 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Wed, 18 Feb 2015 15:09:23 +0100
+Subject: snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats
+
+Locking functions previously using read_lock_irq()/read_lock_irqsave() were
+changed to local_irq_disable/save(), leading to gripes. Use nort variants.
+
+|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
+|in_atomic(): 0, irqs_disabled(): 1, pid: 5947, name: alsa-sink-ALC88
+|CPU: 5 PID: 5947 Comm: alsa-sink-ALC88 Not tainted 3.18.7-rt1 #9
+|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
+| ffff880409316240 ffff88040866fa38 ffffffff815bdeb5 0000000000000002
+| 0000000000000000 ffff88040866fa58 ffffffff81073c86 ffffffffa03b2640
+| ffff88040239ec00 ffff88040866fa78 ffffffff815c3d34 ffffffffa03b2640
+|Call Trace:
+| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
+| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
+| [<ffffffff815c3d34>] __rt_spin_lock+0x24/0x50
+| [<ffffffff815c4044>] rt_read_lock+0x34/0x40
+| [<ffffffffa03a2979>] snd_pcm_stream_lock+0x29/0x70 [snd_pcm]
+| [<ffffffffa03a355d>] snd_pcm_playback_poll+0x5d/0x120 [snd_pcm]
+| [<ffffffff811937a2>] do_sys_poll+0x322/0x5b0
+| [<ffffffff81193d48>] SyS_ppoll+0x1a8/0x1c0
+| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
+
+Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ sound/core/pcm_native.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+--- a/sound/core/pcm_native.c
++++ b/sound/core/pcm_native.c
+@@ -123,7 +123,7 @@ EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock)
+ void snd_pcm_stream_lock_irq(struct snd_pcm_substream *substream)
+ {
+ if (!substream->pcm->nonatomic)
+- local_irq_disable();
++ local_irq_disable_nort();
+ snd_pcm_stream_lock(substream);
+ }
+ EXPORT_SYMBOL_GPL(snd_pcm_stream_lock_irq);
+@@ -138,7 +138,7 @@ void snd_pcm_stream_unlock_irq(struct sn
+ {
+ snd_pcm_stream_unlock(substream);
+ if (!substream->pcm->nonatomic)
+- local_irq_enable();
++ local_irq_enable_nort();
+ }
+ EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irq);
+
+@@ -146,7 +146,7 @@ unsigned long _snd_pcm_stream_lock_irqsa
+ {
+ unsigned long flags = 0;
+ if (!substream->pcm->nonatomic)
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ snd_pcm_stream_lock(substream);
+ return flags;
+ }
+@@ -164,7 +164,7 @@ void snd_pcm_stream_unlock_irqrestore(st
+ {
+ snd_pcm_stream_unlock(substream);
+ if (!substream->pcm->nonatomic)
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+ EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irqrestore);
+
diff --git a/patches/softirq-disable-softirq-stacks-for-rt.patch b/patches/softirq-disable-softirq-stacks-for-rt.patch
new file mode 100644
index 00000000000000..9d9a752b700118
--- /dev/null
+++ b/patches/softirq-disable-softirq-stacks-for-rt.patch
@@ -0,0 +1,156 @@
+Subject: softirq: Disable softirq stacks for RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 18 Jul 2011 13:59:17 +0200
+
+Disable extra stacks for softirqs. We want to preempt softirqs and
+having them on special IRQ-stack does not make this easier.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/powerpc/kernel/irq.c | 2 ++
+ arch/powerpc/kernel/misc_32.S | 2 ++
+ arch/powerpc/kernel/misc_64.S | 2 ++
+ arch/sh/kernel/irq.c | 2 ++
+ arch/sparc/kernel/irq_64.c | 2 ++
+ arch/x86/kernel/entry_64.S | 2 ++
+ arch/x86/kernel/irq_32.c | 2 ++
+ include/linux/interrupt.h | 2 +-
+ 8 files changed, 15 insertions(+), 1 deletion(-)
+
+--- a/arch/powerpc/kernel/irq.c
++++ b/arch/powerpc/kernel/irq.c
+@@ -614,6 +614,7 @@ void irq_ctx_init(void)
+ }
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ void do_softirq_own_stack(void)
+ {
+ struct thread_info *curtp, *irqtp;
+@@ -631,6 +632,7 @@ void do_softirq_own_stack(void)
+ if (irqtp->flags)
+ set_bits(irqtp->flags, &curtp->flags);
+ }
++#endif
+
+ irq_hw_number_t virq_to_hw(unsigned int virq)
+ {
+--- a/arch/powerpc/kernel/misc_32.S
++++ b/arch/powerpc/kernel/misc_32.S
+@@ -40,6 +40,7 @@
+ * We store the saved ksp_limit in the unused part
+ * of the STACK_FRAME_OVERHEAD
+ */
++#ifndef CONFIG_PREEMPT_RT_FULL
+ _GLOBAL(call_do_softirq)
+ mflr r0
+ stw r0,4(r1)
+@@ -56,6 +57,7 @@
+ stw r10,THREAD+KSP_LIMIT(r2)
+ mtlr r0
+ blr
++#endif
+
+ /*
+ * void call_do_irq(struct pt_regs *regs, struct thread_info *irqtp);
+--- a/arch/powerpc/kernel/misc_64.S
++++ b/arch/powerpc/kernel/misc_64.S
+@@ -29,6 +29,7 @@
+
+ .text
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ _GLOBAL(call_do_softirq)
+ mflr r0
+ std r0,16(r1)
+@@ -39,6 +40,7 @@
+ ld r0,16(r1)
+ mtlr r0
+ blr
++#endif
+
+ _GLOBAL(call_do_irq)
+ mflr r0
+--- a/arch/sh/kernel/irq.c
++++ b/arch/sh/kernel/irq.c
+@@ -147,6 +147,7 @@ void irq_ctx_exit(int cpu)
+ hardirq_ctx[cpu] = NULL;
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ void do_softirq_own_stack(void)
+ {
+ struct thread_info *curctx;
+@@ -174,6 +175,7 @@ void do_softirq_own_stack(void)
+ "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr"
+ );
+ }
++#endif
+ #else
+ static inline void handle_one_irq(unsigned int irq)
+ {
+--- a/arch/sparc/kernel/irq_64.c
++++ b/arch/sparc/kernel/irq_64.c
+@@ -849,6 +849,7 @@ void __irq_entry handler_irq(int pil, st
+ set_irq_regs(old_regs);
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ void do_softirq_own_stack(void)
+ {
+ void *orig_sp, *sp = softirq_stack[smp_processor_id()];
+@@ -863,6 +864,7 @@ void do_softirq_own_stack(void)
+ __asm__ __volatile__("mov %0, %%sp"
+ : : "r" (orig_sp));
+ }
++#endif
+
+ #ifdef CONFIG_HOTPLUG_CPU
+ void fixup_irqs(void)
+--- a/arch/x86/kernel/entry_64.S
++++ b/arch/x86/kernel/entry_64.S
+@@ -1120,6 +1120,7 @@ END(native_load_gs_index)
+ jmp 2b
+ .previous
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ /* Call softirq on interrupt stack. Interrupts are off. */
+ ENTRY(do_softirq_own_stack)
+ CFI_STARTPROC
+@@ -1139,6 +1140,7 @@ ENTRY(do_softirq_own_stack)
+ ret
+ CFI_ENDPROC
+ END(do_softirq_own_stack)
++#endif
+
+ #ifdef CONFIG_XEN
+ idtentry xen_hypervisor_callback xen_do_hypervisor_callback has_error_code=0
+--- a/arch/x86/kernel/irq_32.c
++++ b/arch/x86/kernel/irq_32.c
+@@ -135,6 +135,7 @@ void irq_ctx_init(int cpu)
+ cpu, per_cpu(hardirq_stack, cpu), per_cpu(softirq_stack, cpu));
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ void do_softirq_own_stack(void)
+ {
+ struct thread_info *curstk;
+@@ -153,6 +154,7 @@ void do_softirq_own_stack(void)
+
+ call_on_stack(__do_softirq, isp);
+ }
++#endif
+
+ bool handle_irq(unsigned irq, struct pt_regs *regs)
+ {
+--- a/include/linux/interrupt.h
++++ b/include/linux/interrupt.h
+@@ -443,7 +443,7 @@ struct softirq_action
+ asmlinkage void do_softirq(void);
+ asmlinkage void __do_softirq(void);
+
+-#ifdef __ARCH_HAS_DO_SOFTIRQ
++#if defined(__ARCH_HAS_DO_SOFTIRQ) && !defined(CONFIG_PREEMPT_RT_FULL)
+ void do_softirq_own_stack(void);
+ #else
+ static inline void do_softirq_own_stack(void)
diff --git a/patches/softirq-preempt-fix-3-re.patch b/patches/softirq-preempt-fix-3-re.patch
new file mode 100644
index 00000000000000..47b6246f3c1b9d
--- /dev/null
+++ b/patches/softirq-preempt-fix-3-re.patch
@@ -0,0 +1,153 @@
+Subject: softirq: Check preemption after reenabling interrupts
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 13 Nov 2011 17:17:09 +0100 (CET)
+
+raise_softirq_irqoff() disables interrupts and wakes the softirq
+daemon, but after reenabling interrupts there is no preemption check,
+so the execution of the softirq thread might be delayed arbitrarily.
+
+In principle we could add that check to local_irq_enable/restore, but
+that's overkill as the rasie_softirq_irqoff() sections are the only
+ones which show this behaviour.
+
+Reported-by: Carsten Emde <cbe@osadl.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ block/blk-iopoll.c | 3 +++
+ block/blk-softirq.c | 3 +++
+ include/linux/preempt.h | 3 +++
+ net/core/dev.c | 7 +++++++
+ 4 files changed, 16 insertions(+)
+
+--- a/block/blk-iopoll.c
++++ b/block/blk-iopoll.c
+@@ -35,6 +35,7 @@ void blk_iopoll_sched(struct blk_iopoll
+ list_add_tail(&iop->list, this_cpu_ptr(&blk_cpu_iopoll));
+ __raise_softirq_irqoff(BLOCK_IOPOLL_SOFTIRQ);
+ local_irq_restore(flags);
++ preempt_check_resched_rt();
+ }
+ EXPORT_SYMBOL(blk_iopoll_sched);
+
+@@ -132,6 +133,7 @@ static void blk_iopoll_softirq(struct so
+ __raise_softirq_irqoff(BLOCK_IOPOLL_SOFTIRQ);
+
+ local_irq_enable();
++ preempt_check_resched_rt();
+ }
+
+ /**
+@@ -201,6 +203,7 @@ static int blk_iopoll_cpu_notify(struct
+ this_cpu_ptr(&blk_cpu_iopoll));
+ __raise_softirq_irqoff(BLOCK_IOPOLL_SOFTIRQ);
+ local_irq_enable();
++ preempt_check_resched_rt();
+ }
+
+ return NOTIFY_OK;
+--- a/block/blk-softirq.c
++++ b/block/blk-softirq.c
+@@ -51,6 +51,7 @@ static void trigger_softirq(void *data)
+ raise_softirq_irqoff(BLOCK_SOFTIRQ);
+
+ local_irq_restore(flags);
++ preempt_check_resched_rt();
+ }
+
+ /*
+@@ -93,6 +94,7 @@ static int blk_cpu_notify(struct notifie
+ this_cpu_ptr(&blk_cpu_done));
+ raise_softirq_irqoff(BLOCK_SOFTIRQ);
+ local_irq_enable();
++ preempt_check_resched_rt();
+ }
+
+ return NOTIFY_OK;
+@@ -150,6 +152,7 @@ void __blk_complete_request(struct reque
+ goto do_local;
+
+ local_irq_restore(flags);
++ preempt_check_resched_rt();
+ }
+
+ /**
+--- a/include/linux/preempt.h
++++ b/include/linux/preempt.h
+@@ -49,8 +49,10 @@ do { \
+
+ #ifdef CONFIG_PREEMPT_RT_BASE
+ # define preempt_enable_no_resched() sched_preempt_enable_no_resched()
++# define preempt_check_resched_rt() preempt_check_resched()
+ #else
+ # define preempt_enable_no_resched() preempt_enable()
++# define preempt_check_resched_rt() barrier();
+ #endif
+
+ #ifdef CONFIG_PREEMPT
+@@ -125,6 +127,7 @@ do { \
+ #define preempt_disable_notrace() barrier()
+ #define preempt_enable_no_resched_notrace() barrier()
+ #define preempt_enable_notrace() barrier()
++#define preempt_check_resched_rt() barrier()
+
+ #endif /* CONFIG_PREEMPT_COUNT */
+
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -2218,6 +2218,7 @@ static inline void __netif_reschedule(st
+ sd->output_queue_tailp = &q->next_sched;
+ raise_softirq_irqoff(NET_TX_SOFTIRQ);
+ local_irq_restore(flags);
++ preempt_check_resched_rt();
+ }
+
+ void __netif_schedule(struct Qdisc *q)
+@@ -2299,6 +2300,7 @@ void __dev_kfree_skb_irq(struct sk_buff
+ __this_cpu_write(softnet_data.completion_queue, skb);
+ raise_softirq_irqoff(NET_TX_SOFTIRQ);
+ local_irq_restore(flags);
++ preempt_check_resched_rt();
+ }
+ EXPORT_SYMBOL(__dev_kfree_skb_irq);
+
+@@ -3366,6 +3368,7 @@ static int enqueue_to_backlog(struct sk_
+ rps_unlock(sd);
+
+ local_irq_restore(flags);
++ preempt_check_resched_rt();
+
+ atomic_long_inc(&skb->dev->rx_dropped);
+ kfree_skb(skb);
+@@ -4347,6 +4350,7 @@ static void net_rps_action_and_irq_enabl
+ sd->rps_ipi_list = NULL;
+
+ local_irq_enable();
++ preempt_check_resched_rt();
+
+ /* Send pending IPI's to kick RPS processing on remote cpus. */
+ while (remsd) {
+@@ -4360,6 +4364,7 @@ static void net_rps_action_and_irq_enabl
+ } else
+ #endif
+ local_irq_enable();
++ preempt_check_resched_rt();
+ }
+
+ static bool sd_has_rps_ipi_waiting(struct softnet_data *sd)
+@@ -4439,6 +4444,7 @@ void __napi_schedule(struct napi_struct
+ local_irq_save(flags);
+ ____napi_schedule(this_cpu_ptr(&softnet_data), n);
+ local_irq_restore(flags);
++ preempt_check_resched_rt();
+ }
+ EXPORT_SYMBOL(__napi_schedule);
+
+@@ -7168,6 +7174,7 @@ static int dev_cpu_callback(struct notif
+
+ raise_softirq_irqoff(NET_TX_SOFTIRQ);
+ local_irq_enable();
++ preempt_check_resched_rt();
+
+ /* Process offline CPU's input_pkt_queue */
+ while ((skb = __skb_dequeue(&oldsd->process_queue))) {
diff --git a/patches/softirq-split-locks.patch b/patches/softirq-split-locks.patch
new file mode 100644
index 00000000000000..e17cf8af640642
--- /dev/null
+++ b/patches/softirq-split-locks.patch
@@ -0,0 +1,826 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 04 Oct 2012 14:20:47 +0100
+Subject: softirq: Split softirq locks
+
+The 3.x RT series removed the split softirq implementation in favour
+of pushing softirq processing into the context of the thread which
+raised it. Though this prevents us from handling the various softirqs
+at different priorities. Now instead of reintroducing the split
+softirq threads we split the locks which serialize the softirq
+processing.
+
+If a softirq is raised in context of a thread, then the softirq is
+noted on a per thread field, if the thread is in a bh disabled
+region. If the softirq is raised from hard interrupt context, then the
+bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
+When a thread leaves a bh disabled region, then it tries to execute
+the softirqs which have been raised in its own context. It acquires
+the per softirq / per cpu lock for the softirq and then checks,
+whether the softirq is still pending in the per cpu
+local_softirq_pending() field. If yes, it runs the softirq. If no,
+then some other task executed it already. This allows for zero config
+softirq elevation in the context of user space tasks or interrupt
+threads.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/bottom_half.h | 12
+ include/linux/interrupt.h | 15 +
+ include/linux/preempt_mask.h | 15 -
+ include/linux/sched.h | 3
+ init/main.c | 1
+ kernel/softirq.c | 518 ++++++++++++++++++++++++++++++++++++-------
+ kernel/time/tick-sched.c | 9
+ net/core/dev.c | 6
+ 8 files changed, 485 insertions(+), 94 deletions(-)
+
+--- a/include/linux/bottom_half.h
++++ b/include/linux/bottom_half.h
+@@ -4,6 +4,17 @@
+ #include <linux/preempt.h>
+ #include <linux/preempt_mask.h>
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++
++extern void local_bh_disable(void);
++extern void _local_bh_enable(void);
++extern void local_bh_enable(void);
++extern void local_bh_enable_ip(unsigned long ip);
++extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt);
++extern void __local_bh_enable_ip(unsigned long ip, unsigned int cnt);
++
++#else
++
+ #ifdef CONFIG_TRACE_IRQFLAGS
+ extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt);
+ #else
+@@ -31,5 +42,6 @@ static inline void local_bh_enable(void)
+ {
+ __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
+ }
++#endif
+
+ #endif /* _LINUX_BH_H */
+--- a/include/linux/interrupt.h
++++ b/include/linux/interrupt.h
+@@ -440,10 +440,11 @@ struct softirq_action
+ void (*action)(struct softirq_action *);
+ };
+
++#ifndef CONFIG_PREEMPT_RT_FULL
+ asmlinkage void do_softirq(void);
+ asmlinkage void __do_softirq(void);
+-
+-#if defined(__ARCH_HAS_DO_SOFTIRQ) && !defined(CONFIG_PREEMPT_RT_FULL)
++static inline void thread_do_softirq(void) { do_softirq(); }
++#ifdef __ARCH_HAS_DO_SOFTIRQ
+ void do_softirq_own_stack(void);
+ #else
+ static inline void do_softirq_own_stack(void)
+@@ -451,6 +452,9 @@ static inline void do_softirq_own_stack(
+ __do_softirq();
+ }
+ #endif
++#else
++extern void thread_do_softirq(void);
++#endif
+
+ extern void open_softirq(int nr, void (*action)(struct softirq_action *));
+ extern void softirq_init(void);
+@@ -458,6 +462,7 @@ extern void __raise_softirq_irqoff(unsig
+
+ extern void raise_softirq_irqoff(unsigned int nr);
+ extern void raise_softirq(unsigned int nr);
++extern void softirq_check_pending_idle(void);
+
+ DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+
+@@ -615,6 +620,12 @@ void tasklet_hrtimer_cancel(struct taskl
+ tasklet_kill(&ttimer->tasklet);
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++extern void softirq_early_init(void);
++#else
++static inline void softirq_early_init(void) { }
++#endif
++
+ /*
+ * Autoprobing for irqs:
+ *
+--- a/include/linux/preempt_mask.h
++++ b/include/linux/preempt_mask.h
+@@ -44,16 +44,26 @@
+ #define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
+ #define NMI_OFFSET (1UL << NMI_SHIFT)
+
+-#define SOFTIRQ_DISABLE_OFFSET (2 * SOFTIRQ_OFFSET)
++#ifndef CONFIG_PREEMPT_RT_FULL
++# define SOFTIRQ_DISABLE_OFFSET (2 * SOFTIRQ_OFFSET)
++#else
++# define SOFTIRQ_DISABLE_OFFSET (0)
++#endif
+
+ #define PREEMPT_ACTIVE_BITS 1
+ #define PREEMPT_ACTIVE_SHIFT (NMI_SHIFT + NMI_BITS)
+ #define PREEMPT_ACTIVE (__IRQ_MASK(PREEMPT_ACTIVE_BITS) << PREEMPT_ACTIVE_SHIFT)
+
+ #define hardirq_count() (preempt_count() & HARDIRQ_MASK)
+-#define softirq_count() (preempt_count() & SOFTIRQ_MASK)
+ #define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \
+ | NMI_MASK))
++#ifndef CONFIG_PREEMPT_RT_FULL
++# define softirq_count() (preempt_count() & SOFTIRQ_MASK)
++# define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
++#else
++# define softirq_count() (0UL)
++extern int in_serving_softirq(void);
++#endif
+
+ /*
+ * Are we doing bottom half or hardware interrupt processing?
+@@ -64,7 +74,6 @@
+ #define in_irq() (hardirq_count())
+ #define in_softirq() (softirq_count())
+ #define in_interrupt() (irq_count())
+-#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
+
+ /*
+ * Are we in NMI context?
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1791,6 +1791,8 @@ struct task_struct {
+ #endif
+ #ifdef CONFIG_PREEMPT_RT_BASE
+ struct rcu_head put_rcu;
++ int softirq_nestcnt;
++ unsigned int softirqs_raised;
+ #endif
+ #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ unsigned long task_state_change;
+@@ -2041,6 +2043,7 @@ extern void thread_group_cputime_adjuste
+ /*
+ * Per process flags
+ */
++#define PF_IN_SOFTIRQ 0x00000001 /* Task is serving softirq */
+ #define PF_EXITING 0x00000004 /* getting shut down */
+ #define PF_EXITPIDONE 0x00000008 /* pi exit done on shut down */
+ #define PF_VCPU 0x00000010 /* I'm a virtual CPU */
+--- a/init/main.c
++++ b/init/main.c
+@@ -525,6 +525,7 @@ asmlinkage __visible void __init start_k
+ setup_command_line(command_line);
+ setup_nr_cpu_ids();
+ setup_per_cpu_areas();
++ softirq_early_init();
+ smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
+
+ build_all_zonelists(NULL, NULL);
+--- a/kernel/softirq.c
++++ b/kernel/softirq.c
+@@ -26,6 +26,7 @@
+ #include <linux/smp.h>
+ #include <linux/smpboot.h>
+ #include <linux/tick.h>
++#include <linux/locallock.h>
+ #include <linux/irq.h>
+
+ #define CREATE_TRACE_POINTS
+@@ -63,6 +64,98 @@ const char * const softirq_to_name[NR_SO
+ "TASKLET", "SCHED", "HRTIMER", "RCU"
+ };
+
++#ifdef CONFIG_NO_HZ_COMMON
++# ifdef CONFIG_PREEMPT_RT_FULL
++
++struct softirq_runner {
++ struct task_struct *runner[NR_SOFTIRQS];
++};
++
++static DEFINE_PER_CPU(struct softirq_runner, softirq_runners);
++
++static inline void softirq_set_runner(unsigned int sirq)
++{
++ struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
++
++ sr->runner[sirq] = current;
++}
++
++static inline void softirq_clr_runner(unsigned int sirq)
++{
++ struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
++
++ sr->runner[sirq] = NULL;
++}
++
++/*
++ * On preempt-rt a softirq running context might be blocked on a
++ * lock. There might be no other runnable task on this CPU because the
++ * lock owner runs on some other CPU. So we have to go into idle with
++ * the pending bit set. Therefor we need to check this otherwise we
++ * warn about false positives which confuses users and defeats the
++ * whole purpose of this test.
++ *
++ * This code is called with interrupts disabled.
++ */
++void softirq_check_pending_idle(void)
++{
++ static int rate_limit;
++ struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
++ u32 warnpending;
++ int i;
++
++ if (rate_limit >= 10)
++ return;
++
++ warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
++ for (i = 0; i < NR_SOFTIRQS; i++) {
++ struct task_struct *tsk = sr->runner[i];
++
++ /*
++ * The wakeup code in rtmutex.c wakes up the task
++ * _before_ it sets pi_blocked_on to NULL under
++ * tsk->pi_lock. So we need to check for both: state
++ * and pi_blocked_on.
++ */
++ if (tsk) {
++ raw_spin_lock(&tsk->pi_lock);
++ if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING) {
++ /* Clear all bits pending in that task */
++ warnpending &= ~(tsk->softirqs_raised);
++ warnpending &= ~(1 << i);
++ }
++ raw_spin_unlock(&tsk->pi_lock);
++ }
++ }
++
++ if (warnpending) {
++ printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
++ warnpending);
++ rate_limit++;
++ }
++}
++# else
++/*
++ * On !PREEMPT_RT we just printk rate limited:
++ */
++void softirq_check_pending_idle(void)
++{
++ static int rate_limit;
++
++ if (rate_limit < 10 &&
++ (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
++ printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
++ local_softirq_pending());
++ rate_limit++;
++ }
++}
++# endif
++
++#else /* !CONFIG_NO_HZ_COMMON */
++static inline void softirq_set_runner(unsigned int sirq) { }
++static inline void softirq_clr_runner(unsigned int sirq) { }
++#endif
++
+ /*
+ * we cannot loop indefinitely here to avoid userspace starvation,
+ * but we also don't want to introduce a worst case 1/HZ latency
+@@ -78,6 +171,68 @@ static void wakeup_softirqd(void)
+ wake_up_process(tsk);
+ }
+
++static void handle_softirq(unsigned int vec_nr)
++{
++ struct softirq_action *h = softirq_vec + vec_nr;
++ int prev_count;
++
++ prev_count = preempt_count();
++
++ kstat_incr_softirqs_this_cpu(vec_nr);
++
++ trace_softirq_entry(vec_nr);
++ h->action(h);
++ trace_softirq_exit(vec_nr);
++ if (unlikely(prev_count != preempt_count())) {
++ pr_err("huh, entered softirq %u %s %p with preempt_count %08x, exited with %08x?\n",
++ vec_nr, softirq_to_name[vec_nr], h->action,
++ prev_count, preempt_count());
++ preempt_count_set(prev_count);
++ }
++}
++
++#ifndef CONFIG_PREEMPT_RT_FULL
++static inline int ksoftirqd_softirq_pending(void)
++{
++ return local_softirq_pending();
++}
++
++static void handle_pending_softirqs(u32 pending)
++{
++ struct softirq_action *h = softirq_vec;
++ int softirq_bit;
++
++ local_irq_enable();
++
++ h = softirq_vec;
++
++ while ((softirq_bit = ffs(pending))) {
++ unsigned int vec_nr;
++
++ h += softirq_bit - 1;
++ vec_nr = h - softirq_vec;
++ handle_softirq(vec_nr);
++
++ h++;
++ pending >>= softirq_bit;
++ }
++
++ rcu_bh_qs();
++ local_irq_disable();
++}
++
++static void run_ksoftirqd(unsigned int cpu)
++{
++ local_irq_disable();
++ if (ksoftirqd_softirq_pending()) {
++ __do_softirq();
++ local_irq_enable();
++ cond_resched_rcu_qs();
++ return;
++ }
++ local_irq_enable();
++}
++
+ /*
+ * preempt_count and SOFTIRQ_OFFSET usage:
+ * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
+@@ -233,10 +388,8 @@ asmlinkage __visible void __do_softirq(v
+ unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
+ unsigned long old_flags = current->flags;
+ int max_restart = MAX_SOFTIRQ_RESTART;
+- struct softirq_action *h;
+ bool in_hardirq;
+ __u32 pending;
+- int softirq_bit;
+
+ /*
+ * Mask out PF_MEMALLOC s current task context is borrowed for the
+@@ -255,36 +408,7 @@ asmlinkage __visible void __do_softirq(v
+ /* Reset the pending bitmask before enabling irqs */
+ set_softirq_pending(0);
+
+- local_irq_enable();
+-
+- h = softirq_vec;
+-
+- while ((softirq_bit = ffs(pending))) {
+- unsigned int vec_nr;
+- int prev_count;
+-
+- h += softirq_bit - 1;
+-
+- vec_nr = h - softirq_vec;
+- prev_count = preempt_count();
+-
+- kstat_incr_softirqs_this_cpu(vec_nr);
+-
+- trace_softirq_entry(vec_nr);
+- h->action(h);
+- trace_softirq_exit(vec_nr);
+- if (unlikely(prev_count != preempt_count())) {
+- pr_err("huh, entered softirq %u %s %p with preempt_count %08x, exited with %08x?\n",
+- vec_nr, softirq_to_name[vec_nr], h->action,
+- prev_count, preempt_count());
+- preempt_count_set(prev_count);
+- }
+- h++;
+- pending >>= softirq_bit;
+- }
+-
+- rcu_bh_qs();
+- local_irq_disable();
++ handle_pending_softirqs(pending);
+
+ pending = local_softirq_pending();
+ if (pending) {
+@@ -321,6 +445,276 @@ asmlinkage __visible void do_softirq(voi
+ }
+
+ /*
++ * This function must run with irqs disabled!
++ */
++void raise_softirq_irqoff(unsigned int nr)
++{
++ __raise_softirq_irqoff(nr);
++
++ /*
++ * If we're in an interrupt or softirq, we're done
++ * (this also catches softirq-disabled code). We will
++ * actually run the softirq once we return from
++ * the irq or softirq.
++ *
++ * Otherwise we wake up ksoftirqd to make sure we
++ * schedule the softirq soon.
++ */
++ if (!in_interrupt())
++ wakeup_softirqd();
++}
++
++void __raise_softirq_irqoff(unsigned int nr)
++{
++ trace_softirq_raise(nr);
++ or_softirq_pending(1UL << nr);
++}
++
++static inline void local_bh_disable_nort(void) { local_bh_disable(); }
++static inline void _local_bh_enable_nort(void) { _local_bh_enable(); }
++static void ksoftirqd_set_sched_params(unsigned int cpu) { }
++static void ksoftirqd_clr_sched_params(unsigned int cpu, bool online) { }
++
++#else /* !PREEMPT_RT_FULL */
++
++/*
++ * On RT we serialize softirq execution with a cpu local lock per softirq
++ */
++static DEFINE_PER_CPU(struct local_irq_lock [NR_SOFTIRQS], local_softirq_locks);
++
++void __init softirq_early_init(void)
++{
++ int i;
++
++ for (i = 0; i < NR_SOFTIRQS; i++)
++ local_irq_lock_init(local_softirq_locks[i]);
++}
++
++static void lock_softirq(int which)
++{
++ local_lock(local_softirq_locks[which]);
++}
++
++static void unlock_softirq(int which)
++{
++ local_unlock(local_softirq_locks[which]);
++}
++
++static void do_single_softirq(int which)
++{
++ unsigned long old_flags = current->flags;
++
++ current->flags &= ~PF_MEMALLOC;
++ vtime_account_irq_enter(current);
++ current->flags |= PF_IN_SOFTIRQ;
++ lockdep_softirq_enter();
++ local_irq_enable();
++ handle_softirq(which);
++ local_irq_disable();
++ lockdep_softirq_exit();
++ current->flags &= ~PF_IN_SOFTIRQ;
++ vtime_account_irq_enter(current);
++ tsk_restore_flags(current, old_flags, PF_MEMALLOC);
++}
++
++/*
++ * Called with interrupts disabled. Process softirqs which were raised
++ * in current context (or on behalf of ksoftirqd).
++ */
++static void do_current_softirqs(void)
++{
++ while (current->softirqs_raised) {
++ int i = __ffs(current->softirqs_raised);
++ unsigned int pending, mask = (1U << i);
++
++ current->softirqs_raised &= ~mask;
++ local_irq_enable();
++
++ /*
++ * If the lock is contended, we boost the owner to
++ * process the softirq or leave the critical section
++ * now.
++ */
++ lock_softirq(i);
++ local_irq_disable();
++ softirq_set_runner(i);
++ /*
++ * Check with the local_softirq_pending() bits,
++ * whether we need to process this still or if someone
++ * else took care of it.
++ */
++ pending = local_softirq_pending();
++ if (pending & mask) {
++ set_softirq_pending(pending & ~mask);
++ do_single_softirq(i);
++ }
++ softirq_clr_runner(i);
++ unlock_softirq(i);
++ WARN_ON(current->softirq_nestcnt != 1);
++ }
++}
++
++static void __local_bh_disable(void)
++{
++ if (++current->softirq_nestcnt == 1)
++ migrate_disable();
++}
++
++void local_bh_disable(void)
++{
++ __local_bh_disable();
++}
++EXPORT_SYMBOL(local_bh_disable);
++
++void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
++{
++ __local_bh_disable();
++ if (cnt & PREEMPT_CHECK_OFFSET)
++ preempt_disable();
++}
++
++static void __local_bh_enable(void)
++{
++ if (WARN_ON(current->softirq_nestcnt == 0))
++ return;
++
++ local_irq_disable();
++ if (current->softirq_nestcnt == 1 && current->softirqs_raised)
++ do_current_softirqs();
++ local_irq_enable();
++
++ if (--current->softirq_nestcnt == 0)
++ migrate_enable();
++}
++
++void local_bh_enable(void)
++{
++ __local_bh_enable();
++}
++EXPORT_SYMBOL(local_bh_enable);
++
++extern void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
++{
++ __local_bh_enable();
++ if (cnt & PREEMPT_CHECK_OFFSET)
++ preempt_enable();
++}
++
++void local_bh_enable_ip(unsigned long ip)
++{
++ local_bh_enable();
++}
++EXPORT_SYMBOL(local_bh_enable_ip);
++
++int in_serving_softirq(void)
++{
++ return current->flags & PF_IN_SOFTIRQ;
++}
++EXPORT_SYMBOL(in_serving_softirq);
++
++/* Called with preemption disabled */
++static void run_ksoftirqd(unsigned int cpu)
++{
++ local_irq_disable();
++ current->softirq_nestcnt++;
++
++ do_current_softirqs();
++ current->softirq_nestcnt--;
++ rcu_note_context_switch();
++ local_irq_enable();
++}
++
++/*
++ * Called from netif_rx_ni(). Preemption enabled, but migration
++ * disabled. So the cpu can't go away under us.
++ */
++void thread_do_softirq(void)
++{
++ if (!in_serving_softirq() && current->softirqs_raised) {
++ current->softirq_nestcnt++;
++ do_current_softirqs();
++ current->softirq_nestcnt--;
++ }
++}
++
++static void do_raise_softirq_irqoff(unsigned int nr)
++{
++ trace_softirq_raise(nr);
++ or_softirq_pending(1UL << nr);
++
++ /*
++ * If we are not in a hard interrupt and inside a bh disabled
++ * region, we simply raise the flag on current. local_bh_enable()
++ * will make sure that the softirq is executed. Otherwise we
++ * delegate it to ksoftirqd.
++ */
++ if (!in_irq() && current->softirq_nestcnt)
++ current->softirqs_raised |= (1U << nr);
++ else if (__this_cpu_read(ksoftirqd))
++ __this_cpu_read(ksoftirqd)->softirqs_raised |= (1U << nr);
++}
++
++void __raise_softirq_irqoff(unsigned int nr)
++{
++ do_raise_softirq_irqoff(nr);
++ if (!in_irq() && !current->softirq_nestcnt)
++ wakeup_softirqd();
++}
++
++/*
++ * This function must run with irqs disabled!
++ */
++void raise_softirq_irqoff(unsigned int nr)
++{
++ do_raise_softirq_irqoff(nr);
++
++ /*
++ * If we're in an hard interrupt we let irq return code deal
++ * with the wakeup of ksoftirqd.
++ */
++ if (in_irq())
++ return;
++ /*
++ * If we are in thread context but outside of a bh disabled
++ * region, we need to wake ksoftirqd as well.
++ *
++ * CHECKME: Some of the places which do that could be wrapped
++ * into local_bh_disable/enable pairs. Though it's unclear
++ * whether this is worth the effort. To find those places just
++ * raise a WARN() if the condition is met.
++ */
++ if (!current->softirq_nestcnt)
++ wakeup_softirqd();
++}
++
++static inline int ksoftirqd_softirq_pending(void)
++{
++ return current->softirqs_raised;
++}
++
++static inline void local_bh_disable_nort(void) { }
++static inline void _local_bh_enable_nort(void) { }
++
++static inline void ksoftirqd_set_sched_params(unsigned int cpu)
++{
++ struct sched_param param = { .sched_priority = 1 };
++
++ sched_setscheduler(current, SCHED_FIFO, &param);
++ /* Take over all pending softirqs when starting */
++ local_irq_disable();
++ current->softirqs_raised = local_softirq_pending();
++ local_irq_enable();
++}
++
++static inline void ksoftirqd_clr_sched_params(unsigned int cpu, bool online)
++{
++ struct sched_param param = { .sched_priority = 0 };
++
++ sched_setscheduler(current, SCHED_NORMAL, &param);
++}
++
++#endif /* PREEMPT_RT_FULL */
++/*
+ * Enter an interrupt context.
+ */
+ void irq_enter(void)
+@@ -331,9 +725,9 @@ void irq_enter(void)
+ * Prevent raise_softirq from needlessly waking up ksoftirqd
+ * here, as softirq will be serviced on return from interrupt.
+ */
+- local_bh_disable();
++ local_bh_disable_nort();
+ tick_irq_enter();
+- _local_bh_enable();
++ _local_bh_enable_nort();
+ }
+
+ __irq_enter();
+@@ -341,6 +735,7 @@ void irq_enter(void)
+
+ static inline void invoke_softirq(void)
+ {
++#ifndef CONFIG_PREEMPT_RT_FULL
+ if (!force_irqthreads) {
+ #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
+ /*
+@@ -360,6 +755,15 @@ static inline void invoke_softirq(void)
+ } else {
+ wakeup_softirqd();
+ }
++#else /* PREEMPT_RT_FULL */
++ unsigned long flags;
++
++ local_irq_save(flags);
++ if (__this_cpu_read(ksoftirqd) &&
++ __this_cpu_read(ksoftirqd)->softirqs_raised)
++ wakeup_softirqd();
++ local_irq_restore(flags);
++#endif
+ }
+
+ static inline void tick_irq_exit(void)
+@@ -396,26 +800,6 @@ void irq_exit(void)
+ trace_hardirq_exit(); /* must be last! */
+ }
+
+-/*
+- * This function must run with irqs disabled!
+- */
+-inline void raise_softirq_irqoff(unsigned int nr)
+-{
+- __raise_softirq_irqoff(nr);
+-
+- /*
+- * If we're in an interrupt or softirq, we're done
+- * (this also catches softirq-disabled code). We will
+- * actually run the softirq once we return from
+- * the irq or softirq.
+- *
+- * Otherwise we wake up ksoftirqd to make sure we
+- * schedule the softirq soon.
+- */
+- if (!in_interrupt())
+- wakeup_softirqd();
+-}
+-
+ void raise_softirq(unsigned int nr)
+ {
+ unsigned long flags;
+@@ -425,12 +809,6 @@ void raise_softirq(unsigned int nr)
+ local_irq_restore(flags);
+ }
+
+-void __raise_softirq_irqoff(unsigned int nr)
+-{
+- trace_softirq_raise(nr);
+- or_softirq_pending(1UL << nr);
+-}
+-
+ void open_softirq(int nr, void (*action)(struct softirq_action *))
+ {
+ softirq_vec[nr].action = action;
+@@ -733,23 +1111,7 @@ EXPORT_SYMBOL(tasklet_unlock_wait);
+
+ static int ksoftirqd_should_run(unsigned int cpu)
+ {
+- return local_softirq_pending();
+-}
+-
+-static void run_ksoftirqd(unsigned int cpu)
+-{
+- local_irq_disable();
+- if (local_softirq_pending()) {
+- /*
+- * We can safely run softirq on inline stack, as we are not deep
+- * in the task stack here.
+- */
+- __do_softirq();
+- local_irq_enable();
+- cond_resched_rcu_qs();
+- return;
+- }
+- local_irq_enable();
++ return ksoftirqd_softirq_pending();
+ }
+
+ #ifdef CONFIG_HOTPLUG_CPU
+@@ -831,6 +1193,8 @@ static struct notifier_block cpu_nfb = {
+
+ static struct smp_hotplug_thread softirq_threads = {
+ .store = &ksoftirqd,
++ .setup = ksoftirqd_set_sched_params,
++ .cleanup = ksoftirqd_clr_sched_params,
+ .thread_should_run = ksoftirqd_should_run,
+ .thread_fn = run_ksoftirqd,
+ .thread_comm = "ksoftirqd/%u",
+--- a/kernel/time/tick-sched.c
++++ b/kernel/time/tick-sched.c
+@@ -764,14 +764,7 @@ static bool can_stop_idle_tick(int cpu,
+ return false;
+
+ if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
+- static int ratelimit;
+-
+- if (ratelimit < 10 &&
+- (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
+- pr_warn("NOHZ: local_softirq_pending %02x\n",
+- (unsigned int) local_softirq_pending());
+- ratelimit++;
+- }
++ softirq_check_pending_idle();
+ return false;
+ }
+
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -3437,11 +3437,9 @@ int netif_rx_ni(struct sk_buff *skb)
+
+ trace_netif_rx_ni_entry(skb);
+
+- preempt_disable();
++ local_bh_disable();
+ err = netif_rx_internal(skb);
+- if (local_softirq_pending())
+- do_softirq();
+- preempt_enable();
++ local_bh_enable();
+
+ return err;
+ }
diff --git a/patches/sparc64-use-generic-rwsem-spinlocks-rt.patch b/patches/sparc64-use-generic-rwsem-spinlocks-rt.patch
new file mode 100644
index 00000000000000..cb4a7da65c132b
--- /dev/null
+++ b/patches/sparc64-use-generic-rwsem-spinlocks-rt.patch
@@ -0,0 +1,28 @@
+From d6a6675d436897cd1b09e299436df3499abd753e Mon Sep 17 00:00:00 2001
+From: Allen Pais <allen.pais@oracle.com>
+Date: Fri, 13 Dec 2013 09:44:41 +0530
+Subject: [PATCH 1/3] sparc64: use generic rwsem spinlocks rt
+
+Signed-off-by: Allen Pais <allen.pais@oracle.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/sparc/Kconfig | 6 ++----
+ 1 file changed, 2 insertions(+), 4 deletions(-)
+
+--- a/arch/sparc/Kconfig
++++ b/arch/sparc/Kconfig
+@@ -189,12 +189,10 @@ config NR_CPUS
+ source kernel/Kconfig.hz
+
+ config RWSEM_GENERIC_SPINLOCK
+- bool
+- default y if SPARC32
++ def_bool PREEMPT_RT_FULL
+
+ config RWSEM_XCHGADD_ALGORITHM
+- bool
+- default y if SPARC64
++ def_bool !RWSEM_GENERIC_SPINLOCK && !PREEMPT_RT_FULL
+
+ config GENERIC_HWEIGHT
+ bool
diff --git a/patches/spinlock-types-separate-raw.patch b/patches/spinlock-types-separate-raw.patch
new file mode 100644
index 00000000000000..86476ccef7a501
--- /dev/null
+++ b/patches/spinlock-types-separate-raw.patch
@@ -0,0 +1,208 @@
+Subject: spinlock: Split the lock types header
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 29 Jun 2011 19:34:01 +0200
+
+Split raw_spinlock into its own file and the remaining spinlock_t into
+its own non-RT header. The non-RT header will be replaced later by sleeping
+spinlocks.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/rwlock_types.h | 4 +
+ include/linux/spinlock_types.h | 74 ------------------------------------
+ include/linux/spinlock_types_nort.h | 33 ++++++++++++++++
+ include/linux/spinlock_types_raw.h | 56 +++++++++++++++++++++++++++
+ 4 files changed, 95 insertions(+), 72 deletions(-)
+
+--- a/include/linux/rwlock_types.h
++++ b/include/linux/rwlock_types.h
+@@ -1,6 +1,10 @@
+ #ifndef __LINUX_RWLOCK_TYPES_H
+ #define __LINUX_RWLOCK_TYPES_H
+
++#if !defined(__LINUX_SPINLOCK_TYPES_H)
++# error "Do not include directly, include spinlock_types.h"
++#endif
++
+ /*
+ * include/linux/rwlock_types.h - generic rwlock type definitions
+ * and initializers
+--- a/include/linux/spinlock_types.h
++++ b/include/linux/spinlock_types.h
+@@ -9,79 +9,9 @@
+ * Released under the General Public License (GPL).
+ */
+
+-#if defined(CONFIG_SMP)
+-# include <asm/spinlock_types.h>
+-#else
+-# include <linux/spinlock_types_up.h>
+-#endif
++#include <linux/spinlock_types_raw.h>
+
+-#include <linux/lockdep.h>
+-
+-typedef struct raw_spinlock {
+- arch_spinlock_t raw_lock;
+-#ifdef CONFIG_GENERIC_LOCKBREAK
+- unsigned int break_lock;
+-#endif
+-#ifdef CONFIG_DEBUG_SPINLOCK
+- unsigned int magic, owner_cpu;
+- void *owner;
+-#endif
+-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+- struct lockdep_map dep_map;
+-#endif
+-} raw_spinlock_t;
+-
+-#define SPINLOCK_MAGIC 0xdead4ead
+-
+-#define SPINLOCK_OWNER_INIT ((void *)-1L)
+-
+-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+-# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }
+-#else
+-# define SPIN_DEP_MAP_INIT(lockname)
+-#endif
+-
+-#ifdef CONFIG_DEBUG_SPINLOCK
+-# define SPIN_DEBUG_INIT(lockname) \
+- .magic = SPINLOCK_MAGIC, \
+- .owner_cpu = -1, \
+- .owner = SPINLOCK_OWNER_INIT,
+-#else
+-# define SPIN_DEBUG_INIT(lockname)
+-#endif
+-
+-#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \
+- { \
+- .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \
+- SPIN_DEBUG_INIT(lockname) \
+- SPIN_DEP_MAP_INIT(lockname) }
+-
+-#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \
+- (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
+-
+-#define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
+-
+-typedef struct spinlock {
+- union {
+- struct raw_spinlock rlock;
+-
+-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+-# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map))
+- struct {
+- u8 __padding[LOCK_PADSIZE];
+- struct lockdep_map dep_map;
+- };
+-#endif
+- };
+-} spinlock_t;
+-
+-#define __SPIN_LOCK_INITIALIZER(lockname) \
+- { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
+-
+-#define __SPIN_LOCK_UNLOCKED(lockname) \
+- (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
+-
+-#define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
++#include <linux/spinlock_types_nort.h>
+
+ #include <linux/rwlock_types.h>
+
+--- /dev/null
++++ b/include/linux/spinlock_types_nort.h
+@@ -0,0 +1,33 @@
++#ifndef __LINUX_SPINLOCK_TYPES_NORT_H
++#define __LINUX_SPINLOCK_TYPES_NORT_H
++
++#ifndef __LINUX_SPINLOCK_TYPES_H
++#error "Do not include directly. Include spinlock_types.h instead"
++#endif
++
++/*
++ * The non RT version maps spinlocks to raw_spinlocks
++ */
++typedef struct spinlock {
++ union {
++ struct raw_spinlock rlock;
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map))
++ struct {
++ u8 __padding[LOCK_PADSIZE];
++ struct lockdep_map dep_map;
++ };
++#endif
++ };
++} spinlock_t;
++
++#define __SPIN_LOCK_INITIALIZER(lockname) \
++ { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
++
++#define __SPIN_LOCK_UNLOCKED(lockname) \
++ (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
++
++#define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
++
++#endif
+--- /dev/null
++++ b/include/linux/spinlock_types_raw.h
+@@ -0,0 +1,56 @@
++#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
++#define __LINUX_SPINLOCK_TYPES_RAW_H
++
++#if defined(CONFIG_SMP)
++# include <asm/spinlock_types.h>
++#else
++# include <linux/spinlock_types_up.h>
++#endif
++
++#include <linux/lockdep.h>
++
++typedef struct raw_spinlock {
++ arch_spinlock_t raw_lock;
++#ifdef CONFIG_GENERIC_LOCKBREAK
++ unsigned int break_lock;
++#endif
++#ifdef CONFIG_DEBUG_SPINLOCK
++ unsigned int magic, owner_cpu;
++ void *owner;
++#endif
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ struct lockdep_map dep_map;
++#endif
++} raw_spinlock_t;
++
++#define SPINLOCK_MAGIC 0xdead4ead
++
++#define SPINLOCK_OWNER_INIT ((void *)-1L)
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }
++#else
++# define SPIN_DEP_MAP_INIT(lockname)
++#endif
++
++#ifdef CONFIG_DEBUG_SPINLOCK
++# define SPIN_DEBUG_INIT(lockname) \
++ .magic = SPINLOCK_MAGIC, \
++ .owner_cpu = -1, \
++ .owner = SPINLOCK_OWNER_INIT,
++#else
++# define SPIN_DEBUG_INIT(lockname)
++#endif
++
++#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \
++ { \
++ .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \
++ SPIN_DEBUG_INIT(lockname) \
++ SPIN_DEP_MAP_INIT(lockname) }
++
++#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \
++ (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
++
++#define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
++
++#endif
diff --git a/patches/stomp-machine-create-lg_global_trylock_relax-primiti.patch b/patches/stomp-machine-create-lg_global_trylock_relax-primiti.patch
new file mode 100644
index 00000000000000..d8dcb4f01d05d7
--- /dev/null
+++ b/patches/stomp-machine-create-lg_global_trylock_relax-primiti.patch
@@ -0,0 +1,86 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Fri, 2 May 2014 13:13:22 +0200
+Subject: stomp-machine: create lg_global_trylock_relax() primitive
+
+Create lg_global_trylock_relax() for use by stopper thread when it cannot
+schedule, to deal with stop_cpus_lock, which is now an lglock.
+
+Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/lglock.h | 6 ++++++
+ include/linux/spinlock_rt.h | 1 +
+ kernel/locking/lglock.c | 25 +++++++++++++++++++++++++
+ kernel/locking/rtmutex.c | 5 +++++
+ 4 files changed, 37 insertions(+)
+
+--- a/include/linux/lglock.h
++++ b/include/linux/lglock.h
+@@ -76,6 +76,12 @@ void lg_local_unlock_cpu(struct lglock *
+ void lg_global_lock(struct lglock *lg);
+ void lg_global_unlock(struct lglock *lg);
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++#define lg_global_trylock_relax(name) lg_global_lock(name)
++#else
++void lg_global_trylock_relax(struct lglock *lg);
++#endif
++
+ #else
+ /* When !CONFIG_SMP, map lglock to spinlock */
+ #define lglock spinlock
+--- a/include/linux/spinlock_rt.h
++++ b/include/linux/spinlock_rt.h
+@@ -34,6 +34,7 @@ extern int atomic_dec_and_spin_lock(atom
+ */
+ extern void __lockfunc __rt_spin_lock(struct rt_mutex *lock);
+ extern void __lockfunc __rt_spin_unlock(struct rt_mutex *lock);
++extern int __lockfunc __rt_spin_trylock(struct rt_mutex *lock);
+
+ #define spin_lock(lock) \
+ do { \
+--- a/kernel/locking/lglock.c
++++ b/kernel/locking/lglock.c
+@@ -105,3 +105,28 @@ void lg_global_unlock(struct lglock *lg)
+ preempt_enable_nort();
+ }
+ EXPORT_SYMBOL(lg_global_unlock);
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++/*
++ * HACK: If you use this, you get to keep the pieces.
++ * Used in queue_stop_cpus_work() when stop machinery
++ * is called from inactive CPU, so we can't schedule.
++ */
++# define lg_do_trylock_relax(l) \
++ do { \
++ while (!__rt_spin_trylock(l)) \
++ cpu_relax(); \
++ } while (0)
++
++void lg_global_trylock_relax(struct lglock *lg)
++{
++ int i;
++
++ lock_acquire_exclusive(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
++ for_each_possible_cpu(i) {
++ lg_lock_ptr *lock;
++ lock = per_cpu_ptr(lg->lock, i);
++ lg_do_trylock_relax(lock);
++ }
++}
++#endif
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -1133,6 +1133,11 @@ void __lockfunc rt_spin_unlock_wait(spin
+ }
+ EXPORT_SYMBOL(rt_spin_unlock_wait);
+
++int __lockfunc __rt_spin_trylock(struct rt_mutex *lock)
++{
++ return rt_mutex_trylock(lock);
++}
++
+ int __lockfunc rt_spin_trylock(spinlock_t *lock)
+ {
+ int ret = rt_mutex_trylock(&lock->lock);
diff --git a/patches/stomp-machine-use-lg_global_trylock_relax-to-dead-wi.patch b/patches/stomp-machine-use-lg_global_trylock_relax-to-dead-wi.patch
new file mode 100644
index 00000000000000..7a194d728495be
--- /dev/null
+++ b/patches/stomp-machine-use-lg_global_trylock_relax-to-dead-wi.patch
@@ -0,0 +1,100 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Fri, 2 May 2014 13:13:34 +0200
+Subject: stomp-machine: use lg_global_trylock_relax() to dead with stop_cpus_lock lglock
+
+If the stop machinery is called from inactive CPU we cannot use
+lg_global_lock(), because some other stomp machine invocation might be
+in progress and the lock can be contended. We cannot schedule from this
+context, so use the lovely new lg_global_trylock_relax() primitive to
+do what we used to do via one mutex_trylock()/cpu_relax() loop. We
+now do that trylock()/relax() across an entire herd of locks. Joy.
+
+Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/stop_machine.c | 24 ++++++++++++++----------
+ 1 file changed, 14 insertions(+), 10 deletions(-)
+
+--- a/kernel/stop_machine.c
++++ b/kernel/stop_machine.c
+@@ -266,7 +266,7 @@ int stop_two_cpus(unsigned int cpu1, uns
+ struct irq_cpu_stop_queue_work_info call_args;
+ struct multi_stop_data msdata;
+
+- preempt_disable();
++ preempt_disable_nort();
+ msdata = (struct multi_stop_data){
+ .fn = fn,
+ .data = arg,
+@@ -299,7 +299,7 @@ int stop_two_cpus(unsigned int cpu1, uns
+ * This relies on the stopper workqueues to be FIFO.
+ */
+ if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
+- preempt_enable();
++ preempt_enable_nort();
+ return -ENOENT;
+ }
+
+@@ -313,7 +313,7 @@ int stop_two_cpus(unsigned int cpu1, uns
+ &irq_cpu_stop_queue_work,
+ &call_args, 1);
+ lg_local_unlock(&stop_cpus_lock);
+- preempt_enable();
++ preempt_enable_nort();
+
+ wait_for_stop_done(&done);
+
+@@ -347,7 +347,7 @@ static DEFINE_PER_CPU(struct cpu_stop_wo
+
+ static void queue_stop_cpus_work(const struct cpumask *cpumask,
+ cpu_stop_fn_t fn, void *arg,
+- struct cpu_stop_done *done)
++ struct cpu_stop_done *done, bool inactive)
+ {
+ struct cpu_stop_work *work;
+ unsigned int cpu;
+@@ -361,11 +361,13 @@ static void queue_stop_cpus_work(const s
+ }
+
+ /*
+- * Disable preemption while queueing to avoid getting
+- * preempted by a stopper which might wait for other stoppers
+- * to enter @fn which can lead to deadlock.
++ * Make sure that all work is queued on all cpus before
++ * any of the cpus can execute it.
+ */
+- lg_global_lock(&stop_cpus_lock);
++ if (!inactive)
++ lg_global_lock(&stop_cpus_lock);
++ else
++ lg_global_trylock_relax(&stop_cpus_lock);
+ for_each_cpu(cpu, cpumask)
+ cpu_stop_queue_work(cpu, &per_cpu(stop_cpus_work, cpu));
+ lg_global_unlock(&stop_cpus_lock);
+@@ -377,7 +379,7 @@ static int __stop_cpus(const struct cpum
+ struct cpu_stop_done done;
+
+ cpu_stop_init_done(&done, cpumask_weight(cpumask));
+- queue_stop_cpus_work(cpumask, fn, arg, &done);
++ queue_stop_cpus_work(cpumask, fn, arg, &done, false);
+ wait_for_stop_done(&done);
+ return done.executed ? done.ret : -ENOENT;
+ }
+@@ -573,6 +575,8 @@ static int __init cpu_stop_init(void)
+ INIT_LIST_HEAD(&stopper->works);
+ }
+
++ lg_lock_init(&stop_cpus_lock, "stop_cpus_lock");
++
+ BUG_ON(smpboot_register_percpu_thread(&cpu_stop_threads));
+ stop_machine_initialized = true;
+ return 0;
+@@ -668,7 +672,7 @@ int stop_machine_from_inactive_cpu(int (
+ set_state(&msdata, MULTI_STOP_PREPARE);
+ cpu_stop_init_done(&done, num_active_cpus());
+ queue_stop_cpus_work(cpu_active_mask, multi_cpu_stop, &msdata,
+- &done);
++ &done, true);
+ ret = multi_cpu_stop(&msdata);
+
+ /* Busy wait for completion. */
diff --git a/patches/stop-machine-raw-lock.patch b/patches/stop-machine-raw-lock.patch
new file mode 100644
index 00000000000000..ca28bc805664a6
--- /dev/null
+++ b/patches/stop-machine-raw-lock.patch
@@ -0,0 +1,196 @@
+Subject: stop_machine: Use raw spinlocks
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 29 Jun 2011 11:01:51 +0200
+
+Use raw-locks in stomp_machine() to allow locking in irq-off regions.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/stop_machine.c | 64 ++++++++++++++++++++++++++++++++++----------------
+ 1 file changed, 44 insertions(+), 20 deletions(-)
+
+--- a/kernel/stop_machine.c
++++ b/kernel/stop_machine.c
+@@ -30,12 +30,12 @@ struct cpu_stop_done {
+ atomic_t nr_todo; /* nr left to execute */
+ bool executed; /* actually executed? */
+ int ret; /* collected return value */
+- struct completion completion; /* fired if nr_todo reaches 0 */
++ struct task_struct *waiter; /* woken when nr_todo reaches 0 */
+ };
+
+ /* the actual stopper, one per every possible cpu, enabled on online cpus */
+ struct cpu_stopper {
+- spinlock_t lock;
++ raw_spinlock_t lock;
+ bool enabled; /* is this stopper enabled? */
+ struct list_head works; /* list of pending works */
+ };
+@@ -56,7 +56,7 @@ static void cpu_stop_init_done(struct cp
+ {
+ memset(done, 0, sizeof(*done));
+ atomic_set(&done->nr_todo, nr_todo);
+- init_completion(&done->completion);
++ done->waiter = current;
+ }
+
+ /* signal completion unless @done is NULL */
+@@ -65,8 +65,10 @@ static void cpu_stop_signal_done(struct
+ if (done) {
+ if (executed)
+ done->executed = true;
+- if (atomic_dec_and_test(&done->nr_todo))
+- complete(&done->completion);
++ if (atomic_dec_and_test(&done->nr_todo)) {
++ wake_up_process(done->waiter);
++ done->waiter = NULL;
++ }
+ }
+ }
+
+@@ -78,7 +80,7 @@ static void cpu_stop_queue_work(unsigned
+
+ unsigned long flags;
+
+- spin_lock_irqsave(&stopper->lock, flags);
++ raw_spin_lock_irqsave(&stopper->lock, flags);
+
+ if (stopper->enabled) {
+ list_add_tail(&work->list, &stopper->works);
+@@ -86,7 +88,23 @@ static void cpu_stop_queue_work(unsigned
+ } else
+ cpu_stop_signal_done(work->done, false);
+
+- spin_unlock_irqrestore(&stopper->lock, flags);
++ raw_spin_unlock_irqrestore(&stopper->lock, flags);
++}
++
++static void wait_for_stop_done(struct cpu_stop_done *done)
++{
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ while (atomic_read(&done->nr_todo)) {
++ schedule();
++ set_current_state(TASK_UNINTERRUPTIBLE);
++ }
++ /*
++ * We need to wait until cpu_stop_signal_done() has cleared
++ * done->waiter.
++ */
++ while (done->waiter)
++ cpu_relax();
++ set_current_state(TASK_RUNNING);
+ }
+
+ /**
+@@ -120,7 +138,7 @@ int stop_one_cpu(unsigned int cpu, cpu_s
+
+ cpu_stop_init_done(&done, 1);
+ cpu_stop_queue_work(cpu, &work);
+- wait_for_completion(&done.completion);
++ wait_for_stop_done(&done);
+ return done.executed ? done.ret : -ENOENT;
+ }
+
+@@ -297,7 +315,7 @@ int stop_two_cpus(unsigned int cpu1, uns
+ lg_local_unlock(&stop_cpus_lock);
+ preempt_enable();
+
+- wait_for_completion(&done.completion);
++ wait_for_stop_done(&done);
+
+ return done.executed ? done.ret : -ENOENT;
+ }
+@@ -360,7 +378,7 @@ static int __stop_cpus(const struct cpum
+
+ cpu_stop_init_done(&done, cpumask_weight(cpumask));
+ queue_stop_cpus_work(cpumask, fn, arg, &done);
+- wait_for_completion(&done.completion);
++ wait_for_stop_done(&done);
+ return done.executed ? done.ret : -ENOENT;
+ }
+
+@@ -439,9 +457,9 @@ static int cpu_stop_should_run(unsigned
+ unsigned long flags;
+ int run;
+
+- spin_lock_irqsave(&stopper->lock, flags);
++ raw_spin_lock_irqsave(&stopper->lock, flags);
+ run = !list_empty(&stopper->works);
+- spin_unlock_irqrestore(&stopper->lock, flags);
++ raw_spin_unlock_irqrestore(&stopper->lock, flags);
+ return run;
+ }
+
+@@ -453,13 +471,13 @@ static void cpu_stopper_thread(unsigned
+
+ repeat:
+ work = NULL;
+- spin_lock_irq(&stopper->lock);
++ raw_spin_lock_irq(&stopper->lock);
+ if (!list_empty(&stopper->works)) {
+ work = list_first_entry(&stopper->works,
+ struct cpu_stop_work, list);
+ list_del_init(&work->list);
+ }
+- spin_unlock_irq(&stopper->lock);
++ raw_spin_unlock_irq(&stopper->lock);
+
+ if (work) {
+ cpu_stop_fn_t fn = work->fn;
+@@ -491,7 +509,13 @@ static void cpu_stopper_thread(unsigned
+ kallsyms_lookup((unsigned long)fn, NULL, NULL, NULL,
+ ksym_buf), arg);
+
++ /*
++ * Make sure that the wakeup and setting done->waiter
++ * to NULL is atomic.
++ */
++ local_irq_disable();
+ cpu_stop_signal_done(done, true);
++ local_irq_enable();
+ goto repeat;
+ }
+ }
+@@ -510,20 +534,20 @@ static void cpu_stop_park(unsigned int c
+ unsigned long flags;
+
+ /* drain remaining works */
+- spin_lock_irqsave(&stopper->lock, flags);
++ raw_spin_lock_irqsave(&stopper->lock, flags);
+ list_for_each_entry(work, &stopper->works, list)
+ cpu_stop_signal_done(work->done, false);
+ stopper->enabled = false;
+- spin_unlock_irqrestore(&stopper->lock, flags);
++ raw_spin_unlock_irqrestore(&stopper->lock, flags);
+ }
+
+ static void cpu_stop_unpark(unsigned int cpu)
+ {
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+- spin_lock_irq(&stopper->lock);
++ raw_spin_lock_irq(&stopper->lock);
+ stopper->enabled = true;
+- spin_unlock_irq(&stopper->lock);
++ raw_spin_unlock_irq(&stopper->lock);
+ }
+
+ static struct smp_hotplug_thread cpu_stop_threads = {
+@@ -545,7 +569,7 @@ static int __init cpu_stop_init(void)
+ for_each_possible_cpu(cpu) {
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+- spin_lock_init(&stopper->lock);
++ raw_spin_lock_init(&stopper->lock);
+ INIT_LIST_HEAD(&stopper->works);
+ }
+
+@@ -648,7 +672,7 @@ int stop_machine_from_inactive_cpu(int (
+ ret = multi_cpu_stop(&msdata);
+
+ /* Busy wait for completion. */
+- while (!completion_done(&done.completion))
++ while (atomic_read(&done.nr_todo))
+ cpu_relax();
+
+ mutex_unlock(&stop_cpus_mutex);
diff --git a/patches/stop_machine-convert-stop_machine_run-to-PREEMPT_RT.patch b/patches/stop_machine-convert-stop_machine_run-to-PREEMPT_RT.patch
new file mode 100644
index 00000000000000..4ec6647893c029
--- /dev/null
+++ b/patches/stop_machine-convert-stop_machine_run-to-PREEMPT_RT.patch
@@ -0,0 +1,34 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:30:27 -0500
+Subject: stop_machine: convert stop_machine_run() to PREEMPT_RT
+
+Instead of playing with non-preemption, introduce explicit
+startup serialization. This is more robust and cleaner as
+well.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+[bigeasy: XXX: stopper_lock -> stop_cpus_lock]
+---
+ kernel/stop_machine.c | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+--- a/kernel/stop_machine.c
++++ b/kernel/stop_machine.c
+@@ -467,6 +467,16 @@ static void cpu_stopper_thread(unsigned
+ struct cpu_stop_done *done = work->done;
+ char ksym_buf[KSYM_NAME_LEN] __maybe_unused;
+
++ /*
++ * Wait until the stopper finished scheduling on all
++ * cpus
++ */
++ lg_global_lock(&stop_cpus_lock);
++ /*
++ * Let other cpu threads continue as well
++ */
++ lg_global_unlock(&stop_cpus_lock);
++
+ /* cpu stop callbacks are not allowed to sleep */
+ preempt_disable();
+
diff --git a/patches/sunrpc-make-svc_xprt_do_enqueue-use-get_cpu_light.patch b/patches/sunrpc-make-svc_xprt_do_enqueue-use-get_cpu_light.patch
new file mode 100644
index 00000000000000..91085dd857b73f
--- /dev/null
+++ b/patches/sunrpc-make-svc_xprt_do_enqueue-use-get_cpu_light.patch
@@ -0,0 +1,62 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Wed, 18 Feb 2015 16:05:28 +0100
+Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
+
+|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
+|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
+|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
+|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
+|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
+| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
+| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
+| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
+|Call Trace:
+| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
+| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
+| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
+| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
+| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
+| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
+| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
+| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
+| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
+| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
+| [<ffffffff8117f889>] SyS_write+0x49/0xb0
+| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
+
+
+Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ net/sunrpc/svc_xprt.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/net/sunrpc/svc_xprt.c
++++ b/net/sunrpc/svc_xprt.c
+@@ -341,7 +341,7 @@ static void svc_xprt_do_enqueue(struct s
+ goto out;
+ }
+
+- cpu = get_cpu();
++ cpu = get_cpu_light();
+ pool = svc_pool_for_cpu(xprt->xpt_server, cpu);
+
+ atomic_long_inc(&pool->sp_stats.packets);
+@@ -377,7 +377,7 @@ static void svc_xprt_do_enqueue(struct s
+
+ atomic_long_inc(&pool->sp_stats.threads_woken);
+ wake_up_process(rqstp->rq_task);
+- put_cpu();
++ put_cpu_light();
+ goto out;
+ }
+ rcu_read_unlock();
+@@ -398,7 +398,7 @@ static void svc_xprt_do_enqueue(struct s
+ goto redo_search;
+ }
+ rqstp = NULL;
+- put_cpu();
++ put_cpu_light();
+ out:
+ trace_svc_xprt_do_enqueue(xprt, rqstp);
+ }
diff --git a/patches/suspend-prevernt-might-sleep-splats.patch b/patches/suspend-prevernt-might-sleep-splats.patch
new file mode 100644
index 00000000000000..3aeb8501f35ab2
--- /dev/null
+++ b/patches/suspend-prevernt-might-sleep-splats.patch
@@ -0,0 +1,106 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 15 Jul 2010 10:29:00 +0200
+Subject: suspend: Prevent might sleep splats
+
+timekeeping suspend/resume calls read_persistant_clock() which takes
+rtc_lock. That results in might sleep warnings because at that point
+we run with interrupts disabled.
+
+We cannot convert rtc_lock to a raw spinlock as that would trigger
+other might sleep warnings.
+
+As a temporary workaround we disable the might sleep warnings by
+setting system_state to SYSTEM_SUSPEND before calling sysdev_suspend()
+and restoring it to SYSTEM_RUNNING afer sysdev_resume().
+
+Needs to be revisited.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/kernel.h | 1 +
+ kernel/power/hibernate.c | 7 +++++++
+ kernel/power/suspend.c | 4 ++++
+ 3 files changed, 12 insertions(+)
+
+--- a/include/linux/kernel.h
++++ b/include/linux/kernel.h
+@@ -467,6 +467,7 @@ extern enum system_states {
+ SYSTEM_HALT,
+ SYSTEM_POWER_OFF,
+ SYSTEM_RESTART,
++ SYSTEM_SUSPEND,
+ } system_state;
+
+ #define TAINT_PROPRIETARY_MODULE 0
+--- a/kernel/power/hibernate.c
++++ b/kernel/power/hibernate.c
+@@ -285,6 +285,8 @@ static int create_image(int platform_mod
+
+ local_irq_disable();
+
++ system_state = SYSTEM_SUSPEND;
++
+ error = syscore_suspend();
+ if (error) {
+ printk(KERN_ERR "PM: Some system devices failed to power down, "
+@@ -314,6 +316,7 @@ static int create_image(int platform_mod
+ syscore_resume();
+
+ Enable_irqs:
++ system_state = SYSTEM_RUNNING;
+ local_irq_enable();
+
+ Enable_cpus:
+@@ -437,6 +440,7 @@ static int resume_target_kernel(bool pla
+ goto Enable_cpus;
+
+ local_irq_disable();
++ system_state = SYSTEM_SUSPEND;
+
+ error = syscore_suspend();
+ if (error)
+@@ -470,6 +474,7 @@ static int resume_target_kernel(bool pla
+ syscore_resume();
+
+ Enable_irqs:
++ system_state = SYSTEM_RUNNING;
+ local_irq_enable();
+
+ Enable_cpus:
+@@ -555,6 +560,7 @@ int hibernation_platform_enter(void)
+ goto Platform_finish;
+
+ local_irq_disable();
++ system_state = SYSTEM_SUSPEND;
+ syscore_suspend();
+ if (pm_wakeup_pending()) {
+ error = -EAGAIN;
+@@ -567,6 +573,7 @@ int hibernation_platform_enter(void)
+
+ Power_up:
+ syscore_resume();
++ system_state = SYSTEM_RUNNING;
+ local_irq_enable();
+ enable_nonboot_cpus();
+
+--- a/kernel/power/suspend.c
++++ b/kernel/power/suspend.c
+@@ -356,6 +356,8 @@ static int suspend_enter(suspend_state_t
+ arch_suspend_disable_irqs();
+ BUG_ON(!irqs_disabled());
+
++ system_state = SYSTEM_SUSPEND;
++
+ error = syscore_suspend();
+ if (!error) {
+ *wakeup = pm_wakeup_pending();
+@@ -370,6 +372,8 @@ static int suspend_enter(suspend_state_t
+ syscore_resume();
+ }
+
++ system_state = SYSTEM_RUNNING;
++
+ arch_suspend_enable_irqs();
+ BUG_ON(irqs_disabled());
+
diff --git a/patches/sysfs-realtime-entry.patch b/patches/sysfs-realtime-entry.patch
new file mode 100644
index 00000000000000..1231bed700866a
--- /dev/null
+++ b/patches/sysfs-realtime-entry.patch
@@ -0,0 +1,47 @@
+Subject: sysfs: Add /sys/kernel/realtime entry
+From: Clark Williams <williams@redhat.com>
+Date: Sat Jul 30 21:55:53 2011 -0500
+
+Add a /sys/kernel entry to indicate that the kernel is a
+realtime kernel.
+
+Clark says that he needs this for udev rules, udev needs to evaluate
+if its a PREEMPT_RT kernel a few thousand times and parsing uname
+output is too slow or so.
+
+Are there better solutions? Should it exist and return 0 on !-rt?
+
+Signed-off-by: Clark Williams <williams@redhat.com>
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+---
+ kernel/ksysfs.c | 12 ++++++++++++
+ 1 file changed, 12 insertions(+)
+
+--- a/kernel/ksysfs.c
++++ b/kernel/ksysfs.c
+@@ -136,6 +136,15 @@ KERNEL_ATTR_RO(vmcoreinfo);
+
+ #endif /* CONFIG_KEXEC */
+
++#if defined(CONFIG_PREEMPT_RT_FULL)
++static ssize_t realtime_show(struct kobject *kobj,
++ struct kobj_attribute *attr, char *buf)
++{
++ return sprintf(buf, "%d\n", 1);
++}
++KERNEL_ATTR_RO(realtime);
++#endif
++
+ /* whether file capabilities are enabled */
+ static ssize_t fscaps_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+@@ -203,6 +212,9 @@ static struct attribute * kernel_attrs[]
+ &vmcoreinfo_attr.attr,
+ #endif
+ &rcu_expedited_attr.attr,
++#ifdef CONFIG_PREEMPT_RT_FULL
++ &realtime_attr.attr,
++#endif
+ NULL
+ };
+
diff --git a/patches/tasklet-rt-prevent-tasklets-from-going-into-infinite-spin-in-rt.patch b/patches/tasklet-rt-prevent-tasklets-from-going-into-infinite-spin-in-rt.patch
new file mode 100644
index 00000000000000..51307efea09e35
--- /dev/null
+++ b/patches/tasklet-rt-prevent-tasklets-from-going-into-infinite-spin-in-rt.patch
@@ -0,0 +1,391 @@
+Subject: tasklet: Prevent tasklets from going into infinite spin in RT
+From: Ingo Molnar <mingo@elte.hu>
+Date: Tue Nov 29 20:18:22 2011 -0500
+
+When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads,
+and spinlocks turn are mutexes. But this can cause issues with
+tasks disabling tasklets. A tasklet runs under ksoftirqd, and
+if a tasklets are disabled with tasklet_disable(), the tasklet
+count is increased. When a tasklet runs, it checks this counter
+and if it is set, it adds itself back on the softirq queue and
+returns.
+
+The problem arises in RT because ksoftirq will see that a softirq
+is ready to run (the tasklet softirq just re-armed itself), and will
+not sleep, but instead run the softirqs again. The tasklet softirq
+will still see that the count is non-zero and will not execute
+the tasklet and requeue itself on the softirq again, which will
+cause ksoftirqd to run it again and again and again.
+
+It gets worse because ksoftirqd runs as a real-time thread.
+If it preempted the task that disabled tasklets, and that task
+has migration disabled, or can't run for other reasons, the tasklet
+softirq will never run because the count will never be zero, and
+ksoftirqd will go into an infinite loop. As an RT task, it this
+becomes a big problem.
+
+This is a hack solution to have tasklet_disable stop tasklets, and
+when a tasklet runs, instead of requeueing the tasklet softirqd
+it delays it. When tasklet_enable() is called, and tasklets are
+waiting, then the tasklet_enable() will kick the tasklets to continue.
+This prevents the lock up from ksoftirq going into an infinite loop.
+
+[ rostedt@goodmis.org: ported to 3.0-rt ]
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/interrupt.h | 33 ++++---
+ kernel/softirq.c | 201 ++++++++++++++++++++++++++++++++--------------
+ 2 files changed, 162 insertions(+), 72 deletions(-)
+
+--- a/include/linux/interrupt.h
++++ b/include/linux/interrupt.h
+@@ -479,8 +479,9 @@ static inline struct task_struct *this_c
+ to be executed on some cpu at least once after this.
+ * If the tasklet is already scheduled, but its execution is still not
+ started, it will be executed only once.
+- * If this tasklet is already running on another CPU (or schedule is called
+- from tasklet itself), it is rescheduled for later.
++ * If this tasklet is already running on another CPU, it is rescheduled
++ for later.
++ * Schedule must not be called from the tasklet itself (a lockup occurs)
+ * Tasklet is strictly serialized wrt itself, but not
+ wrt another tasklets. If client needs some intertask synchronization,
+ he makes it with spinlocks.
+@@ -505,27 +506,36 @@ struct tasklet_struct name = { NULL, 0,
+ enum
+ {
+ TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */
+- TASKLET_STATE_RUN /* Tasklet is running (SMP only) */
++ TASKLET_STATE_RUN, /* Tasklet is running (SMP only) */
++ TASKLET_STATE_PENDING /* Tasklet is pending */
+ };
+
+-#ifdef CONFIG_SMP
++#define TASKLET_STATEF_SCHED (1 << TASKLET_STATE_SCHED)
++#define TASKLET_STATEF_RUN (1 << TASKLET_STATE_RUN)
++#define TASKLET_STATEF_PENDING (1 << TASKLET_STATE_PENDING)
++
++#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
+ static inline int tasklet_trylock(struct tasklet_struct *t)
+ {
+ return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
+ }
+
++static inline int tasklet_tryunlock(struct tasklet_struct *t)
++{
++ return cmpxchg(&t->state, TASKLET_STATEF_RUN, 0) == TASKLET_STATEF_RUN;
++}
++
+ static inline void tasklet_unlock(struct tasklet_struct *t)
+ {
+ smp_mb__before_atomic();
+ clear_bit(TASKLET_STATE_RUN, &(t)->state);
+ }
+
+-static inline void tasklet_unlock_wait(struct tasklet_struct *t)
+-{
+- while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
+-}
++extern void tasklet_unlock_wait(struct tasklet_struct *t);
++
+ #else
+ #define tasklet_trylock(t) 1
++#define tasklet_tryunlock(t) 1
+ #define tasklet_unlock_wait(t) do { } while (0)
+ #define tasklet_unlock(t) do { } while (0)
+ #endif
+@@ -574,12 +584,7 @@ static inline void tasklet_disable(struc
+ smp_mb();
+ }
+
+-static inline void tasklet_enable(struct tasklet_struct *t)
+-{
+- smp_mb__before_atomic();
+- atomic_dec(&t->count);
+-}
+-
++extern void tasklet_enable(struct tasklet_struct *t);
+ extern void tasklet_kill(struct tasklet_struct *t);
+ extern void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu);
+ extern void tasklet_init(struct tasklet_struct *t,
+--- a/kernel/softirq.c
++++ b/kernel/softirq.c
+@@ -21,6 +21,7 @@
+ #include <linux/freezer.h>
+ #include <linux/kthread.h>
+ #include <linux/rcupdate.h>
++#include <linux/delay.h>
+ #include <linux/ftrace.h>
+ #include <linux/smp.h>
+ #include <linux/smpboot.h>
+@@ -446,15 +447,45 @@ struct tasklet_head {
+ static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec);
+ static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec);
+
++static void inline
++__tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
++{
++ if (tasklet_trylock(t)) {
++again:
++ /* We may have been preempted before tasklet_trylock
++ * and __tasklet_action may have already run.
++ * So double check the sched bit while the takslet
++ * is locked before adding it to the list.
++ */
++ if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
++ t->next = NULL;
++ *head->tail = t;
++ head->tail = &(t->next);
++ raise_softirq_irqoff(nr);
++ tasklet_unlock(t);
++ } else {
++ /* This is subtle. If we hit the corner case above
++ * It is possible that we get preempted right here,
++ * and another task has successfully called
++ * tasklet_schedule(), then this function, and
++ * failed on the trylock. Thus we must be sure
++ * before releasing the tasklet lock, that the
++ * SCHED_BIT is clear. Otherwise the tasklet
++ * may get its SCHED_BIT set, but not added to the
++ * list
++ */
++ if (!tasklet_tryunlock(t))
++ goto again;
++ }
++ }
++}
++
+ void __tasklet_schedule(struct tasklet_struct *t)
+ {
+ unsigned long flags;
+
+ local_irq_save(flags);
+- t->next = NULL;
+- *__this_cpu_read(tasklet_vec.tail) = t;
+- __this_cpu_write(tasklet_vec.tail, &(t->next));
+- raise_softirq_irqoff(TASKLET_SOFTIRQ);
++ __tasklet_common_schedule(t, this_cpu_ptr(&tasklet_vec), TASKLET_SOFTIRQ);
+ local_irq_restore(flags);
+ }
+ EXPORT_SYMBOL(__tasklet_schedule);
+@@ -464,10 +495,7 @@ void __tasklet_hi_schedule(struct taskle
+ unsigned long flags;
+
+ local_irq_save(flags);
+- t->next = NULL;
+- *__this_cpu_read(tasklet_hi_vec.tail) = t;
+- __this_cpu_write(tasklet_hi_vec.tail, &(t->next));
+- raise_softirq_irqoff(HI_SOFTIRQ);
++ __tasklet_common_schedule(t, this_cpu_ptr(&tasklet_hi_vec), HI_SOFTIRQ);
+ local_irq_restore(flags);
+ }
+ EXPORT_SYMBOL(__tasklet_hi_schedule);
+@@ -476,82 +504,122 @@ void __tasklet_hi_schedule_first(struct
+ {
+ BUG_ON(!irqs_disabled());
+
+- t->next = __this_cpu_read(tasklet_hi_vec.head);
+- __this_cpu_write(tasklet_hi_vec.head, t);
+- __raise_softirq_irqoff(HI_SOFTIRQ);
++ __tasklet_hi_schedule(t);
+ }
+ EXPORT_SYMBOL(__tasklet_hi_schedule_first);
+
+-static void tasklet_action(struct softirq_action *a)
++void tasklet_enable(struct tasklet_struct *t)
+ {
+- struct tasklet_struct *list;
++ if (!atomic_dec_and_test(&t->count))
++ return;
++ if (test_and_clear_bit(TASKLET_STATE_PENDING, &t->state))
++ tasklet_schedule(t);
++}
++EXPORT_SYMBOL(tasklet_enable);
+
+- local_irq_disable();
+- list = __this_cpu_read(tasklet_vec.head);
+- __this_cpu_write(tasklet_vec.head, NULL);
+- __this_cpu_write(tasklet_vec.tail, this_cpu_ptr(&tasklet_vec.head));
+- local_irq_enable();
++static void __tasklet_action(struct softirq_action *a,
++ struct tasklet_struct *list)
++{
++ int loops = 1000000;
+
+ while (list) {
+ struct tasklet_struct *t = list;
+
+ list = list->next;
+
+- if (tasklet_trylock(t)) {
+- if (!atomic_read(&t->count)) {
+- if (!test_and_clear_bit(TASKLET_STATE_SCHED,
+- &t->state))
+- BUG();
+- t->func(t->data);
+- tasklet_unlock(t);
+- continue;
+- }
+- tasklet_unlock(t);
++ /*
++ * Should always succeed - after a tasklist got on the
++ * list (after getting the SCHED bit set from 0 to 1),
++ * nothing but the tasklet softirq it got queued to can
++ * lock it:
++ */
++ if (!tasklet_trylock(t)) {
++ WARN_ON(1);
++ continue;
+ }
+
+- local_irq_disable();
+ t->next = NULL;
+- *__this_cpu_read(tasklet_vec.tail) = t;
+- __this_cpu_write(tasklet_vec.tail, &(t->next));
+- __raise_softirq_irqoff(TASKLET_SOFTIRQ);
+- local_irq_enable();
++
++ /*
++ * If we cannot handle the tasklet because it's disabled,
++ * mark it as pending. tasklet_enable() will later
++ * re-schedule the tasklet.
++ */
++ if (unlikely(atomic_read(&t->count))) {
++out_disabled:
++ /* implicit unlock: */
++ wmb();
++ t->state = TASKLET_STATEF_PENDING;
++ continue;
++ }
++
++ /*
++ * After this point on the tasklet might be rescheduled
++ * on another CPU, but it can only be added to another
++ * CPU's tasklet list if we unlock the tasklet (which we
++ * dont do yet).
++ */
++ if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
++ WARN_ON(1);
++
++again:
++ t->func(t->data);
++
++ /*
++ * Try to unlock the tasklet. We must use cmpxchg, because
++ * another CPU might have scheduled or disabled the tasklet.
++ * We only allow the STATE_RUN -> 0 transition here.
++ */
++ while (!tasklet_tryunlock(t)) {
++ /*
++ * If it got disabled meanwhile, bail out:
++ */
++ if (atomic_read(&t->count))
++ goto out_disabled;
++ /*
++ * If it got scheduled meanwhile, re-execute
++ * the tasklet function:
++ */
++ if (test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
++ goto again;
++ if (!--loops) {
++ printk("hm, tasklet state: %08lx\n", t->state);
++ WARN_ON(1);
++ tasklet_unlock(t);
++ break;
++ }
++ }
+ }
+ }
+
++static void tasklet_action(struct softirq_action *a)
++{
++ struct tasklet_struct *list;
++
++ local_irq_disable();
++
++ list = __this_cpu_read(tasklet_vec.head);
++ __this_cpu_write(tasklet_vec.head, NULL);
++ __this_cpu_write(tasklet_vec.tail, this_cpu_ptr(&tasklet_vec.head));
++
++ local_irq_enable();
++
++ __tasklet_action(a, list);
++}
++
+ static void tasklet_hi_action(struct softirq_action *a)
+ {
+ struct tasklet_struct *list;
+
+ local_irq_disable();
++
+ list = __this_cpu_read(tasklet_hi_vec.head);
+ __this_cpu_write(tasklet_hi_vec.head, NULL);
+ __this_cpu_write(tasklet_hi_vec.tail, this_cpu_ptr(&tasklet_hi_vec.head));
+- local_irq_enable();
+-
+- while (list) {
+- struct tasklet_struct *t = list;
+
+- list = list->next;
+-
+- if (tasklet_trylock(t)) {
+- if (!atomic_read(&t->count)) {
+- if (!test_and_clear_bit(TASKLET_STATE_SCHED,
+- &t->state))
+- BUG();
+- t->func(t->data);
+- tasklet_unlock(t);
+- continue;
+- }
+- tasklet_unlock(t);
+- }
++ local_irq_enable();
+
+- local_irq_disable();
+- t->next = NULL;
+- *__this_cpu_read(tasklet_hi_vec.tail) = t;
+- __this_cpu_write(tasklet_hi_vec.tail, &(t->next));
+- __raise_softirq_irqoff(HI_SOFTIRQ);
+- local_irq_enable();
+- }
++ __tasklet_action(a, list);
+ }
+
+ void tasklet_init(struct tasklet_struct *t,
+@@ -572,7 +640,7 @@ void tasklet_kill(struct tasklet_struct
+
+ while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
+ do {
+- yield();
++ msleep(1);
+ } while (test_bit(TASKLET_STATE_SCHED, &t->state));
+ }
+ tasklet_unlock_wait(t);
+@@ -646,6 +714,23 @@ void __init softirq_init(void)
+ open_softirq(HI_SOFTIRQ, tasklet_hi_action);
+ }
+
++#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
++void tasklet_unlock_wait(struct tasklet_struct *t)
++{
++ while (test_bit(TASKLET_STATE_RUN, &(t)->state)) {
++ /*
++ * Hack for now to avoid this busy-loop:
++ */
++#ifdef CONFIG_PREEMPT_RT_FULL
++ msleep(1);
++#else
++ barrier();
++#endif
++ }
++}
++EXPORT_SYMBOL(tasklet_unlock_wait);
++#endif
++
+ static int ksoftirqd_should_run(unsigned int cpu)
+ {
+ return local_softirq_pending();
diff --git a/patches/tasklist-lock-fix-section-conflict.patch b/patches/tasklist-lock-fix-section-conflict.patch
new file mode 100644
index 00000000000000..0c0768f3fd7080
--- /dev/null
+++ b/patches/tasklist-lock-fix-section-conflict.patch
@@ -0,0 +1,55 @@
+Subject: rwlocks: Fix section mismatch
+From: John Kacur <jkacur@redhat.com>
+Date: Mon, 19 Sep 2011 11:09:27 +0200 (CEST)
+
+This fixes the following build error for the preempt-rt kernel.
+
+make kernel/fork.o
+ CC kernel/fork.o
+kernel/fork.c:90: error: section of tasklist_lock conflicts with previous declaration
+make[2]: *** [kernel/fork.o] Error 1
+make[1]: *** [kernel/fork.o] Error 2
+
+The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default.
+The non-rt kernels explicitly cache align only the tasklist_lock in
+kernel/fork.c
+That can create a build conflict. This fixes the build problem by making the
+non-rt kernels cache align RWLOCKs by default. The side effect is that
+the other RWLOCKs are also cache aligned for non-rt.
+
+This is a short term solution for rt only.
+The longer term solution would be to push the cache aligned DEFINE_RWLOCK
+to mainline. If there are objections, then we could create a
+DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature.
+
+Signed-off-by: John Kacur <jkacur@redhat.com>
+Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/rwlock_types.h | 3 ++-
+ kernel/fork.c | 2 +-
+ 2 files changed, 3 insertions(+), 2 deletions(-)
+
+--- a/include/linux/rwlock_types.h
++++ b/include/linux/rwlock_types.h
+@@ -47,6 +47,7 @@ typedef struct {
+ RW_DEP_MAP_INIT(lockname) }
+ #endif
+
+-#define DEFINE_RWLOCK(x) rwlock_t x = __RW_LOCK_UNLOCKED(x)
++#define DEFINE_RWLOCK(name) \
++ rwlock_t name __cacheline_aligned_in_smp = __RW_LOCK_UNLOCKED(name)
+
+ #endif /* __LINUX_RWLOCK_TYPES_H */
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -108,7 +108,7 @@ int max_threads; /* tunable limit on nr
+
+ DEFINE_PER_CPU(unsigned long, process_counts) = 0;
+
+-__cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
++DEFINE_RWLOCK(tasklist_lock); /* outer */
+
+ #ifdef CONFIG_PROVE_RCU
+ int lockdep_tasklist_lock_is_held(void)
diff --git a/patches/thermal-Defer-thermal-wakups-to-threads.patch b/patches/thermal-Defer-thermal-wakups-to-threads.patch
new file mode 100644
index 00000000000000..95a4dc585bb3da
--- /dev/null
+++ b/patches/thermal-Defer-thermal-wakups-to-threads.patch
@@ -0,0 +1,132 @@
+From: Daniel Wagner <wagi@monom.org>
+Date: Tue, 17 Feb 2015 09:37:44 +0100
+Subject: thermal: Defer thermal wakups to threads
+
+On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
+call schedule while we run in irq context.
+
+[<ffffffff816850ac>] dump_stack+0x4e/0x8f
+[<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
+[<ffffffff816896b4>] __schedule+0x5b4/0x700
+[<ffffffff8168982a>] schedule+0x2a/0x90
+[<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
+[<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
+[<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal]
+[<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160
+[<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250
+[<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40
+[<ffffffff8169415d>] thermal_interrupt+0x6d/0x80
+
+Let's defer the work to a kthread.
+
+Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
+[bigeasy: reoder init/denit position. TODO: flush swork on exit]
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/thermal/x86_pkg_temp_thermal.c | 50 +++++++++++++++++++++++++++++++--
+ 1 file changed, 47 insertions(+), 3 deletions(-)
+
+--- a/drivers/thermal/x86_pkg_temp_thermal.c
++++ b/drivers/thermal/x86_pkg_temp_thermal.c
+@@ -29,6 +29,7 @@
+ #include <linux/pm.h>
+ #include <linux/thermal.h>
+ #include <linux/debugfs.h>
++#include <linux/work-simple.h>
+ #include <asm/cpu_device_id.h>
+ #include <asm/mce.h>
+
+@@ -352,7 +353,7 @@ static void pkg_temp_thermal_threshold_w
+ }
+ }
+
+-static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
++static void platform_thermal_notify_work(struct swork_event *event)
+ {
+ unsigned long flags;
+ int cpu = smp_processor_id();
+@@ -369,7 +370,7 @@ static int pkg_temp_thermal_platform_the
+ pkg_work_scheduled[phy_id]) {
+ disable_pkg_thres_interrupt();
+ spin_unlock_irqrestore(&pkg_work_lock, flags);
+- return -EINVAL;
++ return;
+ }
+ pkg_work_scheduled[phy_id] = 1;
+ spin_unlock_irqrestore(&pkg_work_lock, flags);
+@@ -378,9 +379,48 @@ static int pkg_temp_thermal_platform_the
+ schedule_delayed_work_on(cpu,
+ &per_cpu(pkg_temp_thermal_threshold_work, cpu),
+ msecs_to_jiffies(notify_delay_ms));
++}
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++static struct swork_event notify_work;
++
++static int thermal_notify_work_init(void)
++{
++ int err;
++
++ err = swork_get();
++ if (err)
++ return err;
++
++ INIT_SWORK(&notify_work, platform_thermal_notify_work);
+ return 0;
+ }
+
++static void thermal_notify_work_cleanup(void)
++{
++ swork_put();
++}
++
++static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
++{
++ swork_queue(&notify_work);
++ return 0;
++}
++
++#else /* !CONFIG_PREEMPT_RT_FULL */
++
++static int thermal_notify_work_init(void) { return 0; }
++
++static void thermal_notify_work_cleanup(void) { }
++
++static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
++{
++ platform_thermal_notify_work(NULL);
++
++ return 0;
++}
++#endif /* CONFIG_PREEMPT_RT_FULL */
++
+ static int find_siblings_cpu(int cpu)
+ {
+ int i;
+@@ -584,6 +624,9 @@ static int __init pkg_temp_thermal_init(
+ if (!x86_match_cpu(pkg_temp_thermal_ids))
+ return -ENODEV;
+
++ if (!thermal_notify_work_init())
++ return -ENODEV;
++
+ spin_lock_init(&pkg_work_lock);
+ platform_thermal_package_notify =
+ pkg_temp_thermal_platform_thermal_notify;
+@@ -608,7 +651,7 @@ static int __init pkg_temp_thermal_init(
+ kfree(pkg_work_scheduled);
+ platform_thermal_package_notify = NULL;
+ platform_thermal_package_rate_control = NULL;
+-
++ thermal_notify_work_cleanup();
+ return -ENODEV;
+ }
+
+@@ -633,6 +676,7 @@ static void __exit pkg_temp_thermal_exit
+ mutex_unlock(&phy_dev_list_mutex);
+ platform_thermal_package_notify = NULL;
+ platform_thermal_package_rate_control = NULL;
++ thermal_notify_work_cleanup();
+ for_each_online_cpu(i)
+ cancel_delayed_work_sync(
+ &per_cpu(pkg_temp_thermal_threshold_work, i));
diff --git a/patches/timekeeping-split-jiffies-lock.patch b/patches/timekeeping-split-jiffies-lock.patch
new file mode 100644
index 00000000000000..f46e80560e6453
--- /dev/null
+++ b/patches/timekeeping-split-jiffies-lock.patch
@@ -0,0 +1,156 @@
+Subject: timekeeping: Split jiffies seqlock
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 14 Feb 2013 22:36:59 +0100
+
+Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so
+it can be taken in atomic context on RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/time/jiffies.c | 7 ++++---
+ kernel/time/tick-common.c | 10 ++++++----
+ kernel/time/tick-sched.c | 19 ++++++++++++-------
+ kernel/time/timekeeping.c | 6 ++++--
+ kernel/time/timekeeping.h | 3 ++-
+ 5 files changed, 28 insertions(+), 17 deletions(-)
+
+--- a/kernel/time/jiffies.c
++++ b/kernel/time/jiffies.c
+@@ -74,7 +74,8 @@ static struct clocksource clocksource_ji
+ .max_cycles = 10,
+ };
+
+-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
++__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
++__cacheline_aligned_in_smp seqcount_t jiffies_seq;
+
+ #if (BITS_PER_LONG < 64)
+ u64 get_jiffies_64(void)
+@@ -83,9 +84,9 @@ u64 get_jiffies_64(void)
+ u64 ret;
+
+ do {
+- seq = read_seqbegin(&jiffies_lock);
++ seq = read_seqcount_begin(&jiffies_seq);
+ ret = jiffies_64;
+- } while (read_seqretry(&jiffies_lock, seq));
++ } while (read_seqcount_retry(&jiffies_seq, seq));
+ return ret;
+ }
+ EXPORT_SYMBOL(get_jiffies_64);
+--- a/kernel/time/tick-common.c
++++ b/kernel/time/tick-common.c
+@@ -78,13 +78,15 @@ int tick_is_oneshot_available(void)
+ static void tick_periodic(int cpu)
+ {
+ if (tick_do_timer_cpu == cpu) {
+- write_seqlock(&jiffies_lock);
++ raw_spin_lock(&jiffies_lock);
++ write_seqcount_begin(&jiffies_seq);
+
+ /* Keep track of the next tick event */
+ tick_next_period = ktime_add(tick_next_period, tick_period);
+
+ do_timer(1);
+- write_sequnlock(&jiffies_lock);
++ write_seqcount_end(&jiffies_seq);
++ raw_spin_unlock(&jiffies_lock);
+ update_wall_time();
+ }
+
+@@ -146,9 +148,9 @@ void tick_setup_periodic(struct clock_ev
+ ktime_t next;
+
+ do {
+- seq = read_seqbegin(&jiffies_lock);
++ seq = read_seqcount_begin(&jiffies_seq);
+ next = tick_next_period;
+- } while (read_seqretry(&jiffies_lock, seq));
++ } while (read_seqcount_retry(&jiffies_seq, seq));
+
+ clockevents_set_state(dev, CLOCK_EVT_STATE_ONESHOT);
+
+--- a/kernel/time/tick-sched.c
++++ b/kernel/time/tick-sched.c
+@@ -62,7 +62,8 @@ static void tick_do_update_jiffies64(kti
+ return;
+
+ /* Reevalute with jiffies_lock held */
+- write_seqlock(&jiffies_lock);
++ raw_spin_lock(&jiffies_lock);
++ write_seqcount_begin(&jiffies_seq);
+
+ delta = ktime_sub(now, last_jiffies_update);
+ if (delta.tv64 >= tick_period.tv64) {
+@@ -85,10 +86,12 @@ static void tick_do_update_jiffies64(kti
+ /* Keep the tick_next_period variable up to date */
+ tick_next_period = ktime_add(last_jiffies_update, tick_period);
+ } else {
+- write_sequnlock(&jiffies_lock);
++ write_seqcount_end(&jiffies_seq);
++ raw_spin_unlock(&jiffies_lock);
+ return;
+ }
+- write_sequnlock(&jiffies_lock);
++ write_seqcount_end(&jiffies_seq);
++ raw_spin_unlock(&jiffies_lock);
+ update_wall_time();
+ }
+
+@@ -99,12 +102,14 @@ static ktime_t tick_init_jiffy_update(vo
+ {
+ ktime_t period;
+
+- write_seqlock(&jiffies_lock);
++ raw_spin_lock(&jiffies_lock);
++ write_seqcount_begin(&jiffies_seq);
+ /* Did we start the jiffies update yet ? */
+ if (last_jiffies_update.tv64 == 0)
+ last_jiffies_update = tick_next_period;
+ period = last_jiffies_update;
+- write_sequnlock(&jiffies_lock);
++ write_seqcount_end(&jiffies_seq);
++ raw_spin_unlock(&jiffies_lock);
+ return period;
+ }
+
+@@ -578,10 +583,10 @@ static ktime_t tick_nohz_stop_sched_tick
+
+ /* Read jiffies and the time when jiffies were updated last */
+ do {
+- seq = read_seqbegin(&jiffies_lock);
++ seq = read_seqcount_begin(&jiffies_seq);
+ last_update = last_jiffies_update;
+ last_jiffies = jiffies;
+- } while (read_seqretry(&jiffies_lock, seq));
++ } while (read_seqcount_retry(&jiffies_seq, seq));
+
+ if (rcu_needs_cpu(&rcu_delta_jiffies) ||
+ arch_needs_cpu() || irq_work_needs_cpu()) {
+--- a/kernel/time/timekeeping.c
++++ b/kernel/time/timekeeping.c
+@@ -2065,8 +2065,10 @@ EXPORT_SYMBOL(hardpps);
+ */
+ void xtime_update(unsigned long ticks)
+ {
+- write_seqlock(&jiffies_lock);
++ raw_spin_lock(&jiffies_lock);
++ write_seqcount_begin(&jiffies_seq);
+ do_timer(ticks);
+- write_sequnlock(&jiffies_lock);
++ write_seqcount_end(&jiffies_seq);
++ raw_spin_unlock(&jiffies_lock);
+ update_wall_time();
+ }
+--- a/kernel/time/timekeeping.h
++++ b/kernel/time/timekeeping.h
+@@ -22,7 +22,8 @@ extern void timekeeping_resume(void);
+ extern void do_timer(unsigned long ticks);
+ extern void update_wall_time(void);
+
+-extern seqlock_t jiffies_lock;
++extern raw_spinlock_t jiffies_lock;
++extern seqcount_t jiffies_seq;
+
+ #define CS_NAME_LEN 32
+
diff --git a/patches/timer-delay-waking-softirqs-from-the-jiffy-tick.patch b/patches/timer-delay-waking-softirqs-from-the-jiffy-tick.patch
new file mode 100644
index 00000000000000..fd3cea82f35910
--- /dev/null
+++ b/patches/timer-delay-waking-softirqs-from-the-jiffy-tick.patch
@@ -0,0 +1,75 @@
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Fri, 21 Aug 2009 11:56:45 +0200
+Subject: timer: delay waking softirqs from the jiffy tick
+
+People were complaining about broken balancing with the recent -rt
+series.
+
+A look at /proc/sched_debug yielded:
+
+cpu#0, 2393.874 MHz
+ .nr_running : 0
+ .load : 0
+ .cpu_load[0] : 177522
+ .cpu_load[1] : 177522
+ .cpu_load[2] : 177522
+ .cpu_load[3] : 177522
+ .cpu_load[4] : 177522
+cpu#1, 2393.874 MHz
+ .nr_running : 4
+ .load : 4096
+ .cpu_load[0] : 181618
+ .cpu_load[1] : 180850
+ .cpu_load[2] : 180274
+ .cpu_load[3] : 179938
+ .cpu_load[4] : 179758
+
+Which indicated the cpu_load computation was hosed, the 177522 value
+indicates that there is one RT task runnable. Initially I thought the
+old problem of calculating the cpu_load from a softirq had re-surfaced,
+however looking at the code shows its being done from scheduler_tick().
+
+[ we really should fix this RT/cfs interaction some day... ]
+
+A few trace_printk()s later:
+
+ sirq-timer/1-19 [001] 174.289744: 19: 50:S ==> [001] 0:140:R <idle>
+ <idle>-0 [001] 174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522
+ <idle>-0 [001] 174.290725: 0:140:R + [001] 19: 50:S sirq-timer/1
+ <idle>-0 [001] 174.290730: scheduler_tick: current load: 177522
+ <idle>-0 [001] 174.290732: scheduler_tick: current: 0/swapper
+ <idle>-0 [001] 174.290736: 0:140:R ==> [001] 19: 50:R sirq-timer/1
+ sirq-timer/1-19 [001] 174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522
+ sirq-timer/1-19 [001] 174.290743: 19: 50:S ==> [001] 0:140:R <idle>
+
+We see that we always raise the timer softirq before doing the load
+calculation. Avoid this by re-ordering the scheduler_tick() call in
+update_process_times() to occur before we deal with timers.
+
+This lowers the load back to sanity and restores regular load-balancing
+behaviour.
+
+Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/time/timer.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/kernel/time/timer.c
++++ b/kernel/time/timer.c
+@@ -1428,13 +1428,13 @@ void update_process_times(int user_tick)
+
+ /* Note: this timer irq context must be accounted for as well. */
+ account_process_tick(p, user_tick);
++ scheduler_tick();
+ run_local_timers();
+ rcu_check_callbacks(user_tick);
+ #ifdef CONFIG_IRQ_WORK
+ if (in_irq())
+ irq_work_tick();
+ #endif
+- scheduler_tick();
+ run_posix_cpu_timers(p);
+ }
+
diff --git a/patches/timer-fd-avoid-live-lock.patch b/patches/timer-fd-avoid-live-lock.patch
new file mode 100644
index 00000000000000..8dc713c97cafd5
--- /dev/null
+++ b/patches/timer-fd-avoid-live-lock.patch
@@ -0,0 +1,30 @@
+Subject: timer-fd: Prevent live lock
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 25 Jan 2012 11:08:40 +0100
+
+If hrtimer_try_to_cancel() requires a retry, then depending on the
+priority setting te retry loop might prevent timer callback completion
+on RT. Prevent that by waiting for completion on RT, no change for a
+non RT kernel.
+
+Reported-by: Sankara Muthukrishnan <sankara.m@gmail.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ fs/timerfd.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+--- a/fs/timerfd.c
++++ b/fs/timerfd.c
+@@ -450,7 +450,10 @@ static int do_timerfd_settime(int ufd, i
+ break;
+ }
+ spin_unlock_irq(&ctx->wqh.lock);
+- cpu_relax();
++ if (isalarm(ctx))
++ hrtimer_wait_for_timer(&ctx->t.alarm.timer);
++ else
++ hrtimer_wait_for_timer(&ctx->t.tmr);
+ }
+
+ /*
diff --git a/patches/timers-avoid-the-base-null-otptimization-on-rt.patch b/patches/timers-avoid-the-base-null-otptimization-on-rt.patch
new file mode 100644
index 00000000000000..0177c34be8cb89
--- /dev/null
+++ b/patches/timers-avoid-the-base-null-otptimization-on-rt.patch
@@ -0,0 +1,68 @@
+Subject: timers: Avoid the switch timers base set to NULL trick on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 21 Jul 2011 15:23:39 +0200
+
+On RT that code is preemptible, so we cannot assign NULL to timers
+base as a preempter would spin forever in lock_timer_base().
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/time/timer.c | 40 ++++++++++++++++++++++++++++++++--------
+ 1 file changed, 32 insertions(+), 8 deletions(-)
+
+--- a/kernel/time/timer.c
++++ b/kernel/time/timer.c
+@@ -771,6 +771,36 @@ static struct tvec_base *lock_timer_base
+ }
+ }
+
++#ifndef CONFIG_PREEMPT_RT_FULL
++static inline struct tvec_base *switch_timer_base(struct timer_list *timer,
++ struct tvec_base *old,
++ struct tvec_base *new)
++{
++ /* See the comment in lock_timer_base() */
++ timer_set_base(timer, NULL);
++ spin_unlock(&old->lock);
++ spin_lock(&new->lock);
++ timer_set_base(timer, new);
++ return new;
++}
++#else
++static inline struct tvec_base *switch_timer_base(struct timer_list *timer,
++ struct tvec_base *old,
++ struct tvec_base *new)
++{
++ /*
++ * We cannot do the above because we might be preempted and
++ * then the preempter would see NULL and loop forever.
++ */
++ if (spin_trylock(&new->lock)) {
++ timer_set_base(timer, new);
++ spin_unlock(&old->lock);
++ return new;
++ }
++ return old;
++}
++#endif
++
+ static inline int
+ __mod_timer(struct timer_list *timer, unsigned long expires,
+ bool pending_only, int pinned)
+@@ -801,14 +831,8 @@ static inline int
+ * handler yet has not finished. This also guarantees that
+ * the timer is serialized wrt itself.
+ */
+- if (likely(base->running_timer != timer)) {
+- /* See the comment in lock_timer_base() */
+- timer_set_base(timer, NULL);
+- spin_unlock(&base->lock);
+- base = new_base;
+- spin_lock(&base->lock);
+- timer_set_base(timer, base);
+- }
++ if (likely(base->running_timer != timer))
++ base = switch_timer_base(timer, base, new_base);
+ }
+
+ timer->expires = expires;
diff --git a/patches/timers-preempt-rt-support.patch b/patches/timers-preempt-rt-support.patch
new file mode 100644
index 00000000000000..580251f50dec17
--- /dev/null
+++ b/patches/timers-preempt-rt-support.patch
@@ -0,0 +1,54 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:30:20 -0500
+Subject: timers: Preempt-rt support
+
+The base->lock is a sleeping lock. Try not to workaround with a
+spin_trylock(). The rt-mutex lock is not irq save even the try-lock
+due to way how the inner lock accessed. Even with this fixed have the
+problem that the owner is not the current process on the CPU but his
+pid is used while taking the lock. Therefore we go with ext jiffy for
+the wakeup. Also drop preempt_disable() usage since we need just to
+ensure not to switch CPUs (the data structures have own locks).
+
+[bigeasy: dropped that spin try lock]
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/time/timer.c | 12 ++++++++++--
+ 1 file changed, 10 insertions(+), 2 deletions(-)
+
+--- a/kernel/time/timer.c
++++ b/kernel/time/timer.c
+@@ -1395,6 +1395,14 @@ unsigned long get_next_timer_interrupt(u
+ if (cpu_is_offline(smp_processor_id()))
+ return expires;
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++ /*
++ * On PREEMPT_RT we cannot sleep here. As a result we can't take
++ * the base lock to check when the next timer is pending and so
++ * we assume the next jiffy.
++ */
++ return now + 1;
++#endif
+ spin_lock(&base->lock);
+ if (base->active_timers) {
+ if (time_before_eq(base->next_timer, base->timer_jiffies))
+@@ -1594,7 +1602,7 @@ static void migrate_timers(int cpu)
+
+ BUG_ON(cpu_online(cpu));
+ old_base = per_cpu(tvec_bases, cpu);
+- new_base = get_cpu_var(tvec_bases);
++ new_base = get_local_var(tvec_bases);
+ /*
+ * The caller is globally serialized and nobody else
+ * takes two locks at once, deadlock is not possible.
+@@ -1618,7 +1626,7 @@ static void migrate_timers(int cpu)
+
+ spin_unlock(&old_base->lock);
+ spin_unlock_irq(&new_base->lock);
+- put_cpu_var(tvec_bases);
++ put_local_var(tvec_bases);
+ }
+
+ static int timer_cpu_notify(struct notifier_block *self,
diff --git a/patches/timers-prepare-for-full-preemption.patch b/patches/timers-prepare-for-full-preemption.patch
new file mode 100644
index 00000000000000..6d57691d471488
--- /dev/null
+++ b/patches/timers-prepare-for-full-preemption.patch
@@ -0,0 +1,145 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:34 -0500
+Subject: timers: Prepare for full preemption
+
+When softirqs can be preempted we need to make sure that cancelling
+the timer from the active thread can not deadlock vs. a running timer
+callback. Add a waitqueue to resolve that.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ include/linux/timer.h | 2 +-
+ kernel/sched/core.c | 8 ++++++--
+ kernel/time/timer.c | 37 ++++++++++++++++++++++++++++++++++---
+ 3 files changed, 41 insertions(+), 6 deletions(-)
+
+--- a/include/linux/timer.h
++++ b/include/linux/timer.h
+@@ -241,7 +241,7 @@ extern void add_timer(struct timer_list
+
+ extern int try_to_del_timer_sync(struct timer_list *timer);
+
+-#ifdef CONFIG_SMP
++#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
+ extern int del_timer_sync(struct timer_list *timer);
+ #else
+ # define del_timer_sync(t) del_timer(t)
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -641,12 +641,14 @@ void resched_cpu(int cpu)
+ */
+ int get_nohz_timer_target(int pinned)
+ {
+- int cpu = smp_processor_id();
++ int cpu;
+ int i;
+ struct sched_domain *sd;
+
++ preempt_disable_rt();
++ cpu = smp_processor_id();
+ if (pinned || !get_sysctl_timer_migration() || !idle_cpu(cpu))
+- return cpu;
++ goto preempt_en_rt;
+
+ rcu_read_lock();
+ for_each_domain(cpu, sd) {
+@@ -659,6 +661,8 @@ int get_nohz_timer_target(int pinned)
+ }
+ unlock:
+ rcu_read_unlock();
++preempt_en_rt:
++ preempt_enable_rt();
+ return cpu;
+ }
+ /*
+--- a/kernel/time/timer.c
++++ b/kernel/time/timer.c
+@@ -78,6 +78,9 @@ struct tvec_root {
+ struct tvec_base {
+ spinlock_t lock;
+ struct timer_list *running_timer;
++#ifdef CONFIG_PREEMPT_RT_FULL
++ wait_queue_head_t wait_for_running_timer;
++#endif
+ unsigned long timer_jiffies;
+ unsigned long next_timer;
+ unsigned long active_timers;
+@@ -979,6 +982,29 @@ void add_timer_on(struct timer_list *tim
+ }
+ EXPORT_SYMBOL_GPL(add_timer_on);
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++/*
++ * Wait for a running timer
++ */
++static void wait_for_running_timer(struct timer_list *timer)
++{
++ struct tvec_base *base = timer->base;
++
++ if (base->running_timer == timer)
++ wait_event(base->wait_for_running_timer,
++ base->running_timer != timer);
++}
++
++# define wakeup_timer_waiters(b) wake_up(&(b)->wait_for_running_timer)
++#else
++static inline void wait_for_running_timer(struct timer_list *timer)
++{
++ cpu_relax();
++}
++
++# define wakeup_timer_waiters(b) do { } while (0)
++#endif
++
+ /**
+ * del_timer - deactive a timer.
+ * @timer: the timer to be deactivated
+@@ -1036,7 +1062,7 @@ int try_to_del_timer_sync(struct timer_l
+ }
+ EXPORT_SYMBOL(try_to_del_timer_sync);
+
+-#ifdef CONFIG_SMP
++#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
+ static DEFINE_PER_CPU(struct tvec_base, __tvec_bases);
+
+ /**
+@@ -1098,7 +1124,7 @@ int del_timer_sync(struct timer_list *ti
+ int ret = try_to_del_timer_sync(timer);
+ if (ret >= 0)
+ return ret;
+- cpu_relax();
++ wait_for_running_timer(timer);
+ }
+ }
+ EXPORT_SYMBOL(del_timer_sync);
+@@ -1219,15 +1245,17 @@ static inline void __run_timers(struct t
+ if (irqsafe) {
+ spin_unlock(&base->lock);
+ call_timer_fn(timer, fn, data);
++ base->running_timer = NULL;
+ spin_lock(&base->lock);
+ } else {
+ spin_unlock_irq(&base->lock);
+ call_timer_fn(timer, fn, data);
++ base->running_timer = NULL;
+ spin_lock_irq(&base->lock);
+ }
+ }
+ }
+- base->running_timer = NULL;
++ wakeup_timer_waiters(base);
+ spin_unlock_irq(&base->lock);
+ }
+
+@@ -1625,6 +1653,9 @@ static void __init init_timer_cpu(struct
+ base->cpu = cpu;
+ per_cpu(tvec_bases, cpu) = base;
+ spin_lock_init(&base->lock);
++#ifdef CONFIG_PREEMPT_RT_FULL
++ init_waitqueue_head(&base->wait_for_running_timer);
++#endif
+
+ for (j = 0; j < TVN_SIZE; j++) {
+ INIT_LIST_HEAD(base->tv5.vec + j);
diff --git a/patches/tracing-account-for-preempt-off-in-preempt_schedule.patch b/patches/tracing-account-for-preempt-off-in-preempt_schedule.patch
new file mode 100644
index 00000000000000..11263dd3477cb7
--- /dev/null
+++ b/patches/tracing-account-for-preempt-off-in-preempt_schedule.patch
@@ -0,0 +1,46 @@
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Thu, 29 Sep 2011 12:24:30 -0500
+Subject: tracing: Account for preempt off in preempt_schedule()
+
+The preempt_schedule() uses the preempt_disable_notrace() version
+because it can cause infinite recursion by the function tracer as
+the function tracer uses preempt_enable_notrace() which may call
+back into the preempt_schedule() code as the NEED_RESCHED is still
+set and the PREEMPT_ACTIVE has not been set yet.
+
+See commit: d1f74e20b5b064a130cd0743a256c2d3cfe84010 that made this
+change.
+
+The preemptoff and preemptirqsoff latency tracers require the first
+and last preempt count modifiers to enable tracing. But this skips
+the checks. Since we can not convert them back to the non notrace
+version, we can use the idle() hooks for the latency tracers here.
+That is, the start/stop_critical_timings() works well to manually
+start and stop the latency tracer for preempt off timings.
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+Signed-off-by: Clark Williams <williams@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/sched/core.c | 9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2970,7 +2970,16 @@ asmlinkage __visible void __sched notrac
+ * an infinite recursion.
+ */
+ prev_ctx = exception_enter();
++ /*
++ * The add/subtract must not be traced by the function
++ * tracer. But we still want to account for the
++ * preempt off latency tracer. Since the _notrace versions
++ * of add/subtract skip the accounting for latency tracer
++ * we must force it manually.
++ */
++ start_critical_timings();
+ __schedule();
++ stop_critical_timings();
+ exception_exit(prev_ctx);
+
+ __preempt_count_sub(PREEMPT_ACTIVE);
diff --git a/patches/upstream-net-rt-remove-preemption-disabling-in-netif_rx.patch b/patches/upstream-net-rt-remove-preemption-disabling-in-netif_rx.patch
new file mode 100644
index 00000000000000..13f7000548fd69
--- /dev/null
+++ b/patches/upstream-net-rt-remove-preemption-disabling-in-netif_rx.patch
@@ -0,0 +1,65 @@
+Subject: net: Remove preemption disabling in netif_rx()
+From: Priyanka Jain <Priyanka.Jain@freescale.com>
+Date: Thu, 17 May 2012 09:35:11 +0530
+
+1)enqueue_to_backlog() (called from netif_rx) should be
+ bind to a particluar CPU. This can be achieved by
+ disabling migration. No need to disable preemption
+
+2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
+ in case of RT.
+ If preemption is disabled, enqueue_to_backog() is called
+ in atomic context. And if backlog exceeds its count,
+ kfree_skb() is called. But in RT, kfree_skb() might
+ gets scheduled out, so it expects non atomic context.
+
+3)When CONFIG_PREEMPT_RT_FULL is not defined,
+ migrate_enable(), migrate_disable() maps to
+ preempt_enable() and preempt_disable(), so no
+ change in functionality in case of non-RT.
+
+-Replace preempt_enable(), preempt_disable() with
+ migrate_enable(), migrate_disable() respectively
+-Replace get_cpu(), put_cpu() with get_cpu_light(),
+ put_cpu_light() respectively
+
+Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
+Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
+Cc: <rostedt@goodmis.orgn>
+Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ Testing: Tested successfully on p4080ds(8-core SMP system)
+
+ net/core/dev.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -3387,7 +3387,7 @@ static int netif_rx_internal(struct sk_b
+ struct rps_dev_flow voidflow, *rflow = &voidflow;
+ int cpu;
+
+- preempt_disable();
++ migrate_disable();
+ rcu_read_lock();
+
+ cpu = get_rps_cpu(skb->dev, skb, &rflow);
+@@ -3397,13 +3397,13 @@ static int netif_rx_internal(struct sk_b
+ ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+
+ rcu_read_unlock();
+- preempt_enable();
++ migrate_enable();
+ } else
+ #endif
+ {
+ unsigned int qtail;
+- ret = enqueue_to_backlog(skb, get_cpu(), &qtail);
+- put_cpu();
++ ret = enqueue_to_backlog(skb, get_cpu_light(), &qtail);
++ put_cpu_light();
+ }
+ return ret;
+ }
diff --git a/patches/usb-use-_nort-in-giveback.patch b/patches/usb-use-_nort-in-giveback.patch
new file mode 100644
index 00000000000000..355d1fdaa19832
--- /dev/null
+++ b/patches/usb-use-_nort-in-giveback.patch
@@ -0,0 +1,57 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Fri, 8 Nov 2013 17:34:54 +0100
+Subject: usb: Use _nort in giveback function
+
+Since commit 94dfd7ed ("USB: HCD: support giveback of URB in tasklet
+context") I see
+
+|BUG: sleeping function called from invalid context at kernel/rtmutex.c:673
+|in_atomic(): 0, irqs_disabled(): 1, pid: 109, name: irq/11-uhci_hcd
+|no locks held by irq/11-uhci_hcd/109.
+|irq event stamp: 440
+|hardirqs last enabled at (439): [<ffffffff816a7555>] _raw_spin_unlock_irqrestore+0x75/0x90
+|hardirqs last disabled at (440): [<ffffffff81514906>] __usb_hcd_giveback_urb+0x46/0xc0
+|softirqs last enabled at (0): [<ffffffff81081821>] copy_process.part.52+0x511/0x1510
+|softirqs last disabled at (0): [< (null)>] (null)
+|CPU: 3 PID: 109 Comm: irq/11-uhci_hcd Not tainted 3.12.0-rt0-rc1+ #13
+|Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
+| 0000000000000000 ffff8800db9ffbe0 ffffffff8169f064 0000000000000000
+| ffff8800db9ffbf8 ffffffff810b2122 ffff88020f03e888 ffff8800db9ffc18
+| ffffffff816a6944 ffffffff810b5748 ffff88020f03c000 ffff8800db9ffc50
+|Call Trace:
+| [<ffffffff8169f064>] dump_stack+0x4e/0x8f
+| [<ffffffff810b2122>] __might_sleep+0x112/0x190
+| [<ffffffff816a6944>] rt_spin_lock+0x24/0x60
+| [<ffffffff8158435b>] hid_ctrl+0x3b/0x190
+| [<ffffffff8151490f>] __usb_hcd_giveback_urb+0x4f/0xc0
+| [<ffffffff81514aaf>] usb_hcd_giveback_urb+0x3f/0x140
+| [<ffffffff815346af>] uhci_giveback_urb+0xaf/0x280
+| [<ffffffff8153666a>] uhci_scan_schedule+0x47a/0xb10
+| [<ffffffff81537336>] uhci_irq+0xa6/0x1a0
+| [<ffffffff81513c48>] usb_hcd_irq+0x28/0x40
+| [<ffffffff810c8ba3>] irq_forced_thread_fn+0x23/0x70
+| [<ffffffff810c918f>] irq_thread+0x10f/0x150
+| [<ffffffff810a6fad>] kthread+0xcd/0xe0
+| [<ffffffff816a842c>] ret_from_fork+0x7c/0xb0
+
+on -RT we run threaded so no need to disable interrupts.
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ drivers/usb/core/hcd.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/drivers/usb/core/hcd.c
++++ b/drivers/usb/core/hcd.c
+@@ -1681,9 +1681,9 @@ static void __usb_hcd_giveback_urb(struc
+ * and no one may trigger the above deadlock situation when
+ * running complete() in tasklet.
+ */
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ urb->complete(urb);
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+
+ usb_anchor_resume_wakeups(anchor);
+ atomic_dec(&urb->use_count);
diff --git a/patches/user-use-local-irq-nort.patch b/patches/user-use-local-irq-nort.patch
new file mode 100644
index 00000000000000..3150b0cce30b8d
--- /dev/null
+++ b/patches/user-use-local-irq-nort.patch
@@ -0,0 +1,29 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 21 Jul 2009 23:06:05 +0200
+Subject: core: Do not disable interrupts on RT in kernel/users.c
+
+Use the local_irq_*_nort variants to reduce latencies in RT. The code
+is serialized by the locks. No need to disable interrupts.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/user.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/kernel/user.c
++++ b/kernel/user.c
+@@ -161,11 +161,11 @@ void free_uid(struct user_struct *up)
+ if (!up)
+ return;
+
+- local_irq_save(flags);
++ local_irq_save_nort(flags);
+ if (atomic_dec_and_lock(&up->__count, &uidhash_lock))
+ free_user(up, flags);
+ else
+- local_irq_restore(flags);
++ local_irq_restore_nort(flags);
+ }
+
+ struct user_struct *alloc_uid(kuid_t uid)
diff --git a/patches/vtime-split-lock-and-seqcount.patch b/patches/vtime-split-lock-and-seqcount.patch
new file mode 100644
index 00000000000000..5b02e38ea2fb9f
--- /dev/null
+++ b/patches/vtime-split-lock-and-seqcount.patch
@@ -0,0 +1,205 @@
+Subject: vtime: Split lock and seqcount
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 23 Jul 2013 15:45:51 +0200
+
+Replace vtime_seqlock seqlock with a simple seqcounter and a rawlock
+so it can taken in atomic context on RT.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ include/linux/init_task.h | 3 +-
+ include/linux/sched.h | 3 +-
+ kernel/fork.c | 3 +-
+ kernel/sched/cputime.c | 62 +++++++++++++++++++++++++++++-----------------
+ 4 files changed, 46 insertions(+), 25 deletions(-)
+--- a/include/linux/init_task.h
++++ b/include/linux/init_task.h
+@@ -149,7 +149,8 @@ extern struct task_group root_task_group
+
+ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+ # define INIT_VTIME(tsk) \
+- .vtime_seqlock = __SEQLOCK_UNLOCKED(tsk.vtime_seqlock), \
++ .vtime_lock = __RAW_SPIN_LOCK_UNLOCKED(tsk.vtime_lock), \
++ .vtime_seq = SEQCNT_ZERO(tsk.vtime_seq), \
+ .vtime_snap = 0, \
+ .vtime_snap_whence = VTIME_SYS,
+ #else
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -1478,7 +1478,8 @@ struct task_struct {
+ struct cputime prev_cputime;
+ #endif
+ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+- seqlock_t vtime_seqlock;
++ raw_spinlock_t vtime_lock;
++ seqcount_t vtime_seq;
+ unsigned long long vtime_snap;
+ enum {
+ VTIME_SLEEPING = 0,
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1345,7 +1345,8 @@ static struct task_struct *copy_process(
+ p->prev_cputime.utime = p->prev_cputime.stime = 0;
+ #endif
+ #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+- seqlock_init(&p->vtime_seqlock);
++ raw_spin_lock_init(&p->vtime_lock);
++ seqcount_init(&p->vtime_seq);
+ p->vtime_snap = 0;
+ p->vtime_snap_whence = VTIME_SLEEPING;
+ #endif
+--- a/kernel/sched/cputime.c
++++ b/kernel/sched/cputime.c
+@@ -675,37 +675,45 @@ static void __vtime_account_system(struc
+
+ void vtime_account_system(struct task_struct *tsk)
+ {
+- write_seqlock(&tsk->vtime_seqlock);
++ raw_spin_lock(&tsk->vtime_lock);
++ write_seqcount_begin(&tsk->vtime_seq);
+ __vtime_account_system(tsk);
+- write_sequnlock(&tsk->vtime_seqlock);
++ write_seqcount_end(&tsk->vtime_seq);
++ raw_spin_unlock(&tsk->vtime_lock);
+ }
+
+ void vtime_gen_account_irq_exit(struct task_struct *tsk)
+ {
+- write_seqlock(&tsk->vtime_seqlock);
++ raw_spin_lock(&tsk->vtime_lock);
++ write_seqcount_begin(&tsk->vtime_seq);
+ __vtime_account_system(tsk);
+ if (context_tracking_in_user())
+ tsk->vtime_snap_whence = VTIME_USER;
+- write_sequnlock(&tsk->vtime_seqlock);
++ write_seqcount_end(&tsk->vtime_seq);
++ raw_spin_unlock(&tsk->vtime_lock);
+ }
+
+ void vtime_account_user(struct task_struct *tsk)
+ {
+ cputime_t delta_cpu;
+
+- write_seqlock(&tsk->vtime_seqlock);
++ raw_spin_lock(&tsk->vtime_lock);
++ write_seqcount_begin(&tsk->vtime_seq);
+ delta_cpu = get_vtime_delta(tsk);
+ tsk->vtime_snap_whence = VTIME_SYS;
+ account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+- write_sequnlock(&tsk->vtime_seqlock);
++ write_seqcount_end(&tsk->vtime_seq);
++ raw_spin_unlock(&tsk->vtime_lock);
+ }
+
+ void vtime_user_enter(struct task_struct *tsk)
+ {
+- write_seqlock(&tsk->vtime_seqlock);
++ raw_spin_lock(&tsk->vtime_lock);
++ write_seqcount_begin(&tsk->vtime_seq);
+ __vtime_account_system(tsk);
+ tsk->vtime_snap_whence = VTIME_USER;
+- write_sequnlock(&tsk->vtime_seqlock);
++ write_seqcount_end(&tsk->vtime_seq);
++ raw_spin_unlock(&tsk->vtime_lock);
+ }
+
+ void vtime_guest_enter(struct task_struct *tsk)
+@@ -717,19 +725,23 @@ void vtime_guest_enter(struct task_struc
+ * synchronization against the reader (task_gtime())
+ * that can thus safely catch up with a tickless delta.
+ */
+- write_seqlock(&tsk->vtime_seqlock);
++ raw_spin_lock(&tsk->vtime_lock);
++ write_seqcount_begin(&tsk->vtime_seq);
+ __vtime_account_system(tsk);
+ current->flags |= PF_VCPU;
+- write_sequnlock(&tsk->vtime_seqlock);
++ write_seqcount_end(&tsk->vtime_seq);
++ raw_spin_unlock(&tsk->vtime_lock);
+ }
+ EXPORT_SYMBOL_GPL(vtime_guest_enter);
+
+ void vtime_guest_exit(struct task_struct *tsk)
+ {
+- write_seqlock(&tsk->vtime_seqlock);
++ raw_spin_lock(&tsk->vtime_lock);
++ write_seqcount_begin(&tsk->vtime_seq);
+ __vtime_account_system(tsk);
+ current->flags &= ~PF_VCPU;
+- write_sequnlock(&tsk->vtime_seqlock);
++ write_seqcount_end(&tsk->vtime_seq);
++ raw_spin_unlock(&tsk->vtime_lock);
+ }
+ EXPORT_SYMBOL_GPL(vtime_guest_exit);
+
+@@ -742,24 +754,30 @@ void vtime_account_idle(struct task_stru
+
+ void arch_vtime_task_switch(struct task_struct *prev)
+ {
+- write_seqlock(&prev->vtime_seqlock);
++ raw_spin_lock(&prev->vtime_lock);
++ write_seqcount_begin(&prev->vtime_seq);
+ prev->vtime_snap_whence = VTIME_SLEEPING;
+- write_sequnlock(&prev->vtime_seqlock);
++ write_seqcount_end(&prev->vtime_seq);
++ raw_spin_unlock(&prev->vtime_lock);
+
+- write_seqlock(&current->vtime_seqlock);
++ raw_spin_lock(&current->vtime_lock);
++ write_seqcount_begin(&current->vtime_seq);
+ current->vtime_snap_whence = VTIME_SYS;
+ current->vtime_snap = sched_clock_cpu(smp_processor_id());
+- write_sequnlock(&current->vtime_seqlock);
++ write_seqcount_end(&current->vtime_seq);
++ raw_spin_unlock(&current->vtime_lock);
+ }
+
+ void vtime_init_idle(struct task_struct *t, int cpu)
+ {
+ unsigned long flags;
+
+- write_seqlock_irqsave(&t->vtime_seqlock, flags);
++ raw_spin_lock_irqsave(&t->vtime_lock, flags);
++ write_seqcount_begin(&t->vtime_seq);
+ t->vtime_snap_whence = VTIME_SYS;
+ t->vtime_snap = sched_clock_cpu(cpu);
+- write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
++ write_seqcount_end(&t->vtime_seq);
++ raw_spin_unlock_irqrestore(&t->vtime_lock, flags);
+ }
+
+ cputime_t task_gtime(struct task_struct *t)
+@@ -768,13 +786,13 @@ cputime_t task_gtime(struct task_struct
+ cputime_t gtime;
+
+ do {
+- seq = read_seqbegin(&t->vtime_seqlock);
++ seq = read_seqcount_begin(&t->vtime_seq);
+
+ gtime = t->gtime;
+ if (t->flags & PF_VCPU)
+ gtime += vtime_delta(t);
+
+- } while (read_seqretry(&t->vtime_seqlock, seq));
++ } while (read_seqcount_retry(&t->vtime_seq, seq));
+
+ return gtime;
+ }
+@@ -797,7 +815,7 @@ fetch_task_cputime(struct task_struct *t
+ *udelta = 0;
+ *sdelta = 0;
+
+- seq = read_seqbegin(&t->vtime_seqlock);
++ seq = read_seqcount_begin(&t->vtime_seq);
+
+ if (u_dst)
+ *u_dst = *u_src;
+@@ -821,7 +839,7 @@ fetch_task_cputime(struct task_struct *t
+ if (t->vtime_snap_whence == VTIME_SYS)
+ *sdelta = delta;
+ }
+- } while (read_seqretry(&t->vtime_seqlock, seq));
++ } while (read_seqcount_retry(&t->vtime_seq, seq));
+ }
+
+
diff --git a/patches/wait-simple-implementation.patch b/patches/wait-simple-implementation.patch
new file mode 100644
index 00000000000000..4bbd029be7634d
--- /dev/null
+++ b/patches/wait-simple-implementation.patch
@@ -0,0 +1,362 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon Dec 12 12:29:04 2011 +0100
+Subject: wait-simple: Simple waitqueue implementation
+
+wait_queue is a swiss army knife and in most of the cases the
+complexity is not needed. For RT waitqueues are a constant source of
+trouble as we can't convert the head lock to a raw spinlock due to
+fancy and long lasting callbacks.
+
+Provide a slim version, which allows RT to replace wait queues. This
+should go mainline as well, as it lowers memory consumption and
+runtime overhead.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+smp_mb() added by Steven Rostedt to fix a race condition with swait
+wakeups vs adding items to the list.
+---
+ include/linux/wait-simple.h | 207 ++++++++++++++++++++++++++++++++++++++++++++
+ kernel/sched/Makefile | 2
+ kernel/sched/wait-simple.c | 115 ++++++++++++++++++++++++
+ 3 files changed, 323 insertions(+), 1 deletion(-)
+
+--- /dev/null
++++ b/include/linux/wait-simple.h
+@@ -0,0 +1,207 @@
++#ifndef _LINUX_WAIT_SIMPLE_H
++#define _LINUX_WAIT_SIMPLE_H
++
++#include <linux/spinlock.h>
++#include <linux/list.h>
++
++#include <asm/current.h>
++
++struct swaiter {
++ struct task_struct *task;
++ struct list_head node;
++};
++
++#define DEFINE_SWAITER(name) \
++ struct swaiter name = { \
++ .task = current, \
++ .node = LIST_HEAD_INIT((name).node), \
++ }
++
++struct swait_head {
++ raw_spinlock_t lock;
++ struct list_head list;
++};
++
++#define SWAIT_HEAD_INITIALIZER(name) { \
++ .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
++ .list = LIST_HEAD_INIT((name).list), \
++ }
++
++#define DEFINE_SWAIT_HEAD(name) \
++ struct swait_head name = SWAIT_HEAD_INITIALIZER(name)
++
++extern void __init_swait_head(struct swait_head *h, struct lock_class_key *key);
++
++#define init_swait_head(swh) \
++ do { \
++ static struct lock_class_key __key; \
++ \
++ __init_swait_head((swh), &__key); \
++ } while (0)
++
++/*
++ * Waiter functions
++ */
++extern void swait_prepare_locked(struct swait_head *head, struct swaiter *w);
++extern void swait_prepare(struct swait_head *head, struct swaiter *w, int state);
++extern void swait_finish_locked(struct swait_head *head, struct swaiter *w);
++extern void swait_finish(struct swait_head *head, struct swaiter *w);
++
++/* Check whether a head has waiters enqueued */
++static inline bool swaitqueue_active(struct swait_head *h)
++{
++ /* Make sure the condition is visible before checking list_empty() */
++ smp_mb();
++ return !list_empty(&h->list);
++}
++
++/*
++ * Wakeup functions
++ */
++extern unsigned int __swait_wake(struct swait_head *head, unsigned int state, unsigned int num);
++extern unsigned int __swait_wake_locked(struct swait_head *head, unsigned int state, unsigned int num);
++
++#define swait_wake(head) __swait_wake(head, TASK_NORMAL, 1)
++#define swait_wake_interruptible(head) __swait_wake(head, TASK_INTERRUPTIBLE, 1)
++#define swait_wake_all(head) __swait_wake(head, TASK_NORMAL, 0)
++#define swait_wake_all_interruptible(head) __swait_wake(head, TASK_INTERRUPTIBLE, 0)
++
++/*
++ * Event API
++ */
++#define __swait_event(wq, condition) \
++do { \
++ DEFINE_SWAITER(__wait); \
++ \
++ for (;;) { \
++ swait_prepare(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
++ if (condition) \
++ break; \
++ schedule(); \
++ } \
++ swait_finish(&wq, &__wait); \
++} while (0)
++
++/**
++ * swait_event - sleep until a condition gets true
++ * @wq: the waitqueue to wait on
++ * @condition: a C expression for the event to wait for
++ *
++ * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
++ * @condition evaluates to true. The @condition is checked each time
++ * the waitqueue @wq is woken up.
++ *
++ * wake_up() has to be called after changing any variable that could
++ * change the result of the wait condition.
++ */
++#define swait_event(wq, condition) \
++do { \
++ if (condition) \
++ break; \
++ __swait_event(wq, condition); \
++} while (0)
++
++#define __swait_event_interruptible(wq, condition, ret) \
++do { \
++ DEFINE_SWAITER(__wait); \
++ \
++ for (;;) { \
++ swait_prepare(&wq, &__wait, TASK_INTERRUPTIBLE); \
++ if (condition) \
++ break; \
++ if (signal_pending(current)) { \
++ ret = -ERESTARTSYS; \
++ break; \
++ } \
++ schedule(); \
++ } \
++ swait_finish(&wq, &__wait); \
++} while (0)
++
++#define __swait_event_interruptible_timeout(wq, condition, ret) \
++do { \
++ DEFINE_SWAITER(__wait); \
++ \
++ for (;;) { \
++ swait_prepare(&wq, &__wait, TASK_INTERRUPTIBLE); \
++ if (condition) \
++ break; \
++ if (signal_pending(current)) { \
++ ret = -ERESTARTSYS; \
++ break; \
++ } \
++ ret = schedule_timeout(ret); \
++ if (!ret) \
++ break; \
++ } \
++ swait_finish(&wq, &__wait); \
++} while (0)
++
++/**
++ * swait_event_interruptible - sleep until a condition gets true
++ * @wq: the waitqueue to wait on
++ * @condition: a C expression for the event to wait for
++ *
++ * The process is put to sleep (TASK_INTERRUPTIBLE) until the
++ * @condition evaluates to true. The @condition is checked each time
++ * the waitqueue @wq is woken up.
++ *
++ * wake_up() has to be called after changing any variable that could
++ * change the result of the wait condition.
++ */
++#define swait_event_interruptible(wq, condition) \
++({ \
++ int __ret = 0; \
++ if (!(condition)) \
++ __swait_event_interruptible(wq, condition, __ret); \
++ __ret; \
++})
++
++#define swait_event_interruptible_timeout(wq, condition, timeout) \
++({ \
++ int __ret = timeout; \
++ if (!(condition)) \
++ __swait_event_interruptible_timeout(wq, condition, __ret); \
++ __ret; \
++})
++
++#define __swait_event_timeout(wq, condition, ret) \
++do { \
++ DEFINE_SWAITER(__wait); \
++ \
++ for (;;) { \
++ swait_prepare(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
++ if (condition) \
++ break; \
++ ret = schedule_timeout(ret); \
++ if (!ret) \
++ break; \
++ } \
++ swait_finish(&wq, &__wait); \
++} while (0)
++
++/**
++ * swait_event_timeout - sleep until a condition gets true or a timeout elapses
++ * @wq: the waitqueue to wait on
++ * @condition: a C expression for the event to wait for
++ * @timeout: timeout, in jiffies
++ *
++ * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
++ * @condition evaluates to true. The @condition is checked each time
++ * the waitqueue @wq is woken up.
++ *
++ * wake_up() has to be called after changing any variable that could
++ * change the result of the wait condition.
++ *
++ * The function returns 0 if the @timeout elapsed, and the remaining
++ * jiffies if the condition evaluated to true before the timeout elapsed.
++ */
++#define swait_event_timeout(wq, condition, timeout) \
++({ \
++ long __ret = timeout; \
++ if (!(condition)) \
++ __swait_event_timeout(wq, condition, __ret); \
++ __ret; \
++})
++
++#endif
+--- a/kernel/sched/Makefile
++++ b/kernel/sched/Makefile
+@@ -13,7 +13,7 @@ endif
+
+ obj-y += core.o proc.o clock.o cputime.o
+ obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
+-obj-y += wait.o completion.o idle.o
++obj-y += wait.o wait-simple.o completion.o idle.o
+ obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o
+ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
+ obj-$(CONFIG_SCHEDSTATS) += stats.o
+--- /dev/null
++++ b/kernel/sched/wait-simple.c
+@@ -0,0 +1,115 @@
++/*
++ * Simple waitqueues without fancy flags and callbacks
++ *
++ * (C) 2011 Thomas Gleixner <tglx@linutronix.de>
++ *
++ * Based on kernel/wait.c
++ *
++ * For licencing details see kernel-base/COPYING
++ */
++#include <linux/init.h>
++#include <linux/export.h>
++#include <linux/sched.h>
++#include <linux/wait-simple.h>
++
++/* Adds w to head->list. Must be called with head->lock locked. */
++static inline void __swait_enqueue(struct swait_head *head, struct swaiter *w)
++{
++ list_add(&w->node, &head->list);
++ /* We can't let the condition leak before the setting of head */
++ smp_mb();
++}
++
++/* Removes w from head->list. Must be called with head->lock locked. */
++static inline void __swait_dequeue(struct swaiter *w)
++{
++ list_del_init(&w->node);
++}
++
++void __init_swait_head(struct swait_head *head, struct lock_class_key *key)
++{
++ raw_spin_lock_init(&head->lock);
++ lockdep_set_class(&head->lock, key);
++ INIT_LIST_HEAD(&head->list);
++}
++EXPORT_SYMBOL(__init_swait_head);
++
++void swait_prepare_locked(struct swait_head *head, struct swaiter *w)
++{
++ w->task = current;
++ if (list_empty(&w->node))
++ __swait_enqueue(head, w);
++}
++
++void swait_prepare(struct swait_head *head, struct swaiter *w, int state)
++{
++ unsigned long flags;
++
++ raw_spin_lock_irqsave(&head->lock, flags);
++ swait_prepare_locked(head, w);
++ __set_current_state(state);
++ raw_spin_unlock_irqrestore(&head->lock, flags);
++}
++EXPORT_SYMBOL(swait_prepare);
++
++void swait_finish_locked(struct swait_head *head, struct swaiter *w)
++{
++ __set_current_state(TASK_RUNNING);
++ if (w->task)
++ __swait_dequeue(w);
++}
++
++void swait_finish(struct swait_head *head, struct swaiter *w)
++{
++ unsigned long flags;
++
++ __set_current_state(TASK_RUNNING);
++ if (w->task) {
++ raw_spin_lock_irqsave(&head->lock, flags);
++ __swait_dequeue(w);
++ raw_spin_unlock_irqrestore(&head->lock, flags);
++ }
++}
++EXPORT_SYMBOL(swait_finish);
++
++unsigned int
++__swait_wake_locked(struct swait_head *head, unsigned int state, unsigned int num)
++{
++ struct swaiter *curr, *next;
++ int woken = 0;
++
++ list_for_each_entry_safe(curr, next, &head->list, node) {
++ if (wake_up_state(curr->task, state)) {
++ __swait_dequeue(curr);
++ /*
++ * The waiting task can free the waiter as
++ * soon as curr->task = NULL is written,
++ * without taking any locks. A memory barrier
++ * is required here to prevent the following
++ * store to curr->task from getting ahead of
++ * the dequeue operation.
++ */
++ smp_wmb();
++ curr->task = NULL;
++ if (++woken == num)
++ break;
++ }
++ }
++ return woken;
++}
++
++unsigned int
++__swait_wake(struct swait_head *head, unsigned int state, unsigned int num)
++{
++ unsigned long flags;
++ int woken;
++
++ if (!swaitqueue_active(head))
++ return 0;
++
++ raw_spin_lock_irqsave(&head->lock, flags);
++ woken = __swait_wake_locked(head, state, num);
++ raw_spin_unlock_irqrestore(&head->lock, flags);
++ return woken;
++}
++EXPORT_SYMBOL(__swait_wake);
diff --git a/patches/wait.h-include-atomic.h.patch b/patches/wait.h-include-atomic.h.patch
new file mode 100644
index 00000000000000..9274a288c62376
--- /dev/null
+++ b/patches/wait.h-include-atomic.h.patch
@@ -0,0 +1,32 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Mon, 28 Oct 2013 12:19:57 +0100
+Subject: wait.h: include atomic.h
+
+| CC init/main.o
+|In file included from include/linux/mmzone.h:9:0,
+| from include/linux/gfp.h:4,
+| from include/linux/kmod.h:22,
+| from include/linux/module.h:13,
+| from init/main.c:15:
+|include/linux/wait.h: In function ‘wait_on_atomic_t’:
+|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
+| if (atomic_read(val) == 0)
+| ^
+
+This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/wait.h | 1 +
+ 1 file changed, 1 insertion(+)
+
+--- a/include/linux/wait.h
++++ b/include/linux/wait.h
+@@ -8,6 +8,7 @@
+ #include <linux/spinlock.h>
+ #include <asm/current.h>
+ #include <uapi/linux/wait.h>
++#include <linux/atomic.h>
+
+ typedef struct __wait_queue wait_queue_t;
+ typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, void *key);
diff --git a/patches/work-queue-work-around-irqsafe-timer-optimization.patch b/patches/work-queue-work-around-irqsafe-timer-optimization.patch
new file mode 100644
index 00000000000000..20956776fff3bc
--- /dev/null
+++ b/patches/work-queue-work-around-irqsafe-timer-optimization.patch
@@ -0,0 +1,132 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 01 Jul 2013 11:02:42 +0200
+Subject: workqueue: Prevent workqueue versus ata-piix livelock
+
+An Intel i7 system regularly detected rcu_preempt stalls after the kernel
+was upgraded from 3.6-rt to 3.8-rt. When the stall happened, disk I/O was no
+longer possible, unless the system was restarted.
+
+The kernel message was:
+INFO: rcu_preempt self-detected stall on CPU { 6}
+[..]
+NMI backtrace for cpu 6
+CPU 6
+Pid: 119, comm: irq/19-ata_piix Not tainted 3.8.13-rt13 #11 Shuttle Inc. SX58/SX58
+RIP: 0010:[<ffffffff8124ca60>] [<ffffffff8124ca60>] ip_compute_csum+0x30/0x30
+RSP: 0018:ffff880333303cb0 EFLAGS: 00000002
+RAX: 0000000000000006 RBX: 00000000000003e9 RCX: 0000000000000034
+RDX: 0000000000000000 RSI: ffffffff81aa16d0 RDI: 0000000000000001
+RBP: ffff880333303ce8 R08: ffffffff81aa16d0 R09: ffffffff81c1b8cc
+R10: 0000000000000000 R11: 0000000000000000 R12: 000000000005161f
+R13: 0000000000000006 R14: ffffffff81aa16d0 R15: 0000000000000002
+FS: 0000000000000000(0000) GS:ffff880333300000(0000) knlGS:0000000000000000
+CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
+CR2: 0000003c1b2bb420 CR3: 0000000001a0f000 CR4: 00000000000007e0
+DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
+Process irq/19-ata_piix (pid: 119, threadinfo ffff88032d88a000, task ffff88032df80000)
+Stack:
+ffffffff8124cb32 000000000005161e 00000000000003e9 0000000000001000
+0000000000009022 ffffffff81aa16d0 0000000000000002 ffff880333303cf8
+ffffffff8124caa9 ffff880333303d08 ffffffff8124cad2 ffff880333303d28
+Call Trace:
+<IRQ>
+[<ffffffff8124cb32>] ? delay_tsc+0x33/0xe3
+[<ffffffff8124caa9>] __delay+0xf/0x11
+[<ffffffff8124cad2>] __const_udelay+0x27/0x29
+[<ffffffff8102d1fa>] native_safe_apic_wait_icr_idle+0x39/0x45
+[<ffffffff8102dc9b>] __default_send_IPI_dest_field.constprop.0+0x1e/0x58
+[<ffffffff8102dd1e>] default_send_IPI_mask_sequence_phys+0x49/0x7d
+[<ffffffff81030326>] physflat_send_IPI_all+0x17/0x19
+[<ffffffff8102de53>] arch_trigger_all_cpu_backtrace+0x50/0x79
+[<ffffffff810b21d0>] rcu_check_callbacks+0x1cb/0x568
+[<ffffffff81048c9c>] ? raise_softirq+0x2e/0x35
+[<ffffffff81086be0>] ? tick_sched_do_timer+0x38/0x38
+[<ffffffff8104f653>] update_process_times+0x44/0x55
+[<ffffffff81086866>] tick_sched_handle+0x4a/0x59
+[<ffffffff81086c1c>] tick_sched_timer+0x3c/0x5b
+[<ffffffff81062845>] __run_hrtimer+0x9b/0x158
+[<ffffffff810631d8>] hrtimer_interrupt+0x172/0x2aa
+[<ffffffff8102d498>] smp_apic_timer_interrupt+0x76/0x89
+[<ffffffff814d881d>] apic_timer_interrupt+0x6d/0x80
+<EOI>
+[<ffffffff81057cd2>] ? __local_lock_irqsave+0x17/0x4a
+[<ffffffff81059336>] try_to_grab_pending+0x42/0x17e
+[<ffffffff8105a699>] mod_delayed_work_on+0x32/0x88
+[<ffffffff8105a70b>] mod_delayed_work+0x1c/0x1e
+[<ffffffff8122ae84>] blk_run_queue_async+0x37/0x39
+[<ffffffff81230985>] flush_end_io+0xf1/0x107
+[<ffffffff8122e0da>] blk_finish_request+0x21e/0x264
+[<ffffffff8122e162>] blk_end_bidi_request+0x42/0x60
+[<ffffffff8122e1ba>] blk_end_request+0x10/0x12
+[<ffffffff8132de46>] scsi_io_completion+0x1bf/0x492
+[<ffffffff81335cec>] ? sd_done+0x298/0x2ef
+[<ffffffff81325a02>] scsi_finish_command+0xe9/0xf2
+[<ffffffff8132dbcb>] scsi_softirq_done+0x106/0x10f
+[<ffffffff812333d3>] blk_done_softirq+0x77/0x87
+[<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1
+[<ffffffff810aa820>] ? irq_thread_fn+0x3a/0x3a
+[<ffffffff81048466>] local_bh_enable+0x43/0x72
+[<ffffffff810aa866>] irq_forced_thread_fn+0x46/0x52
+[<ffffffff810ab089>] irq_thread+0x8c/0x17c
+[<ffffffff810ab179>] ? irq_thread+0x17c/0x17c
+[<ffffffff810aaffd>] ? wake_threads_waitq+0x44/0x44
+[<ffffffff8105eb18>] kthread+0x8d/0x95
+[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
+[<ffffffff814d7b7c>] ret_from_fork+0x7c/0xb0
+[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
+
+The state of softirqd of this CPU at the time of the crash was:
+ksoftirqd/6 R running task 0 53 2 0x00000000
+ffff88032fc39d18 0000000000000046 ffff88033330c4c0 ffff8803303f4710
+ffff88032fc39fd8 ffff88032fc39fd8 0000000000000000 0000000000062500
+ffff88032df88000 ffff8803303f4710 0000000000000000 ffff88032fc38000
+Call Trace:
+[<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c
+[<ffffffff814d178c>] preempt_schedule+0x61/0x76
+[<ffffffff8106cccf>] migrate_enable+0xe5/0x1df
+[<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c
+[<ffffffff8104ef52>] run_timer_softirq+0x161/0x1d6
+[<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1
+[<ffffffff8104840b>] run_ksoftirqd+0x2d/0x45
+[<ffffffff8106658a>] smpboot_thread_fn+0x2ea/0x308
+[<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc
+[<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc
+[<ffffffff8105eb18>] kthread+0x8d/0x95
+[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
+[<ffffffff814d7afc>] ret_from_fork+0x7c/0xb0
+[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
+
+Apparently, the softirq demon and the ata_piix IRQ handler were waiting
+for each other to finish ending up in a livelock. After the below patch
+was applied, the system no longer crashes.
+
+Reported-by: Carsten Emde <C.Emde@osadl.org>
+Proposed-by: Thomas Gleixner <tglx@linutronix.de>
+Tested by: Carsten Emde <C.Emde@osadl.org>
+Signed-off-by: Carsten Emde <C.Emde@osadl.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ kernel/workqueue.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -49,6 +49,7 @@
+ #include <linux/moduleparam.h>
+ #include <linux/uaccess.h>
+ #include <linux/locallock.h>
++#include <linux/delay.h>
+
+ #include "workqueue_internal.h"
+
+@@ -1239,7 +1240,7 @@ static int try_to_grab_pending(struct wo
+ local_unlock_irqrestore(pendingb_lock, *flags);
+ if (work_is_canceling(work))
+ return -ENOENT;
+- cpu_relax();
++ cpu_chill();
+ return -EAGAIN;
+ }
+
diff --git a/patches/work-simple-Simple-work-queue-implemenation.patch b/patches/work-simple-Simple-work-queue-implemenation.patch
new file mode 100644
index 00000000000000..c1f3c0bb98ef30
--- /dev/null
+++ b/patches/work-simple-Simple-work-queue-implemenation.patch
@@ -0,0 +1,232 @@
+From: Daniel Wagner <daniel.wagner@bmw-carit.de>
+Date: Fri, 11 Jul 2014 15:26:11 +0200
+Subject: work-simple: Simple work queue implemenation
+
+Provides a framework for enqueuing callbacks from irq context
+PREEMPT_RT_FULL safe. The callbacks are executed in kthread context.
+
+Bases on wait-simple.
+
+Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
+Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ include/linux/work-simple.h | 24 ++++++
+ kernel/sched/Makefile | 2
+ kernel/sched/work-simple.c | 172 ++++++++++++++++++++++++++++++++++++++++++++
+ 3 files changed, 197 insertions(+), 1 deletion(-)
+ create mode 100644 include/linux/work-simple.h
+ create mode 100644 kernel/sched/work-simple.c
+
+--- /dev/null
++++ b/include/linux/work-simple.h
+@@ -0,0 +1,24 @@
++#ifndef _LINUX_SWORK_H
++#define _LINUX_SWORK_H
++
++#include <linux/list.h>
++
++struct swork_event {
++ struct list_head item;
++ unsigned long flags;
++ void (*func)(struct swork_event *);
++};
++
++static inline void INIT_SWORK(struct swork_event *event,
++ void (*func)(struct swork_event *))
++{
++ event->flags = 0;
++ event->func = func;
++}
++
++bool swork_queue(struct swork_event *sev);
++
++int swork_get(void);
++void swork_put(void);
++
++#endif /* _LINUX_SWORK_H */
+--- a/kernel/sched/Makefile
++++ b/kernel/sched/Makefile
+@@ -13,7 +13,7 @@ endif
+
+ obj-y += core.o proc.o clock.o cputime.o
+ obj-y += idle_task.o fair.o rt.o deadline.o stop_task.o
+-obj-y += wait.o wait-simple.o completion.o idle.o
++obj-y += wait.o wait-simple.o work-simple.o completion.o idle.o
+ obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o
+ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
+ obj-$(CONFIG_SCHEDSTATS) += stats.o
+--- /dev/null
++++ b/kernel/sched/work-simple.c
+@@ -0,0 +1,172 @@
++/*
++ * Copyright (C) 2014 BMW Car IT GmbH, Daniel Wagner daniel.wagner@bmw-carit.de
++ *
++ * Provides a framework for enqueuing callbacks from irq context
++ * PREEMPT_RT_FULL safe. The callbacks are executed in kthread context.
++ */
++
++#include <linux/wait-simple.h>
++#include <linux/work-simple.h>
++#include <linux/kthread.h>
++#include <linux/slab.h>
++#include <linux/spinlock.h>
++
++#define SWORK_EVENT_PENDING (1 << 0)
++
++static DEFINE_MUTEX(worker_mutex);
++static struct sworker *glob_worker;
++
++struct sworker {
++ struct list_head events;
++ struct swait_head wq;
++
++ raw_spinlock_t lock;
++
++ struct task_struct *task;
++ int refs;
++};
++
++static bool swork_readable(struct sworker *worker)
++{
++ bool r;
++
++ if (kthread_should_stop())
++ return true;
++
++ raw_spin_lock_irq(&worker->lock);
++ r = !list_empty(&worker->events);
++ raw_spin_unlock_irq(&worker->lock);
++
++ return r;
++}
++
++static int swork_kthread(void *arg)
++{
++ struct sworker *worker = arg;
++
++ for (;;) {
++ swait_event_interruptible(worker->wq,
++ swork_readable(worker));
++ if (kthread_should_stop())
++ break;
++
++ raw_spin_lock_irq(&worker->lock);
++ while (!list_empty(&worker->events)) {
++ struct swork_event *sev;
++
++ sev = list_first_entry(&worker->events,
++ struct swork_event, item);
++ list_del(&sev->item);
++ raw_spin_unlock_irq(&worker->lock);
++
++ WARN_ON_ONCE(!test_and_clear_bit(SWORK_EVENT_PENDING,
++ &sev->flags));
++ sev->func(sev);
++ raw_spin_lock_irq(&worker->lock);
++ }
++ raw_spin_unlock_irq(&worker->lock);
++ }
++ return 0;
++}
++
++static struct sworker *swork_create(void)
++{
++ struct sworker *worker;
++
++ worker = kzalloc(sizeof(*worker), GFP_KERNEL);
++ if (!worker)
++ return ERR_PTR(-ENOMEM);
++
++ INIT_LIST_HEAD(&worker->events);
++ raw_spin_lock_init(&worker->lock);
++ init_swait_head(&worker->wq);
++
++ worker->task = kthread_run(swork_kthread, worker, "kswork");
++ if (IS_ERR(worker->task)) {
++ kfree(worker);
++ return ERR_PTR(-ENOMEM);
++ }
++
++ return worker;
++}
++
++static void swork_destroy(struct sworker *worker)
++{
++ kthread_stop(worker->task);
++
++ WARN_ON(!list_empty(&worker->events));
++ kfree(worker);
++}
++
++/**
++ * swork_queue - queue swork
++ *
++ * Returns %false if @work was already on a queue, %true otherwise.
++ *
++ * The work is queued and processed on a random CPU
++ */
++bool swork_queue(struct swork_event *sev)
++{
++ unsigned long flags;
++
++ if (test_and_set_bit(SWORK_EVENT_PENDING, &sev->flags))
++ return false;
++
++ raw_spin_lock_irqsave(&glob_worker->lock, flags);
++ list_add_tail(&sev->item, &glob_worker->events);
++ raw_spin_unlock_irqrestore(&glob_worker->lock, flags);
++
++ swait_wake(&glob_worker->wq);
++ return true;
++}
++EXPORT_SYMBOL_GPL(swork_queue);
++
++/**
++ * swork_get - get an instance of the sworker
++ *
++ * Returns an negative error code if the initialization if the worker did not
++ * work, %0 otherwise.
++ *
++ */
++int swork_get(void)
++{
++ struct sworker *worker;
++
++ mutex_lock(&worker_mutex);
++ if (!glob_worker) {
++ worker = swork_create();
++ if (IS_ERR(worker)) {
++ mutex_unlock(&worker_mutex);
++ return -ENOMEM;
++ }
++
++ glob_worker = worker;
++ }
++
++ glob_worker->refs++;
++ mutex_unlock(&worker_mutex);
++
++ return 0;
++}
++EXPORT_SYMBOL_GPL(swork_get);
++
++/**
++ * swork_put - puts an instance of the sworker
++ *
++ * Will destroy the sworker thread. This function must not be called until all
++ * queued events have been completed.
++ */
++void swork_put(void)
++{
++ mutex_lock(&worker_mutex);
++
++ glob_worker->refs--;
++ if (glob_worker->refs > 0)
++ goto out;
++
++ swork_destroy(glob_worker);
++ glob_worker = NULL;
++out:
++ mutex_unlock(&worker_mutex);
++}
++EXPORT_SYMBOL_GPL(swork_put);
diff --git a/patches/workqueue-distangle-from-rq-lock.patch b/patches/workqueue-distangle-from-rq-lock.patch
new file mode 100644
index 00000000000000..64e1a531f59cf7
--- /dev/null
+++ b/patches/workqueue-distangle-from-rq-lock.patch
@@ -0,0 +1,260 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed Jun 22 19:47:03 2011 +0200
+Subject: sched: Distangle worker accounting from rqlock
+
+The worker accounting for cpu bound workers is plugged into the core
+scheduler code and the wakeup code. This is not a hard requirement and
+can be avoided by keeping track of the state in the workqueue code
+itself.
+
+Keep track of the sleeping state in the worker itself and call the
+notifier before entering the core scheduler. There might be false
+positives when the task is woken between that call and actually
+scheduling, but that's not really different from scheduling and being
+woken immediately after switching away. There is also no harm from
+updating nr_running when the task returns from scheduling instead of
+accounting it in the wakeup code.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Tejun Heo <tj@kernel.org>
+Cc: Jens Axboe <axboe@kernel.dk>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ kernel/sched/core.c | 70 +++++++++-----------------------------------
+ kernel/workqueue.c | 55 ++++++++++++++--------------------
+ kernel/workqueue_internal.h | 5 +--
+ 3 files changed, 41 insertions(+), 89 deletions(-)
+
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1517,10 +1517,6 @@ static void ttwu_activate(struct rq *rq,
+ {
+ activate_task(rq, p, en_flags);
+ p->on_rq = TASK_ON_RQ_QUEUED;
+-
+- /* if a worker is waking up, notify workqueue */
+- if (p->flags & PF_WQ_WORKER)
+- wq_worker_waking_up(p, cpu_of(rq));
+ }
+
+ /*
+@@ -1797,42 +1793,6 @@ try_to_wake_up(struct task_struct *p, un
+ }
+
+ /**
+- * try_to_wake_up_local - try to wake up a local task with rq lock held
+- * @p: the thread to be awakened
+- *
+- * Put @p on the run-queue if it's not already there. The caller must
+- * ensure that this_rq() is locked, @p is bound to this_rq() and not
+- * the current task.
+- */
+-static void try_to_wake_up_local(struct task_struct *p)
+-{
+- struct rq *rq = task_rq(p);
+-
+- if (WARN_ON_ONCE(rq != this_rq()) ||
+- WARN_ON_ONCE(p == current))
+- return;
+-
+- lockdep_assert_held(&rq->lock);
+-
+- if (!raw_spin_trylock(&p->pi_lock)) {
+- raw_spin_unlock(&rq->lock);
+- raw_spin_lock(&p->pi_lock);
+- raw_spin_lock(&rq->lock);
+- }
+-
+- if (!(p->state & TASK_NORMAL))
+- goto out;
+-
+- if (!task_on_rq_queued(p))
+- ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+-
+- ttwu_do_wakeup(rq, p, 0);
+- ttwu_stat(p, smp_processor_id(), 0);
+-out:
+- raw_spin_unlock(&p->pi_lock);
+-}
+-
+-/**
+ * wake_up_process - Wake up a specific process
+ * @p: The process to be woken up.
+ *
+@@ -2995,21 +2955,6 @@ static void __sched __schedule(void)
+ } else {
+ deactivate_task(rq, prev, DEQUEUE_SLEEP);
+ prev->on_rq = 0;
+-
+- /*
+- * If a worker went to sleep, notify and ask workqueue
+- * whether it wants to wake up a task to maintain
+- * concurrency.
+- * Only call wake up if prev isn't blocked on a sleeping
+- * spin lock.
+- */
+- if (prev->flags & PF_WQ_WORKER && !prev->saved_state) {
+- struct task_struct *to_wakeup;
+-
+- to_wakeup = wq_worker_sleeping(prev, cpu);
+- if (to_wakeup)
+- try_to_wake_up_local(to_wakeup);
+- }
+ }
+ switch_count = &prev->nvcsw;
+ }
+@@ -3041,6 +2986,14 @@ static inline void sched_submit_work(str
+ {
+ if (!tsk->state || tsk_is_pi_blocked(tsk))
+ return;
++
++ /*
++ * If a worker went to sleep, notify and ask workqueue whether
++ * it wants to wake up a task to maintain concurrency.
++ */
++ if (tsk->flags & PF_WQ_WORKER)
++ wq_worker_sleeping(tsk);
++
+ /*
+ * If we are going to sleep and we have plugged IO queued,
+ * make sure to submit it to avoid deadlocks.
+@@ -3049,6 +3002,12 @@ static inline void sched_submit_work(str
+ blk_schedule_flush_plug(tsk);
+ }
+
++static void sched_update_worker(struct task_struct *tsk)
++{
++ if (tsk->flags & PF_WQ_WORKER)
++ wq_worker_running(tsk);
++}
++
+ asmlinkage __visible void __sched schedule(void)
+ {
+ struct task_struct *tsk = current;
+@@ -3057,6 +3016,7 @@ asmlinkage __visible void __sched schedu
+ do {
+ __schedule();
+ } while (need_resched());
++ sched_update_worker(tsk);
+ }
+ EXPORT_SYMBOL(schedule);
+
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -804,44 +804,31 @@ static void wake_up_worker(struct worker
+ }
+
+ /**
+- * wq_worker_waking_up - a worker is waking up
+- * @task: task waking up
+- * @cpu: CPU @task is waking up to
++ * wq_worker_running - a worker is running again
++ * @task: task returning from sleep
+ *
+- * This function is called during try_to_wake_up() when a worker is
+- * being awoken.
+- *
+- * CONTEXT:
+- * spin_lock_irq(rq->lock)
++ * This function is called when a worker returns from schedule()
+ */
+-void wq_worker_waking_up(struct task_struct *task, int cpu)
++void wq_worker_running(struct task_struct *task)
+ {
+ struct worker *worker = kthread_data(task);
+
+- if (!(worker->flags & WORKER_NOT_RUNNING)) {
+- WARN_ON_ONCE(worker->pool->cpu != cpu);
++ if (!worker->sleeping)
++ return;
++ if (!(worker->flags & WORKER_NOT_RUNNING))
+ atomic_inc(&worker->pool->nr_running);
+- }
++ worker->sleeping = 0;
+ }
+
+ /**
+ * wq_worker_sleeping - a worker is going to sleep
+ * @task: task going to sleep
+- * @cpu: CPU in question, must be the current CPU number
+- *
+- * This function is called during schedule() when a busy worker is
+- * going to sleep. Worker on the same cpu can be woken up by
+- * returning pointer to its task.
+- *
+- * CONTEXT:
+- * spin_lock_irq(rq->lock)
+- *
+- * Return:
+- * Worker task on @cpu to wake up, %NULL if none.
++ * This function is called from schedule() when a busy worker is
++ * going to sleep.
+ */
+-struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu)
++void wq_worker_sleeping(struct task_struct *task)
+ {
+- struct worker *worker = kthread_data(task), *to_wakeup = NULL;
++ struct worker *next, *worker = kthread_data(task);
+ struct worker_pool *pool;
+
+ /*
+@@ -850,14 +837,15 @@ struct task_struct *wq_worker_sleeping(s
+ * checking NOT_RUNNING.
+ */
+ if (worker->flags & WORKER_NOT_RUNNING)
+- return NULL;
++ return;
+
+ pool = worker->pool;
+
+- /* this can only happen on the local cpu */
+- if (WARN_ON_ONCE(cpu != raw_smp_processor_id() || pool->cpu != cpu))
+- return NULL;
++ if (WARN_ON_ONCE(worker->sleeping))
++ return;
+
++ worker->sleeping = 1;
++ spin_lock_irq(&pool->lock);
+ /*
+ * The counterpart of the following dec_and_test, implied mb,
+ * worklist not empty test sequence is in insert_work().
+@@ -870,9 +858,12 @@ struct task_struct *wq_worker_sleeping(s
+ * lock is safe.
+ */
+ if (atomic_dec_and_test(&pool->nr_running) &&
+- !list_empty(&pool->worklist))
+- to_wakeup = first_idle_worker(pool);
+- return to_wakeup ? to_wakeup->task : NULL;
++ !list_empty(&pool->worklist)) {
++ next = first_idle_worker(pool);
++ if (next)
++ wake_up_process(next->task);
++ }
++ spin_unlock_irq(&pool->lock);
+ }
+
+ /**
+--- a/kernel/workqueue_internal.h
++++ b/kernel/workqueue_internal.h
+@@ -43,6 +43,7 @@ struct worker {
+ unsigned long last_active; /* L: last active timestamp */
+ unsigned int flags; /* X: flags */
+ int id; /* I: worker id */
++ int sleeping; /* None */
+
+ /*
+ * Opaque string set with work_set_desc(). Printed out with task
+@@ -68,7 +69,7 @@ static inline struct worker *current_wq_
+ * Scheduler hooks for concurrency managed workqueue. Only to be used from
+ * sched/core.c and workqueue.c.
+ */
+-void wq_worker_waking_up(struct task_struct *task, int cpu);
+-struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu);
++void wq_worker_running(struct task_struct *task);
++void wq_worker_sleeping(struct task_struct *task);
+
+ #endif /* _KERNEL_WORKQUEUE_INTERNAL_H */
diff --git a/patches/workqueue-prevent-deadlock-stall.patch b/patches/workqueue-prevent-deadlock-stall.patch
new file mode 100644
index 00000000000000..6f06c1d7996b39
--- /dev/null
+++ b/patches/workqueue-prevent-deadlock-stall.patch
@@ -0,0 +1,200 @@
+Subject: workqueue: Prevent deadlock/stall on RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Fri, 27 Jun 2014 16:24:52 +0200 (CEST)
+
+Austin reported a XFS deadlock/stall on RT where scheduled work gets
+never exececuted and tasks are waiting for each other for ever.
+
+The underlying problem is the modification of the RT code to the
+handling of workers which are about to go to sleep. In mainline a
+worker thread which goes to sleep wakes an idle worker if there is
+more work to do. This happens from the guts of the schedule()
+function. On RT this must be outside and the accessed data structures
+are not protected against scheduling due to the spinlock to rtmutex
+conversion. So the naive solution to this was to move the code outside
+of the scheduler and protect the data structures by the pool
+lock. That approach turned out to be a little naive as we cannot call
+into that code when the thread blocks on a lock, as it is not allowed
+to block on two locks in parallel. So we dont call into the worker
+wakeup magic when the worker is blocked on a lock, which causes the
+deadlock/stall observed by Austin and Mike.
+
+Looking deeper into that worker code it turns out that the only
+relevant data structure which needs to be protected is the list of
+idle workers which can be woken up.
+
+So the solution is to protect the list manipulation operations with
+preempt_enable/disable pairs on RT and call unconditionally into the
+worker code even when the worker is blocked on a lock. The preemption
+protection is safe as there is nothing which can fiddle with the list
+outside of thread context.
+
+Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com>
+Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos
+Cc: Richard Weinberger <richard.weinberger@gmail.com>
+Cc: Steven Rostedt <rostedt@goodmis.org>
+
+---
+ kernel/sched/core.c | 7 ++++-
+ kernel/workqueue.c | 61 ++++++++++++++++++++++++++++++++++++++++------------
+ 2 files changed, 53 insertions(+), 15 deletions(-)
+
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -3022,9 +3022,8 @@ static void __sched __schedule(void)
+
+ static inline void sched_submit_work(struct task_struct *tsk)
+ {
+- if (!tsk->state || tsk_is_pi_blocked(tsk))
++ if (!tsk->state)
+ return;
+-
+ /*
+ * If a worker went to sleep, notify and ask workqueue whether
+ * it wants to wake up a task to maintain concurrency.
+@@ -3032,6 +3031,10 @@ static inline void sched_submit_work(str
+ if (tsk->flags & PF_WQ_WORKER)
+ wq_worker_sleeping(tsk);
+
++
++ if (tsk_is_pi_blocked(tsk))
++ return;
++
+ /*
+ * If we are going to sleep and we have plugged IO queued,
+ * make sure to submit it to avoid deadlocks.
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -123,6 +123,11 @@ enum {
+ * cpu or grabbing pool->lock is enough for read access. If
+ * POOL_DISASSOCIATED is set, it's identical to L.
+ *
++ * On RT we need the extra protection via rt_lock_idle_list() for
++ * the list manipulations against read access from
++ * wq_worker_sleeping(). All other places are nicely serialized via
++ * pool->lock.
++ *
+ * A: pool->attach_mutex protected.
+ *
+ * PL: wq_pool_mutex protected.
+@@ -405,6 +410,31 @@ static void workqueue_sysfs_unregister(s
+ if (({ assert_rcu_or_wq_mutex(wq); false; })) { } \
+ else
+
++#ifdef CONFIG_PREEMPT_RT_BASE
++static inline void rt_lock_idle_list(struct worker_pool *pool)
++{
++ preempt_disable();
++}
++static inline void rt_unlock_idle_list(struct worker_pool *pool)
++{
++ preempt_enable();
++}
++static inline void sched_lock_idle_list(struct worker_pool *pool) { }
++static inline void sched_unlock_idle_list(struct worker_pool *pool) { }
++#else
++static inline void rt_lock_idle_list(struct worker_pool *pool) { }
++static inline void rt_unlock_idle_list(struct worker_pool *pool) { }
++static inline void sched_lock_idle_list(struct worker_pool *pool)
++{
++ spin_lock_irq(&pool->lock);
++}
++static inline void sched_unlock_idle_list(struct worker_pool *pool)
++{
++ spin_unlock_irq(&pool->lock);
++}
++#endif
++
++
+ #ifdef CONFIG_DEBUG_OBJECTS_WORK
+
+ static struct debug_obj_descr work_debug_descr;
+@@ -797,10 +827,16 @@ static struct worker *first_idle_worker(
+ */
+ static void wake_up_worker(struct worker_pool *pool)
+ {
+- struct worker *worker = first_idle_worker(pool);
++ struct worker *worker;
++
++ rt_lock_idle_list(pool);
++
++ worker = first_idle_worker(pool);
+
+ if (likely(worker))
+ wake_up_process(worker->task);
++
++ rt_unlock_idle_list(pool);
+ }
+
+ /**
+@@ -828,7 +864,7 @@ void wq_worker_running(struct task_struc
+ */
+ void wq_worker_sleeping(struct task_struct *task)
+ {
+- struct worker *next, *worker = kthread_data(task);
++ struct worker *worker = kthread_data(task);
+ struct worker_pool *pool;
+
+ /*
+@@ -845,25 +881,18 @@ void wq_worker_sleeping(struct task_stru
+ return;
+
+ worker->sleeping = 1;
+- spin_lock_irq(&pool->lock);
++
+ /*
+ * The counterpart of the following dec_and_test, implied mb,
+ * worklist not empty test sequence is in insert_work().
+ * Please read comment there.
+- *
+- * NOT_RUNNING is clear. This means that we're bound to and
+- * running on the local cpu w/ rq lock held and preemption
+- * disabled, which in turn means that none else could be
+- * manipulating idle_list, so dereferencing idle_list without pool
+- * lock is safe.
+ */
+ if (atomic_dec_and_test(&pool->nr_running) &&
+ !list_empty(&pool->worklist)) {
+- next = first_idle_worker(pool);
+- if (next)
+- wake_up_process(next->task);
++ sched_lock_idle_list(pool);
++ wake_up_worker(pool);
++ sched_unlock_idle_list(pool);
+ }
+- spin_unlock_irq(&pool->lock);
+ }
+
+ /**
+@@ -1554,7 +1583,9 @@ static void worker_enter_idle(struct wor
+ worker->last_active = jiffies;
+
+ /* idle_list is LIFO */
++ rt_lock_idle_list(pool);
+ list_add(&worker->entry, &pool->idle_list);
++ rt_unlock_idle_list(pool);
+
+ if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
+ mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
+@@ -1587,7 +1618,9 @@ static void worker_leave_idle(struct wor
+ return;
+ worker_clr_flags(worker, WORKER_IDLE);
+ pool->nr_idle--;
++ rt_lock_idle_list(pool);
+ list_del_init(&worker->entry);
++ rt_unlock_idle_list(pool);
+ }
+
+ static struct worker *alloc_worker(int node)
+@@ -1755,7 +1788,9 @@ static void destroy_worker(struct worker
+ pool->nr_workers--;
+ pool->nr_idle--;
+
++ rt_lock_idle_list(pool);
+ list_del_init(&worker->entry);
++ rt_unlock_idle_list(pool);
+ worker->flags |= WORKER_DIE;
+ wake_up_process(worker->task);
+ }
diff --git a/patches/workqueue-use-locallock.patch b/patches/workqueue-use-locallock.patch
new file mode 100644
index 00000000000000..ce3b20786b72a0
--- /dev/null
+++ b/patches/workqueue-use-locallock.patch
@@ -0,0 +1,144 @@
+Subject: workqueue: Use local irq lock instead of irq disable regions
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 17 Jul 2011 21:42:26 +0200
+
+Use a local_irq_lock as a replacement for irq off regions. We keep the
+semantic of irq-off in regard to the pool->lock and remain preemptible.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/workqueue.c | 31 +++++++++++++++++--------------
+ 1 file changed, 17 insertions(+), 14 deletions(-)
+
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -48,6 +48,7 @@
+ #include <linux/nodemask.h>
+ #include <linux/moduleparam.h>
+ #include <linux/uaccess.h>
++#include <linux/locallock.h>
+
+ #include "workqueue_internal.h"
+
+@@ -329,6 +330,8 @@ EXPORT_SYMBOL_GPL(system_power_efficient
+ struct workqueue_struct *system_freezable_power_efficient_wq __read_mostly;
+ EXPORT_SYMBOL_GPL(system_freezable_power_efficient_wq);
+
++static DEFINE_LOCAL_IRQ_LOCK(pendingb_lock);
++
+ static int worker_thread(void *__worker);
+ static void copy_workqueue_attrs(struct workqueue_attrs *to,
+ const struct workqueue_attrs *from);
+@@ -1065,9 +1068,9 @@ static void put_pwq_unlocked(struct pool
+ * As both pwqs and pools are RCU protected, the
+ * following lock operations are safe.
+ */
+- spin_lock_irq(&pwq->pool->lock);
++ local_spin_lock_irq(pendingb_lock, &pwq->pool->lock);
+ put_pwq(pwq);
+- spin_unlock_irq(&pwq->pool->lock);
++ local_spin_unlock_irq(pendingb_lock, &pwq->pool->lock);
+ }
+ }
+
+@@ -1169,7 +1172,7 @@ static int try_to_grab_pending(struct wo
+ struct worker_pool *pool;
+ struct pool_workqueue *pwq;
+
+- local_irq_save(*flags);
++ local_lock_irqsave(pendingb_lock, *flags);
+
+ /* try to steal the timer if it exists */
+ if (is_dwork) {
+@@ -1233,7 +1236,7 @@ static int try_to_grab_pending(struct wo
+ spin_unlock(&pool->lock);
+ fail:
+ rcu_read_unlock();
+- local_irq_restore(*flags);
++ local_unlock_irqrestore(pendingb_lock, *flags);
+ if (work_is_canceling(work))
+ return -ENOENT;
+ cpu_relax();
+@@ -1305,7 +1308,7 @@ static void __queue_work(int cpu, struct
+ * queued or lose PENDING. Grabbing PENDING and queueing should
+ * happen with IRQ disabled.
+ */
+- WARN_ON_ONCE(!irqs_disabled());
++ WARN_ON_ONCE_NONRT(!irqs_disabled());
+
+ debug_work_activate(work);
+
+@@ -1410,14 +1413,14 @@ bool queue_work_on(int cpu, struct workq
+ bool ret = false;
+ unsigned long flags;
+
+- local_irq_save(flags);
++ local_lock_irqsave(pendingb_lock,flags);
+
+ if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
+ __queue_work(cpu, wq, work);
+ ret = true;
+ }
+
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pendingb_lock, flags);
+ return ret;
+ }
+ EXPORT_SYMBOL(queue_work_on);
+@@ -1484,14 +1487,14 @@ bool queue_delayed_work_on(int cpu, stru
+ unsigned long flags;
+
+ /* read the comment in __queue_work() */
+- local_irq_save(flags);
++ local_lock_irqsave(pendingb_lock, flags);
+
+ if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
+ __queue_delayed_work(cpu, wq, dwork, delay);
+ ret = true;
+ }
+
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pendingb_lock, flags);
+ return ret;
+ }
+ EXPORT_SYMBOL(queue_delayed_work_on);
+@@ -1526,7 +1529,7 @@ bool mod_delayed_work_on(int cpu, struct
+
+ if (likely(ret >= 0)) {
+ __queue_delayed_work(cpu, wq, dwork, delay);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pendingb_lock, flags);
+ }
+
+ /* -ENOENT from try_to_grab_pending() becomes %true */
+@@ -2802,7 +2805,7 @@ static bool __cancel_work_timer(struct w
+
+ /* tell other tasks trying to grab @work to back off */
+ mark_work_canceling(work);
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pendingb_lock, flags);
+
+ flush_work(work);
+ clear_work_data(work);
+@@ -2857,10 +2860,10 @@ EXPORT_SYMBOL_GPL(cancel_work_sync);
+ */
+ bool flush_delayed_work(struct delayed_work *dwork)
+ {
+- local_irq_disable();
++ local_lock_irq(pendingb_lock);
+ if (del_timer_sync(&dwork->timer))
+ __queue_work(dwork->cpu, dwork->wq, &dwork->work);
+- local_irq_enable();
++ local_unlock_irq(pendingb_lock);
+ return flush_work(&dwork->work);
+ }
+ EXPORT_SYMBOL(flush_delayed_work);
+@@ -2895,7 +2898,7 @@ bool cancel_delayed_work(struct delayed_
+
+ set_work_pool_and_clear_pending(&dwork->work,
+ get_work_pool_id(&dwork->work));
+- local_irq_restore(flags);
++ local_unlock_irqrestore(pendingb_lock, flags);
+ return ret;
+ }
+ EXPORT_SYMBOL(cancel_delayed_work);
diff --git a/patches/workqueue-use-rcu.patch b/patches/workqueue-use-rcu.patch
new file mode 100644
index 00000000000000..e03025ee8146cb
--- /dev/null
+++ b/patches/workqueue-use-rcu.patch
@@ -0,0 +1,342 @@
+Subject: workqueue: Use normal rcu
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Wed, 24 Jul 2013 15:26:54 +0200
+
+There is no need for sched_rcu. The undocumented reason why sched_rcu
+is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
+abusing the fact that sched_rcu reader side critical sections are also
+protected by preempt or irq disabled regions.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ kernel/workqueue.c | 92 +++++++++++++++++++++++++++++------------------------
+ 1 file changed, 51 insertions(+), 41 deletions(-)
+
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -125,11 +125,11 @@ enum {
+ *
+ * PL: wq_pool_mutex protected.
+ *
+- * PR: wq_pool_mutex protected for writes. Sched-RCU protected for reads.
++ * PR: wq_pool_mutex protected for writes. RCU protected for reads.
+ *
+ * WQ: wq->mutex protected.
+ *
+- * WR: wq->mutex protected for writes. Sched-RCU protected for reads.
++ * WR: wq->mutex protected for writes. RCU protected for reads.
+ *
+ * MD: wq_mayday_lock protected.
+ */
+@@ -178,7 +178,7 @@ struct worker_pool {
+ atomic_t nr_running ____cacheline_aligned_in_smp;
+
+ /*
+- * Destruction of pool is sched-RCU protected to allow dereferences
++ * Destruction of pool is RCU protected to allow dereferences
+ * from get_work_pool().
+ */
+ struct rcu_head rcu;
+@@ -207,7 +207,7 @@ struct pool_workqueue {
+ /*
+ * Release of unbound pwq is punted to system_wq. See put_pwq()
+ * and pwq_unbound_release_workfn() for details. pool_workqueue
+- * itself is also sched-RCU protected so that the first pwq can be
++ * itself is also RCU protected so that the first pwq can be
+ * determined without grabbing wq->mutex.
+ */
+ struct work_struct unbound_release_work;
+@@ -338,14 +338,14 @@ static void workqueue_sysfs_unregister(s
+ #include <trace/events/workqueue.h>
+
+ #define assert_rcu_or_pool_mutex() \
+- rcu_lockdep_assert(rcu_read_lock_sched_held() || \
++ rcu_lockdep_assert(rcu_read_lock_held() || \
+ lockdep_is_held(&wq_pool_mutex), \
+- "sched RCU or wq_pool_mutex should be held")
++ "RCU or wq_pool_mutex should be held")
+
+ #define assert_rcu_or_wq_mutex(wq) \
+- rcu_lockdep_assert(rcu_read_lock_sched_held() || \
++ rcu_lockdep_assert(rcu_read_lock_held() || \
+ lockdep_is_held(&wq->mutex), \
+- "sched RCU or wq->mutex should be held")
++ "RCU or wq->mutex should be held")
+
+ #define for_each_cpu_worker_pool(pool, cpu) \
+ for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0]; \
+@@ -357,7 +357,7 @@ static void workqueue_sysfs_unregister(s
+ * @pool: iteration cursor
+ * @pi: integer used for iteration
+ *
+- * This must be called either with wq_pool_mutex held or sched RCU read
++ * This must be called either with wq_pool_mutex held or RCU read
+ * locked. If the pool needs to be used beyond the locking in effect, the
+ * caller is responsible for guaranteeing that the pool stays online.
+ *
+@@ -389,7 +389,7 @@ static void workqueue_sysfs_unregister(s
+ * @pwq: iteration cursor
+ * @wq: the target workqueue
+ *
+- * This must be called either with wq->mutex held or sched RCU read locked.
++ * This must be called either with wq->mutex held or RCU read locked.
+ * If the pwq needs to be used beyond the locking in effect, the caller is
+ * responsible for guaranteeing that the pwq stays online.
+ *
+@@ -551,7 +551,7 @@ static int worker_pool_assign_id(struct
+ * @wq: the target workqueue
+ * @node: the node ID
+ *
+- * This must be called either with pwq_lock held or sched RCU read locked.
++ * This must be called either with pwq_lock held or RCU read locked.
+ * If the pwq needs to be used beyond the locking in effect, the caller is
+ * responsible for guaranteeing that the pwq stays online.
+ *
+@@ -655,8 +655,8 @@ static struct pool_workqueue *get_work_p
+ * @work: the work item of interest
+ *
+ * Pools are created and destroyed under wq_pool_mutex, and allows read
+- * access under sched-RCU read lock. As such, this function should be
+- * called under wq_pool_mutex or with preemption disabled.
++ * access under RCU read lock. As such, this function should be
++ * called under wq_pool_mutex or inside of a rcu_read_lock() region.
+ *
+ * All fields of the returned pool are accessible as long as the above
+ * mentioned locking is in effect. If the returned pool needs to be used
+@@ -1062,7 +1062,7 @@ static void put_pwq_unlocked(struct pool
+ {
+ if (pwq) {
+ /*
+- * As both pwqs and pools are sched-RCU protected, the
++ * As both pwqs and pools are RCU protected, the
+ * following lock operations are safe.
+ */
+ spin_lock_irq(&pwq->pool->lock);
+@@ -1188,6 +1188,7 @@ static int try_to_grab_pending(struct wo
+ if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
+ return 0;
+
++ rcu_read_lock();
+ /*
+ * The queueing is in progress, or it is already queued. Try to
+ * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
+@@ -1226,10 +1227,12 @@ static int try_to_grab_pending(struct wo
+ set_work_pool_and_keep_pending(work, pool->id);
+
+ spin_unlock(&pool->lock);
++ rcu_read_unlock();
+ return 1;
+ }
+ spin_unlock(&pool->lock);
+ fail:
++ rcu_read_unlock();
+ local_irq_restore(*flags);
+ if (work_is_canceling(work))
+ return -ENOENT;
+@@ -1310,6 +1313,8 @@ static void __queue_work(int cpu, struct
+ if (unlikely(wq->flags & __WQ_DRAINING) &&
+ WARN_ON_ONCE(!is_chained_work(wq)))
+ return;
++
++ rcu_read_lock();
+ retry:
+ if (req_cpu == WORK_CPU_UNBOUND)
+ cpu = raw_smp_processor_id();
+@@ -1366,10 +1371,8 @@ static void __queue_work(int cpu, struct
+ /* pwq determined, queue */
+ trace_workqueue_queue_work(req_cpu, pwq, work);
+
+- if (WARN_ON(!list_empty(&work->entry))) {
+- spin_unlock(&pwq->pool->lock);
+- return;
+- }
++ if (WARN_ON(!list_empty(&work->entry)))
++ goto out;
+
+ pwq->nr_in_flight[pwq->work_color]++;
+ work_flags = work_color_to_flags(pwq->work_color);
+@@ -1385,7 +1388,9 @@ static void __queue_work(int cpu, struct
+
+ insert_work(pwq, work, worklist, work_flags);
+
++out:
+ spin_unlock(&pwq->pool->lock);
++ rcu_read_unlock();
+ }
+
+ /**
+@@ -2672,14 +2677,14 @@ static bool start_flush_work(struct work
+
+ might_sleep();
+
+- local_irq_disable();
++ rcu_read_lock();
+ pool = get_work_pool(work);
+ if (!pool) {
+- local_irq_enable();
++ rcu_read_unlock();
+ return false;
+ }
+
+- spin_lock(&pool->lock);
++ spin_lock_irq(&pool->lock);
+ /* see the comment in try_to_grab_pending() with the same code */
+ pwq = get_work_pwq(work);
+ if (pwq) {
+@@ -2706,10 +2711,11 @@ static bool start_flush_work(struct work
+ else
+ lock_map_acquire_read(&pwq->wq->lockdep_map);
+ lock_map_release(&pwq->wq->lockdep_map);
+-
++ rcu_read_unlock();
+ return true;
+ already_gone:
+ spin_unlock_irq(&pool->lock);
++ rcu_read_unlock();
+ return false;
+ }
+
+@@ -3147,7 +3153,7 @@ static void rcu_free_pool(struct rcu_hea
+ * put_unbound_pool - put a worker_pool
+ * @pool: worker_pool to put
+ *
+- * Put @pool. If its refcnt reaches zero, it gets destroyed in sched-RCU
++ * Put @pool. If its refcnt reaches zero, it gets destroyed in RCU
+ * safe manner. get_unbound_pool() calls this function on its failure path
+ * and this function should be able to release pools which went through,
+ * successfully or not, init_worker_pool().
+@@ -3201,8 +3207,8 @@ static void put_unbound_pool(struct work
+ del_timer_sync(&pool->idle_timer);
+ del_timer_sync(&pool->mayday_timer);
+
+- /* sched-RCU protected to allow dereferences from get_work_pool() */
+- call_rcu_sched(&pool->rcu, rcu_free_pool);
++ /* RCU protected to allow dereferences from get_work_pool() */
++ call_rcu(&pool->rcu, rcu_free_pool);
+ }
+
+ /**
+@@ -3307,14 +3313,14 @@ static void pwq_unbound_release_workfn(s
+ put_unbound_pool(pool);
+ mutex_unlock(&wq_pool_mutex);
+
+- call_rcu_sched(&pwq->rcu, rcu_free_pwq);
++ call_rcu(&pwq->rcu, rcu_free_pwq);
+
+ /*
+ * If we're the last pwq going away, @wq is already dead and no one
+ * is gonna access it anymore. Schedule RCU free.
+ */
+ if (is_last)
+- call_rcu_sched(&wq->rcu, rcu_free_wq);
++ call_rcu(&wq->rcu, rcu_free_wq);
+ }
+
+ /**
+@@ -3920,7 +3926,7 @@ void destroy_workqueue(struct workqueue_
+ * The base ref is never dropped on per-cpu pwqs. Directly
+ * schedule RCU free.
+ */
+- call_rcu_sched(&wq->rcu, rcu_free_wq);
++ call_rcu(&wq->rcu, rcu_free_wq);
+ } else {
+ /*
+ * We're the sole accessor of @wq at this point. Directly
+@@ -4013,7 +4019,8 @@ bool workqueue_congested(int cpu, struct
+ struct pool_workqueue *pwq;
+ bool ret;
+
+- rcu_read_lock_sched();
++ rcu_read_lock();
++ preempt_disable();
+
+ if (cpu == WORK_CPU_UNBOUND)
+ cpu = smp_processor_id();
+@@ -4024,7 +4031,8 @@ bool workqueue_congested(int cpu, struct
+ pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
+
+ ret = !list_empty(&pwq->delayed_works);
+- rcu_read_unlock_sched();
++ preempt_enable();
++ rcu_read_unlock();
+
+ return ret;
+ }
+@@ -4050,15 +4058,15 @@ unsigned int work_busy(struct work_struc
+ if (work_pending(work))
+ ret |= WORK_BUSY_PENDING;
+
+- local_irq_save(flags);
++ rcu_read_lock();
+ pool = get_work_pool(work);
+ if (pool) {
+- spin_lock(&pool->lock);
++ spin_lock_irqsave(&pool->lock, flags);
+ if (find_worker_executing_work(pool, work))
+ ret |= WORK_BUSY_RUNNING;
+- spin_unlock(&pool->lock);
++ spin_unlock_irqrestore(&pool->lock, flags);
+ }
+- local_irq_restore(flags);
++ rcu_read_unlock();
+
+ return ret;
+ }
+@@ -4247,7 +4255,7 @@ void show_workqueue_state(void)
+ unsigned long flags;
+ int pi;
+
+- rcu_read_lock_sched();
++ rcu_read_lock();
+
+ pr_info("Showing busy workqueues and worker pools:\n");
+
+@@ -4298,7 +4306,7 @@ void show_workqueue_state(void)
+ spin_unlock_irqrestore(&pool->lock, flags);
+ }
+
+- rcu_read_unlock_sched();
++ rcu_read_unlock();
+ }
+
+ /*
+@@ -4648,16 +4656,16 @@ bool freeze_workqueues_busy(void)
+ * nr_active is monotonically decreasing. It's safe
+ * to peek without lock.
+ */
+- rcu_read_lock_sched();
++ rcu_read_lock();
+ for_each_pwq(pwq, wq) {
+ WARN_ON_ONCE(pwq->nr_active < 0);
+ if (pwq->nr_active) {
+ busy = true;
+- rcu_read_unlock_sched();
++ rcu_read_unlock();
+ goto out_unlock;
+ }
+ }
+- rcu_read_unlock_sched();
++ rcu_read_unlock();
+ }
+ out_unlock:
+ mutex_unlock(&wq_pool_mutex);
+@@ -4771,7 +4779,8 @@ static ssize_t wq_pool_ids_show(struct d
+ const char *delim = "";
+ int node, written = 0;
+
+- rcu_read_lock_sched();
++ get_online_cpus();
++ rcu_read_lock();
+ for_each_node(node) {
+ written += scnprintf(buf + written, PAGE_SIZE - written,
+ "%s%d:%d", delim, node,
+@@ -4779,7 +4788,8 @@ static ssize_t wq_pool_ids_show(struct d
+ delim = " ";
+ }
+ written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
+- rcu_read_unlock_sched();
++ rcu_read_unlock();
++ put_online_cpus();
+
+ return written;
+ }
diff --git a/patches/x86-UV-raw_spinlock-conversion.patch b/patches/x86-UV-raw_spinlock-conversion.patch
new file mode 100644
index 00000000000000..f7a749e1894a09
--- /dev/null
+++ b/patches/x86-UV-raw_spinlock-conversion.patch
@@ -0,0 +1,244 @@
+From: Mike Galbraith <umgwanakikbuti@gmail.com>
+Date: Sun, 2 Nov 2014 08:31:37 +0100
+Subject: x86: UV: raw_spinlock conversion
+
+Shrug. Lots of hobbyists have a beast in their basement, right?
+
+
+Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/x86/include/asm/uv/uv_bau.h | 14 +++++++-------
+ arch/x86/include/asm/uv/uv_hub.h | 2 +-
+ arch/x86/kernel/apic/x2apic_uv_x.c | 2 +-
+ arch/x86/platform/uv/tlb_uv.c | 26 +++++++++++++-------------
+ arch/x86/platform/uv/uv_time.c | 21 +++++++++++++--------
+ 5 files changed, 35 insertions(+), 30 deletions(-)
+
+--- a/arch/x86/include/asm/uv/uv_bau.h
++++ b/arch/x86/include/asm/uv/uv_bau.h
+@@ -615,9 +615,9 @@ struct bau_control {
+ cycles_t send_message;
+ cycles_t period_end;
+ cycles_t period_time;
+- spinlock_t uvhub_lock;
+- spinlock_t queue_lock;
+- spinlock_t disable_lock;
++ raw_spinlock_t uvhub_lock;
++ raw_spinlock_t queue_lock;
++ raw_spinlock_t disable_lock;
+ /* tunables */
+ int max_concurr;
+ int max_concurr_const;
+@@ -776,15 +776,15 @@ static inline int atom_asr(short i, stru
+ * to be lowered below the current 'v'. atomic_add_unless can only stop
+ * on equal.
+ */
+-static inline int atomic_inc_unless_ge(spinlock_t *lock, atomic_t *v, int u)
++static inline int atomic_inc_unless_ge(raw_spinlock_t *lock, atomic_t *v, int u)
+ {
+- spin_lock(lock);
++ raw_spin_lock(lock);
+ if (atomic_read(v) >= u) {
+- spin_unlock(lock);
++ raw_spin_unlock(lock);
+ return 0;
+ }
+ atomic_inc(v);
+- spin_unlock(lock);
++ raw_spin_unlock(lock);
+ return 1;
+ }
+
+--- a/arch/x86/include/asm/uv/uv_hub.h
++++ b/arch/x86/include/asm/uv/uv_hub.h
+@@ -492,7 +492,7 @@ struct uv_blade_info {
+ unsigned short nr_online_cpus;
+ unsigned short pnode;
+ short memory_nid;
+- spinlock_t nmi_lock; /* obsolete, see uv_hub_nmi */
++ raw_spinlock_t nmi_lock; /* obsolete, see uv_hub_nmi */
+ unsigned long nmi_count; /* obsolete, see uv_hub_nmi */
+ };
+ extern struct uv_blade_info *uv_blade_info;
+--- a/arch/x86/kernel/apic/x2apic_uv_x.c
++++ b/arch/x86/kernel/apic/x2apic_uv_x.c
+@@ -949,7 +949,7 @@ void __init uv_system_init(void)
+ uv_blade_info[blade].pnode = pnode;
+ uv_blade_info[blade].nr_possible_cpus = 0;
+ uv_blade_info[blade].nr_online_cpus = 0;
+- spin_lock_init(&uv_blade_info[blade].nmi_lock);
++ raw_spin_lock_init(&uv_blade_info[blade].nmi_lock);
+ min_pnode = min(pnode, min_pnode);
+ max_pnode = max(pnode, max_pnode);
+ blade++;
+--- a/arch/x86/platform/uv/tlb_uv.c
++++ b/arch/x86/platform/uv/tlb_uv.c
+@@ -714,9 +714,9 @@ static void destination_plugged(struct b
+
+ quiesce_local_uvhub(hmaster);
+
+- spin_lock(&hmaster->queue_lock);
++ raw_spin_lock(&hmaster->queue_lock);
+ reset_with_ipi(&bau_desc->distribution, bcp);
+- spin_unlock(&hmaster->queue_lock);
++ raw_spin_unlock(&hmaster->queue_lock);
+
+ end_uvhub_quiesce(hmaster);
+
+@@ -736,9 +736,9 @@ static void destination_timeout(struct b
+
+ quiesce_local_uvhub(hmaster);
+
+- spin_lock(&hmaster->queue_lock);
++ raw_spin_lock(&hmaster->queue_lock);
+ reset_with_ipi(&bau_desc->distribution, bcp);
+- spin_unlock(&hmaster->queue_lock);
++ raw_spin_unlock(&hmaster->queue_lock);
+
+ end_uvhub_quiesce(hmaster);
+
+@@ -759,7 +759,7 @@ static void disable_for_period(struct ba
+ cycles_t tm1;
+
+ hmaster = bcp->uvhub_master;
+- spin_lock(&hmaster->disable_lock);
++ raw_spin_lock(&hmaster->disable_lock);
+ if (!bcp->baudisabled) {
+ stat->s_bau_disabled++;
+ tm1 = get_cycles();
+@@ -772,7 +772,7 @@ static void disable_for_period(struct ba
+ }
+ }
+ }
+- spin_unlock(&hmaster->disable_lock);
++ raw_spin_unlock(&hmaster->disable_lock);
+ }
+
+ static void count_max_concurr(int stat, struct bau_control *bcp,
+@@ -835,7 +835,7 @@ static void record_send_stats(cycles_t t
+ */
+ static void uv1_throttle(struct bau_control *hmaster, struct ptc_stats *stat)
+ {
+- spinlock_t *lock = &hmaster->uvhub_lock;
++ raw_spinlock_t *lock = &hmaster->uvhub_lock;
+ atomic_t *v;
+
+ v = &hmaster->active_descriptor_count;
+@@ -968,7 +968,7 @@ static int check_enable(struct bau_contr
+ struct bau_control *hmaster;
+
+ hmaster = bcp->uvhub_master;
+- spin_lock(&hmaster->disable_lock);
++ raw_spin_lock(&hmaster->disable_lock);
+ if (bcp->baudisabled && (get_cycles() >= bcp->set_bau_on_time)) {
+ stat->s_bau_reenabled++;
+ for_each_present_cpu(tcpu) {
+@@ -980,10 +980,10 @@ static int check_enable(struct bau_contr
+ tbcp->period_giveups = 0;
+ }
+ }
+- spin_unlock(&hmaster->disable_lock);
++ raw_spin_unlock(&hmaster->disable_lock);
+ return 0;
+ }
+- spin_unlock(&hmaster->disable_lock);
++ raw_spin_unlock(&hmaster->disable_lock);
+ return -1;
+ }
+
+@@ -1901,9 +1901,9 @@ static void __init init_per_cpu_tunables
+ bcp->cong_reps = congested_reps;
+ bcp->disabled_period = sec_2_cycles(disabled_period);
+ bcp->giveup_limit = giveup_limit;
+- spin_lock_init(&bcp->queue_lock);
+- spin_lock_init(&bcp->uvhub_lock);
+- spin_lock_init(&bcp->disable_lock);
++ raw_spin_lock_init(&bcp->queue_lock);
++ raw_spin_lock_init(&bcp->uvhub_lock);
++ raw_spin_lock_init(&bcp->disable_lock);
+ }
+ }
+
+--- a/arch/x86/platform/uv/uv_time.c
++++ b/arch/x86/platform/uv/uv_time.c
+@@ -58,7 +58,7 @@ static DEFINE_PER_CPU(struct clock_event
+
+ /* There is one of these allocated per node */
+ struct uv_rtc_timer_head {
+- spinlock_t lock;
++ raw_spinlock_t lock;
+ /* next cpu waiting for timer, local node relative: */
+ int next_cpu;
+ /* number of cpus on this node: */
+@@ -178,7 +178,7 @@ static __init int uv_rtc_allocate_timers
+ uv_rtc_deallocate_timers();
+ return -ENOMEM;
+ }
+- spin_lock_init(&head->lock);
++ raw_spin_lock_init(&head->lock);
+ head->ncpus = uv_blade_nr_possible_cpus(bid);
+ head->next_cpu = -1;
+ blade_info[bid] = head;
+@@ -232,7 +232,7 @@ static int uv_rtc_set_timer(int cpu, u64
+ unsigned long flags;
+ int next_cpu;
+
+- spin_lock_irqsave(&head->lock, flags);
++ raw_spin_lock_irqsave(&head->lock, flags);
+
+ next_cpu = head->next_cpu;
+ *t = expires;
+@@ -244,12 +244,12 @@ static int uv_rtc_set_timer(int cpu, u64
+ if (uv_setup_intr(cpu, expires)) {
+ *t = ULLONG_MAX;
+ uv_rtc_find_next_timer(head, pnode);
+- spin_unlock_irqrestore(&head->lock, flags);
++ raw_spin_unlock_irqrestore(&head->lock, flags);
+ return -ETIME;
+ }
+ }
+
+- spin_unlock_irqrestore(&head->lock, flags);
++ raw_spin_unlock_irqrestore(&head->lock, flags);
+ return 0;
+ }
+
+@@ -268,7 +268,7 @@ static int uv_rtc_unset_timer(int cpu, i
+ unsigned long flags;
+ int rc = 0;
+
+- spin_lock_irqsave(&head->lock, flags);
++ raw_spin_lock_irqsave(&head->lock, flags);
+
+ if ((head->next_cpu == bcpu && uv_read_rtc(NULL) >= *t) || force)
+ rc = 1;
+@@ -280,7 +280,7 @@ static int uv_rtc_unset_timer(int cpu, i
+ uv_rtc_find_next_timer(head, pnode);
+ }
+
+- spin_unlock_irqrestore(&head->lock, flags);
++ raw_spin_unlock_irqrestore(&head->lock, flags);
+
+ return rc;
+ }
+@@ -300,13 +300,18 @@ static int uv_rtc_unset_timer(int cpu, i
+ static cycle_t uv_read_rtc(struct clocksource *cs)
+ {
+ unsigned long offset;
++ cycle_t cycles;
+
++ preempt_disable();
+ if (uv_get_min_hub_revision_id() == 1)
+ offset = 0;
+ else
+ offset = (uv_blade_processor_id() * L1_CACHE_BYTES) % PAGE_SIZE;
+
+- return (cycle_t)uv_read_local_mmr(UVH_RTC | offset);
++ cycles = (cycle_t)uv_read_local_mmr(UVH_RTC | offset);
++ preempt_enable();
++
++ return cycles;
+ }
+
+ /*
diff --git a/patches/x86-crypto-reduce-preempt-disabled-regions.patch b/patches/x86-crypto-reduce-preempt-disabled-regions.patch
new file mode 100644
index 00000000000000..0650758353b663
--- /dev/null
+++ b/patches/x86-crypto-reduce-preempt-disabled-regions.patch
@@ -0,0 +1,112 @@
+Subject: x86: crypto: Reduce preempt disabled regions
+From: Peter Zijlstra <peterz@infradead.org>
+Date: Mon, 14 Nov 2011 18:19:27 +0100
+
+Restrict the preempt disabled regions to the actual floating point
+operations and enable preemption for the administrative actions.
+
+This is necessary on RT to avoid that kfree and other operations are
+called with preemption disabled.
+
+Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
+Signed-off-by: Peter Zijlstra <peterz@infradead.org>
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/x86/crypto/aesni-intel_glue.c | 24 +++++++++++++-----------
+ 1 file changed, 13 insertions(+), 11 deletions(-)
+
+--- a/arch/x86/crypto/aesni-intel_glue.c
++++ b/arch/x86/crypto/aesni-intel_glue.c
+@@ -382,14 +382,14 @@ static int ecb_encrypt(struct blkcipher_
+ err = blkcipher_walk_virt(desc, &walk);
+ desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+- kernel_fpu_begin();
+ while ((nbytes = walk.nbytes)) {
++ kernel_fpu_begin();
+ aesni_ecb_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
+- nbytes & AES_BLOCK_MASK);
++ nbytes & AES_BLOCK_MASK);
++ kernel_fpu_end();
+ nbytes &= AES_BLOCK_SIZE - 1;
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+- kernel_fpu_end();
+
+ return err;
+ }
+@@ -406,14 +406,14 @@ static int ecb_decrypt(struct blkcipher_
+ err = blkcipher_walk_virt(desc, &walk);
+ desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+- kernel_fpu_begin();
+ while ((nbytes = walk.nbytes)) {
++ kernel_fpu_begin();
+ aesni_ecb_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes & AES_BLOCK_MASK);
++ kernel_fpu_end();
+ nbytes &= AES_BLOCK_SIZE - 1;
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+- kernel_fpu_end();
+
+ return err;
+ }
+@@ -430,14 +430,14 @@ static int cbc_encrypt(struct blkcipher_
+ err = blkcipher_walk_virt(desc, &walk);
+ desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+- kernel_fpu_begin();
+ while ((nbytes = walk.nbytes)) {
++ kernel_fpu_begin();
+ aesni_cbc_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes & AES_BLOCK_MASK, walk.iv);
++ kernel_fpu_end();
+ nbytes &= AES_BLOCK_SIZE - 1;
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+- kernel_fpu_end();
+
+ return err;
+ }
+@@ -454,14 +454,14 @@ static int cbc_decrypt(struct blkcipher_
+ err = blkcipher_walk_virt(desc, &walk);
+ desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+- kernel_fpu_begin();
+ while ((nbytes = walk.nbytes)) {
++ kernel_fpu_begin();
+ aesni_cbc_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes & AES_BLOCK_MASK, walk.iv);
++ kernel_fpu_end();
+ nbytes &= AES_BLOCK_SIZE - 1;
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+- kernel_fpu_end();
+
+ return err;
+ }
+@@ -513,18 +513,20 @@ static int ctr_crypt(struct blkcipher_de
+ err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
+ desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+- kernel_fpu_begin();
+ while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) {
++ kernel_fpu_begin();
+ aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr,
+ nbytes & AES_BLOCK_MASK, walk.iv);
++ kernel_fpu_end();
+ nbytes &= AES_BLOCK_SIZE - 1;
+ err = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+ if (walk.nbytes) {
++ kernel_fpu_begin();
+ ctr_crypt_final(ctx, &walk);
++ kernel_fpu_end();
+ err = blkcipher_walk_done(desc, &walk, 0);
+ }
+- kernel_fpu_end();
+
+ return err;
+ }
diff --git a/patches/x86-highmem-add-a-already-used-pte-check.patch b/patches/x86-highmem-add-a-already-used-pte-check.patch
new file mode 100644
index 00000000000000..a3493efae2809b
--- /dev/null
+++ b/patches/x86-highmem-add-a-already-used-pte-check.patch
@@ -0,0 +1,22 @@
+From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Date: Mon, 11 Mar 2013 17:09:55 +0100
+Subject: x86/highmem: Add a "already used pte" check
+
+This is a copy from kmap_atomic_prot().
+
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/x86/mm/iomap_32.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/arch/x86/mm/iomap_32.c
++++ b/arch/x86/mm/iomap_32.c
+@@ -66,6 +66,8 @@ void *kmap_atomic_prot_pfn(unsigned long
+ type = kmap_atomic_idx_push();
+ idx = type + KM_TYPE_NR * smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
++ WARN_ON(!pte_none(*(kmap_pte - idx)));
++
+ #ifdef CONFIG_PREEMPT_RT_FULL
+ current->kmap_pte[type] = pte;
+ #endif
diff --git a/patches/x86-io-apic-migra-no-unmask.patch b/patches/x86-io-apic-migra-no-unmask.patch
new file mode 100644
index 00000000000000..76fadb9a913cbf
--- /dev/null
+++ b/patches/x86-io-apic-migra-no-unmask.patch
@@ -0,0 +1,26 @@
+From: Ingo Molnar <mingo@elte.hu>
+Date: Fri, 3 Jul 2009 08:29:27 -0500
+Subject: x86/ioapic: Do not unmask io_apic when interrupt is in progress
+
+With threaded interrupts we might see an interrupt in progress on
+migration. Do not unmask it when this is the case.
+
+Signed-off-by: Ingo Molnar <mingo@elte.hu>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ arch/x86/kernel/apic/io_apic.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/kernel/apic/io_apic.c
++++ b/arch/x86/kernel/apic/io_apic.c
+@@ -1891,7 +1891,8 @@ static bool io_apic_level_ack_pending(st
+ static inline bool ioapic_irqd_mask(struct irq_data *data, struct irq_cfg *cfg)
+ {
+ /* If we are moving the irq we need to mask it */
+- if (unlikely(irqd_is_setaffinity_pending(data))) {
++ if (unlikely(irqd_is_setaffinity_pending(data) &&
++ !irqd_irq_inprogress(data))) {
+ mask_ioapic(cfg);
+ return true;
+ }
diff --git a/patches/x86-kvm-require-const-tsc-for-rt.patch b/patches/x86-kvm-require-const-tsc-for-rt.patch
new file mode 100644
index 00000000000000..7ef8da21b0c1de
--- /dev/null
+++ b/patches/x86-kvm-require-const-tsc-for-rt.patch
@@ -0,0 +1,30 @@
+Subject: x86: kvm Require const tsc for RT
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 06 Nov 2011 12:26:18 +0100
+
+Non constant TSC is a nightmare on bare metal already, but with
+virtualization it becomes a complete disaster because the workarounds
+are horrible latency wise. That's also a preliminary for running RT in
+a guest on top of a RT host.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/x86/kvm/x86.c | 7 +++++++
+ 1 file changed, 7 insertions(+)
+
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -5813,6 +5813,13 @@ int kvm_arch_init(void *opaque)
+ goto out;
+ }
+
++#ifdef CONFIG_PREEMPT_RT_FULL
++ if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
++ printk(KERN_ERR "RT requires X86_FEATURE_CONSTANT_TSC\n");
++ return -EOPNOTSUPP;
++ }
++#endif
++
+ r = kvm_mmu_module_init();
+ if (r)
+ goto out_free_percpu;
diff --git a/patches/x86-mce-timer-hrtimer.patch b/patches/x86-mce-timer-hrtimer.patch
new file mode 100644
index 00000000000000..af21becd3a7d3a
--- /dev/null
+++ b/patches/x86-mce-timer-hrtimer.patch
@@ -0,0 +1,179 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Mon, 13 Dec 2010 16:33:39 +0100
+Subject: x86: Convert mce timer to hrtimer
+
+mce_timer is started in atomic contexts of cpu bringup. This results
+in might_sleep() warnings on RT. Convert mce_timer to a hrtimer to
+avoid this.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+fold in:
+|From: Mike Galbraith <bitbucket@online.de>
+|Date: Wed, 29 May 2013 13:52:13 +0200
+|Subject: [PATCH] x86/mce: fix mce timer interval
+|
+|Seems mce timer fire at the wrong frequency in -rt kernels since roughly
+|forever due to 32 bit overflow. 3.8-rt is also missing a multiplier.
+|
+|Add missing us -> ns conversion and 32 bit overflow prevention.
+|
+|Signed-off-by: Mike Galbraith <bitbucket@online.de>
+|[bigeasy: use ULL instead of u64 cast]
+|Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+---
+ arch/x86/kernel/cpu/mcheck/mce.c | 52 +++++++++++++++------------------------
+ 1 file changed, 20 insertions(+), 32 deletions(-)
+
+--- a/arch/x86/kernel/cpu/mcheck/mce.c
++++ b/arch/x86/kernel/cpu/mcheck/mce.c
+@@ -41,6 +41,7 @@
+ #include <linux/debugfs.h>
+ #include <linux/irq_work.h>
+ #include <linux/export.h>
++#include <linux/jiffies.h>
+
+ #include <asm/processor.h>
+ #include <asm/traps.h>
+@@ -1267,7 +1268,7 @@ void mce_log_therm_throt_event(__u64 sta
+ static unsigned long check_interval = INITIAL_CHECK_INTERVAL;
+
+ static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */
+-static DEFINE_PER_CPU(struct timer_list, mce_timer);
++static DEFINE_PER_CPU(struct hrtimer, mce_timer);
+
+ static unsigned long mce_adjust_timer_default(unsigned long interval)
+ {
+@@ -1276,32 +1277,18 @@ static unsigned long mce_adjust_timer_de
+
+ static unsigned long (*mce_adjust_timer)(unsigned long interval) = mce_adjust_timer_default;
+
+-static void __restart_timer(struct timer_list *t, unsigned long interval)
++static enum hrtimer_restart __restart_timer(struct hrtimer *timer, unsigned long interval)
+ {
+- unsigned long when = jiffies + interval;
+- unsigned long flags;
+-
+- local_irq_save(flags);
+-
+- if (timer_pending(t)) {
+- if (time_before(when, t->expires))
+- mod_timer_pinned(t, when);
+- } else {
+- t->expires = round_jiffies(when);
+- add_timer_on(t, smp_processor_id());
+- }
+-
+- local_irq_restore(flags);
++ if (!interval)
++ return HRTIMER_NORESTART;
++ hrtimer_forward_now(timer, ns_to_ktime(jiffies_to_nsecs(interval)));
++ return HRTIMER_RESTART;
+ }
+
+-static void mce_timer_fn(unsigned long data)
++static enum hrtimer_restart mce_timer_fn(struct hrtimer *timer)
+ {
+- struct timer_list *t = this_cpu_ptr(&mce_timer);
+- int cpu = smp_processor_id();
+ unsigned long iv;
+
+- WARN_ON(cpu != data);
+-
+ iv = __this_cpu_read(mce_next_interval);
+
+ if (mce_available(this_cpu_ptr(&cpu_info))) {
+@@ -1324,7 +1311,7 @@ static void mce_timer_fn(unsigned long d
+
+ done:
+ __this_cpu_write(mce_next_interval, iv);
+- __restart_timer(t, iv);
++ return __restart_timer(timer, iv);
+ }
+
+ /*
+@@ -1332,7 +1319,7 @@ static void mce_timer_fn(unsigned long d
+ */
+ void mce_timer_kick(unsigned long interval)
+ {
+- struct timer_list *t = this_cpu_ptr(&mce_timer);
++ struct hrtimer *t = this_cpu_ptr(&mce_timer);
+ unsigned long iv = __this_cpu_read(mce_next_interval);
+
+ __restart_timer(t, interval);
+@@ -1347,7 +1334,7 @@ static void mce_timer_delete_all(void)
+ int cpu;
+
+ for_each_online_cpu(cpu)
+- del_timer_sync(&per_cpu(mce_timer, cpu));
++ hrtimer_cancel(&per_cpu(mce_timer, cpu));
+ }
+
+ static void mce_do_trigger(struct work_struct *work)
+@@ -1649,7 +1636,7 @@ static void __mcheck_cpu_init_vendor(str
+ }
+ }
+
+-static void mce_start_timer(unsigned int cpu, struct timer_list *t)
++static void mce_start_timer(unsigned int cpu, struct hrtimer *t)
+ {
+ unsigned long iv = check_interval * HZ;
+
+@@ -1658,16 +1645,17 @@ static void mce_start_timer(unsigned int
+
+ per_cpu(mce_next_interval, cpu) = iv;
+
+- t->expires = round_jiffies(jiffies + iv);
+- add_timer_on(t, cpu);
++ hrtimer_start_range_ns(t, ns_to_ktime(jiffies_to_usecs(iv) * 1000ULL),
++ 0, HRTIMER_MODE_REL_PINNED);
+ }
+
+ static void __mcheck_cpu_init_timer(void)
+ {
+- struct timer_list *t = this_cpu_ptr(&mce_timer);
++ struct hrtimer *t = this_cpu_ptr(&mce_timer);
+ unsigned int cpu = smp_processor_id();
+
+- setup_timer(t, mce_timer_fn, cpu);
++ hrtimer_init(t, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
++ t->function = mce_timer_fn;
+ mce_start_timer(cpu, t);
+ }
+
+@@ -2345,6 +2333,8 @@ static void mce_disable_cpu(void *h)
+ if (!mce_available(raw_cpu_ptr(&cpu_info)))
+ return;
+
++ hrtimer_cancel(this_cpu_ptr(&mce_timer));
++
+ if (!(action & CPU_TASKS_FROZEN))
+ cmci_clear();
+ for (i = 0; i < mca_cfg.banks; i++) {
+@@ -2371,6 +2361,7 @@ static void mce_reenable_cpu(void *h)
+ if (b->init)
+ wrmsrl(MSR_IA32_MCx_CTL(i), b->ctl);
+ }
++ __mcheck_cpu_init_timer();
+ }
+
+ /* Get notified when a cpu comes on/off. Be hotplug friendly. */
+@@ -2378,7 +2369,6 @@ static int
+ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
+ {
+ unsigned int cpu = (unsigned long)hcpu;
+- struct timer_list *t = &per_cpu(mce_timer, cpu);
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_ONLINE:
+@@ -2398,11 +2388,9 @@ mce_cpu_callback(struct notifier_block *
+ break;
+ case CPU_DOWN_PREPARE:
+ smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
+- del_timer_sync(t);
+ break;
+ case CPU_DOWN_FAILED:
+ smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
+- mce_start_timer(cpu, t);
+ break;
+ }
+
diff --git a/patches/x86-mce-use-swait-queue-for-mce-wakeups.patch b/patches/x86-mce-use-swait-queue-for-mce-wakeups.patch
new file mode 100644
index 00000000000000..ae8a65288fa476
--- /dev/null
+++ b/patches/x86-mce-use-swait-queue-for-mce-wakeups.patch
@@ -0,0 +1,159 @@
+Subject: x86/mce: use swait queue for mce wakeups
+From: Steven Rostedt <rostedt@goodmis.org>
+Date: Fri, 27 Feb 2015 15:20:37 +0100
+
+We had a customer report a lockup on a 3.0-rt kernel that had the
+following backtrace:
+
+[ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113
+[ffff88107fca3f40] rt_spin_lock at ffffffff81499a56
+[ffff88107fca3f50] __wake_up at ffffffff81043379
+[ffff88107fca3f80] mce_notify_irq at ffffffff81017328
+[ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508
+[ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1
+[ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853
+
+It actually bugged because the lock was taken by the same owner that
+already had that lock. What happened was the thread that was setting
+itself on a wait queue had the lock when an MCE triggered. The MCE
+interrupt does a wake up on its wait list and grabs the same lock.
+
+NOTE: THIS IS NOT A BUG ON MAINLINE
+
+Sorry for yelling, but as I Cc'd mainline maintainers I want them to
+know that this is an PREEMPT_RT bug only. I only Cc'd them for advice.
+
+On PREEMPT_RT the wait queue locks are converted from normal
+"spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above).
+These are not to be taken by hard interrupt context. This usually isn't
+a problem as most all interrupts in PREEMPT_RT are converted into
+schedulable threads. Unfortunately that's not the case with the MCE irq.
+
+As wait queue locks are notorious for long hold times, we can not
+convert them to raw_spin_locks without causing issues with -rt. But
+Thomas has created a "simple-wait" structure that uses raw spin locks
+which may have been a good fit.
+
+Unfortunately, wait queues are not the only issue, as the mce_notify_irq
+also does a schedule_work(), which grabs the workqueue spin locks that
+have the exact same issue.
+
+Thus, this patch I'm proposing is to move the actual work of the MCE
+interrupt into a helper thread that gets woken up on the MCE interrupt
+and does the work in a schedulable context.
+
+NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET
+
+Oops, sorry for yelling again, but I want to stress that I keep the same
+behavior of mainline when PREEMPT_RT is not set. Thus, this only changes
+the MCE behavior when PREEMPT_RT is configured.
+
+Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
+[bigeasy@linutronix: make mce_notify_work() a proper prototype, use
+ kthread_run()]
+Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+[wagi: use work-simple framework to defer work to a kthread]
+Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
+---
+ arch/x86/kernel/cpu/mcheck/mce.c | 68 ++++++++++++++++++++++++++++++++-------
+ 1 file changed, 56 insertions(+), 12 deletions(-)
+
+--- a/arch/x86/kernel/cpu/mcheck/mce.c
++++ b/arch/x86/kernel/cpu/mcheck/mce.c
+@@ -42,6 +42,7 @@
+ #include <linux/irq_work.h>
+ #include <linux/export.h>
+ #include <linux/jiffies.h>
++#include <linux/work-simple.h>
+
+ #include <asm/processor.h>
+ #include <asm/traps.h>
+@@ -1344,6 +1345,56 @@ static void mce_do_trigger(struct work_s
+
+ static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
+
++static void __mce_notify_work(struct swork_event *event)
++{
++ /* Not more than two messages every minute */
++ static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
++
++ /* wake processes polling /dev/mcelog */
++ wake_up_interruptible(&mce_chrdev_wait);
++
++ /*
++ * There is no risk of missing notifications because
++ * work_pending is always cleared before the function is
++ * executed.
++ */
++ if (mce_helper[0] && !work_pending(&mce_trigger_work))
++ schedule_work(&mce_trigger_work);
++
++ if (__ratelimit(&ratelimit))
++ pr_info(HW_ERR "Machine check events logged\n");
++}
++
++#ifdef CONFIG_PREEMPT_RT_FULL
++static bool notify_work_ready __read_mostly;
++static struct swork_event notify_work;
++
++static int mce_notify_work_init(void)
++{
++ int err;
++
++ err = swork_get();
++ if (err)
++ return err;
++
++ INIT_SWORK(&notify_work, __mce_notify_work);
++ notify_work_ready = true;
++ return 0;
++}
++
++static void mce_notify_work(void)
++{
++ if (notify_work_ready)
++ swork_queue(&notify_work);
++}
++#else
++static void mce_notify_work(void)
++{
++ __mce_notify_work(NULL);
++}
++static inline int mce_notify_work_init(void) { return 0; }
++#endif
++
+ /*
+ * Notify the user(s) about new machine check events.
+ * Can be called from interrupt context, but not from machine check/NMI
+@@ -1351,19 +1402,8 @@ static DECLARE_WORK(mce_trigger_work, mc
+ */
+ int mce_notify_irq(void)
+ {
+- /* Not more than two messages every minute */
+- static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
+-
+ if (test_and_clear_bit(0, &mce_need_notify)) {
+- /* wake processes polling /dev/mcelog */
+- wake_up_interruptible(&mce_chrdev_wait);
+-
+- if (mce_helper[0])
+- schedule_work(&mce_trigger_work);
+-
+- if (__ratelimit(&ratelimit))
+- pr_info(HW_ERR "Machine check events logged\n");
+-
++ mce_notify_work();
+ return 1;
+ }
+ return 0;
+@@ -2429,6 +2469,10 @@ static __init int mcheck_init_device(voi
+ goto err_out;
+ }
+
++ err = mce_notify_work_init();
++ if (err)
++ goto err_out;
++
+ if (!zalloc_cpumask_var(&mce_device_initialized, GFP_KERNEL)) {
+ err = -ENOMEM;
+ goto err_out;
diff --git a/patches/x86-preempt-lazy.patch b/patches/x86-preempt-lazy.patch
new file mode 100644
index 00000000000000..4385262f77bed7
--- /dev/null
+++ b/patches/x86-preempt-lazy.patch
@@ -0,0 +1,170 @@
+Subject: x86: Support for lazy preemption
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 01 Nov 2012 11:03:47 +0100
+
+Implement the x86 pieces for lazy preempt.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+---
+ arch/x86/Kconfig | 1 +
+ arch/x86/include/asm/thread_info.h | 6 ++++++
+ arch/x86/kernel/asm-offsets.c | 2 ++
+ arch/x86/kernel/entry_32.S | 20 ++++++++++++++++++--
+ arch/x86/kernel/entry_64.S | 24 ++++++++++++++++++++----
+ 5 files changed, 47 insertions(+), 6 deletions(-)
+
+--- a/arch/x86/Kconfig
++++ b/arch/x86/Kconfig
+@@ -22,6 +22,7 @@ config X86_64
+ ### Arch settings
+ config X86
+ def_bool y
++ select HAVE_PREEMPT_LAZY
+ select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
+ select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
+ select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
+--- a/arch/x86/include/asm/thread_info.h
++++ b/arch/x86/include/asm/thread_info.h
+@@ -55,6 +55,8 @@ struct thread_info {
+ __u32 status; /* thread synchronous flags */
+ __u32 cpu; /* current CPU */
+ int saved_preempt_count;
++ int preempt_lazy_count; /* 0 => lazy preemptable
++ <0 => BUG */
+ mm_segment_t addr_limit;
+ void __user *sysenter_return;
+ unsigned int sig_on_uaccess_error:1;
+@@ -95,6 +97,7 @@ struct thread_info {
+ #define TIF_SYSCALL_EMU 6 /* syscall emulation active */
+ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
+ #define TIF_SECCOMP 8 /* secure computing */
++#define TIF_NEED_RESCHED_LAZY 9 /* lazy rescheduling necessary */
+ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
+ #define TIF_UPROBE 12 /* breakpointed or singlestepping */
+ #define TIF_NOTSC 16 /* TSC is not accessible in userland */
+@@ -119,6 +122,7 @@ struct thread_info {
+ #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
+ #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
+ #define _TIF_SECCOMP (1 << TIF_SECCOMP)
++#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY)
+ #define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
+ #define _TIF_UPROBE (1 << TIF_UPROBE)
+ #define _TIF_NOTSC (1 << TIF_NOTSC)
+@@ -168,6 +172,8 @@ struct thread_info {
+ #define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
+ #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
+
++#define _TIF_NEED_RESCHED_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)
++
+ #define STACK_WARN (THREAD_SIZE/8)
+
+ /*
+--- a/arch/x86/kernel/asm-offsets.c
++++ b/arch/x86/kernel/asm-offsets.c
+@@ -32,6 +32,7 @@ void common(void) {
+ OFFSET(TI_flags, thread_info, flags);
+ OFFSET(TI_status, thread_info, status);
+ OFFSET(TI_addr_limit, thread_info, addr_limit);
++ OFFSET(TI_preempt_lazy_count, thread_info, preempt_lazy_count);
+
+ BLANK();
+ OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
+@@ -71,4 +72,5 @@ void common(void) {
+
+ BLANK();
+ DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
++ DEFINE(_PREEMPT_ENABLED, PREEMPT_ENABLED);
+ }
+--- a/arch/x86/kernel/entry_32.S
++++ b/arch/x86/kernel/entry_32.S
+@@ -359,8 +359,24 @@ END(ret_from_exception)
+ ENTRY(resume_kernel)
+ DISABLE_INTERRUPTS(CLBR_ANY)
+ need_resched:
++ # preempt count == 0 + NEED_RS set?
+ cmpl $0,PER_CPU_VAR(__preempt_count)
++#ifndef CONFIG_PREEMPT_LAZY
+ jnz restore_all
++#else
++ jz test_int_off
++
++ # atleast preempt count == 0 ?
++ cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count)
++ jne restore_all
++
++ cmpl $0,TI_preempt_lazy_count(%ebp) # non-zero preempt_lazy_count ?
++ jnz restore_all
++
++ testl $_TIF_NEED_RESCHED_LAZY, TI_flags(%ebp)
++ jz restore_all
++test_int_off:
++#endif
+ testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path) ?
+ jz restore_all
+ call preempt_schedule_irq
+@@ -594,7 +610,7 @@ ENDPROC(system_call)
+ ALIGN
+ RING0_PTREGS_FRAME # can't unwind into user space anyway
+ work_pending:
+- testb $_TIF_NEED_RESCHED, %cl
++ testl $_TIF_NEED_RESCHED_MASK, %ecx
+ jz work_notifysig
+ work_resched:
+ call schedule
+@@ -607,7 +623,7 @@ ENDPROC(system_call)
+ andl $_TIF_WORK_MASK, %ecx # is there any work to be done other
+ # than syscall tracing?
+ jz restore_all
+- testb $_TIF_NEED_RESCHED, %cl
++ testl $_TIF_NEED_RESCHED_MASK, %ecx
+ jnz work_resched
+
+ work_notifysig: # deal with pending signals and
+--- a/arch/x86/kernel/entry_64.S
++++ b/arch/x86/kernel/entry_64.S
+@@ -370,8 +370,8 @@ GLOBAL(int_with_check)
+ /* First do a reschedule test. */
+ /* edx: work, edi: workmask */
+ int_careful:
+- bt $TIF_NEED_RESCHED,%edx
+- jnc int_very_careful
++ testl $_TIF_NEED_RESCHED_MASK,%edx
++ jz int_very_careful
+ TRACE_IRQS_ON
+ ENABLE_INTERRUPTS(CLBR_NONE)
+ pushq_cfi %rdi
+@@ -776,7 +776,23 @@ retint_swapgs: /* return to user-space
+ bt $9,EFLAGS(%rsp) /* interrupts were off? */
+ jnc 1f
+ 0: cmpl $0,PER_CPU_VAR(__preempt_count)
++#ifndef CONFIG_PREEMPT_LAZY
+ jnz 1f
++#else
++ jz do_preempt_schedule_irq
++
++ # atleast preempt count == 0 ?
++ cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count)
++ jnz 1f
++
++ GET_THREAD_INFO(%rcx)
++ cmpl $0, TI_preempt_lazy_count(%rcx)
++ jnz 1f
++
++ bt $TIF_NEED_RESCHED_LAZY,TI_flags(%rcx)
++ jnc 1f
++do_preempt_schedule_irq:
++#endif
+ call preempt_schedule_irq
+ jmp 0b
+ 1:
+@@ -846,8 +862,8 @@ ENTRY(native_iret)
+ /* edi: workmask, edx: work */
+ retint_careful:
+ CFI_RESTORE_STATE
+- bt $TIF_NEED_RESCHED,%edx
+- jnc retint_signal
++ testl $_TIF_NEED_RESCHED_MASK,%edx
++ jz retint_signal
+ TRACE_IRQS_ON
+ ENABLE_INTERRUPTS(CLBR_NONE)
+ pushq_cfi %rdi
diff --git a/patches/x86-stackprot-no-random-on-rt.patch b/patches/x86-stackprot-no-random-on-rt.patch
new file mode 100644
index 00000000000000..5c3ca09908af5e
--- /dev/null
+++ b/patches/x86-stackprot-no-random-on-rt.patch
@@ -0,0 +1,47 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Thu, 16 Dec 2010 14:25:18 +0100
+Subject: x86: stackprotector: Avoid random pool on rt
+
+CPU bringup calls into the random pool to initialize the stack
+canary. During boot that works nicely even on RT as the might sleep
+checks are disabled. During CPU hotplug the might sleep checks
+trigger. Making the locks in random raw is a major PITA, so avoid the
+call on RT is the only sensible solution. This is basically the same
+randomness which we get during boot where the random pool has no
+entropy and we rely on the TSC randomnness.
+
+Reported-by: Carsten Emde <carsten.emde@osadl.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ arch/x86/include/asm/stackprotector.h | 10 +++++++++-
+ 1 file changed, 9 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/include/asm/stackprotector.h
++++ b/arch/x86/include/asm/stackprotector.h
+@@ -57,7 +57,7 @@
+ */
+ static __always_inline void boot_init_stack_canary(void)
+ {
+- u64 canary;
++ u64 uninitialized_var(canary);
+ u64 tsc;
+
+ #ifdef CONFIG_X86_64
+@@ -68,8 +68,16 @@ static __always_inline void boot_init_st
+ * of randomness. The TSC only matters for very early init,
+ * there it already has some randomness on most systems. Later
+ * on during the bootup the random pool has true entropy too.
++ *
++ * For preempt-rt we need to weaken the randomness a bit, as
++ * we can't call into the random generator from atomic context
++ * due to locking constraints. We just leave canary
++ * uninitialized and use the TSC based randomness on top of
++ * it.
+ */
++#ifndef CONFIG_PREEMPT_RT_FULL
+ get_random_bytes(&canary, sizeof(canary));
++#endif
+ tsc = __native_read_tsc();
+ canary += tsc + (tsc << 32UL);
+
diff --git a/patches/x86-use-gen-rwsem-spinlocks-rt.patch b/patches/x86-use-gen-rwsem-spinlocks-rt.patch
new file mode 100644
index 00000000000000..a32937dd889c76
--- /dev/null
+++ b/patches/x86-use-gen-rwsem-spinlocks-rt.patch
@@ -0,0 +1,28 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Sun, 26 Jul 2009 02:21:32 +0200
+Subject: x86: Use generic rwsem_spinlocks on -rt
+
+Simplifies the separation of anon_rw_semaphores and rw_semaphores for
+-rt.
+
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+
+---
+ arch/x86/Kconfig | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/Kconfig
++++ b/arch/x86/Kconfig
+@@ -203,8 +203,11 @@ config ARCH_MAY_HAVE_PC_FDC
+ def_bool y
+ depends on ISA_DMA_API
+
++config RWSEM_GENERIC_SPINLOCK
++ def_bool PREEMPT_RT_FULL
++
+ config RWSEM_XCHGADD_ALGORITHM
+- def_bool y
++ def_bool !RWSEM_GENERIC_SPINLOCK && !PREEMPT_RT_FULL
+
+ config GENERIC_CALIBRATE_DELAY
+ def_bool y