commit 2134d97aa3a7ce38bb51f933f2e20cafde371085 Author: Greg Kroah-Hartman Date: Thu Feb 25 12:01:36 2016 -0800 Linux 4.4.3 commit e2f712dc927e3b9a981ecd86a64d944d0b140322 Author: Luis R. Rodriguez Date: Wed Feb 3 16:55:26 2016 +1030 modules: fix modparam async_probe request commit 4355efbd80482a961cae849281a8ef866e53d55c upstream. Commit f2411da746985 ("driver-core: add driver module asynchronous probe support") added async probe support, in two forms: * in-kernel driver specification annotation * generic async_probe module parameter (modprobe foo async_probe) To support the generic kernel parameter parse_args() was extended via commit ecc8617053e0 ("module: add extra argument for parse_params() callback") however commit failed to f2411da746985 failed to add the required argument. This causes a crash then whenever async_probe generic module parameter is used. This was overlooked when the form in which in-kernel async probe support was reworked a bit... Fix this as originally intended. Cc: Hannes Reinecke Cc: Dmitry Torokhov Signed-off-by: Luis R. Rodriguez Signed-off-by: Rusty Russell [minimized] Signed-off-by: Greg Kroah-Hartman commit a24d9a2fee9857a162ab18c9ad72aa6571ff4715 Author: Rusty Russell Date: Wed Feb 3 16:55:26 2016 +1030 module: wrapper for symbol name. commit 2e7bac536106236104e9e339531ff0fcdb7b8147 upstream. This trivial wrapper adds clarity and makes the following patch smaller. Signed-off-by: Rusty Russell Signed-off-by: Greg Kroah-Hartman commit 82e730baa9f7ba55cebe9365efde3e9f008d4376 Author: Thomas Gleixner Date: Thu Jan 14 16:54:48 2016 +0000 itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper commit 51cbb5242a41700a3f250ecfb48dcfb7e4375ea4 upstream. As Helge reported for timerfd we have the same issue in itimers. We return remaining time larger than the programmed relative time to user space in case of CONFIG_TIME_LOW_RES=y. Use the proper function to adjust the extra time added in hrtimer_start_range_ns(). Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Helge Deller Cc: John Stultz Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Link: http://lkml.kernel.org/r/20160114164159.528222587@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 1c94da3e7480e3370945aad2b784343da9eb664f Author: Thomas Gleixner Date: Thu Jan 14 16:54:47 2016 +0000 posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper commit 572c39172684c3711e4a03c9a7380067e2b0661c upstream. As Helge reported for timerfd we have the same issue in posix timers. We return remaining time larger than the programmed relative time to user space in case of CONFIG_TIME_LOW_RES=y. Use the proper function to adjust the extra time added in hrtimer_start_range_ns(). Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Helge Deller Cc: John Stultz Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Link: http://lkml.kernel.org/r/20160114164159.450510905@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 565f222968d300657ad40d2fad91e4ccf3148d89 Author: Thomas Gleixner Date: Thu Jan 14 16:54:46 2016 +0000 timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper commit b62526ed11a1fe3861ab98d40b7fdab8981d788a upstream. Helge reported that a relative timer can return a remaining time larger than the programmed relative time on parisc and other architectures which have CONFIG_TIME_LOW_RES set. This happens because we add a jiffie to the resulting expiry time to prevent short timeouts. Use the new function hrtimer_expires_remaining_adjusted() to calculate the remaining time. It takes that extra added time into account for relative timers. Reported-and-tested-by: Helge Deller Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: John Stultz Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Link: http://lkml.kernel.org/r/20160114164159.354500742@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit e5e99792b6474b38ef2dea33cbcd1dde5fd70330 Author: Mateusz Guzik Date: Wed Jan 20 15:01:02 2016 -0800 prctl: take mmap sem for writing to protect against others commit ddf1d398e517e660207e2c807f76a90df543a217 upstream. An unprivileged user can trigger an oops on a kernel with CONFIG_CHECKPOINT_RESTORE. proc_pid_cmdline_read takes mmap_sem for reading and obtains args + env start/end values. These get sanity checked as follows: BUG_ON(arg_start > arg_end); BUG_ON(env_start > env_end); These can be changed by prctl_set_mm. Turns out also takes the semaphore for reading, effectively rendering it useless. This results in: kernel BUG at fs/proc/base.c:240! invalid opcode: 0000 [#1] SMP Modules linked in: virtio_net CPU: 0 PID: 925 Comm: a.out Not tainted 4.4.0-rc8-next-20160105dupa+ #71 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff880077a68000 ti: ffff8800784d0000 task.ti: ffff8800784d0000 RIP: proc_pid_cmdline_read+0x520/0x530 RSP: 0018:ffff8800784d3db8 EFLAGS: 00010206 RAX: ffff880077c5b6b0 RBX: ffff8800784d3f18 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 00007f78e8857000 RDI: 0000000000000246 RBP: ffff8800784d3e40 R08: 0000000000000008 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000050 R13: 00007f78e8857800 R14: ffff88006fcef000 R15: ffff880077c5b600 FS: 00007f78e884a740(0000) GS:ffff88007b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f78e8361770 CR3: 00000000790a5000 CR4: 00000000000006f0 Call Trace: __vfs_read+0x37/0x100 vfs_read+0x82/0x130 SyS_read+0x58/0xd0 entry_SYSCALL_64_fastpath+0x12/0x76 Code: 4c 8b 7d a8 eb e9 48 8b 9d 78 ff ff ff 4c 8b 7d 90 48 8b 03 48 39 45 a8 0f 87 f0 fe ff ff e9 d1 fe ff ff 4c 8b 7d 90 eb c6 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 RIP proc_pid_cmdline_read+0x520/0x530 ---[ end trace 97882617ae9c6818 ]--- Turns out there are instances where the code just reads aformentioned values without locking whatsoever - namely environ_read and get_cmdline. Interestingly these functions look quite resilient against bogus values, but I don't believe this should be relied upon. The first patch gets rid of the oops bug by grabbing mmap_sem for writing. The second patch is optional and puts locking around aformentioned consumers for safety. Consumers of other fields don't seem to benefit from similar treatment and are left untouched. This patch (of 2): The code was taking the semaphore for reading, which does not protect against readers nor concurrent modifications. The problem could cause a sanity checks to fail in procfs's cmdline reader, resulting in an OOPS. Note that some functions perform an unlocked read of various mm fields, but they seem to be fine despite possible modificaton. Signed-off-by: Mateusz Guzik Acked-by: Cyrill Gorcunov Cc: Alexey Dobriyan Cc: Jarod Wilson Cc: Jan Stancek Cc: Al Viro Cc: Anshuman Khandual Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f86701c4f3cd59bac9363846e62fd26ecb2437c2 Author: Dave Chinner Date: Tue Jan 19 08:28:10 2016 +1100 xfs: log mount failures don't wait for buffers to be released commit 85bec5460ad8e05e0a8d70fb0f6750eb719ad092 upstream. Recently I've been seeing xfs/051 fail on 1k block size filesystems. Trying to trace the events during the test lead to the problem going away, indicating that it was a race condition that lead to this ASSERT failure: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156 ..... [] xfs_free_perag+0x87/0xb0 [] xfs_mountfs+0x4d9/0x900 [] xfs_fs_fill_super+0x3bf/0x4d0 [] mount_bdev+0x180/0x1b0 [] xfs_fs_mount+0x15/0x20 [] mount_fs+0x38/0x170 [] vfs_kern_mount+0x67/0x120 [] do_mount+0x218/0xd60 [] SyS_mount+0x8b/0xd0 When I finally caught it with tracing enabled, I saw that AG 2 had an elevated reference count and a buffer was responsible for it. I tracked down the specific buffer, and found that it was missing the final reference count release that would put it back on the LRU and hence be found by xfs_wait_buftarg() calls in the log mount failure handling. The last four traces for the buffer before the assert were (trimmed for relevance) kworker/0:1-5259 xfs_buf_iodone: hold 2 lock 0 flags ASYNC kworker/0:1-5259 xfs_buf_ioerror: hold 2 lock 0 error -5 mount-7163 xfs_buf_lock_done: hold 2 lock 0 flags ASYNC mount-7163 xfs_buf_unlock: hold 2 lock 1 flags ASYNC This is an async write that is completing, so there's nobody waiting for it directly. Hence we call xfs_buf_relse() once all the processing is complete. That does: static inline void xfs_buf_relse(xfs_buf_t *bp) { xfs_buf_unlock(bp); xfs_buf_rele(bp); } Now, it's clear that mount is waiting on the buffer lock, and that it has been released by xfs_buf_relse() and gained by mount. This is expected, because at this point the mount process is in xfs_buf_delwri_submit() waiting for all the IO it submitted to complete. The mount process, however, is waiting on the lock for the buffer because it is in xfs_buf_delwri_submit(). This waits for IO completion, but it doesn't wait for the buffer reference owned by the IO to go away. The mount process collects all the completions, fails the log recovery, and the higher level code then calls xfs_wait_buftarg() to free all the remaining buffers in the filesystem. The issue is that on unlocking the buffer, the scheduler has decided that the mount process has higher priority than the the kworker thread that is running the IO completion, and so immediately switched contexts to the mount process from the semaphore unlock code, hence preventing the kworker thread from finishing the IO completion and releasing the IO reference to the buffer. Hence by the time that xfs_wait_buftarg() is run, the buffer still has an active reference and so isn't on the LRU list that the function walks to free the remaining buffers. Hence we miss that buffer and continue onwards to tear down the mount structures, at which time we get find a stray reference count on the perag structure. On a non-debug kernel, this will be ignored and the structure torn down and freed. Hence when the kworker thread is then rescheduled and the buffer released and freed, it will access a freed perag structure. The problem here is that when the log mount fails, we still need to quiesce the log to ensure that the IO workqueues have returned to idle before we run xfs_wait_buftarg(). By synchronising the workqueues, we ensure that all IO completions are fully processed, not just to the point where buffers have been unlocked. This ensures we don't end up in the situation above. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner Signed-off-by: Greg Kroah-Hartman commit 16f14a28f660d0a3bdb169ebd17e0181c03d4778 Author: Dave Chinner Date: Tue Jan 19 08:21:46 2016 +1100 Revert "xfs: clear PF_NOFREEZE for xfsaild kthread" commit 3e85286e75224fa3f08bdad20e78c8327742634e upstream. This reverts commit 24ba16bb3d499c49974669cd8429c3e4138ab102 as it prevents machines from suspending. This regression occurs when the xfsaild is idle on entry to suspend, and so there s no activity to wake it from it's idle sleep and hence see that it is supposed to freeze. Hence the freezer times out waiting for it and suspend is cancelled. There is no obvious fix for this short of freezing the filesystem properly, so revert this change for now. Signed-off-by: Dave Chinner Acked-by: Jiri Kosina Reviewed-by: Brian Foster Signed-off-by: Dave Chinner Signed-off-by: Greg Kroah-Hartman commit 7530e6fdd9f207a6ebcf669490656def4f7cf73e Author: Dave Chinner Date: Tue Jan 12 07:03:44 2016 +1100 xfs: inode recovery readahead can race with inode buffer creation commit b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6 upstream. When we do inode readahead in log recovery, we do can do the readahead before we've replayed the icreate transaction that stamps the buffer with inode cores. The inode readahead verifier catches this and marks the buffer as !done to indicate that it doesn't yet contain valid inodes. In adding buffer error notification (i.e. setting b_error = -EIO at the same time as as we clear the done flag) to such a readahead verifier failure, we can then get subsequent inode recovery failing with this error: XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32 This occurs when readahead completion races with icreate item replay such as: inode readahead find buffer lock buffer submit RA io .... icreate recovery xfs_trans_get_buffer find buffer lock buffer ..... fails verifier clear XBF_DONE set bp->b_error = -EIO release and unlock buffer icreate initialises buffer marks buffer as done adds buffer to delayed write queue releases buffer At this point, we have an initialised inode buffer that is up to date but has an -EIO state registered against it. When we finally get to recovering an inode in that buffer: inode item recovery xfs_trans_read_buffer find buffer lock buffer sees XBF_DONE is set, returns buffer sees bp->b_error is set fail log recovery! Essentially, we need xfs_trans_get_buf_map() to clear the error status of the buffer when doing a lookup. This function returns uninitialised buffers, so the buffer returned can not be in an error state and none of the code that uses this function expects b_error to be set on return. Indeed, there is an ASSERT(!bp->b_error); in the transaction case in xfs_trans_get_buf_map() that would have caught this if log recovery used transactions.... This patch firstly changes the inode readahead failure to set -EIO on the buffer, and secondly changes xfs_buf_get_map() to never return a buffer with an error state set so this first change doesn't cause unexpected log recovery failures. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner Signed-off-by: Greg Kroah-Hartman commit 888959f2fd5042609f1beefe663ced4dea4dc109 Author: Darrick J. Wong Date: Mon Jan 4 16:13:21 2016 +1100 libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30 upstream. Because struct xfs_agfl is 36 bytes long and has a 64-bit integer inside it, gcc will quietly round the structure size up to the nearest 64 bits -- in this case, 40 bytes. This results in the XFS_AGFL_SIZE macro returning incorrect results for v5 filesystems on 64-bit machines (118 items instead of 119). As a result, a 32-bit xfs_repair will see garbage in AGFL item 119 and complain. Therefore, tell gcc not to pad the structure so that the AGFL size calculation is correct. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Greg Kroah-Hartman commit 8373f6590f6b371bff2c5f2c0581548eb0192014 Author: Miklos Szeredi Date: Fri Dec 11 16:30:49 2015 +0100 ovl: setattr: check permissions before copy-up commit cf9a6784f7c1b5ee2b9159a1246e327c331c5697 upstream. Without this copy-up of a file can be forced, even without actually being allowed to do anything on the file. [Arnd Bergmann] include for PAGE_CACHE_SIZE (used by MAX_LFS_FILESIZE definition). Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit 7193e802960f884e0ae6c40711c4e3c70fa3b070 Author: Miklos Szeredi Date: Wed Dec 9 16:11:59 2015 +0100 ovl: root: copy attr commit ed06e069775ad9236087594a1c1667367e983fb5 upstream. We copy i_uid and i_gid of underlying inode into overlayfs inode. Except for the root inode. Fix this omission. Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit 367e439dbc238907963ddf2758200b55087cab04 Author: Konstantin Khlebnikov Date: Mon Nov 16 18:44:11 2015 +0300 ovl: check dentry positiveness in ovl_cleanup_whiteouts() commit 84889d49335627bc770b32787c1ef9ebad1da232 upstream. This patch fixes kernel crash at removing directory which contains whiteouts from lower layers. Cache of directory content passed as "list" contains entries from all layers, including whiteouts from lower layers. So, lookup in upper dir (moved into work at this stage) will return negative entry. Plus this cache is filled long before and we can race with external removal. Example: mkdir -p lower0/dir lower1/dir upper work overlay touch lower0/dir/a lower0/dir/b mknod lower1/dir/a c 0 0 mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work rm -fr overlay/dir Signed-off-by: Konstantin Khlebnikov Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit fa932190a5f332e1936ab0223c7c5d79b5596a9c Author: Vito Caputo Date: Sat Oct 24 07:19:46 2015 -0500 ovl: use a minimal buffer in ovl_copy_xattr commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe upstream. Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes https://github.com/coreos/bugs/issues/489 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit 85a7ed329aca5efd0eb70789db145a1e989c7a53 Author: Miklos Szeredi Date: Tue Nov 10 17:08:41 2015 +0100 ovl: allow zero size xattr commit 97daf8b97ad6f913a34c82515be64dc9ac08d63e upstream. When ovl_copy_xattr() encountered a zero size xattr no more xattrs were copied and the function returned success. This is clearly not the desired behavior. Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman commit acaf84251f8d49d0e7ee41d654f987dbd441f991 Author: Thomas Gleixner Date: Sat Dec 19 20:07:38 2015 +0000 futex: Drop refcount if requeue_pi() acquired the rtmutex commit fb75a4282d0d9a3c7c44d940582c2d226cf3acfb upstream. If the proxy lock in the requeue loop acquires the rtmutex for a waiter then it acquired also refcount on the pi_state related to the futex, but the waiter side does not drop the reference count. Add the missing free_pi_state() call. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Darren Hart Cc: Davidlohr Bueso Cc: Bhuvanesh_Surachari@mentor.com Cc: Andy Lowe Link: http://lkml.kernel.org/r/20151219200607.178132067@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 30066dcdf98a1474d71b3ce8a05f81e14a61a3bd Author: Toshi Kani Date: Wed Feb 17 13:11:29 2016 -0800 devm_memremap_release(): fix memremap'd addr handling commit 9273a8bbf58a15051e53a777389a502420ddc60e upstream. The pmem driver calls devm_memremap() to map a persistent memory range. When the pmem driver is unloaded, this memremap'd range is not released so the kernel will leak a vma. Fix devm_memremap_release() to handle a given memremap'd address properly. Signed-off-by: Toshi Kani Acked-by: Dan Williams Cc: Christoph Hellwig Cc: Ross Zwisler Cc: Matthew Wilcox Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 15db15e2f10ae12d021c9a2e9edd8a03b9238551 Author: Kirill A. Shutemov Date: Wed Feb 17 13:11:35 2016 -0800 ipc/shm: handle removed segments gracefully in shm_mmap() commit 1ac0b6dec656f3f78d1c3dd216fad84cb4d0a01e upstream. remap_file_pages(2) emulation can reach file which represents removed IPC ID as long as a memory segment is mapped. It breaks expectations of IPC subsystem. Test case (rewritten to be more human readable, originally autogenerated by syzkaller[1]): #define _GNU_SOURCE #include #include #include #include #define PAGE_SIZE 4096 int main() { int id; void *p; id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0); p = shmat(id, NULL, 0); shmctl(id, IPC_RMID, NULL); remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0); return 0; } The patch changes shm_mmap() and code around shm_lock() to propagate locking error back to caller of shm_mmap(). [1] http://github.com/google/syzkaller Signed-off-by: Kirill A. Shutemov Reported-by: Dmitry Vyukov Cc: Davidlohr Bueso Cc: Manfred Spraul Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit fe90acff279808bf69425a1118639273a656c981 Author: Dan Carpenter Date: Tue Jan 26 12:24:25 2016 +0300 intel_scu_ipcutil: underflow in scu_reg_access() commit b1d353ad3d5835b16724653b33c05124e1b5acf1 upstream. "count" is controlled by the user and it can be negative. Let's prevent that by making it unsigned. You have to have CAP_SYS_RAWIO to call this function so the bug is not as serious as it could be. Fixes: 5369c02d951a ('intel_scu_ipc: Utility driver for intel scu ipc') Signed-off-by: Dan Carpenter Signed-off-by: Darren Hart Signed-off-by: Greg Kroah-Hartman commit edfde263bd8a70a0b93b11996fd2a77526ff3b80 Author: Vineet Gupta Date: Thu Feb 11 16:13:09 2016 -0800 mm,thp: khugepaged: call pte flush at the time of collapse commit 6a6ac72fd6ea32594b316513e1826c3f6db4cc93 upstream. This showed up on ARC when running LMBench bw_mem tests as Overlapping TLB Machine Check Exception triggered due to STLB entry (2M pages) overlapping some NTLB entry (regular 8K page). bw_mem 2m touches a large chunk of vaddr creating NTLB entries. In the interim khugepaged kicks in, collapsing the contiguous ptes into a single pmd. pmdp_collapse_flush()->flush_pmd_tlb_range() is called to flush out NTLB entries for the ptes. This for ARC (by design) can only shootdown STLB entries (for pmd). The stray NTLB entries cause the overlap with the subsequent STLB entry for collapsed page. So make pmdp_collapse_flush() call pte flush interface not pmd flush. Note that originally all thp flush call sites in generic code called flush_tlb_range() leaving it to architecture to implement the flush for pte and/or pmd. Commit 12ebc1581ad11454 changed this by calling a new opt-in API flush_pmd_tlb_range() which made the semantics more explicit but failed to distinguish the pte vs pmd flush in generic code, which is what this patch fixes. Note that ARC can fixed w/o touching the generic pmdp_collapse_flush() by defining a ARC version, but that defeats the purpose of generic version, plus sementically this is the right thing to do. Fixes STAR 9000961194: LMBench on AXS103 triggering duplicate TLB exceptions with super pages Fixes: 12ebc1581ad11454 ("mm,thp: introduce flush_pmd_tlb_range") Signed-off-by: Vineet Gupta Reviewed-by: Aneesh Kumar K.V Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit e31e4672559674b0950885465a42968a86865d42 Author: Eric Dumazet Date: Fri Feb 5 15:36:16 2016 -0800 dump_stack: avoid potential deadlocks commit d7ce36924344ace0dbdc855b1206cacc46b36d45 upstream. Some servers experienced fatal deadlocks because of a combination of bugs, leading to multiple cpus calling dump_stack(). The checksumming bug was fixed in commit 34ae6a1aa054 ("ipv6: update skb->csum when CE mark is propagated"). The second problem is a faulty locking in dump_stack() CPU1 runs in process context and calls dump_stack(), grabs dump_lock. CPU2 receives a TCP packet under softirq, grabs socket spinlock, and call dump_stack() from netdev_rx_csum_fault(). dump_stack() spins on atomic_cmpxchg(&dump_lock, -1, 2), since dump_lock is owned by CPU1 While dumping its stack, CPU1 is interrupted by a softirq, and happens to process a packet for the TCP socket locked by CPU2. CPU1 spins forever in spin_lock() : deadlock Stack trace on CPU1 looked like : NMI backtrace for cpu 1 RIP: _raw_spin_lock+0x25/0x30 ... Call Trace: tcp_v6_rcv+0x243/0x620 ip6_input_finish+0x11f/0x330 ip6_input+0x38/0x40 ip6_rcv_finish+0x3c/0x90 ipv6_rcv+0x2a9/0x500 process_backlog+0x461/0xaa0 net_rx_action+0x147/0x430 __do_softirq+0x167/0x2d0 call_softirq+0x1c/0x30 do_softirq+0x3f/0x80 irq_exit+0x6e/0xc0 smp_call_function_single_interrupt+0x35/0x40 call_function_single_interrupt+0x6a/0x70 printk+0x4d/0x4f printk_address+0x31/0x33 print_trace_address+0x33/0x3c print_context_stack+0x7f/0x119 dump_trace+0x26b/0x28e show_trace_log_lvl+0x4f/0x5c show_stack_log_lvl+0x104/0x113 show_stack+0x42/0x44 dump_stack+0x46/0x58 netdev_rx_csum_fault+0x38/0x3c __skb_checksum_complete_head+0x6e/0x80 __skb_checksum_complete+0x11/0x20 tcp_rcv_established+0x2bd5/0x2fd0 tcp_v6_do_rcv+0x13c/0x620 sk_backlog_rcv+0x15/0x30 release_sock+0xd2/0x150 tcp_recvmsg+0x1c1/0xfc0 inet_recvmsg+0x7d/0x90 sock_recvmsg+0xaf/0xe0 ___sys_recvmsg+0x111/0x3b0 SyS_recvmsg+0x5c/0xb0 system_call_fastpath+0x16/0x1b Fixes: b58d977432c8 ("dump_stack: serialize the output from dump_stack()") Signed-off-by: Eric Dumazet Cc: Alex Thorlton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 55e0d9869f1d3a6bbd5d1e864c0e866fe1247f97 Author: Konstantin Khlebnikov Date: Fri Feb 5 15:37:01 2016 -0800 radix-tree: fix oops after radix_tree_iter_retry commit 732042821cfa106b3c20b9780e4c60fee9d68900 upstream. Helper radix_tree_iter_retry() resets next_index to the current index. In following radix_tree_next_slot current chunk size becomes zero. This isn't checked and it tries to dereference null pointer in slot. Tagged iterator is fine because retry happens only at slot 0 where tag bitmask in iter->tags is filled with single bit. Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Signed-off-by: Konstantin Khlebnikov Cc: Matthew Wilcox Cc: Hugh Dickins Cc: Ohad Ben-Cohen Cc: Jeremiah Mahler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 077b6173a8c89d666a06203f42267ff8b6d02d73 Author: Matthew Wilcox Date: Tue Feb 2 16:57:55 2016 -0800 drivers/hwspinlock: fix race between radix tree insertion and lookup commit c6400ba7e13a41539342f1b6e1f9e78419cb0148 upstream. of_hwspin_lock_get_id() is protected by the RCU lock, which means that insertions can occur simultaneously with the lookup. If the radix tree transitions from a height of 0, we can see a slot with the indirect_ptr bit set, which will cause us to at least read random memory, and could cause other havoc. Fix this by using the newly introduced radix_tree_iter_retry(). Signed-off-by: Matthew Wilcox Cc: Hugh Dickins Cc: Ohad Ben-Cohen Cc: Konstantin Khlebnikov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f4595e0081495b677a98c780e9ec1ab68ce89488 Author: Matthew Wilcox Date: Tue Feb 2 16:57:52 2016 -0800 radix-tree: fix race in gang lookup commit 46437f9a554fbe3e110580ca08ab703b59f2f95a upstream. If the indirect_ptr bit is set on a slot, that indicates we need to redo the lookup. Introduce a new function radix_tree_iter_retry() which forces the loop to retry the lookup by setting 'slot' to NULL and turning the iterator back to point at the problematic entry. This is a pretty rare problem to hit at the moment; the lookup has to race with a grow of the radix tree from a height of 0. The consequences of hitting this race are that gang lookup could return a pointer to a radix_tree_node instead of a pointer to whatever the user had inserted in the tree. Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") Signed-off-by: Matthew Wilcox Cc: Hugh Dickins Cc: Ohad Ben-Cohen Cc: Konstantin Khlebnikov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 262139f0244b3ade149b785dc820118c4265cdb1 Author: Rich Felker Date: Fri Jan 22 15:11:05 2016 -0800 MAINTAINERS: return arch/sh to maintained state, with new maintainers commit 114bf37e04d839b555b3dc460b5e6ce156f49cf0 upstream. Add Yoshinori Sato and Rich Felker as maintainers for arch/sh (SUPERH). Signed-off-by: Rich Felker Signed-off-by: Yoshinori Sato Acked-by: D. Jeff Dionne Acked-by: Rob Landley Acked-by: Peter Zijlstra (Intel) Acked-by: Simon Horman Acked-by: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit ececa3ebe27f61db28161bc8c6ef2e1612dbafbe Author: Martijn Coenen Date: Fri Jan 15 16:57:49 2016 -0800 memcg: only free spare array when readers are done commit 6611d8d76132f86faa501de9451a89bf23fb2371 upstream. A spare array holding mem cgroup threshold events is kept around to make sure we can always safely deregister an event and have an array to store the new set of events in. In the scenario where we're going from 1 to 0 registered events, the pointer to the primary array containing 1 event is copied to the spare slot, and then the spare slot is freed because no events are left. However, it is freed before calling synchronize_rcu(), which means readers may still be accessing threshold->primary after it is freed. Fixed by only freeing after synchronize_rcu(). Signed-off-by: Martijn Coenen Cc: Johannes Weiner Acked-by: Michal Hocko Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4b20545910cbc96cede009adb89e19f7d2aa8f52 Author: Michael Holzheu Date: Tue Feb 2 16:57:26 2016 -0800 numa: fix /proc//numa_maps for hugetlbfs on s390 commit 5c2ff95e41c9290d16556cd02e35b25d81be8fe0 upstream. When working with hugetlbfs ptes (which are actually pmds) is not valid to directly use pte functions like pte_present() because the hardware bit layout of pmds and ptes can be different. This is the case on s390. Therefore we have to convert the hugetlbfs ptes first into a valid pte encoding with huge_ptep_get(). Currently the /proc//numa_maps code uses hugetlbfs ptes without huge_ptep_get(). On s390 this leads to the following two problems: 1) The pte_present() function returns false (instead of true) for PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing completely in the "numa_maps" output. 2) The pte_dirty() function always returns false for all hugetlb ptes. Therefore these pages are reported as "mapped=xxx" instead of "dirty=xxx". Therefore use huge_ptep_get() to correctly convert the hugetlb ptes. Signed-off-by: Michael Holzheu Reviewed-by: Gerald Schaefer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit db33368ca32dd307cdcc191361de34f3937f513a Author: Mike Kravetz Date: Fri Jan 15 16:57:37 2016 -0800 fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list() commit 9aacdd354d197ad64685941b36d28ea20ab88757 upstream. Hillf Danton noticed bugs in the hugetlb_vmtruncate_list routine. The argument end is of type pgoff_t. It was being converted to a vaddr offset and passed to unmap_hugepage_range. However, end was also being used as an argument to the vma_interval_tree_foreach controlling loop. In addition, the conversion of end to vaddr offset was incorrect. hugetlb_vmtruncate_list is called as part of a file truncate or fallocate hole punch operation. When truncating a hugetlbfs file, this bug could prevent some pages from being unmapped. This is possible if there are multiple vmas mapping the file, and there is a sufficiently sized hole between the mappings. The size of the hole between two vmas (A,B) must be such that the starting virtual address of B is greater than (ending virtual address of A << PAGE_SHIFT). In this case, the pages in B would not be unmapped. If pages are not properly unmapped during truncate, the following BUG is hit: kernel BUG at fs/hugetlbfs/inode.c:428! In the fallocate hole punch case, this bug could prevent pages from being unmapped as in the truncate case. However, for hole punch the result is that unmapped pages will not be removed during the operation. For hole punch, it is also possible that more pages than desired will be unmapped. This unnecessary unmapping will cause page faults to reestablish the mappings on subsequent page access. Fixes: 1bfad99ab (" hugetlbfs: hugetlb_vmtruncate_list() needs to take a range")Reported-by: Hillf Danton Signed-off-by: Mike Kravetz Cc: Hugh Dickins Cc: Naoya Horiguchi Cc: Davidlohr Bueso Cc: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b105aa33af0d487fc349212edecac04ef35a2622 Author: Sergey Senozhatsky Date: Thu Jan 14 15:16:53 2016 -0800 scripts/bloat-o-meter: fix python3 syntax error commit 72214a24a7677d4c7501eecc9517ed681b5f2db2 upstream. In Python3+ print is a function so the old syntax is not correct anymore: $ ./scripts/bloat-o-meter vmlinux.o vmlinux.o.old File "./scripts/bloat-o-meter", line 61 print "add/remove: %s/%s grow/shrink: %s/%s up/down: %s/%s (%s)" % \ ^ SyntaxError: invalid syntax Fix by calling print as a function. Tested on python 2.7.11, 3.5.1 Signed-off-by: Sergey Senozhatsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit dad5038f3fe2ac41078eed8abba9b7f5ce2fa05c Author: Laura Abbott Date: Thu Jan 14 15:16:50 2016 -0800 dma-debug: switch check from _text to _stext commit ea535e418c01837d07b6c94e817540f50bfdadb0 upstream. In include/asm-generic/sections.h: /* * Usage guidelines: * _text, _data: architecture specific, don't use them in * arch-independent code * [_stext, _etext]: contains .text.* sections, may also contain * .rodata.* * and/or .init.* sections _text is not guaranteed across architectures. Architectures such as ARM may reuse parts which are not actually text and erroneously trigger a bug. Switch to using _stext which is guaranteed to contain text sections. Came out of https://lkml.kernel.org/g/<567B1176.4000106@redhat.com> Signed-off-by: Laura Abbott Reviewed-by: Kees Cook Cc: Russell King Cc: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 275adaf191c617409452bc569ad8f423e74678c7 Author: Sudip Mukherjee Date: Thu Jan 14 15:16:47 2016 -0800 m32r: fix m32104ut_defconfig build fail commit 601f1db653217f205ffa5fb33514b4e1711e56d1 upstream. The build of m32104ut_defconfig for m32r arch was failing for long long time with the error: ERROR: "memory_start" [fs/udf/udf.ko] undefined! ERROR: "memory_end" [fs/udf/udf.ko] undefined! ERROR: "memory_end" [drivers/scsi/sg.ko] undefined! ERROR: "memory_start" [drivers/scsi/sg.ko] undefined! ERROR: "memory_end" [drivers/i2c/i2c-dev.ko] undefined! ERROR: "memory_start" [drivers/i2c/i2c-dev.ko] undefined! As done in other architectures export the symbols to fix the error. Reported-by: Fengguang Wu Signed-off-by: Sudip Mukherjee Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 71e5a4a747b0eadbff4835cf41493187bcbbd886 Author: Mathias Nyman Date: Tue Jan 26 17:50:12 2016 +0200 xhci: Fix list corruption in urb dequeue at host removal commit 5c82171167adb8e4ac77b91a42cd49fb211a81a0 upstream. xhci driver frees data for all devices, both usb2 and and usb3 the first time usb_remove_hcd() is called, including td_list and and xhci_ring structures. When usb_remove_hcd() is called a second time for the second xhci bus it will try to dequeue all pending urbs, and touches td_list which is already freed for that endpoint. Reported-by: Joe Lawrence Tested-by: Joe Lawrence Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman commit d15298509b86f06d63135770ac8433295a18375f Author: Mathias Nyman Date: Tue Jan 26 17:50:04 2016 +0200 Revert "xhci: don't finish a TD if we get a short-transfer event mid TD" commit a6835090716a85f2297668ba593bd00e1051e662 upstream. This reverts commit e210c422b6fd ("xhci: don't finish a TD if we get a short transfer event mid TD") Turns out that most host controllers do not follow the xHCI specs and never send the second event for the last TRB in the TD if there was a short event mid-TD. Returning the URB directly after the first short-transfer event is far better than never returning the URB. (class drivers usually timeout after 30sec). For the hosts that do send the second event we will go back to treating it as misplaced event and print an error message for it. The origial patch was sent to stable kernels and needs to be reverted from there as well Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman commit 2231e5748746cd57df389521397e1c7f91882077 Author: David Woodhouse Date: Mon Feb 15 12:42:38 2016 +0000 iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts commit 46924008273ed03bd11dbb32136e3da4cfe056e1 upstream. According to the VT-d specification we need to clear the PPR bit in the Page Request Status register when handling page requests, or the hardware won't generate any more interrupts. This wasn't actually necessary on SKL/KBL (which may well be the subject of a hardware erratum, although it's harmless enough). But other implementations do appear to get it right, and we only ever get one interrupt unless we clear the PPR bit. Reported-by: CQ Tang Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman commit db3ac35cbd310b3ce2e668f82eb6d0c77cc10b15 Author: CQ Tang Date: Wed Jan 13 21:15:03 2016 +0000 iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG commit fda3bec12d0979aae3f02ee645913d66fbc8a26e upstream. This is a 32-bit register. Apparently harmless on real hardware, but causing justified warnings in simulation. Signed-off-by: CQ Tang Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman commit 7c6471cb94adcb2d7027d6981e87e87ccd05b06e Author: David Woodhouse Date: Tue Jan 12 19:18:06 2016 +0000 iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users commit e57e58bd390a6843db58560bf7b8341665d2e058 upstream. Holding mm_users works OK for graphics, which was the first user of SVM with VT-d. However, it works less well for other devices, where we actually do a mmap() from the file descriptor to which the SVM PASID state is tied. In this case on process exit we end up with a recursive reference count: - The MM remains alive until the file is closed and the driver's release() call ends up unbinding the PASID. - The VMA corresponding to the mmap() remains intact until the MM is destroyed. - Thus the file isn't closed, even when exit_files() runs, because the VMA is still holding a reference to it. And the MM remains alive… To address this issue, we *stop* holding mm_users while the PASID is bound. We already hold mm_count by virtue of the MMU notifier, and that can be made to be sufficient. It means that for a period during process exit, the fun part of mmput() has happened and exit_mmap() has been called so the MM is basically defunct. But the PGD still exists and the PASID is still bound to it. During this period, we have to be very careful — exit_mmap() doesn't use mm->mmap_sem because it doesn't expect anyone else to be touching the MM (quite reasonably, since mm_users is zero). So we also need to fix the fault handler to just report failure if mm_users is already zero, and to temporarily bump mm_users while handling any faults. Additionally, exit_mmap() calls mmu_notifier_release() *before* it tears down the page tables, which is too early for us to flush the IOTLB for this PASID. And __mmu_notifier_release() removes every notifier from the list, so when exit_mmap() finally *does* tear down the mappings and clear the page tables, we don't get notified. So we work around this by clearing the PASID table entry in our MMU notifier release() callback. That way, the hardware *can't* get any pages back from the page tables before they get cleared. Hardware designers have confirmed that the resulting 'PASID not present' faults should be handled just as gracefully as 'page not present' faults, the important criterion being that they don't perturb the operation for any *other* PASID in the system. Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman commit d63a009a9bd9f0503a0d817b6ae97f5b06de22e3 Author: Baoquan He Date: Wed Jan 20 22:01:19 2016 +0800 iommu/amd: Correct the wrong setting of alias DTE in do_attach commit 9b1a12d29109234d2b9718d04d4d404b7da4e794 upstream. In below commit alias DTE is set when its peripheral is setting DTE. However there's a code bug here to wrongly set the alias DTE, correct it in this patch. commit e25bfb56ea7f046b71414e02f80f620deb5c6362 Author: Joerg Roedel Date: Tue Oct 20 17:33:38 2015 +0200 iommu/amd: Set alias DTE in do_attach/do_detach Signed-off-by: Baoquan He Tested-by: Mark Hounschell Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit c65a7b684133887e9211cd901bb689c48a6c18d8 Author: Jeremy McNicoll Date: Thu Jan 14 21:33:06 2016 -0800 iommu/vt-d: Don't skip PCI devices when disabling IOTLB commit da972fb13bc5a1baad450c11f9182e4cd0a091f6 upstream. Fix a simple typo when disabling IOTLB on PCI(e) devices. Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS") Signed-off-by: Jeremy McNicoll Reviewed-by: Alex Williamson Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit b864f4e50c56a5bd0ab1603b86164cab66337906 Author: Dmitry Torokhov Date: Sat Jan 16 10:04:49 2016 -0800 Input: vmmouse - fix absolute device registration commit d4f1b06d685d11ebdaccf11c0db1cb3c78736862 upstream. We should set device's capabilities first, and then register it, otherwise various handlers already present in the kernel will not be able to connect to the device. Reported-by: Lauri Kasanen Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit 726ecfc321994ec6ab044c1e3e5886408de991ac Author: James Bottomley Date: Wed Jan 20 14:58:29 2016 -0800 string_helpers: fix precision loss for some inputs commit 564b026fbd0d28e9f70fb3831293d2922bb7855b upstream. It was noticed that we lose precision in the final calculation for some inputs. The most egregious example is size=3000 blk_size=1900 in units of 10 should yield 5.70 MB but in fact yields 3.00 MB (oops). This is because the current algorithm doesn't correctly account for all the remainders in the logarithms. Fix this by doing a correct calculation in the remainders based on napier's algorithm. Additionally, now we have the correct result, we have to account for arithmetic rounding because we're printing 3 digits of precision. This means that if the fourth digit is five or greater, we have to round up, so add a section to ensure correct rounding. Finally account for all possible inputs correctly, including zero for block size. Fixes: b9f28d863594c429e1df35a0474d2663ca28b307 Signed-off-by: James Bottomley Reported-by: Vitaly Kuznetsov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 5c73252f746d4973294b7ae7eead944f4a5210f8 Author: Aurélien Francillon Date: Sat Jan 2 20:39:54 2016 -0800 Input: i8042 - add Fujitsu Lifebook U745 to the nomux list commit dd0d0d4de582a6a61c032332c91f4f4cb2bab569 upstream. Without i8042.nomux=1 the Elantech touch pad is not working at all on a Fujitsu Lifebook U745. This patch does not seem necessary for all U745 (maybe because of different BIOS versions?). However, it was verified that the patch does not break those (see opensuse bug 883192: https://bugzilla.opensuse.org/show_bug.cgi?id=883192). Signed-off-by: Aurélien Francillon Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit 1d70d30a5fa2139325e413aaf4ce44675d3133ce Author: Benjamin Tissoires Date: Mon Jan 11 17:35:38 2016 -0800 Input: elantech - mark protocols v2 and v3 as semi-mt commit 6544a1df11c48c8413071aac3316792e4678fbfb upstream. When using a protocol v2 or v3 hardware, elantech uses the function elantech_report_semi_mt_data() to report data. This devices are rather creepy because if num_finger is 3, (x2,y2) is (0,0). Yes, only one valid touch is reported. Anyway, userspace (libinput) is now confused by these (0,0) touches, and detect them as palm, and rejects them. Commit 3c0213d17a09 ("Input: elantech - fix semi-mt protocol for v3 HW") was sufficient enough for xf86-input-synaptics and libinput before it has palm rejection. Now we need to actually tell libinput that this device is a semi-mt one and it should not rely on the actual values of the 2 touches. Signed-off-by: Benjamin Tissoires Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit d1f8217a9a6e9b84df4cf6a0d3c99c7bbe1430a7 Author: Kirill A. Shutemov Date: Wed Feb 17 13:11:15 2016 -0800 mm: fix regression in remap_file_pages() emulation commit 48f7df329474b49d83d0dffec1b6186647f11976 upstream. Grazvydas Ignotas has reported a regression in remap_file_pages() emulation. Testcase: #define _GNU_SOURCE #include #include #include #include #define SIZE (4096 * 3) int main(int argc, char **argv) { unsigned long *p; long i; p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(*p)] = i; if (remap_file_pages(p, 4096, 0, 1, 0)) { perror("remap_file_pages"); return -1; } if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) { perror("remap_file_pages"); return -1; } assert(p[0] == 1); munmap(p, SIZE); return 0; } The second remap_file_pages() fails with -EINVAL. The reason is that remap_file_pages() emulation assumes that the target vma covers whole area we want to over map. That assumption is broken by first remap_file_pages() call: it split the area into two vma. The solution is to check next adjacent vmas, if they map the same file with the same flags. Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation") Signed-off-by: Kirill A. Shutemov Reported-by: Grazvydas Ignotas Tested-by: Grazvydas Ignotas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 413aab16bc7b9b58fb1a52010eb3af3d227d7007 Author: Konstantin Khlebnikov Date: Fri Feb 5 15:36:50 2016 -0800 mm: replace vma_lock_anon_vma with anon_vma_lock_read/write commit 12352d3cae2cebe18805a91fab34b534d7444231 upstream. Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if anon_vma appeared between lock and unlock. We have to check anon_vma first or call anon_vma_prepare() to be sure that it's here. There are only few users of these legacy helpers. Let's get rid of them. This patch fixes anon_vma lock imbalance in validate_mm(). Write lock isn't required here, read lock is enough. And reorders expand_downwards/expand_upwards: security_mmap_addr() and wrapping-around check don't have to be under anon vma lock. Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.com Signed-off-by: Konstantin Khlebnikov Reported-by: Dmitry Vyukov Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 918a2c388ed7828fa4b6ccb5cacd9e422f30938c Author: Kirill A. Shutemov Date: Thu Jan 21 16:40:27 2016 -0800 mm: fix mlock accouting commit 7162a1e87b3e380133dadc7909081bb70d0a7041 upstream. Tetsuo Handa reported underflow of NR_MLOCK on munlock. Testcase: #include #include #include #define BASE ((void *)0x400000000000) #define SIZE (1UL << 21) int main(int argc, char *argv[]) { void *addr; system("grep Mlocked /proc/meminfo"); addr = mmap(BASE, SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_LOCKED | MAP_FIXED, -1, 0); if (addr == MAP_FAILED) printf("mmap() failed\n"), exit(1); munmap(addr, SIZE); system("grep Mlocked /proc/meminfo"); return 0; } It happens on munlock_vma_page() due to unfortunate choice of nr_pages data type: __mod_zone_page_state(zone, NR_MLOCK, -nr_pages); For unsigned int nr_pages, implicitly casted to long in __mod_zone_page_state(), it becomes something around UINT_MAX. munlock_vma_page() usually called for THP as small pages go though pagevec. Let's make nr_pages signed int. Similar fixes in 6cdb18ad98a4 ("mm/vmstat: fix overflow in mod_zone_page_state()") used `long' type, but `int' here is OK for a count of the number of sub-pages in a huge page. Fixes: ff6a6da60b89 ("mm: accelerate munlock() treatment of THP pages") Signed-off-by: Kirill A. Shutemov Reported-by: Tetsuo Handa Tested-by: Tetsuo Handa Cc: Michel Lespinasse Acked-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 6e8ea2f2258caf0d2a876b36c15a93788a7cde93 Author: Dan Williams Date: Tue Jan 5 18:37:23 2016 -0800 libnvdimm: fix namespace object confusion in is_uuid_busy() commit e07ecd76d4db7bda1e9495395b2110a3fe28845a upstream. When btt devices were re-worked to be child devices of regions this routine was overlooked. It mistakenly attempts to_nd_namespace_pmem() or to_nd_namespace_blk() conversions on btt and pfn devices. By luck to date we have happened to be hitting valid memory leading to a uuid miscompare, but a recent change to struct nd_namespace_common causes: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 IP: [] memcmp+0xc/0x40 [..] Call Trace: [] is_uuid_busy+0xc1/0x2a0 [libnvdimm] [] ? to_nd_blk_region+0x50/0x50 [libnvdimm] [] device_for_each_child+0x50/0x90 Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman commit bd55913cf20804d2c3d4d83e79d9867963af2ff1 Author: Naoya Horiguchi Date: Fri Jan 15 16:54:03 2016 -0800 mm: soft-offline: check return value in second __get_any_page() call commit d96b339f453997f2f08c52da3f41423be48c978f upstream. I saw the following BUG_ON triggered in a testcase where a process calls madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that calls migratepages command repeatedly (doing ping-pong among different NUMA nodes) for the first process: Soft offlining page 0x60000 at 0x700000600000 __get_any_page: 0x60000 free buddy page page:ffffea0001800000 count:0 mapcount:-127 mapping: (null) index:0x1 flags: 0x1fffc0000000000() page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) ------------[ cut here ]------------ kernel BUG at /src/linux-dev/include/linux/mm.h:342! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G O 4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000 RIP: 0010:[] [] put_page+0x5c/0x60 RSP: 0018:ffff88007c213e00 EFLAGS: 00010246 Call Trace: put_hwpoison_page+0x4e/0x80 soft_offline_page+0x501/0x520 SyS_madvise+0x6bc/0x6f0 entry_SYSCALL_64_fastpath+0x12/0x6a Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47 RIP [] put_page+0x5c/0x60 RSP The root cause resides in get_any_page() which retries to get a refcount of the page to be soft-offlined. This function calls put_hwpoison_page(), expecting that the target page is putback to LRU list. But it can be also freed to buddy. So the second check need to care about such case. Fixes: af8fae7c0886 ("mm/memory-failure.c: clean up soft_offline_page()") Signed-off-by: Naoya Horiguchi Cc: Sasha Levin Cc: Aneesh Kumar K.V Cc: Vlastimil Babka Cc: Jerome Marchand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Dave Hansen Cc: Mel Gorman Cc: Rik van Riel Cc: Steve Capper Cc: Johannes Weiner Cc: Michal Hocko Cc: Christoph Lameter Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit a6a3f3ddf6a663ab3d141f0e13f457b084a8c409 Author: Ravi Bangoria Date: Mon Dec 7 12:25:02 2015 +0530 perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data commit 3caeaa562733c4836e61086ec07666635006a787 upstream. While recording guest samples in host using perf kvm record, it will populate unprocessable sample error, though samples will be recorded properly. While generating report using perf kvm report, no samples will be processed and same error will populate. We have seen this behaviour with upstream perf(4.4-rc3) on x86 and ppc64 hardware. Reason behind this failure is, when it tries to fetch machine from rb_tree of machines, it fails. As a part of tracing a bug, we figured out that this code was incorrectly refactored in commit 54245fdc3576 ("perf session: Remove wrappers to machines__find"). This patch will change the functionality such that if it can't fetch machine in first trial, it will create one node of machine and add that to rb_tree. So next time when it tries to fetch same machine from rb_tree, it won't fail. Actually it was the case before refactoring of code in aforementioned commit. This patch is generated from acme perf/core branch. Below I've mention an example that demonstrate the behaviour before and after applying patch. Before applying patch: [Note: One needs to run guest before recording data in host] ravi@ravi-bangoria:~$ ./perf kvm record -a Warning: 5903 unprocessable samples recorded. Do you have a KVM guest running and not using 'perf kvm'? [ perf record: Captured and wrote 1.409 MB perf.data.guest (285 samples) ] ravi@ravi-bangoria:~$ ./perf kvm report --stdio Warning: 5903 unprocessable samples recorded. Do you have a KVM guest running and not using 'perf kvm'? # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 285 of event 'cycles' # Event count (approx.): 88715406 # # Overhead Command Shared Object Symbol # ........ ....... ............. ...... # # (For a higher level overview, try: perf report --sort comm,dso) # After applying patch: ravi@ravi-bangoria:~$ ./perf kvm record -a [ perf record: Captured and wrote 1.188 MB perf.data.guest (17 samples) ] ravi@ravi-bangoria:~$ ./perf kvm report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 17 of event 'cycles' # Event count (approx.): 700746 # # Overhead Command Shared Object Symbol # ........ ....... ................ ...................... # 34.19% :5758 [unknown] [g] 0xffffffff818682ab 22.79% :5758 [unknown] [g] 0xffffffff812dc7f8 22.79% :5758 [unknown] [g] 0xffffffff818650d0 14.83% :5758 [unknown] [g] 0xffffffff8161a1b6 2.49% :5758 [unknown] [g] 0xffffffff818692bf 0.48% :5758 [unknown] [g] 0xffffffff81869253 0.05% :5758 [unknown] [g] 0xffffffff81869250 Signed-off-by: Ravi Bangoria Cc: Naveen N. Rao Fixes: 54245fdc3576 ("perf session: Remove wrappers to machines__find") Link: http://lkml.kernel.org/r/1449471302-11283-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit b58731d6263a323e602191257628d93c2274cefa Author: Greg Kurz Date: Wed Jan 13 18:28:17 2016 +0100 KVM: PPC: Fix ONE_REG AltiVec support commit b4d7f161feb3015d6306e1d35b565c888ff70c9d upstream. The get and set operations got exchanged by mistake when moving the code from book3s.c to powerpc.c. Fixes: 3840edc8033ad5b86deee309c1c321ca54257452 Signed-off-by: Greg Kurz Signed-off-by: Paul Mackerras Signed-off-by: Greg Kroah-Hartman commit 921fa9b77380b0c3df67c424c2a68db5214ab461 Author: Thomas Huth Date: Fri Nov 20 09:11:45 2015 +0100 KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8 commit 760a7364f27d974d100118d88190e574626e18a6 upstream. In the old DABR register, the BT (Breakpoint Translation) bit is bit number 61. In the new DAWRX register, the WT (Watchpoint Translation) bit is bit number 59. So to move the DABR-BT bit into the position of the DAWRX-WT bit, it has to be shifted by two, not only by one. This fixes hardware watchpoints in gdb of older guests that only use the H_SET_DABR/X interface instead of the new H_SET_MODE interface. Signed-off-by: Thomas Huth Reviewed-by: Laurent Vivier Reviewed-by: David Gibson Signed-off-by: Paul Mackerras Signed-off-by: Greg Kroah-Hartman commit b3e336de65ebdd5ed85ba1f03216091a28583e7e Author: Andre Przywara Date: Wed Feb 3 16:56:51 2016 +0000 KVM: arm/arm64: Fix reference to uninitialised VGIC commit b3aff6ccbb1d25e506b60ccd9c559013903f3464 upstream. Commit 4b4b4512da2a ("arm/arm64: KVM: Rework the arch timer to use level-triggered semantics") brought the virtual architected timer closer to the VGIC. There is one occasion were we don't properly check for the VGIC actually having been initialized before, but instead go on to check the active state of some IRQ number. If userland hasn't instantiated a virtual GIC, we end up with a kernel NULL pointer dereference: ========= Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc9745c5000 [00000000] *pgd=00000009f631e003, *pud=00000009f631e003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#2] PREEMPT SMP Modules linked in: CPU: 0 PID: 2144 Comm: kvm_simplest-ar Tainted: G D 4.5.0-rc2+ #1300 Hardware name: ARM Juno development board (r1) (DT) task: ffffffc976da8000 ti: ffffffc976e28000 task.ti: ffffffc976e28000 PC is at vgic_bitmap_get_irq_val+0x78/0x90 LR is at kvm_vgic_map_is_active+0xac/0xc8 pc : [] lr : [] pstate: 20000145 .... ========= Fix this by bailing out early of kvm_timer_flush_hwstate() if we don't have a VGIC at all. Reported-by: Cosmin Gorgovan Acked-by: Marc Zyngier Signed-off-by: Andre Przywara Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 593337c55ac30807f38ca924e716edfdc4e6a916 Author: Marek Szyprowski Date: Tue Feb 16 15:14:44 2016 +0100 arm64: dma-mapping: fix handling of devices registered before arch_initcall commit 722ec35f7faefcc34d12616eca7976a848870f9d upstream. This patch ensures that devices, which got registered before arch_initcall will be handled correctly by IOMMU-based DMA-mapping code. Fixes: 13b8629f6511 ("arm64: Add IOMMU dma_ops") Acked-by: Robin Murphy Signed-off-by: Marek Szyprowski Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit a6e01f0c81d54678cdd50892462d12b672320e5e Author: Tony Lindgren Date: Thu Jan 14 12:20:48 2016 -0800 ARM: OMAP2+: Fix ppa_zero_params and ppa_por_params for rodata commit 4da597d16602d14405b71a18d45e1c59f28f0fd2 upstream. We don't want to write to .text so let's move ppa_zero_params and ppa_por_params to .data and access them via pointers. Note that I have not been able to test as we I don't have a HS omap4 to test with. The code has been changed in similar way as for omap3 though. Cc: Kees Cook Cc: Laura Abbott Cc: Nishanth Menon Cc: Richard Woodruff Cc: Russell King Cc: Tero Kristo Acked-by: Nicolas Pitre Fixes: 1e6b48116a95 ("ARM: mm: allow non-text sections to be non-executable") Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 82de5956e9f4102e7b29d6718abaf0d156cdbc2a Author: Tony Lindgren Date: Thu Jan 14 12:20:47 2016 -0800 ARM: OMAP2+: Fix save_secure_ram_context for rodata commit a5311d4d13df80bd71a9e47f9ecaf327f478fab1 upstream. We don't want to write to .text and we can move save_secure_ram_context into .data as it all gets copied into SRAM anyways. Cc: Kees Cook Cc: Laura Abbott Cc: Nishanth Menon Cc: Richard Woodruff Cc: Russell King Cc: Sergei Shtylyov Cc: Tero Kristo Acked-by: Nicolas Pitre Fixes: 1e6b48116a95 ("ARM: mm: allow non-text sections to be non-executable") Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 31a50ee1ad3e0cc51a75fb052115fcf5a804ab35 Author: Tony Lindgren Date: Thu Jan 14 12:20:47 2016 -0800 ARM: OMAP2+: Fix l2dis_3630 for rodata commit eeaf9646aca89d097861caa24d9818434e48810e upstream. We don't want to write to .text section. Let's move l2dis_3630 to .data and access it via a pointer. For calculating the offset, let's optimize out the add and do it in ldr/str as suggested by Nicolas Pitre . Cc: Kees Cook Cc: Laura Abbott Cc: Nishanth Menon Cc: Richard Woodruff Cc: Russell King Cc: Tero Kristo Acked-by: Nicolas Pitre Fixes: 1e6b48116a95 ("ARM: mm: allow non-text sections to be non-executable") Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 98b3f17a7235cb49755c6bd17fc428381261105c Author: Tony Lindgren Date: Thu Jan 14 12:20:47 2016 -0800 ARM: OMAP2+: Fix l2_inv_api_params for rodata commit 0a0b13275558c32bbf6241464a7244b1ffd5afb3 upstream. We don't want to write to .text, so let's move l2_inv_api_params to .data and access it via a pointer. Cc: Kees Cook Cc: Laura Abbott Cc: Nishanth Menon Cc: Richard Woodruff Cc: Russell King Cc: Tero Kristo Acked-by: Nicolas Pitre Fixes: 1e6b48116a95 ("ARM: mm: allow non-text sections to be non-executable") Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit ec776d670e2d1f6034fac2f1a28ff972776bba52 Author: Tony Lindgren Date: Thu Jan 14 12:20:47 2016 -0800 ARM: OMAP2+: Fix wait_dll_lock_timed for rodata commit d9db59103305eb5ec2a86369f32063e9921b6ac5 upstream. We don't want to be writing to .text so it can be set rodata. Fix error "Unable to handle kernel paging request at virtual address c012396c" in wait_dll_lock_timed if CONFIG_DEBUG_RODATA is selected. As these counters are for debugging only and unused, we can just remove them. Cc: Kees Cook Cc: Laura Abbott Cc: Nishanth Menon Cc: Richard Woodruff Cc: Russell King Cc: Tero Kristo Acked-by: Nicolas Pitre Fixes: 1e6b48116a95 ("ARM: mm: allow non-text sections to be non-executable") Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 6ec8b7c5bbdd539878fb23f6040a2c041c43247c Author: Wenyou Yang Date: Wed Jan 27 13:16:24 2016 +0800 ARM: dts: at91: sama5d4ek: add phy address and IRQ for macb0 commit aae6b18f5c95b9dc78de66d1e27e8afeee2763b7 upstream. On SAMA5D4EK board, the Ethernet doesn't work after resuming from the suspend state. Signed-off-by: Wenyou Yang [nicolas.ferre@atmel.com: adapt to newer kernel] Fixes: 38153a017896 ("ARM: at91/dt: sama5d4: add dts for sama5d4 xplained board") Signed-off-by: Nicolas Ferre Signed-off-by: Greg Kroah-Hartman commit 3b18631fbcea811634fadacb5a7358170d9e3414 Author: Nicolas Ferre Date: Wed Jan 27 11:03:02 2016 +0100 ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type commit e873cc022ce5e2c04bbc53b5874494b657e29d3f upstream. For phy0 KSZ8081, the type of GPIO IRQ should be "level low" instead of "edge falling". Signed-off-by: Nicolas Ferre Fixes: 38153a017896 ("ARM: at91/dt: sama5d4: add dts for sama5d4 xplained board") Signed-off-by: Greg Kroah-Hartman commit 080fc28fe4759d0d8a64b8980e4e6220a590221e Author: Mohamed Jamsheeth Hajanajubudeen Date: Fri Dec 11 17:06:26 2015 +0530 ARM: dts: at91: sama5d4: fix instance id of DBGU commit 929e883f2bfdf68d4bd3aec43912e956417005c7 upstream. Change instance id of DBGU to 45. Signed-off-by: Mohamed Jamsheeth Hajanajubudeen Fixes: 7c661394c56c ("ARM: at91: dt: add device tree file for SAMA5D4 SoC") Signed-off-by: Nicolas Ferre Signed-off-by: Greg Kroah-Hartman commit 5542d00c46536d09c594c2f604f3d684c62c3f29 Author: Alexandre Belloni Date: Fri Jan 15 09:30:18 2016 +0100 ARM: dts: at91: sama5d4 xplained: properly mux phy interrupt commit f505dba762ae826bb68978a85ee5c8ced7dea8d7 upstream. No interrupt were received from the phy because PIOE 1 may not be properly muxed. It prevented proper link detection, especially since commit 321beec5047a ("net: phy: Use interrupts when available in NOLINK state") disables polling. Signed-off-by: Alexandre Belloni Signed-off-by: Nicolas Ferre Signed-off-by: Greg Kroah-Hartman commit a482d944816941a8ad987975585ee064dc023325 Author: H. Nikolaus Schaller Date: Tue Jan 5 13:01:37 2016 +0100 ARM: dts: omap5-board-common: enable rtc and charging of backup battery commit c08659d431b40ad5beb97d7dde49ad9796cb812c upstream. tested on OMP5432 EVM Signed-off-by: H. Nikolaus Schaller Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 41a94b382396f3b33d6426ca8d0ddab237ad8b8a Author: Tony Lindgren Date: Mon Jan 11 14:35:24 2016 -0800 ARM: dts: Fix omap5 PMIC control lines for RTC writes commit af756bbccff85504ce05c63a50f80b9d7823c500 upstream. The palmas PMIC has two control lines that need to be muxed properly for things to work. The sys_nirq pin is used for interrupts, and msecure pin is used for enabling writes to some PMIC registers. Without these pins configured properly things can fail in mysterious ways. For example, we can't update the RTC registers on palmas PMIC unless the msecure pin is configured. And this is probably the reason why we had RTC missing from the omap5 dts file. According to "OMAP5430 ES2.0 Data Manual [Public] VErsion A (Rev. F)" swps052f.pdf, mux mode 1 is for sys_drm_msecure so in theory there's should be no need to configure it as a GPIO pin. However, it seems there are some reliability issues using the msecure mux mode. And the TI trees configure the msecure pin as GPIO out high instead. As the PMIC only cares that the msecure line is high to allow access to the RTC registers, let's use a GPIO hog as suggested by Nishanth Menon . Also the use of the internal pull was considered but supposedly that may not be capable of keeping the line high in a noisy environment. If we ever see high security omap5 products in the mainline tree, those need to skip the msecure pin muxing and ignore setting the GPIO hog. Chances are the related pin mux registers are locked in that case and the msecure pin is managed by whatever software may be running in the ARM TrustZone. Who knows what the original intention of the msecure pin was. Maybe it was supposed to prevent the system time to be set back for some game demo modes to time out? Anyways, it seems that later PMICs like tps659037 have recycled this pin for "powerhold" and devices like beagle-x15 do not need changes to the msecure pin configuration. To avoid further confusion with TWL variant PMICs, beagle-x15 does not have a back-up battery for RTC palmas. Instead the mcp79410 RTC is used with rtc-ds1307 driver. There is a "powerhold" jumper j5 holes near the palmas PMIC, and shorting it seems to power up beagle-x15 automatically. It is unknown if it also has other side effects to the beagle-x15 power up sequence. Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 671a5bc6f54d080b9453dc70a638c41ea23838d8 Author: Adam Ford Date: Thu Jan 21 11:03:20 2016 -0600 ARM: dts: Fix wl12xx missing clocks that cause hangs commit 0ea24daae053a9ba65d2f3eb20523002c1a8af38 upstream. The tcxo-clock-frequency binding is listed as optional, but without it the wl12xx used on the torpedo + wireless may hang. Scanning also appears broken without this patch. Signed-off-by: Adam Ford Fixes: 687c27676151 ("ARM: dts: Add minimal support for LogicPD Torpedo DM3730 devkit") Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 323f7cd28b7fd6b1c0cd9f003aa8482c10ff9d7d Author: Linus Walleij Date: Mon Feb 1 14:18:57 2016 +0100 ARM: nomadik: fix up SD/MMC DT settings commit 418d5516568b3fdbc4e7b53677dd78aed8514565 upstream. The DTSI file for the Nomadik does not properly specify how the PL180 levelshifter is connected: the Nomadik actually needs all the five st,sig-dir-* flags set to properly control all lines out. Further this board supports full power cycling of the card, and since this variant has no hardware clock gating, it needs a ridiculously low frequency setting to keep up with the ever overflowing FIFO. The pin configuration set-up is a bit of a mystery, because of course these pins are a mix of inputs and outputs. However the reference implementation sets all pins to "output" with unspecified initial value, so let's do that here as well. Signed-off-by: Linus Walleij Acked-by: Ulf Hansson Signed-off-by: Olof Johansson Signed-off-by: Greg Kroah-Hartman commit 53d991bbbc513ac4ac67ad7c422b13841920353f Author: Linus Walleij Date: Mon Feb 8 09:14:37 2016 +0100 ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz() commit 5070fb14a0154f075c8b418e5bc58a620ae85a45 upstream. When trying to set the ICST 307 clock to 25174000 Hz I ran into this arithmetic error: the icst_hz_to_vco() correctly figure out DIVIDE=2, RDW=100 and VDW=99 yielding a frequency of 25174000 Hz out of the VCO. (I replicated the icst_hz() function in a spreadsheet to verify this.) However, when I called icst_hz() on these VCO settings it would instead return 4122709 Hz. This causes an error in the common clock driver for ICST as the common clock framework will call .round_rate() on the clock which will utilize icst_hz_to_vco() followed by icst_hz() suggesting the erroneous frequency, and then the clock gets set to this. The error did not manifest in the old clock framework since this high frequency was only used by the CLCD, which calls clk_set_rate() without first calling clk_round_rate() and since the old clock framework would not call clk_round_rate() before setting the frequency, the correct values propagated into the VCO. After some experimenting I figured out that it was due to a simple arithmetic overflow: the divisor for 24Mhz reference frequency as reference becomes 24000000*2*(99+8)=0x132212400 and the "1" in bit 32 overflows and is lost. But introducing an explicit 64-by-32 bit do_div() and casting the divisor into (u64) we get the right frequency back, and the right frequency gets set. Tested on the ARM Versatile. Cc: linux-clk@vger.kernel.org Cc: Pawel Moll Signed-off-by: Linus Walleij Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 9fe0b68c4949517720546aa3717d05b491e4ee09 Author: Linus Walleij Date: Wed Feb 10 09:25:17 2016 +0100 ARM: 8519/1: ICST: try other dividends than 1 commit e972c37459c813190461dabfeaac228e00aae259 upstream. Since the dawn of time the ICST code has only supported divide by one or hang in an eternal loop. Luckily we were always dividing by one because the reference frequency for the systems using the ICSTs is 24MHz and the [min,max] values for the PLL input if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop will always terminate immediately without assigning any divisor for the reference frequency. But for the code to make sense, let's insert the missing i++ Reported-by: David Binderman Signed-off-by: Linus Walleij Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit a68f555363f5e43d0da4a6d025e23db74e2b4f1f Author: Mika Penttilä Date: Tue Jan 26 15:47:25 2016 +0000 arm64: mm: avoid calling apply_to_page_range on empty range commit 57adec866c0440976c96a4b8f5b59fb411b1cacb upstream. Calling apply_to_page_range with an empty range results in a BUG_ON from the core code. This can be triggered by trying to load the st_drv module with CONFIG_DEBUG_SET_MODULE_RONX enabled: kernel BUG at mm/memory.c:1874! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 1764 Comm: insmod Not tainted 4.5.0-rc1+ #2 Hardware name: ARM Juno development board (r0) (DT) task: ffffffc9763b8000 ti: ffffffc975af8000 task.ti: ffffffc975af8000 PC is at apply_to_page_range+0x2cc/0x2d0 LR is at change_memory_common+0x80/0x108 This patch fixes the issue by making change_memory_common (called by the set_memory_* functions) a NOP when numpages == 0, therefore avoiding the erroneous call to apply_to_page_range and bringing us into line with x86 and s390. Reviewed-by: Laura Abbott Acked-by: David Rientjes Signed-off-by: Mika Penttilä Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 242813b9a1b6d44845fbc73fe0f6db13f2401828 Author: Thomas Petazzoni Date: Fri Dec 4 14:29:02 2015 +0100 ARM: mvebu: remove duplicated regulator definition in Armada 388 GP commit 079ae0c121fd23287f4ad2be9e9f8a13f63cae73 upstream. The Armada 388 GP Device Tree file describes two times a regulator named 'reg_usb2_1_vbus', with the exact same description. This has been wrong since Armada 388 GP support was introduced. Fixes: 928413bd859c0 ("ARM: mvebu: Add Armada 388 General Purpose Development Board support") Signed-off-by: Thomas Petazzoni Signed-off-by: Gregory CLEMENT Signed-off-by: Greg Kroah-Hartman commit 602acfedc981eb887289f11f1a7565e6f48c710d Author: Alexey Kardashevskiy Date: Wed Feb 17 18:26:31 2016 +1100 powerpc/ioda: Set "read" permission when "write" is set commit 6ecad912a0073c768db1491c27ca55ad2d0ee68f upstream. Quite often drivers set only "write" permission assuming that this includes "read" permission as well and this works on plenty of platforms. However IODA2 is strict about this and produces an EEH when "read" permission is not set and reading happens. This adds a workaround in the IODA code to always add the "read" bit when the "write" bit is set. Fixes: 10b35b2b7485 ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE") Cc: Benjamin Herrenschmidt Signed-off-by: Alexey Kardashevskiy Tested-by: Douglas Miller Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit b5311270caba5392a83ed918e11a16e71b3b1e44 Author: Gavin Shan Date: Tue Feb 9 15:50:22 2016 +1100 powerpc/powernv: Fix stale PE primary bus commit 1bc74f1ccd457832dc515fc1febe6655985fdcd2 upstream. When PCI bus is unplugged during full hotplug for EEH recovery, the platform PE instance (struct pnv_ioda_pe) isn't released and it dereferences the stale PCI bus that has been released. It leads to kernel crash when referring to the stale PCI bus. This fixes the issue by correcting the PE's primary bus when it's oneline at plugging time, in pnv_pci_dma_bus_setup() which is to be called by pcibios_fixup_bus(). Reported-by: Andrew Donnellan Reported-by: Pradipta Ghosh Signed-off-by: Gavin Shan Tested-by: Andrew Donnellan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 5ecdf58c1945c40b6dcaf9764c119ecc607e8a06 Author: Gavin Shan Date: Tue Feb 9 15:50:21 2016 +1100 powerpc/eeh: Fix stale cached primary bus commit 05ba75f848647135f063199dc0e9f40fee769724 upstream. When PE is created, its primary bus is cached to pe->bus. At later point, the cached primary bus is returned from eeh_pe_bus_get(). However, we could get stale cached primary bus and run into kernel crash in one case: full hotplug as part of fenced PHB error recovery releases all PCI busses under the PHB at unplugging time and recreate them at plugging time. pe->bus is still dereferencing the PCI bus that was released. This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity of pe->bus. pe->bus is updated when its first child EEH device is online and the flag is set. Before unplugging in full hotplug for error recovery, the flag is cleared. Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE") Reported-by: Andrew Donnellan Reported-by: Pradipta Ghosh Signed-off-by: Gavin Shan Tested-by: Andrew Donnellan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 64f10cf83a6cb819d8192e679a9916519d3edf4a Author: Gavin Shan Date: Wed Dec 2 16:25:32 2015 +1100 powerpc/eeh: Fix PE location code commit 7e56f627768da4e6480986b5145dc3422bc448a5 upstream. In eeh_pe_loc_get(), the PE location code is retrieved from the "ibm,loc-code" property of the device node for the bridge of the PE's primary bus. It's not correct because the property indicates the parent PE's location code. This reads the correct PE location code from "ibm,io-base-loc-code" or "ibm,slot-location-code" property of PE parent bus's device node. Fixes: 357b2f3dd9b7 ("powerpc/eeh: Dump PE location code") Signed-off-by: Gavin Shan Tested-by: Russell Currey Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 782126b225221e5a2b30b1824b2326abba53eab9 Author: Trond Myklebust Date: Wed Jan 6 08:57:06 2016 -0500 SUNRPC: Fixup socket wait for memory commit 13331a551ab4df87f7a027d2cab392da96aba1de upstream. We're seeing hangs in the NFS client code, with loops of the form: RPC: 30317 xmit incomplete (267368 left of 524448) RPC: 30317 call_status (status -11) RPC: 30317 call_transmit (status 0) RPC: 30317 xprt_prepare_transmit RPC: 30317 xprt_transmit(524448) RPC: xs_tcp_send_request(267368) = -11 RPC: 30317 xmit incomplete (267368 left of 524448) RPC: 30317 call_status (status -11) RPC: 30317 call_transmit (status 0) RPC: 30317 xprt_prepare_transmit RPC: 30317 xprt_transmit(524448) Turns out commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") moved SOCKWQ_ASYNC_NOSPACE out of sock->flags and into sk->sk_wq->flags, however it never tried to fix up the code in net/sunrpc. The new idiom is to use the flags in the RCU protected struct socket_wq. While we're at it, clear out the now redundant places where we set/clear SOCKWQ_ASYNC_NOSPACE and SOCK_NOSPACE. In principle, sk_stream_wait_memory() is supposed to set these for us, so we only need to clear them in the particular case of our ->write_space() callback. Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") Cc: Eric Dumazet Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit d0452554b9a114d40f7f86248c2935d0ba7ddd22 Author: Andrew Gabbasov Date: Thu Dec 24 10:25:33 2015 -0600 udf: Check output buffer length when converting name to CS0 commit bb00c898ad1ce40c4bb422a8207ae562e9aea7ae upstream. If a name contains at least some characters with Unicode values exceeding single byte, the CS0 output should have 2 bytes per character. And if other input characters have single byte Unicode values, then the single input byte is converted to 2 output bytes, and the length of output becomes larger than the length of input. And if the input name is long enough, the output length may exceed the allocated buffer length. All this means that conversion from UTF8 or NLS to CS0 requires checking of output length in order to stop when it exceeds the given output buffer size. [JK: Make code return -ENAMETOOLONG instead of silently truncating the name] Signed-off-by: Andrew Gabbasov Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit eec1445767ccbb320255ad8bfc2e64c929ee21cc Author: Andrew Gabbasov Date: Thu Dec 24 10:25:32 2015 -0600 udf: Prevent buffer overrun with multi-byte characters commit ad402b265ecf6fa22d04043b41444cdfcdf4f52d upstream. udf_CS0toUTF8 function stops the conversion when the output buffer length reaches UDF_NAME_LEN-2, which is correct maximum name length, but, when checking, it leaves the space for a single byte only, while multi-bytes output characters can take more space, causing buffer overflow. Similar error exists in udf_CS0toNLS function, that restricts the output length to UDF_NAME_LEN, while actual maximum allowed length is UDF_NAME_LEN-2. In these cases the output can override not only the current buffer length field, causing corruption of the name buffer itself, but also following allocation structures, causing kernel crash. Adjust the output length checks in both functions to prevent buffer overruns in case of multi-bytes UTF8 or NLS characters. Signed-off-by: Andrew Gabbasov Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit aef22a3d69452aa516f6738774f7b19372034fdc Author: Vegard Nossum Date: Fri Dec 11 15:54:16 2015 +0100 udf: limit the maximum number of indirect extents in a row commit b0918d9f476a8434b055e362b83fa4fd1d462c3f upstream. udf_next_aext() just follows extent pointers while extents are marked as indirect. This can loop forever for corrupted filesystem. Limit number the of indirect extents we are willing to follow in a row. [JK: Updated changelog, limit, style] Signed-off-by: Vegard Nossum Cc: Jan Kara Cc: Quentin Casasnovas Cc: Andrew Morton Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman commit 66b8812e87f3fcf9d52b107a8d3102cd3f77ae1e Author: Trond Myklebust Date: Thu Jan 21 15:39:40 2016 -0500 pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn commit 082fa37d1351a41afc491d44a1d095cb8d919aa2 upstream. We must not skip encoding the statistics, or the server will see an XDR encoding error. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit d65eb5b3dfb1159ea69c6ce170f83a0a90548ac6 Author: Andrew Elble Date: Wed Dec 2 09:20:57 2015 -0500 nfs: Fix race in __update_open_stateid() commit 361cad3c89070aeb37560860ea8bfc092d545adc upstream. We've seen this in a packet capture - I've intermixed what I think was going on. The fix here is to grab the so_lock sooner. 1964379 -> #1 open (for write) reply seqid=1 1964393 -> #2 open (for read) reply seqid=2 __nfs4_close(), state->n_wronly-- nfs4_state_set_mode_locked(), changes state->state = [R] state->flags is [RW] state->state is [R], state->n_wronly == 0, state->n_rdonly == 1 1964398 -> #3 open (for write) call -> because close is already running 1964399 -> downgrade (to read) call seqid=2 (close of #1) 1964402 -> #3 open (for write) reply seqid=3 __update_open_stateid() nfs_set_open_stateid_locked(), changes state->flags state->flags is [RW] state->state is [R], state->n_wronly == 0, state->n_rdonly == 1 new sequence number is exposed now via nfs4_stateid_copy() next step would be update_open_stateflags(), pending so_lock 1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1) nfs4_close_prepare() gets so_lock and recalcs flags -> send close 1964405 -> downgrade (to read) call seqid=3 (close of #1 retry) __update_open_stateid() gets so_lock * update_open_stateflags() updates state->n_wronly. nfs4_state_set_mode_locked() updates state->state state->flags is [RW] state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1 * should have suppressed the preceding nfs4_close_prepare() from sending open_downgrade 1964406 -> write call 1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry) nfs_clear_open_stateid_locked() state->flags is [R] state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1 1964409 -> write reply (fails, openmode) Signed-off-by: Andrew Elble Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit c8841e15d6ded83aac4da7caf70a20d3c58bf6fe Author: Trond Myklebust Date: Wed Dec 30 10:57:01 2015 -0500 pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() commit 86fb449b07b8215443a30782dca5755d5b8b0577 upstream. Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out copy+paste has played cruel tricks on a nested loop. Reported-by: Jeff Layton Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 1873e6f486061eb93dc82a85fa8030857833d485 Author: Trond Myklebust Date: Tue Dec 29 18:55:19 2015 -0500 NFS: Fix attribute cache revalidation commit ade14a7df796d4e86bd9d181193c883a57b13db0 upstream. If a NFSv4 client uses the cache_consistency_bitmask in order to request only information about the change attribute, timestamps and size, then it has not revalidated all attributes, and hence the attribute timeout timestamp should not be updated. Reported-by: Donald Buczek Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit dadfe92207503548d60a2d4c6c467702ead17148 Author: Anton Protopopov Date: Wed Feb 10 12:50:21 2016 -0500 cifs: fix erroneous return value commit 4b550af519854421dfec9f7732cdddeb057134b2 upstream. The setup_ntlmv2_rsp() function may return positive value ENOMEM instead of -ENOMEM in case of kmalloc failure. Signed-off-by: Anton Protopopov Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 7e30995b26ccc952e30bbc2563fc5728c50c9e12 Author: Vasily Averin Date: Thu Jan 14 13:41:14 2016 +0300 cifs_dbg() outputs an uninitialized buffer in cifs_readdir() commit 01b9b0b28626db4a47d7f48744d70abca9914ef1 upstream. In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized, therefore its printk with "%s" modifier can leak content of kernelspace memory. If old content of this buffer does not contain '\0' access bejond end of allocated object can crash the host. Signed-off-by: Vasily Averin Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 5d80673404e691093d66fc5b02c8bc2ac1692d77 Author: Rabin Vincent Date: Wed Dec 23 07:32:41 2015 +0100 cifs: fix race between call_async() and reconnect() commit 820962dc700598ffe8cd21b967e30e7520c34748 upstream. cifs_call_async() queues the MID to the pending list and calls smb_send_rqst(). If smb_send_rqst() performs a partial send, it sets the tcpStatus to CifsNeedReconnect and returns an error code to cifs_call_async(). In this case, cifs_call_async() removes the MID from the list and returns to the caller. However, cifs_call_async() releases the server mutex _before_ removing the MID. This means that a cifs_reconnect() can race with this function and manage to remove the MID from the list and delete the entry before cifs_call_async() calls cifs_delete_mid(). This leads to various crashes due to the use after free in cifs_delete_mid(). Task1 Task2 cifs_call_async(): - rc = -EAGAIN - mutex_unlock(srv_mutex) cifs_reconnect(): - mutex_lock(srv_mutex) - mutex_unlock(srv_mutex) - list_delete(mid) - mid->callback() cifs_writev_callback(): - mutex_lock(srv_mutex) - delete(mid) - mutex_unlock(srv_mutex) - cifs_delete_mid(mid) <---- use after free Fix this by removing the MID in cifs_call_async() before releasing the srv_mutex. Also hold the srv_mutex in cifs_reconnect() until the MIDs are moved out of the pending list. Signed-off-by: Rabin Vincent Acked-by: Shirish Pargaonkar Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 88413fceab844abec9e8007aaeec35c4bfc3f3fc Author: Jamie Bainbridge Date: Sat Nov 7 22:13:49 2015 +1000 cifs: Ratelimit kernel log messages commit ec7147a99e33a9e4abad6fc6e1b40d15df045d53 upstream. Under some conditions, CIFS can repeatedly call the cifs_dbg() logging wrapper. If done rapidly enough, the console framebuffer can softlockup or "rcu_sched self-detected stall". Apply the built-in log ratelimiters to prevent such hangs. Signed-off-by: Jamie Bainbridge Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 224f259d9393ca342dd565321db48ec4a79a695f Author: Dan Carpenter Date: Tue Jan 26 12:25:21 2016 +0300 iio: inkern: fix a NULL dereference on error commit d81dac3c1c5295c61b15293074ac2bd3254e1875 upstream. In twl4030_bci_probe() there are some failure paths where we call iio_channel_release() with a NULL pointer. (Apparently, that driver can opperate without a valid channel pointer). Let's fix it by adding a NULL check in iio_channel_release(). Fixes: 2202e1fc5a29 ('drivers: power: twl4030_charger: fix link problems when building as module') Signed-off-by: Dan Carpenter Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit e16eb4bb193c3b0faf1ca78cf05a6623de5cb633 Author: Akinobu Mita Date: Thu Jan 21 01:07:31 2016 +0900 iio: pressure: mpl115: fix temperature offset sign commit 431386e783a3a6c8b7707bee32d18c353b8688b2 upstream. According to the datasheet, the resolusion of temperature sensor is -5.35 counts/C. Temperature ADC is 472 counts at 25C. (https://www.sparkfun.com/datasheets/Sensors/Pressure/MPL115A1.pdf NOTE: This is older revision, but this information is removed from the latest datasheet from nxp somehow) Temp [C] = (Tadc - 472) / -5.35 + 25 = (Tadc - 605.750000) * -0.186915888 So the correct offset is -605.750000. Signed-off-by: Akinobu Mita Acked-by: Peter Meerwald-Stadler Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 909e9c55196df318c7f3096826c8dc307cc2fd4b Author: Gabriele Mazzotta Date: Tue Jan 12 16:21:39 2016 +0100 iio: light: acpi-als: Report data as processed commit fa34e6dd44d7c02c8a8468ce4a52a7506f907bef upstream. As per the ACPI specification (Revision 5.0) [1], the data coming from the sensor represent the ambient light illuminance reading expressed in lux. So use IIO_CHAN_INFO_PROCESSED to signify that the data are pre-processed. However, to keep backward ABI compatibility, the IIO_CHAN_INFO_RAW bit is not removed. [1] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf This issue has also been responsible for at least one userspace bug report hence marking what is a small semantic fix really for stable. [2] https://github.com/hadess/iio-sensor-proxy/issues/46 Signed-off-by: Gabriele Mazzotta Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 377d1f59388fb7711b977458be7cb432f8dbbf9b Author: Yong Li Date: Wed Jan 6 09:09:43 2016 +0800 iio: dac: mcp4725: set iio name property in sysfs commit 97a249e98a72d6b79fb7350a8dd56b147e9d5bdb upstream. Without this change, the name entity for mcp4725 is missing in /sys/bus/iio/devices/iio\:device*/name With this change, name is reported correctly Signed-off-by: Yong Li Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 1c1d4f2d76293cdcef7f77f7b92863c4bf3479d9 Author: Vegard Nossum Date: Sat Jan 2 14:04:39 2016 +0100 iio: add IIO_TRIGGER dependency to STK8BA50 commit 01cc5235604d61018712c11a14d74230f6a38bf4 upstream. Ran into this on UML: drivers/iio/accel/stk8ba50.c: In function ‘stk8ba50_data_rdy_trigger_set_state’: drivers/iio/accel/stk8ba50.c:163:9: error: implicit declaration of function ‘iio_trigger_get_drvdata’ [-Werror=implicit-function-declaration] iio_trigger_get_drvdata() is defined only when IIO_TRIGGER is selected. Signed-off-by: Vegard Nossum Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit dfa6e741d472a464ee85dc2e0a44176a1496f832 Author: Vegard Nossum Date: Sat Jan 2 14:04:40 2016 +0100 iio: add HAS_IOMEM dependency to VF610_ADC commit 005ce0713006a76d2b0c924ce0e2629e5d8510c3 upstream. Ran into this on UML: drivers/built-in.o: In function `vf610_adc_probe': drivers/iio/adc/vf610_adc.c:744: undefined reference to `devm_ioremap_resource' devm_ioremap_resource() is defined only when HAS_IOMEM is selected. Signed-off-by: Vegard Nossum Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit f865d8c326dd50649d3ee3129b93e6abb6929fae Author: Markus Elfring Date: Sat Dec 19 14:14:54 2015 +0100 iio-light: Use a signed return type for ltr501_match_samp_freq() commit c08ae18560aaed50fed306a2e11f36ce70130f65 upstream. The return type "unsigned int" was used by the ltr501_match_samp_freq() function despite of the aspect that it will eventually return a negative error code. Improve this implementation detail by deletion of the type modifier then. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring Acked-by: Peter Meerwald-Stadler Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit e9b0f0e411d00b900a72891435c8997fb8eeb32b Author: Jonathan Cameron Date: Fri Jan 1 18:05:34 2016 +0000 iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer. commit 9d0be85d4e2cfa2519ae16efe7ff4a7150c43c0b upstream. Whilst this part has a hardware buffer, the identifcation that IIO cares about is the userspace facing end. It this case we push individual elements from the hardware fifo into the software interface (specifically a kfifo) rather than providing direct reads through to a hardware buffer (as we still do in the sca3000 for example). Technically the original specification as a hardware buffer could be considered wrong, but it didn't matter until the patch listed below. Result is that any attempt to enable the buffer will return -EINVAL Fixes: 225d59adf1c8 ("iio: Specify supported modes for buffers") Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit dc275a6eb9d0ebe7711d8ced07509dc5168163b3 Author: Lars-Peter Clausen Date: Fri Nov 27 14:55:56 2015 +0100 iio: adis_buffer: Fix out-of-bounds memory access commit d590faf9e8f8509a0a0aa79c38e87fcc6b913248 upstream. The SPI tx and rx buffers are both supposed to be scan_bytes amount of bytes large and a common allocation is used to allocate both buffers. This puts the beginning of the tx buffer scan_bytes bytes after the rx buffer. The initialization of the tx buffer pointer is done adding scan_bytes to the beginning of the rx buffer, but since the rx buffer is of type __be16 this will actually add two times as much and the tx buffer ends up pointing after the allocated buffer. Fix this by using scan_count, which is scan_bytes / 2, instead of scan_bytes when initializing the tx buffer pointer. Fixes: aacff892cbd5 ("staging:iio:adis: Preallocate transfer message") Signed-off-by: Lars-Peter Clausen Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit a258a959fcf31230ae660f5b0a537b8d7fc0b675 Author: James Bottomley Date: Wed Feb 10 08:03:26 2016 -0800 scsi: fix soft lockup in scsi_remove_target() on module removal commit 90a88d6ef88edcfc4f644dddc7eef4ea41bccf8b upstream. This softlockup is currently happening: [ 444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29] [ 444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda _core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_ fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th ermal [ 444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G O 4.4.0-rc5-2.g1e923a3-default #1 [ 444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E /D2164-A1, BIOS 5.00 R1.10.2164.A1 05/08/2006 [ 444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc] [ 444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000 [ 444.088002] EIP: 0060:[] EFLAGS: 00000286 CPU: 1 [ 444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20 [ 444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286 [ 444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8 [ 444.088002] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0 [ 444.088002] Stack: [ 444.088002] f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90 [ 444.088002] f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040 [ 444.088002] f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004 [ 444.088002] Call Trace: [ 444.088002] [] scsi_remove_target+0x167/0x1c0 [ 444.088002] [] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc] [ 444.088002] [] process_one_work+0x155/0x3e0 [ 444.088002] [] worker_thread+0x37/0x490 [ 444.088002] [] kthread+0x9b/0xb0 [ 444.088002] [] ret_from_kernel_thread+0x21/0x40 What appears to be happening is that something has pinned the target so it can't go into STARGET_DEL via final release and the loop in scsi_remove_target spins endlessly until that happens. The fix for this soft lockup is to not keep looping over a device that we've called remove on but which hasn't gone into DEL state. This patch will retain a simplistic memory of the last target and not keep looping over it. Reported-by: Sebastian Herbszt Tested-by: Sebastian Herbszt Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38 Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman commit 900ae746c1e954c5dd2b926d9e30863c9852396c Author: Mika Westerberg Date: Wed Jan 27 16:19:13 2016 +0200 SCSI: Add Marvell Console to VPD blacklist commit 82c43310508eb19eb41fe7862e89afeb74030b84 upstream. I have a Marvell 88SE9230 SATA Controller that has some sort of integrated console SCSI device attached to one of the ports. ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 ata14.00: configured for UDMA/66 scsi 13:0:0:0: Processor Marvell Console 1.01 PQ: 0 ANSI: 5 Sending it VPD INQUIRY command seem to always fail with following error: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 ata14.00: irq_stat 0x40000001 ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 2 dma 16640 in Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) ata14: hard resetting link This has been minor annoyance (only error printed on dmesg) until commit 09e2b0b14690 ("scsi: rescan VPD attributes") added call to scsi_attach_vpd() in scsi_rescan_device(). The commit causes the system to splat out following errors continuously without ever reaching the UI: ata14.00: configured for UDMA/66 ata14: EH complete ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 ata14.00: irq_stat 0x40000001 ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 6 dma 16640 in Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) ata14: hard resetting link ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata14.00: configured for UDMA/66 ata14: EH complete ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 ata14.00: irq_stat 0x40000001 ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 7 dma 16640 in Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) Without in-depth understanding of SCSI layer and the Marvell controller, I suspect this happens because when the link goes down (because of an error) we schedule scsi_rescan_device() which again fails to read VPD data... ad infinitum. Since VPD data cannot be read from the device anyway we prevent the SCSI layer from even trying by blacklisting the device. This gets away the error and the system starts up normally. [mkp: Widened the match to all revisions of this device] Signed-off-by: Mika Westerberg Reported-by: Kirill A. Shutemov Reported-by: Alexander Duyck Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 32c55052aa3364bd4a0d0c2b5f53ed9b95ae8d05 Author: Hannes Reinecke Date: Fri Jan 22 15:42:41 2016 +0100 scsi_dh_rdac: always retry MODE SELECT on command lock violation commit d2d06d4fe0f2cc2df9b17fefec96e6e1a1271d91 upstream. If MODE SELECT returns with sense '05/91/36' (command lock violation) it should always be retried without counting the number of retries. During an HBA upgrade or similar circumstances one might see a flood of MODE SELECT command from various HBAs, which will easily trigger the sense code and exceed the retry count. Signed-off-by: Hannes Reinecke Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 4c654fc9357b70803c1878ec729ef61a8be0af52 Author: Kirill A. Shutemov Date: Tue Feb 2 16:57:35 2016 -0800 drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration commit 461c7fa126794157484dca48e88effa4963e3af3 upstream. Reduced testcase: #include #include #include #include #define SIZE 0x2000 int main() { int fd; void *p; fd = open("/dev/sg0", O_RDWR); p = mmap(NULL, SIZE, PROT_EXEC, MAP_PRIVATE | MAP_LOCKED, fd, 0); mbind(p, SIZE, 0, NULL, 0, MPOL_MF_MOVE); return 0; } We shouldn't try to migrate pages in sg VMA as we don't have a way to update Sg_scatter_hold::pages accordingly from mm core. Let's mark the VMA as VM_IO to indicate to mm core that the VMA is not migratable. Signed-off-by: Kirill A. Shutemov Reported-by: Dmitry Vyukov Acked-by: Vlastimil Babka Cc: Doug Gilbert Cc: David Rientjes Cc: Naoya Horiguchi Cc: "Kirill A. Shutemov" Cc: Shiraz Hashim Cc: Hugh Dickins Cc: Sasha Levin Cc: syzkaller Cc: Kostya Serebryany Cc: Alexander Potapenko Cc: James Bottomley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d763177d00d72af0efc0f13745b8c26c832845bf Author: Alan Stern Date: Wed Jan 20 11:26:01 2016 -0500 SCSI: fix crashes in sd and sr runtime PM commit 13b4389143413a1f18127c07f72c74cad5b563e8 upstream. Runtime suspend during driver probe and removal can cause problems. The driver's runtime_suspend or runtime_resume callbacks may invoked before the driver has finished binding to the device or after the driver has unbound from the device. This problem shows up with the sd and sr drivers, and can cause disk or CD/DVD drives to become unusable as a result. The fix is simple. The drivers store a pointer to the scsi_disk or scsi_cd structure as their private device data when probing is finished, so we simply have to be sure to clear the private data during removal and test it during runtime suspend/resume. This fixes . Signed-off-by: Alan Stern Reported-by: Paul Menzel Reported-by: Erich Schubert Reported-by: Alexandre Rossi Tested-by: Paul Menzel Tested-by: Erich Schubert Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman commit dcec7af70910c48cba1e7b42383b9696356afc33 Author: Nicholas Bellinger Date: Tue Jan 19 16:15:27 2016 -0800 iscsi-target: Fix potential dead-lock during node acl delete commit 26a99c19f810b2593410899a5b304b21b47428a6 upstream. This patch is a iscsi-target specific bug-fix for a dead-lock that can occur during explicit struct se_node_acl->acl_group se_session deletion via configfs rmdir(2), when iscsi-target time2retain timer is still active. It changes iscsi-target to obtain se_portal_group->session_lock internally using spin_in_locked() to check for the specific se_node_acl configfs shutdown rmdir(2) case. Note this patch is intended for stable, and the subsequent v4.5-rc patch converts target_core_tpg.c to use proper se_sess->sess_kref reference counting for both se_node_acl deletion + se_node_acl->queue_depth se_session restart. Reported-by:: Sagi Grimberg Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Andy Grover Cc: Mike Christie Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman commit 954bb20f70ed1ab93406c6e01e1cd54d866c977e Author: Mike Christie Date: Thu Jan 7 16:34:05 2016 -0600 scsi: add Synology to 1024 sector blacklist commit 9055082fb100cc66e20c048251d05159f5f2cfba upstream. Another iscsi target that cannot handle large IOs, but does not tell us a limit. The Synology iSCSI targets report: Block limits VPD page (SBC): Write same no zero (WSNZ): 0 Maximum compare and write length: 0 blocks Optimal transfer length granularity: 0 blocks Maximum transfer length: 0 blocks Optimal transfer length: 0 blocks Maximum prefetch length: 0 blocks Maximum unmap LBA count: 0 Maximum unmap block descriptor count: 0 Optimal unmap granularity: 0 Unmap granularity alignment valid: 0 Unmap granularity alignment: 0 Maximum write same length: 0x0 blocks and the size of the command it can handle seems to depend on how much memory it can allocate at the time. This results in IO errors when handling large IOs. This patch just has us use the old 1024 default sectors for this target by adding it to the scsi blacklist. We do not have good contacs with this vendors, so I have not been able to try and fix on their side. I have posted this a long while back, but it was not merged. This version just fixes it up for merge/patch failures in the original version. Reported-by: Ancoron Luciferis Reported-by: Michael Meyers Signed-off-by: Mike Christie Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 5b27adfac012bd383de3e950e2c8c431da9ebfd5 Author: James Bottomley Date: Wed Jan 13 08:10:31 2016 -0800 klist: fix starting point removed bug in klist iterators commit 00cd29b799e3449f0c68b1cc77cd4a5f95b42d17 upstream. The starting node for a klist iteration is often passed in from somewhere way above the klist infrastructure, meaning there's no guarantee the node is still on the list. We've seen this in SCSI where we use bus_find_device() to iterate through a list of devices. In the face of heavy hotplug activity, the last device returned by bus_find_device() can be removed before the next call. This leads to Dec 3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50() Dec 3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core Dec 3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2 Dec 3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013 Dec 3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000 Dec 3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000 Dec 3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60 Dec 3 13:22:02 localhost kernel: Call Trace: Dec 3 13:22:02 localhost kernel: [] dump_stack+0x44/0x55 Dec 3 13:22:02 localhost kernel: [] warn_slowpath_common+0x82/0xc0 Dec 3 13:22:02 localhost kernel: [] ? proc_scsi_show+0x20/0x20 Dec 3 13:22:02 localhost kernel: [] warn_slowpath_null+0x1a/0x20 Dec 3 13:22:02 localhost kernel: [] klist_iter_init_node+0x3d/0x50 Dec 3 13:22:02 localhost kernel: [] bus_find_device+0x51/0xb0 Dec 3 13:22:02 localhost kernel: [] scsi_seq_next+0x2d/0x40 [...] And an eventual crash. It can actually occur in any hotplug system which has a device finder and a starting device. We can fix this globally by making sure the starting node for klist_iter_init_node() is actually a member of the list before using it (and by starting from the beginning if it isn't). Reported-by: Ewan D. Milne Tested-by: Ewan D. Milne Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman commit 152fb02241b60ffb8d406b87c68d1908478a205f Author: Steven Rostedt (Red Hat) Date: Mon Feb 15 12:36:14 2016 -0500 tracepoints: Do not trace when cpu is offline commit f37755490fe9bf76f6ba1d8c6591745d3574a6a6 upstream. The tracepoint infrastructure uses RCU sched protection to enable and disable tracepoints safely. There are some instances where tracepoints are used in infrastructure code (like kfree()) that get called after a CPU is going offline, and perhaps when it is coming back online but hasn't been registered yet. This can probuce the following warning: [ INFO: suspicious RCU usage. ] 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S ------------------------------- include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34 Call Trace: [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable) [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170 [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440 [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100 [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150 [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140 [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310 [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60 [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40 [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560 [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360 [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14 This warning is not a false positive either. RCU is not protecting code that is being executed while the CPU is offline. Instead of playing "whack-a-mole(TM)" and adding conditional statements to the tracepoints we find that are used in this instance, simply add a cpu_online() test to the tracepoint code where the tracepoint will be ignored if the CPU is offline. Use of raw_smp_processor_id() is fine, as there should never be a case where the tracepoint code goes from running on a CPU that is online and suddenly gets migrated to a CPU that is offline. Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org Reported-by: Denis Kirjanov Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints") Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 2fa82bbbc73a7d8716e3f7aba6b2b5c84147a2fc Author: Arnd Bergmann Date: Fri Feb 12 22:26:42 2016 +0100 tracing: Fix freak link error caused by branch tracer commit b33c8ff4431a343561e2319f17c14286f2aa52e2 upstream. In my randconfig tests, I came across a bug that involves several components: * gcc-4.9 through at least 5.3 * CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files * CONFIG_PROFILE_ALL_BRANCHES overriding every if() * The optimized implementation of do_div() that tries to replace a library call with an division by multiplication * code in drivers/media/dvb-frontends/zl10353.c doing u32 adc_clock = 450560; /* 45.056 MHz */ if (state->config.adc_clock) adc_clock = state->config.adc_clock; do_div(value, adc_clock); In this case, gcc fails to determine whether the divisor in do_div() is __builtin_constant_p(). In particular, it concludes that __builtin_constant_p(adc_clock) is false, while __builtin_constant_p(!!adc_clock) is true. That in turn throws off the logic in do_div() that also uses __builtin_constant_p(), and instead of picking either the constant- optimized division, and the code in ilog2() that uses __builtin_constant_p() to figure out whether it knows the answer at compile time. The result is a link error from failing to find multiple symbols that should never have been called based on the __builtin_constant_p(): dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN' dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod' ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined! ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined! This patch avoids the problem by changing __trace_if() to check whether the condition is known at compile-time to be nonzero, rather than checking whether it is actually a constant. I see this one link error in roughly one out of 1600 randconfig builds on ARM, and the patch fixes all known instances. Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de Acked-by: Nicolas Pitre Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y") Signed-off-by: Arnd Bergmann Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman commit 6fa74f50e357b0d87fad34a4b7ccbc2beb42b859 Author: Adrian Hunter Date: Tue Jan 26 14:05:20 2016 +0200 perf tools: tracepoint_error() can receive e=NULL, robustify it commit ec183d22cc284a7a1e17f0341219d8ec8ca070cc upstream. Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 ) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 ) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 , argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter Tested-by: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Josh Poimboeuf Cc: Tong Zhang Cc: Wang Nan Fixes: 196581717d85 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 6e50ddaf0991285f6b6abb4d575f0cc0e8ed7834 Author: Steven Rostedt Date: Mon Nov 16 17:25:16 2015 -0500 tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines commit 32abc2ede536aae52978d6c0a8944eb1df14f460 upstream. When a long value is read on 32 bit machines for 64 bit output, the parsing needs to change "%lu" into "%llu", as the value is read natively. Unfortunately, if "%llu" is already there, the code will add another "l" to it and fail to parse it properly. Signed-off-by: Steven Rostedt Acked-by: Namhyung Kim Link: http://lkml.kernel.org/r/20151116172516.4b79b109@gandalf.local.home Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 969624b7c1c8c9784651eb97431e6f2bbb7a024c Author: Jann Horn Date: Wed Jan 20 15:00:04 2016 -0800 ptrace: use fsuid, fsgid, effective creds for fs access checks commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream. By checking the effective credentials instead of the real UID / permitted capabilities, ensure that the calling process actually intended to use its credentials. To ensure that all ptrace checks use the correct caller credentials (e.g. in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS flag), use two new flags and require one of them to be set. The problem was that when a privileged task had temporarily dropped its privileges, e.g. by calling setreuid(0, user_uid), with the intent to perform following syscalls with the credentials of a user, it still passed ptrace access checks that the user would not be able to pass. While an attacker should not be able to convince the privileged task to perform a ptrace() syscall, this is a problem because the ptrace access check is reused for things in procfs. In particular, the following somewhat interesting procfs entries only rely on ptrace access checks: /proc/$pid/stat - uses the check for determining whether pointers should be visible, useful for bypassing ASLR /proc/$pid/maps - also useful for bypassing ASLR /proc/$pid/cwd - useful for gaining access to restricted directories that contain files with lax permissions, e.g. in this scenario: lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar drwx------ root root /root drwxr-xr-x root root /root/foobar -rw-r--r-- root root /root/foobar/secret Therefore, on a system where a root-owned mode 6755 binary changes its effective credentials as described and then dumps a user-specified file, this could be used by an attacker to reveal the memory layout of root's processes or reveal the contents of files he is not allowed to access (through /proc/$pid/cwd). [akpm@linux-foundation.org: fix warning] Signed-off-by: Jann Horn Acked-by: Kees Cook Cc: Casey Schaufler Cc: Oleg Nesterov Cc: Ingo Molnar Cc: James Morris Cc: "Serge E. Hallyn" Cc: Andy Shevchenko Cc: Andy Lutomirski Cc: Al Viro Cc: "Eric W. Biederman" Cc: Willy Tarreau Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit ba6d92801ba4e4c0262b70ea00922a71092999bb Author: Filipe Manana Date: Mon Feb 15 16:20:26 2016 +0000 Btrfs: fix direct IO requests not reporting IO error to user space commit 1636d1d77ef4e01e57f706a4cae3371463896136 upstream. If a bio for a direct IO request fails, we were not setting the error in the parent bio (the main DIO bio), making us not return the error to user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO() return the number of bytes issued for IO and not the error a bio created and submitted by btrfs_submit_direct() got from the block layer. This essentially happens because when we call: dio_end_io(dio_bio, bio->bi_error); It does not set dio_bio->bi_error to the value of the second argument. So just add this missing assignment in endio callbacks, just as we do in the error path at btrfs_submit_direct() when we fail to clone the dio bio or allocate its private object. This follows the convention of what is done with other similar APIs such as bio_endio() where the caller is responsible for setting the bi_error field in the bio it passes as an argument to bio_endio(). This was detected by the new generic test cases in xfstests: 271, 272, 276 and 278. Which essentially setup a dm error target, then load the error table, do a direct IO write and unload the error table. They expect the write to fail with -EIO, which was not getting reported when testing against btrfs. Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio") Signed-off-by: Filipe Manana Signed-off-by: Greg Kroah-Hartman commit e8eced78e0252040c4e6bb633b40afb11a176416 Author: Filipe Manana Date: Wed Feb 3 19:17:27 2016 +0000 Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl commit 0c0fe3b0fa45082cd752553fdb3a4b42503a118e upstream. While doing some tests I ran into an hang on an extent buffer's rwlock that produced the following trace: [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] [39389.800016] irq event stamp: 0 [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) [39389.800016] hardirqs last disabled at (0): [] copy_process+0x638/0x1a35 [39389.800016] softirqs last enabled at (0): [] copy_process+0x638/0x1a35 [39389.800016] softirqs last disabled at (0): [< (null)>] (null) [39389.800016] CPU: 14 PID: 32165 Comm: fdm-stress Not tainted 4.4.0-rc6-btrfs-next-18+ #1 [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 [39389.800016] RIP: 0010:[] [] queued_spin_lock_slowpath+0x57/0x158 [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 [39389.800016] Stack: [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c [39389.800016] Call Trace: [39389.800016] [] queued_read_lock_slowpath+0x46/0x60 [39389.800016] [] do_raw_read_lock+0x3e/0x41 [39389.800016] [] _raw_read_lock+0x3d/0x44 [39389.800016] [] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] [39389.800016] [] btrfs_tree_read_lock+0x54/0x125 [btrfs] [39389.800016] [] ? btrfs_find_item+0xa7/0xd2 [btrfs] [39389.800016] [] btrfs_ref_to_path+0xd6/0x174 [btrfs] [39389.800016] [] inode_to_path+0x53/0xa2 [btrfs] [39389.800016] [] paths_from_inode+0x117/0x2ec [btrfs] [39389.800016] [] btrfs_ioctl+0xd5b/0x2793 [btrfs] [39389.800016] [] ? arch_local_irq_save+0x9/0xc [39389.800016] [] ? __this_cpu_preempt_check+0x13/0x15 [39389.800016] [] ? arch_local_irq_save+0x9/0xc [39389.800016] [] ? rcu_read_unlock+0x3e/0x5d [39389.800016] [] do_vfs_ioctl+0x42b/0x4ea [39389.800016] [] ? __fget_light+0x62/0x71 [39389.800016] [] SyS_ioctl+0x57/0x79 [39389.800016] [] entry_SYSCALL_64_fastpath+0x12/0x6f [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] [39389.800012] irq event stamp: 0 [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) [39389.800012] hardirqs last disabled at (0): [] copy_process+0x638/0x1a35 [39389.800012] softirqs last enabled at (0): [] copy_process+0x638/0x1a35 [39389.800012] softirqs last disabled at (0): [< (null)>] (null) [39389.800012] CPU: 15 PID: 32166 Comm: fdm-stress Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 [39389.800012] RIP: 0010:[] [] queued_write_lock_slowpath+0x62/0x72 [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 [39389.800012] Stack: [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 [39389.800012] Call Trace: [39389.800012] [] do_raw_write_lock+0x72/0x8c [39389.800012] [] _raw_write_lock+0x3a/0x41 [39389.800012] [] ? btrfs_tree_lock+0x119/0x251 [btrfs] [39389.800012] [] btrfs_tree_lock+0x119/0x251 [btrfs] [39389.800012] [] ? rcu_read_unlock+0x5b/0x5d [btrfs] [39389.800012] [] ? btrfs_root_node+0xda/0xe6 [btrfs] [39389.800012] [] btrfs_lock_root_node+0x22/0x42 [btrfs] [39389.800012] [] btrfs_search_slot+0x1b8/0x758 [btrfs] [39389.800012] [] ? time_hardirqs_on+0x15/0x28 [39389.800012] [] btrfs_lookup_inode+0x31/0x95 [btrfs] [39389.800012] [] ? trace_hardirqs_on+0xd/0xf [39389.800012] [] ? mutex_lock_nested+0x397/0x3bc [39389.800012] [] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] [39389.800012] [] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] [39389.800012] [] ? _raw_spin_unlock+0x31/0x44 [39389.800012] [] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] [39389.800012] [] btrfs_run_delayed_items+0x11/0x13 [btrfs] [39389.800012] [] btrfs_commit_transaction+0x234/0x96e [btrfs] [39389.800012] [] btrfs_sync_fs+0x145/0x1ad [btrfs] [39389.800012] [] btrfs_ioctl+0x11d2/0x2793 [btrfs] [39389.800012] [] ? arch_local_irq_save+0x9/0xc [39389.800012] [] ? __might_fault+0x4c/0xa7 [39389.800012] [] ? __might_fault+0x4c/0xa7 [39389.800012] [] ? arch_local_irq_save+0x9/0xc [39389.800012] [] ? rcu_read_unlock+0x3e/0x5d [39389.800012] [] do_vfs_ioctl+0x42b/0x4ea [39389.800012] [] ? __fget_light+0x62/0x71 [39389.800012] [] SyS_ioctl+0x57/0x79 [39389.800012] [] entry_SYSCALL_64_fastpath+0x12/0x6f [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 This happens because in the code path executed by the inode_paths ioctl we end up nesting two calls to read lock a leaf's rwlock when after the first call to read_lock() and before the second call to read_lock(), another task (running the delayed items as part of a transaction commit) has already called write_lock() against the leaf's rwlock. This situation is illustrated by the following diagram: Task A Task B btrfs_ref_to_path() btrfs_commit_transaction() read_lock(&eb->lock); btrfs_run_delayed_items() __btrfs_commit_inode_delayed_items() __btrfs_update_delayed_inode() btrfs_lookup_inode() write_lock(&eb->lock); --> task waits for lock read_lock(&eb->lock); --> makes this task hang forever (and task B too of course) So fix this by avoiding doing the nested read lock, which is easily avoidable. This issue does not happen if task B calls write_lock() after task A does the second call to read_lock(), however there does not seem to exist anything in the documentation that mentions what is the expected behaviour for recursive locking of rwlocks (leaving the idea that doing so is not a good usage of rwlocks). Also, as a side effect necessary for this fix, make sure we do not needlessly read lock extent buffers when the input path has skip_locking set (used when called from send). Signed-off-by: Filipe Manana Signed-off-by: Greg Kroah-Hartman commit be1232bcea113204523b7d530cc5d6d534916276 Author: Filipe Manana Date: Wed Jan 27 18:37:47 2016 +0000 Btrfs: fix page reading in extent_same ioctl leading to csum errors commit 313140023026ae542ad76e7e268c56a1eaa2c28e upstream. In the extent_same ioctl, we were grabbing the pages (locked) and attempting to read them without bothering about any concurrent IO against them. That is, we were not checking for any ongoing ordered extents nor waiting for them to complete, which leads to a race where the extent_same() code gets a checksum verification error when it reads the pages, producing a message like the following in dmesg and making the operation fail to user space with -ENOMEM: [18990.161265] BTRFS warning (device sdc): csum failed ino 259 off 495616 csum 685204116 expected csum 1515870868 Fix this by using btrfs_readpage() for reading the pages instead of extent_read_full_page_nolock(), which waits for any concurrent ordered extents to complete and locks the io range. Also do better error handling and don't treat all failures as -ENOMEM, as that's clearly misleasing, becoming identical to the checks and operation of prepare_uptodate_page(). The use of extent_read_full_page_nolock() was required before commit f441460202cb ("btrfs: fix deadlock with extent-same and readpage"), as we had the range locked in an inode's io tree before attempting to read the pages. Fixes: f441460202cb ("btrfs: fix deadlock with extent-same and readpage") Signed-off-by: Filipe Manana Signed-off-by: Greg Kroah-Hartman commit df567e6dcd22097d2f31b6a8bf481959f1352b6a Author: Filipe Manana Date: Wed Jan 27 10:20:58 2016 +0000 Btrfs: fix invalid page accesses in extent_same (dedup) ioctl commit e0bd70c67bf996b360f706b6c643000f2e384681 upstream. In the extent_same ioctl we are getting the pages for the source and target ranges and unlocking them immediately after, which is incorrect because later we attempt to map them (with kmap_atomic) and access their contents at btrfs_cmp_data(). When we do such access the pages might have been relocated or removed from memory, which leads to an invalid memory access. This issue is detected on a kernel with CONFIG_DEBUG_PAGEALLOC=y which produces a trace like the following: 186736.677437] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [186736.680382] Modules linked in: btrfs dm_flakey dm_mod ppdev xor raid6_pq sha256_generic hmac drbg ansi_cprng acpi_cpufreq evdev sg aesni_intel aes_x86_64 parport_pc ablk_helper tpm_tis psmouse parport i2c_piix4 tpm cryptd i2c_core lrw processor button serio_raw pcspkr gf128mul glue_helper loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] [186736.681319] CPU: 13 PID: 10222 Comm: duperemove Tainted: G W 4.4.0-rc6-btrfs-next-18+ #1 [186736.681319] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [186736.681319] task: ffff880132600400 ti: ffff880362284000 task.ti: ffff880362284000 [186736.681319] RIP: 0010:[] [] memcmp+0xb/0x22 [186736.681319] RSP: 0018:ffff880362287d70 EFLAGS: 00010287 [186736.681319] RAX: 000002c002468acf RBX: 0000000012345678 RCX: 0000000000000000 [186736.681319] RDX: 0000000000001000 RSI: 0005d129c5cf9000 RDI: 0005d129c5cf9000 [186736.681319] RBP: ffff880362287d70 R08: 0000000000000000 R09: 0000000000001000 [186736.681319] R10: ffff880000000000 R11: 0000000000000476 R12: 0000000000001000 [186736.681319] R13: ffff8802f91d4c88 R14: ffff8801f2a77830 R15: ffff880352e83e40 [186736.681319] FS: 00007f27b37fe700(0000) GS:ffff88043dda0000(0000) knlGS:0000000000000000 [186736.681319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [186736.681319] CR2: 00007f27a406a000 CR3: 0000000217421000 CR4: 00000000001406e0 [186736.681319] Stack: [186736.681319] ffff880362287ea0 ffffffffa048d0bd 000000000009f000 0000000000001000 [186736.681319] 0100000000000000 ffff8801f2a77850 ffff8802f91d49b0 ffff880132600400 [186736.681319] 00000000000004f8 ffff8801c1efbe41 0000000000000000 0000000000000038 [186736.681319] Call Trace: [186736.681319] [] btrfs_ioctl+0x24cb/0x2731 [btrfs] [186736.681319] [] ? arch_local_irq_save+0x9/0xc [186736.681319] [] ? rcu_read_unlock+0x3e/0x5d [186736.681319] [] do_vfs_ioctl+0x42b/0x4ea [186736.681319] [] ? __fget_light+0x62/0x71 [186736.681319] [] SyS_ioctl+0x57/0x79 [186736.681319] [] entry_SYSCALL_64_fastpath+0x12/0x6f [186736.681319] Code: 0a 3c 6e 74 0d 3c 79 74 04 3c 59 75 0c c6 06 01 eb 03 c6 06 00 31 c0 eb 05 b8 ea ff ff ff 5d c3 55 31 c9 48 89 e5 48 39 d1 74 13 <0f> b6 04 0f 44 0f b6 04 0e 48 ff c1 44 29 c0 74 ea eb 02 31 c0 (gdb) list *(btrfs_ioctl+0x24cb) 0x5e0e1 is in btrfs_ioctl (fs/btrfs/ioctl.c:2972). 2967 dst_addr = kmap_atomic(dst_page); 2968 2969 flush_dcache_page(src_page); 2970 flush_dcache_page(dst_page); 2971 2972 if (memcmp(addr, dst_addr, cmp_len)) 2973 ret = BTRFS_SAME_DATA_DIFFERS; 2974 2975 kunmap_atomic(addr); 2976 kunmap_atomic(dst_addr); So fix this by making sure we keep the pages locked and respect the same locking order as everywhere else: get and lock the pages first and then lock the range in the inode's io tree (like for example at __btrfs_buffered_write() and extent_readpages()). If an ordered extent is found after locking the range in the io tree, unlock the range, unlock the pages, wait for the ordered extent to complete and repeat the entire locking process until no overlapping ordered extents are found. Signed-off-by: Filipe Manana Signed-off-by: Greg Kroah-Hartman commit b58081d430b4eaf6bf7d407cc5d4bfa39488b6c1 Author: David Sterba Date: Fri Nov 13 13:44:28 2015 +0100 btrfs: properly set the termination value of ctx->pos in readdir commit bc4ef7592f657ae81b017207a1098817126ad4cb upstream. The value of ctx->pos in the last readdir call is supposed to be set to INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a larger value, then it's LLONG_MAX. There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++" overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a 64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before the increment. We can get to that situation like that: * emit all regular readdir entries * still in the same call to readdir, bump the last pos to INT_MAX * next call to readdir will not emit any entries, but will reach the bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX Normally this is not a problem, but if we call readdir again, we'll find 'pos' set to LLONG_MAX and the unconditional increment will overflow. The report from Victor at (http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging print shows that pattern: Overflow: e Overflow: 7fffffff Overflow: 7fffffffffffffff PAX: size overflow detected in function btrfs_real_readdir fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0; context: dir_context; CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015 ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48 ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78 ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8 Call Trace: [] dump_stack+0x4c/0x7f [] report_size_overflow+0x36/0x40 [] btrfs_real_readdir+0x69c/0x6d0 [] iterate_dir+0xa8/0x150 [] ? __fget_light+0x2d/0x70 [] SyS_getdents+0xba/0x1c0 Overflow: 1a [] ? iterate_dir+0x150/0x150 [] entry_SYSCALL_64_fastpath+0x12/0x83 The jump from 7fffffff to 7fffffffffffffff happens when new dir entries are not yet synced and are processed from the delayed list. Then the code could go to the bump section again even though it might not emit any new dir entries from the delayed list. The fix avoids entering the "bump" section again once we've finished emitting the entries, both for synced and delayed entries. References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 Reported-by: Victor Signed-off-by: David Sterba Tested-by: Holger Hoffstätte Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman commit dfd2961ab6eda6c39aa9d7ddc5e6fb754b16a750 Author: David Sterba Date: Mon Jan 25 11:02:06 2016 +0100 Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()" commit 80ad623edd2d0ccb47d85357ee31c97e6c684e82 upstream. This reverts commit 696249132158014d594896df3a81390616069c5c. The cleaner thread can block freezing when there's a snapshot cleaning in progress and the other threads get suspended first. From the logs provided by Martin we're waiting for reading extent pages: kernel: PM: Syncing filesystems ... done. kernel: Freezing user space processes ... (elapsed 0.015 seconds) done. kernel: Freezing remaining freezable tasks ... kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0): kernel: btrfs-cleaner D ffff88033dd13bc0 0 152 2 0x00000000 kernel: ffff88032ebc2e00 ffff88032e750000 ffff88032e74fa50 7fffffffffffffff kernel: ffffffff814a58df 0000000000000002 ffffea000934d580 ffffffff814a5451 kernel: 7fffffffffffffff ffffffff814a6e8f 0000000000000000 0000000000000020 kernel: Call Trace: kernel: [] ? bit_wait+0x2c/0x2c kernel: [] ? schedule+0x6f/0x7c kernel: [] ? schedule_timeout+0x2f/0xd8 kernel: [] ? timekeeping_get_ns+0xa/0x2e kernel: [] ? ktime_get+0x36/0x44 kernel: [] ? io_schedule_timeout+0x94/0xf2 kernel: [] ? io_schedule_timeout+0x94/0xf2 kernel: [] ? bit_wait_io+0x2c/0x30 kernel: [] ? __wait_on_bit+0x41/0x73 kernel: [] ? wait_on_page_bit+0x6d/0x72 kernel: [] ? autoremove_wake_function+0x2a/0x2a kernel: [] ? read_extent_buffer_pages+0x1bd/0x203 kernel: [] ? free_root_pointers+0x4c/0x4c kernel: [] ? btree_read_extent_buffer_pages.constprop.57+0x5a/0xe9 kernel: [] ? read_tree_block+0x2d/0x45 kernel: [] ? read_block_for_search.isra.34+0x22a/0x26b kernel: [] ? btrfs_set_path_blocking+0x1e/0x4a kernel: [] ? btrfs_search_slot+0x648/0x736 kernel: [] ? btrfs_lookup_extent_info+0xb7/0x2c7 kernel: [] ? walk_down_proc+0x9c/0x1ae kernel: [] ? walk_down_tree+0x40/0xa4 kernel: [] ? btrfs_drop_snapshot+0x2da/0x664 kernel: [] ? finish_task_switch+0x126/0x167 kernel: [] ? btrfs_clean_one_deleted_snapshot+0xa6/0xb0 kernel: [] ? cleaner_kthread+0x13e/0x17b kernel: [] ? btrfs_item_end+0x33/0x33 kernel: [] ? kthread+0x95/0x9d kernel: [] ? kthread_parkme+0x16/0x16 kernel: [] ? ret_from_fork+0x3f/0x70 kernel: [] ? kthread_parkme+0x16/0x16 As this affects a released kernel (4.4) we need a minimal fix for stable kernels. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=108361 Reported-by: Martin Ziegler CC: Jiri Kosina Signed-off-by: David Sterba Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman commit 4e6943903a8eb700bf13a7be572249b34d438574 Author: Filipe Manana Date: Wed Jan 6 22:42:35 2016 +0000 Btrfs: fix fitrim discarding device area reserved for boot loader's use commit 8cdc7c5b00d945a3c823fc4277af304abb9cb43d upstream. As of the 4.3 kernel release, the fitrim ioctl can now discard any region of a disk that is not allocated to any chunk/block group, including the first megabyte which is used for our primary superblock and by the boot loader (grub for example). Fix this by not allowing to trim/discard any region in the device starting with an offset not greater than min(alloc_start_mount_option, 1Mb), just as it was not possible before 4.3. A reproducer test case for xfstests follows. seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 # Write to the [0, 64Kb[ and [68Kb, 1Mb[ ranges of the device. These ranges are # reserved for a boot loader to use (GRUB for example) and btrfs should never # use them - neither for allocating metadata/data nor should trim/discard them. # The range [64Kb, 68Kb[ is used for the primary superblock of the filesystem. $XFS_IO_PROG -c "pwrite -S 0xfd 0 64K" $SCRATCH_DEV | _filter_xfs_io $XFS_IO_PROG -c "pwrite -S 0xfd 68K 956K" $SCRATCH_DEV | _filter_xfs_io # Now mount the filesystem and perform a fitrim against it. _scratch_mount _require_batched_discard $SCRATCH_MNT $FSTRIM_PROG $SCRATCH_MNT # Now unmount the filesystem and verify the content of the ranges was not # modified (no trim/discard happened on them). _scratch_unmount echo "Content of the ranges [0, 64Kb] and [68Kb, 1Mb[ after fitrim:" od -t x1 -N $((64 * 1024)) $SCRATCH_DEV od -t x1 -j $((68 * 1024)) -N $((956 * 1024)) $SCRATCH_DEV status=0 exit Reported-by: Vincent Petry Reported-by: Andrei Borzenkov Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109341 Fixes: 499f377f49f0 (btrfs: iterate over unused chunk space in FITRIM) Signed-off-by: Filipe Manana Signed-off-by: Greg Kroah-Hartman commit c57e49b50bc5ab5089bc1295bab9106925033c40 Author: David Sterba Date: Mon Nov 30 17:27:06 2015 +0100 btrfs: handle invalid num_stripes in sys_array commit f5cdedd73fa71b74dcc42f2a11a5735d89ce7c4f upstream. We can handle the special case of num_stripes == 0 directly inside btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to catch other unhandled cases where we fail to validate external data. A crafted or corrupted image crashes at mount time: BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0 BTRFS info (device loop0): disk space caching is enabled BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()! Kernel panic - not syncing: BUG! CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25 Stack: 637af890 60062489 602aeb2e 604192ba 60387961 00000011 637af8a0 6038a835 637af9c0 6038776b 634ef32b 00000000 Call Trace: [<6001c86d>] show_stack+0xfe/0x15b [<6038a835>] dump_stack+0x2a/0x2c [<6038776b>] panic+0x13e/0x2b3 [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff [<601cfbbe>] open_ctree+0x192d/0x27af [<6019c2c1>] btrfs_mount+0x8f5/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<6019bcb0>] btrfs_mount+0x2e4/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<600d710b>] do_mount+0xa35/0xbc9 [<600d7557>] SyS_mount+0x95/0xc8 [<6001e884>] handle_syscall+0x6b/0x8e Reported-by: Jiri Slaby Reported-by: Vegard Nossum Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit bbfe21c87bd0f529d19f077051a52d779c785c6c Author: Eryu Guan Date: Fri Feb 12 01:20:43 2016 -0500 ext4: don't read blocks from disk after extents being swapped commit bcff24887d00bce102e0857d7b0a8c44a40f53d1 upstream. I notice ext4/307 fails occasionally on ppc64 host, reporting md5 checksum mismatch after moving data from original file to donor file. The reason is that move_extent_per_page() calls __block_write_begin() and block_commit_write() to write saved data from original inode blocks to donor inode blocks, but __block_write_begin() not only maps buffer heads but also reads block content from disk if the size is not block size aligned. At this time the physical block number in mapped buffer head is pointing to the donor file not the original file, and that results in reading wrong data to page, which get written to disk in following block_commit_write call. This also can be reproduced by the following script on 1k block size ext4 on x86_64 host: mnt=/mnt/ext4 donorfile=$mnt/donor testfile=$mnt/testfile e4compact=~/xfstests/src/e4compact rm -f $donorfile $testfile # reserve space for donor file, written by 0xaa and sync to disk to # avoid EBUSY on EXT4_IOC_MOVE_EXT xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile # create test file written by 0xbb xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile # compute initial md5sum md5sum $testfile | tee md5sum.txt # drop cache, force e4compact to read data from disk echo 3 > /proc/sys/vm/drop_caches # test defrag echo "$testfile" | $e4compact -i -v -f $donorfile # check md5sum md5sum -c md5sum.txt Fix it by creating & mapping buffer heads only but not reading blocks from disk, because all the data in page is guaranteed to be up-to-date in mext_page_mkuptodate(). Signed-off-by: Eryu Guan Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 600d41f4ecb53edb540fa00a34a78ea6e5c9f9f7 Author: Insu Yun Date: Fri Feb 12 01:15:59 2016 -0500 ext4: fix potential integer overflow commit 46901760b46064964b41015d00c140c83aa05bcf upstream. Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data), integer overflow could be happened. Therefore, need to fix integer overflow sanitization. Signed-off-by: Insu Yun Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 33f48f8ab0b94fdd1249f561b15c872c36e560c8 Author: Jan Kara Date: Thu Feb 11 23:15:12 2016 -0500 ext4: fix scheduling in atomic on group checksum failure commit 05145bd799e498ce4e3b5145894174ee881f02b0 upstream. When block group checksum is wrong, we call ext4_error() while holding group spinlock from ext4_init_block_bitmap() or ext4_init_inode_bitmap() which results in scheduling while in atomic. Fix the issue by calling ext4_error() later after dropping the spinlock. Reported-by: Dmitry Vyukov Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Reviewed-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 5859b9077763ea1c3f4db47522bdcf300aedb080 Author: Peter Hurley Date: Tue Jan 12 15:14:46 2016 -0800 serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485) commit 308bbc9ab838d0ace0298268c7970ba9513e2c65 upstream. The omap-serial driver emulates RS485 delays using software timers, but neglects to clamp the input values from the unprivileged ioctl(TIOCSRS485). Because the software implementation busy-waits, malicious userspace could stall the cpu for ~49 days. Clamp the input values to < 100ms. Fixes: 4a0ac0f55b18 ("OMAP: add RS485 support") Signed-off-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman commit 76e88140aa9111605768e7725ec1c5b709d51865 Author: Mika Westerberg Date: Fri Jan 29 16:49:47 2016 +0200 serial: 8250_pci: Add Intel Broadwell ports commit 6c55d9b98335f7f6bd5f061866ff1633401f3a44 upstream. Some recent (early 2015) macbooks have Intel Broadwell where LPSS UARTs are PCI enumerated instead of ACPI. The LPSS UART block is pretty much same as used on Intel Baytrail so we can reuse the existing Baytrail setup code. Add both Broadwell LPSS UART ports to the list of supported devices. Signed-off-by: Leif Liddy Signed-off-by: Mika Westerberg Reviewed-by: Andy Shevchenko Reviewed-by: Heikki Krogerus Signed-off-by: Greg Kroah-Hartman commit 124efa9fd5679ed68fd69e5b899e55b0452c68a4 Author: Jeremy McNicoll Date: Tue Feb 2 13:00:45 2016 -0800 tty: Add support for PCIe WCH382 2S multi-IO card commit 7dde55787b43a8f2b4021916db38d90c03a2ec64 upstream. WCH382 2S board is a PCIe card with 2 DB9 COM ports detected as Serial controller: Device 1c00:3253 (rev 10) (prog-if 05 [16850]) Signed-off-by: Jeremy McNicoll Signed-off-by: Greg Kroah-Hartman commit 1bdf16025dfc5ed335f3d7d8bbe78461583105fc Author: Herton R. Krzesinski Date: Thu Jan 14 17:56:58 2016 -0200 pty: make sure super_block is still valid in final /dev/tty close commit 1f55c718c290616889c04946864a13ef30f64929 upstream. Considering current pty code and multiple devpts instances, it's possible to umount a devpts file system while a program still has /dev/tty opened pointing to a previosuly closed pty pair in that instance. In the case all ptmx and pts/N files are closed, umount can be done. If the program closes /dev/tty after umount is done, devpts_kill_index will use now an invalid super_block, which was already destroyed in the umount operation after running ->kill_sb. This is another "use after free" type of issue, but now related to the allocated super_block instance. To avoid the problem (warning at ida_remove and potential crashes) for this specific case, I added two functions in devpts which grabs additional references to the super_block, which pty code now uses so it makes sure the super block structure is still valid until pty shutdown is done. I also moved the additional inode references to the same functions, which also covered similar case with inode being freed before /dev/tty final close/shutdown. Signed-off-by: Herton R. Krzesinski Reviewed-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman commit 3ceeb564198cf81863c16d76572306e57e833963 Author: Herton R. Krzesinski Date: Mon Jan 11 12:07:43 2016 -0200 pty: fix possible use after free of tty->driver_data commit 2831c89f42dcde440cfdccb9fee9f42d54bbc1ef upstream. This change fixes a bug for a corner case where we have the the last release from a pty master/slave coming from a previously opened /dev/tty file. When this happens, the tty->driver_data can be stale, due to all ptmx or pts/N files having already been closed before (and thus the inode related to these files, which tty->driver_data points to, being already freed/destroyed). The fix here is to keep a reference on the opened master ptmx inode. We maintain the inode referenced until the final pty_unix98_shutdown, and only pass this inode to devpts_kill_index. Signed-off-by: Herton R. Krzesinski Reviewed-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman commit a45f23edb00e017289c5263fef25ad920009edb0 Author: Peter Hurley Date: Sun Jan 10 22:40:58 2016 -0800 staging/speakup: Use tty_ldisc_ref() for paste kworker commit f4f9edcf9b5289ed96113e79fa65a7bf27ecb096 upstream. As the function documentation for tty_ldisc_ref_wait() notes, it is only callable from a tty file_operations routine; otherwise there is no guarantee the ref won't be NULL. The key difference with the VT's paste_selection() is that is an ioctl, where __speakup_paste_selection() is completely async kworker, kicked off from interrupt context. Fixes: 28a821c30688 ("Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt") Signed-off-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman commit 3375ee8b9964979f6f1f745053e21ded1afdb1b6 Author: Tony Lindgren Date: Mon Nov 30 21:39:54 2015 -0800 phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload commit 58a66dba1beac2121d931cda4682ae4d40816af5 upstream. If we reload phy-twl4030-usb, we get a warning about unbalanced pm_runtime_enable. Let's fix the issue and also fix idling of the device on unload before we attempt to shut it down. If we don't properly idle the PHY before shutting it down on removal, the twl4030 ends up consuming about 62mW of extra power compared to running idle with the module loaded. Cc: Bin Liu Cc: Felipe Balbi Cc: Kishon Vijay Abraham I Cc: NeilBrown Signed-off-by: Tony Lindgren Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Greg Kroah-Hartman commit a90e66cb949a59298d74638e5db4e888711b4fa0 Author: Tony Lindgren Date: Mon Nov 30 21:39:53 2015 -0800 phy: twl4030-usb: Relase usb phy on unload commit b241d31ef2f6a289d33dcaa004714b26e06f476f upstream. Otherwise rmmod omap2430; rmmod phy-twl4030-usb; modprobe omap2430 will try to use a non-existing phy and oops: Unable to handle kernel paging request at virtual address b6f7c1f0 ... [] (devm_usb_get_phy_by_node) from [] (omap2430_musb_init+0x44/0x2b4 [omap2430]) [] (omap2430_musb_init [omap2430]) from [] (musb_init_controller+0x194/0x878 [musb_hdrc]) Cc: Bin Liu Cc: Felipe Balbi Cc: Kishon Vijay Abraham I Cc: NeilBrown Signed-off-by: Tony Lindgren Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Greg Kroah-Hartman commit a40efb855068a20cf769425a799642aa95c57635 Author: Takashi Iwai Date: Tue Feb 16 14:15:59 2016 +0100 ALSA: seq: Fix double port list deletion commit 13d5e5d4725c64ec06040d636832e78453f477b7 upstream. The commit [7f0973e973cd: ALSA: seq: Fix lockdep warnings due to double mutex locks] split the management of two linked lists (source and destination) into two individual calls for avoiding the AB/BA deadlock. However, this may leave the possible double deletion of one of two lists when the counterpart is being deleted concurrently. It ends up with a list corruption, as revealed by syzkaller fuzzer. This patch fixes it by checking the list emptiness and skipping the deletion and the following process. BugLink: http://lkml.kernel.org/r/CACT4Y+bay9qsrz6dQu31EcGaH9XwfW7o3oBzSQUG9fMszoh=Sg@mail.gmail.com Fixes: 7f0973e973cd ('ALSA: seq: Fix lockdep warnings due to 'double mutex locks) Reported-by: Dmitry Vyukov Tested-by: Dmitry Vyukov Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 6bb345ac7b30680018be6d7e6b4738fab7283b4f Author: Takashi Iwai Date: Mon Feb 15 16:20:24 2016 +0100 ALSA: seq: Fix leak of pool buffer at concurrent writes commit d99a36f4728fcbcc501b78447f625bdcce15b842 upstream. When multiple concurrent writes happen on the ALSA sequencer device right after the open, it may try to allocate vmalloc buffer for each write and leak some of them. It's because the presence check and the assignment of the buffer is done outside the spinlock for the pool. The fix is to move the check and the assignment into the spinlock. (The current implementation is suboptimal, as there can be multiple unnecessary vmallocs because the allocation is done before the check in the spinlock. But the pool size is already checked beforehand, so this isn't a big problem; that is, the only possible path is the multiple writes before any pool assignment, and practically seen, the current coverage should be "good enough".) The issue was triggered by syzkaller fuzzer. BugLink: http://lkml.kernel.org/r/CACT4Y+bSzazpXNvtAr=WXaL8hptqjHwqEyFA+VN2AWEx=aurkg@mail.gmail.com Reported-by: Dmitry Vyukov Tested-by: Dmitry Vyukov Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit ef0ca96169a23551c8f8961c9a081a1c071703a4 Author: Takashi Iwai Date: Wed Feb 17 14:30:26 2016 +0100 ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream commit 67ec1072b053c15564e6090ab30127895dc77a89 upstream. A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice in the same code path, e.g. one in snd_pcm_action_nonatomic() and another in snd_pcm_stream_lock(). Usually this is OK, but when a write lock is issued between these two read locks, the problem happens: the write lock is blocked due to the first reade lock, and the second read lock is also blocked by the write lock. This eventually deadlocks. The reason is the way rwsem manages waiters; it's queued like FIFO, so even if the writer itself doesn't take the lock yet, it blocks all the waiters (including reads) queued after it. As a workaround, in this patch, we replace the standard down_write() with an spinning loop. This is far from optimal, but it's good enough, as the spinning time is supposed to be relatively short for normal PCM operations, and the code paths requiring the write lock aren't called so often. Reported-by: Vinod Koul Tested-by: Ramesh Babu Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 434e26d6f6a000b8585c0eb64764a55daff65d20 Author: Takashi Iwai Date: Mon Feb 15 16:37:24 2016 +0100 ALSA: hda - Cancel probe work instead of flush at remove commit 0b8c82190c12e530eb6003720dac103bf63e146e upstream. The commit [991f86d7ae4e: ALSA: hda - Flush the pending probe work at remove] introduced the sync of async probe work at remove for fixing the race. However, this may lead to another hangup when the module removal is performed quickly before starting the probe work, because it issues flush_work() and it's blocked forever. The workaround is to use cancel_work_sync() instead of flush_work() there. Fixes: 991f86d7ae4e ('ALSA: hda - Flush the pending probe work at remove') Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 6deb0ec93da69d60f28c54b48d24c35e92b2155f Author: Toshi Kani Date: Wed Feb 17 18:16:54 2016 -0700 x86/mm: Fix vmalloc_fault() to handle large pages properly commit f4eafd8bcd5229e998aa252627703b8462c3b90f upstream. A kernel page fault oops with the callstack below was observed when a read syscall was made to a pmem device after a huge amount (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64 system: BUG: unable to handle kernel paging request at ffff880840000ff8 IP: vmalloc_fault+0x1be/0x300 PGD c7f03a067 PUD 0 Oops: 0000 [#1] SM Call Trace: __do_page_fault+0x285/0x3e0 do_page_fault+0x2f/0x80 ? put_prev_entity+0x35/0x7a0 page_fault+0x28/0x30 ? memcpy_erms+0x6/0x10 ? schedule+0x35/0x80 ? pmem_rw_bytes+0x6a/0x190 [nd_pmem] ? schedule_timeout+0x183/0x240 btt_log_read+0x63/0x140 [nd_btt] : ? __symbol_put+0x60/0x60 ? kernel_read+0x50/0x80 SyS_finit_module+0xb9/0xf0 entry_SYSCALL_64_fastpath+0x1a/0xa4 Since v4.1, ioremap() supports large page (pud/pmd) mappings in x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc range is limited to pte mappings. vmalloc faults do not normally happen in ioremap'd ranges since ioremap() sets up the kernel page tables, which are shared by user processes. pgd_ctor() sets the kernel's PGD entries to user's during fork(). When allocation of the vmalloc ranges crosses a 512GB boundary, ioremap() allocates a new pud table and updates the kernel PGD entry to point it. If user process's PGD entry does not have this update yet, a read/write syscall to the range will cause a vmalloc fault, which hits the Oops above as it does not handle a large page properly. Following changes are made to vmalloc_fault(). 64-bit: - No change for the PGD sync operation as it handles large pages already. - Add pud_huge() and pmd_huge() to the validation code to handle large pages. - Change pud_page_vaddr() to pud_pfn() since an ioremap range is not directly mapped (while the if-statement still works with a bogus addr). - Change pmd_page() to pmd_pfn() since an ioremap range is not backed by struct page (while the if-statement still works with a bogus addr). 32-bit: - No change for the sync operation since the index3 PGD entry covers the entire vmalloc range, which is always valid. (A separate change to sync PGD entry is necessary if this memory layout is changed regardless of the page size.) - Add pmd_huge() to the validation code to handle large pages. This is for completeness since vmalloc_fault() won't happen in ioremap'd ranges as its PGD entry is always valid. Reported-by: Henning Schild Signed-off-by: Toshi Kani Acked-by: Borislav Petkov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: linux-mm@kvack.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit e0c89043e71ae8af6a8cd4a9dac99f46b7ff1d83 Author: Toshi Kani Date: Thu Feb 11 14:24:17 2016 -0700 x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() commit a82eee7424525e34e98d821dd059ce14560a1e35 upstream. Data corruption issues were observed in tests which initiated a system crash/reset while accessing BTT devices. This problem is reproducible. The BTT driver calls pmem_rw_bytes() to update data in pmem devices. This interface calls __copy_user_nocache(), which uses non-temporal stores so that the stores to pmem are persistent. __copy_user_nocache() uses non-temporal stores when a request size is 8 bytes or larger (and is aligned by 8 bytes). The BTT driver updates the BTT map table, which entry size is 4 bytes. Therefore, updates to the map table entries remain cached, and are not written to pmem after a crash. Change __copy_user_nocache() to use non-temporal store when a request size is 4 bytes. The change extends the current byte-copy path for a less-than-8-bytes request, and does not add any overhead to the regular path. Reported-and-tested-by: Micah Parrish Reported-and-tested-by: Brian Boylston Signed-off-by: Toshi Kani Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dan Williams Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Peter Zijlstra Cc: Ross Zwisler Cc: Thomas Gleixner Cc: Toshi Kani Cc: Vishal Verma Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com [ Small readability edits. ] Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 1e2e0ad1cc1643fdfea0610805c253f53ec1aaa1 Author: Toshi Kani Date: Thu Feb 11 14:24:16 2016 -0700 x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable commit ee9737c924706aaa72c2ead93e3ad5644681dc1c upstream. Add comments to __copy_user_nocache() to clarify its procedures and alignment requirements. Also change numeric branch target labels to named local labels. No code changed: arch/x86/lib/copy_user_64.o: text data bss dec hex filename 1239 0 0 1239 4d7 copy_user_64.o.before 1239 0 0 1239 4d7 copy_user_64.o.after md5: 58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.before.asm 58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.after.asm Signed-off-by: Toshi Kani Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: brian.boylston@hpe.com Cc: dan.j.williams@intel.com Cc: linux-nvdimm@lists.01.org Cc: micah.parrish@hpe.com Cc: ross.zwisler@linux.intel.com Cc: vishal.l.verma@intel.com Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com [ Small readability edits and added object file comparison. ] Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 4f298c10c35dbaa3c328bc5360b9c46cc52d9f93 Author: Matt Fleming Date: Fri Jan 29 11:36:10 2016 +0000 x86/mm/pat: Avoid truncation when converting cpa->numpages to address commit 742563777e8da62197d6cb4b99f4027f59454735 upstream. There are a couple of nasty truncation bugs lurking in the pageattr code that can be triggered when mapping EFI regions, e.g. when we pass a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting left by PAGE_SHIFT will truncate the resultant address to 32-bits. Viorel-Cătălin managed to trigger this bug on his Dell machine that provides a ~5GB EFI region which requires 1236992 pages to be mapped. When calling populate_pud() the end of the region gets calculated incorrectly in the following buggy expression, end = start + (cpa->numpages << PAGE_SHIFT); And only 188416 pages are mapped. Next, populate_pud() gets invoked for a second time because of the loop in __change_page_attr_set_clr(), only this time no pages get mapped because shifting the remaining number of pages (1048576) by PAGE_SHIFT is zero. At which point the loop in __change_page_attr_set_clr() spins forever because we fail to map progress. Hitting this bug depends very much on the virtual address we pick to map the large region at and how many pages we map on the initial run through the loop. This explains why this issue was only recently hit with the introduction of commit a5caa209ba9c ("x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down") It's interesting to note that safe uses of cpa->numpages do exist in the pageattr code. If instead of shifting ->numpages we multiply by PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and so the result is unsigned long. To avoid surprises when users try to convert very large cpa->numpages values to addresses, change the data type from 'int' to 'unsigned long', thereby making it suitable for shifting by PAGE_SHIFT without any type casting. The alternative would be to make liberal use of casting, but that is far more likely to cause problems in the future when someone adds more code and fails to cast properly; this bug was difficult enough to track down in the first place. Reported-and-tested-by: Viorel-Cătălin Răpițeanu Acked-by: Borislav Petkov Cc: Sai Praneeth Prakhya Signed-off-by: Matt Fleming Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131 Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 75a101ba31faf1047da55e26338d800fba9a86bc Author: Jan Beulich Date: Tue Jan 26 04:15:18 2016 -0700 x86/mm: Fix types used in pgprot cacheability flags translations commit 3625c2c234ef66acf21a72d47a5ffa94f6c5ebf2 upstream. For PAE kernels "unsigned long" is not suitable to hold page protection flags, since _PAGE_NX doesn't fit there. This is the reason for quite a few W+X pages getting reported as insecure during boot (observed namely for the entire initrd range). Fixes: 281d4078be ("x86: Make page cache mode a real type") Signed-off-by: Jan Beulich Reviewed-by: Juergen Gross Link: http://lkml.kernel.org/r/56A7635602000078000CAFF1@prv-mh.provo.novell.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman