commit 0a10a456df5b5a8d3d62c89eeeb7b59ed7fea5fa Author: Willy Tarreau Date: Sun Nov 23 10:55:30 2014 +0100 Linux 2.6.32.64 Signed-off-by: Willy Tarreau commit 5abff9a5a12f45b27139b0fbd87570be3169926b Author: Zhu Yanjun Date: Thu Nov 20 14:04:40 2014 +0800 sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe 2.6.x kernels require a similar logic change as commit 2c0d6ac894a [sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe] introduces for newer kernels. Since the transport has always been in state SCTP_UNCONFIRMED, it therefore wasn't active before and hasn't been used before, and it always has been, so it is unnecessary to bug the user with a notification. Reported-by: Deepak Khandelwal Suggested-by: Vlad Yasevich Suggested-by: Michael Tuexen Suggested-by: Daniel Borkmann Signed-off-by: Zhu Yanjun Acked-by: Vlad Yasevich Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 34af0b70d23880d47fd75f209d2ccc138dbe6e55 Author: Jan Kara Date: Sun Aug 17 11:49:57 2014 +0200 isofs: Fix unbounded recursion when processing relocated directories We did not check relocated directory in any way when processing Rock Ridge 'CL' tag. Thus a corrupted isofs image can possibly have a CL entry pointing to another CL entry leading to possibly unbounded recursion in kernel code and thus stack overflow or deadlocks (if there is a loop created from CL entries). Fix the problem by not allowing CL entry to point to a directory entry with CL entry (such use makes no good sense anyway) and by checking whether CL entry doesn't point to itself. CC: stable@vger.kernel.org Reported-by: Chris Evans Signed-off-by: Jan Kara (cherry picked from commit 410dd3cf4c9b36f27ed4542ee18b1af5e68645a4) Signed-off-by: Willy Tarreau commit 5c114ceb1b1f75c960ddbdeadbb97ed57788db3f Author: Thomas Gleixner Date: Thu Sep 11 23:44:35 2014 +0200 futex: Unlock hb->lock in futex_wait_requeue_pi() error path futex_wait_requeue_pi() calls futex_wait_setup(). If futex_wait_setup() succeeds it returns with hb->lock held and preemption disabled. Now the sanity check after this does: if (match_futex(&q.key, &key2)) { ret = -EINVAL; goto out_put_keys; } which releases the keys but does not release hb->lock. So we happily return to user space with hb->lock held and therefor preemption disabled. Unlock hb->lock before taking the exit route. Reported-by: Dave "Trinity" Jones Signed-off-by: Thomas Gleixner Reviewed-by: Darren Hart Reviewed-by: Davidlohr Bueso Cc: Peter Zijlstra Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos Signed-off-by: Thomas Gleixner (cherry picked from commit 13c42c2f43b19aab3195f2d357db00d1e885eaa8) [wt: 2.6.32 needs &q as first argument of queue_unlock()] Signed-off-by: Willy Tarreau commit 673c64e0912b531f87db84d9c3d3f25737e91b5a Author: Rui li Date: Tue Jan 31 15:27:33 2012 +0800 USB: add new zte 3g-dongle's pid to option.c As ZTE have and will use more pid for new products this year, so we need to add some new zte 3g-dongle's pid on option.c , and delete one pid 0x0154 because it use for mass-storage port. Signed-off-by: Rui li Cc: stable Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 1608ea5f4b5d6262cd6e808839491cfb2a67405a) Signed-off-by: Willy Tarreau commit 39e37e3dd09a56f9638998c4ebdf24d9e90d690d Author: Willy Tarreau Date: Sat Sep 27 12:31:37 2014 +0200 lzo: check for length overrun in variable length encoding. This fix ensures that we never meet an integer overflow while adding 255 while parsing a variable length encoding. It works differently from commit 206a81c ("lzo: properly check for overruns") because instead of ensuring that we don't overrun the input, which is tricky to guarantee due to many assumptions in the code, it simply checks that the cumulated number of 255 read cannot overflow by bounding this number. The MAX_255_COUNT is the maximum number of times we can add 255 to a base count without overflowing an integer. The multiply will overflow when multiplying 255 by more than MAXINT/255. The sum will overflow earlier depending on the base count. Since the base count is taken from a u8 and a few bits, it is safe to assume that it will always be lower than or equal to 2*255, thus we can always prevent any overflow by accepting two less 255 steps. This patch also reduces the CPU overhead and actually increases performance by 1.1% compared to the initial code, while the previous fix costs 3.1% (measured on x86_64). The fix needs to be backported to all currently supported stable kernels. Reported-by: Willem Pinckaers Cc: "Don A. Bailey" Cc: stable Signed-off-by: Willy Tarreau Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 72cf90124e87d975d0b2114d930808c58b4c05e4) Signed-off-by: Willy Tarreau commit bc81c8560161eb956f087c45f7bec1521bdcb26c Author: Willy Tarreau Date: Sat Sep 27 12:31:35 2014 +0200 Documentation: lzo: document part of the encoding Add a complete description of the LZO format as processed by the decompressor. I have not found a public specification of this format hence this analysis, which will be used to better understand the code. Cc: Willem Pinckaers Cc: "Don A. Bailey" Cc: stable Signed-off-by: Willy Tarreau Signed-off-by: Greg Kroah-Hartman (cherry picked from commit d98a0526434d27e261f622cf9d2e0028b5ff1a00) Signed-off-by: Willy Tarreau commit bf1d894fed5852708a989a0a3f5df1a169a06afc Author: Markus F.X.J. Oberhumer Date: Mon Aug 13 17:25:44 2012 +0200 lib/lzo: Update LZO compression to current upstream version This commit updates the kernel LZO code to the current upsteam version which features a significant speed improvement - benchmarking the Calgary and Silesia test corpora typically shows a doubled performance in both compression and decompression on modern i386/x86_64/powerpc machines. Signed-off-by: Markus F.X.J. Oberhumer (cherry picked from commit 8b975bd3f9089f8ee5d7bbfd798537b992bbc7e7) [wt: this update was only needed to apply the following security fixes] Signed-off-by: Willy Tarreau commit cf13d78aa26b3d4ba6bbb0006ac4633c3ae7f0ac Author: Nicolas Pitre Date: Tue Mar 12 13:00:42 2013 +0100 ARM: 7670/1: fix the memset fix Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations") attempted to fix a compliance issue with the memset return value. However the memset itself became broken by that patch for misaligned pointers. This fixes the above by branching over the entry code from the misaligned fixup code to avoid reloading the original pointer. Also, because the function entry alignment is wrong in the Thumb mode compilation, that fixup code is moved to the end. While at it, the entry instructions are slightly reworked to help dual issue pipelines. Signed-off-by: Nicolas Pitre Tested-by: Alexander Holler Signed-off-by: Russell King (cherry picked from commit 418df63adac56841ef6b0f1fcf435bc64d4ed177) Signed-off-by: Willy Tarreau commit 87efed2a8c4ebb88262cd0bf5ae88865134575de Author: Ivan Djelic Date: Wed Mar 6 20:09:27 2013 +0100 ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on assumptions about the implementation of memset and similar functions. The current ARM optimized memset code does not return the value of its first argument, as is usually expected from standard implementations. For instance in the following function: void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) { memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter)); waiter->magic = waiter; INIT_LIST_HEAD(&waiter->list); } compiled as: 800554d0 : 800554d0: e92d4008 push {r3, lr} 800554d4: e1a00001 mov r0, r1 800554d8: e3a02010 mov r2, #16 ; 0x10 800554dc: e3a01011 mov r1, #17 ; 0x11 800554e0: eb04426e bl 80165ea0 800554e4: e1a03000 mov r3, r0 800554e8: e583000c str r0, [r3, #12] 800554ec: e5830000 str r0, [r3] 800554f0: e5830004 str r0, [r3, #4] 800554f4: e8bd8008 pop {r3, pc} GCC assumes memset returns the value of pointer 'waiter' in register r0; causing register/memory corruptions. This patch fixes the return value of the assembly version of memset. It adds a 'mov' instruction and merges an additional load+store into existing load/store instructions. For ease of review, here is a breakdown of the patch into 4 simple steps: Step 1 ====== Perform the following substitutions: ip -> r8, then r0 -> ip, and insert 'mov ip, r0' as the first statement of the function. At this point, we have a memset() implementation returning the proper result, but corrupting r8 on some paths (the ones that were using ip). Step 2 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 1: save r8: - str lr, [sp, #-4]! + stmfd sp!, {r8, lr} and restore r8 on both exit paths: - ldmeqfd sp!, {pc} @ Now <64 bytes to go. + ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go. (...) tst r2, #16 stmneia ip!, {r1, r3, r8, lr} - ldr lr, [sp], #4 + ldmfd sp!, {r8, lr} Step 3 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 0: save r8: - stmfd sp!, {r4-r7, lr} + stmfd sp!, {r4-r8, lr} and restore r8 on both exit paths: bgt 3b - ldmeqfd sp!, {r4-r7, pc} + ldmeqfd sp!, {r4-r8, pc} (...) tst r2, #16 stmneia ip!, {r4-r7} - ldmfd sp!, {r4-r7, lr} + ldmfd sp!, {r4-r8, lr} Step 4 ====== Rewrite register list "r4-r7, r8" as "r4-r8". Signed-off-by: Ivan Djelic Reviewed-by: Nicolas Pitre Signed-off-by: Dirk Behme Signed-off-by: Russell King (cherry picked from commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5) Signed-off-by: Willy Tarreau commit e196cc43b633a24b7acdf59827a0548f7ff7de60 Author: Christoph Schulz Date: Sun Jul 13 00:53:15 2014 +0200 net: pppoe: use correct channel MTU when using Multilink PPP The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see ppp_generic module) tries to determine how big a fragment might be. According to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the corresponding comment and code in ppp_mp_explode(): /* * hdrlen includes the 2-byte PPP protocol field, but the * MTU counts only the payload excluding the protocol field. * (RFC1661 Section 2) */ mtu = pch->chan->mtu - (hdrlen - 2); However, the pppoe module *does* include the PPP protocol field in the channel MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under certain circumstances (one byte if PPP protocol compression is used, two otherwise), causing the generated Ethernet packets to be dropped. So the pppoe module has to subtract two bytes from the channel MTU. This error only manifests itself when using Multilink PPP, as otherwise the channel MTU is not used anywhere. In the following, I will describe how to reproduce this bug. We configure two pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is computed by adding the two link MTUs and subtracting the MP header twice, which is 4 bytes long.) The necessary pppd statements on both sides are "multilink mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server side, we additionally need to start two pppoe-server instances to be able to establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU of the PPP network interface to the MRRU (2976) on both sides of the connection in order to make use of the higher bandwidth. (If we didn't do that, IP fragmentation would kick in, which we want to avoid.) Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to server over the PPP link. This results in the following network packet: 2948 (echo payload) + 8 (ICMPv4 header) + 20 (IPv4 header) --------------------- 2976 (PPP payload) These 2976 bytes do not exceed the MTU of the PPP network interface, so the IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode() prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger than the negotiated MRRU. So this packet would have to be divided in three fragments. But this does not happen as each link MTU is assumed to be two bytes larger. So this packet is diveded into two fragments only, one of size 1489 and one of size 1488. Now we have for that bigger fragment: 1489 (PPP payload) + 4 (MP header) + 2 (PPP protocol field for the MP payload (0x3d)) + 6 (PPPoE header) -------------------------- 1501 (Ethernet payload) This packet exceeds the link MTU and is discarded. If one configures the link MTU on the client side to 1501, one can see the discarded Ethernet frames with tcpdump running on the client. A ping -s 2948 -c 1 192.168.15.254 leads to the smaller fragment that is correctly received on the server side: (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d) 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000, Flags [end], length 1492 and to the bigger fragment that is not received on the server side: (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d) 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000, Flags [begin], length 1493 With the patch below, we correctly obtain three fragments: 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [begin], length 1492 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [none], length 1492 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000, Flags [end], length 5 And the ICMPv4 echo request is successfully received at the server side: IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1), length 2976) 192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0, length 2956 The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698 ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies to 3.10 upwards but the fix can be applied (with minor modifications) to kernels as old as 2.6.32. Signed-off-by: Christoph Schulz Signed-off-by: David S. Miller (cherry picked from commit a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711) Signed-off-by: Willy Tarreau commit 0f606d9357c08220c98ff9ffaf7db53c2887db12 Author: NeilBrown Date: Wed Aug 13 09:57:07 2014 +1000 md/raid6: avoid data corruption during recovery of double-degraded RAID6 During recovery of a double-degraded RAID6 it is possible for some blocks not to be recovered properly, leading to corruption. If a write happens to one block in a stripe that would be written to a missing device, and at the same time that stripe is recovering data to the other missing device, then that recovered data may not be written. This patch skips, in the double-degraded case, an optimisation that is only safe for single-degraded arrays. Bug was introduced in 2.6.32 and fix is suitable for any kernel since then. In an older kernel with separate handle_stripe5() and handle_stripe6() functions the patch must change handle_stripe6(). Cc: stable@vger.kernel.org (2.6.32+) Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8 Cc: Yuri Tikhonov Cc: Dan Williams Reported-by: "Manibalan P" Tested-by: "Manibalan P" Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423 Signed-off-by: NeilBrown Acked-by: Dan Williams (cherry picked from commit 9c4bdf697c39805078392d5ddbbba5ae5680e0dd) Signed-off-by: Willy Tarreau commit 331613cb4e0990851316c9a2c4af959c0b6b789a Author: Steven Rostedt (Red Hat) Date: Wed Aug 6 14:11:33 2014 -0400 ring-buffer: Always reset iterator to reader page When performing a consuming read, the ring buffer swaps out a page from the ring buffer with a empty page and this page that was swapped out becomes the new reader page. The reader page is owned by the reader and since it was swapped out of the ring buffer, writers do not have access to it (there's an exception to that rule, but it's out of scope for this commit). When reading the "trace" file, it is a non consuming read, which means that the data in the ring buffer will not be modified. When the trace file is opened, a ring buffer iterator is allocated and writes to the ring buffer are disabled, such that the iterator will not have issues iterating over the data. Although the ring buffer disabled writes, it does not disable other reads, or even consuming reads. If a consuming read happens, then the iterator is reset and starts reading from the beginning again. My tests would sometimes trigger this bug on my i386 box: WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa() Modules linked in: CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M Call Trace: [] dump_stack+0x4b/0x75 [] warn_slowpath_common+0x7e/0x95 [] ? __trace_find_cmdline+0x66/0xaa [] ? __trace_find_cmdline+0x66/0xaa [] warn_slowpath_fmt+0x33/0x35 [] __trace_find_cmdline+0x66/0xaa^M [] trace_find_cmdline+0x40/0x64 [] trace_print_context+0x27/0xec [] ? trace_seq_printf+0x37/0x5b [] print_trace_line+0x319/0x39b [] ? ring_buffer_read+0x47/0x50 [] s_show+0x192/0x1ab [] ? s_next+0x5a/0x7c [] seq_read+0x267/0x34c [] vfs_read+0x8c/0xef [] ? seq_lseek+0x154/0x154 [] SyS_read+0x54/0x7f [] syscall_call+0x7/0xb ---[ end trace 3f507febd6b4cc83 ]--- >>>> ##### CPU 1 buffer started #### Which was the __trace_find_cmdline() function complaining about the pid in the event record being negative. After adding more test cases, this would trigger more often. Strangely enough, it would never trigger on a single test, but instead would trigger only when running all the tests. I believe that was the case because it required one of the tests to be shutting down via delayed instances while a new test started up. After spending several days debugging this, I found that it was caused by the iterator becoming corrupted. Debugging further, I found out why the iterator became corrupted. It happened with the rb_iter_reset(). As consuming reads may not read the full reader page, and only part of it, there's a "read" field to know where the last read took place. The iterator, must also start at the read position. In the rb_iter_reset() code, if the reader page was disconnected from the ring buffer, the iterator would start at the head page within the ring buffer (where writes still happen). But the mistake there was that it still used the "read" field to start the iterator on the head page, where it should always start at zero because readers never read from within the ring buffer where writes occur. I originally wrote a patch to have it set the iter->head to 0 instead of iter->head_page->read, but then I questioned why it wasn't always setting the iter to point to the reader page, as the reader page is still valid. The list_empty(reader_page->list) just means that it was successful in swapping out. But the reader_page may still have data. There was a bug report a long time ago that was not reproducible that had something about trace_pipe (consuming read) not matching trace (iterator read). This may explain why that happened. Anyway, the correct answer to this bug is to always use the reader page an not reset the iterator to inside the writable ring buffer. Cc: stable@vger.kernel.org Fixes: d769041f8653 "ring_buffer: implement new locking" Signed-off-by: Steven Rostedt (cherry picked from commit 651e22f2701b4113989237c3048d17337dd2185c) [wt: 2.6.32 has no cache{,_read} member in struct ring_buffer_iter ] Signed-off-by: Willy Tarreau commit a7be0fe9a512a966ec6e641f9821d647bc803270 Author: Florian Westphal Date: Thu Oct 23 10:36:07 2014 +0200 netfilter: nfnetlink_log: fix maximum packet length logged to userspace don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work. The nla length includes the size of the nla struct, so anything larger results in u16 integer overflow. This patch is similar to 9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage). Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso (cherry picked from commit c1e7dc91eed0ed1a51c9b814d648db18bf8fc6e9) Signed-off-by: Willy Tarreau commit 6c50e2f9c29f9dc72a3b1af1c6c401b10da96992 Author: Florian Westphal Date: Thu Oct 23 10:36:06 2014 +0200 netfilter: nf_log: account for size of NLMSG_DONE attribute We currently neither account for the nlattr size, nor do we consider the size of the trailing NLMSG_DONE when allocating nlmsg skb. This can result in nflog to stop working, as __nfulnl_send() re-tries sending forever if it failed to append NLMSG_DONE (which will never work if buffer is not large enough). Reported-by: Houcheng Lin Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso (cherry picked from commit 9dfa1dfe4d5e5e66a991321ab08afe69759d797a) Signed-off-by: Willy Tarreau commit 26967a02542d7e0673f82bc1b71e467f37e0f858 Author: Peter Hurley Date: Thu Oct 16 13:51:30 2014 -0400 tty: Fix high cpu load if tty is unreleaseable Kernel oops can cause the tty to be unreleaseable (for example, if n_tty_read() crashes while on the read_wait queue). This will cause tty_release() to endlessly loop without sleeping. Use a killable sleep timeout which grows by 2n+1 jiffies over the interval [0, 120 secs.) and then jumps to forever (but still killable). NB: killable just allows for the task to be rewoken manually, not to be terminated. Cc: Signed-off-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 37b164578826406a173ca7c20d9ba7430134d23e) Signed-off-by: Willy Tarreau commit 237d7b2521815c41c245ca22219f4cca2ec75058 Author: Daniel Borkmann Date: Thu Oct 9 22:55:31 2014 +0200 net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev: ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: [] skb_put+0x5c/0x70 [] sctp_addto_chunk+0x63/0xd0 [sctp] [] sctp_process_asconf+0x1af/0x540 [sctp] [] ? _read_unlock_bh+0x15/0x20 [] sctp_sf_do_asconf+0x168/0x240 [sctp] [] sctp_do_sm+0x71/0x1210 [sctp] [] ? fib_rules_lookup+0xad/0xf0 [] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [] sctp_inq_push+0x56/0x80 [sctp] [] sctp_rcv+0x982/0xa10 [sctp] [] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [] ? nf_iterate+0x69/0xb0 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ? nf_hook_slow+0x76/0x120 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ip_local_deliver_finish+0xdd/0x2d0 [] ip_local_deliver+0x98/0xa0 [] ip_rcv_finish+0x12d/0x440 [] ip_rcv+0x275/0x350 [] __netif_receive_skb+0x4ab/0x750 [] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void *)asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification *and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: Daniel Borkmann Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller (cherry picked from commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74) Signed-off-by: Willy Tarreau commit 3a0ed0e3c95280c6e4f247f1337560d520830ee4 Author: Al Viro Date: Wed Oct 8 23:44:00 2014 -0400 fix misuses of f_count() in ppp and netlink we used to check for "nobody else could start doing anything with that opened file" by checking that refcount was 2 or less - one for descriptor table and one we'd acquired in fget() on the way to wherever we are. That was race-prone (somebody else might have had a reference to descriptor table and do fget() just as we'd been checking) and it had become flat-out incorrect back when we switched to fget_light() on those codepaths - unlike fget(), it doesn't grab an extra reference unless the descriptor table is shared. The same change allowed a race-free check, though - we are safe exactly when refcount is less than 2. It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH, netlink hadn't grown that check until 3.9 and ppp used to live in drivers/net, not drivers/net/ppp until 3.1. The bug existed well before that, though, and the same fix used to apply in old location of file. Cc: stable@vger.kernel.org Signed-off-by: Al Viro (cherry picked from commit 24dff96a37a2ca319e75a74d3929b2de22447ca6) [wt: apply to drivers/net/ppp_generic.c only] Signed-off-by: Willy Tarreau commit faa4cc13e7acb2d1f313290b556d5b7401cc65c8 Author: Johan Hovold Date: Wed Oct 29 09:07:30 2014 +0100 USB: kobil_sct: fix non-atomic allocation in write path Write may be called from interrupt context so make sure to use GFP_ATOMIC for all allocations in write. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable Signed-off-by: Johan Hovold (cherry picked from commit 191252837626fca0de694c18bb2aa64c118eda89) Signed-off-by: Willy Tarreau commit 2143af36b32c5e1abe77dfafb6e0747460ab75e2 Author: Zhu Yanjun Date: Wed Oct 15 16:52:51 2014 +0800 gianfar: disable vlan tag insertion by default 2.6.x kernels require a similar logic change as commit 51b8cbfc [gianfar: fix bug caused by e1653c3e] introduces for newer kernels. Gianfar driver originally enables vlan tag insertion by default. This will lead to unusable connections on some configurations. Since gianfar nic vlan tag insertion is disabled by default and it is not enabled any longer, it is not necessary to disable it again. Reported-by: Xu Jianrong Suggested-by: Wang Feng Signed-off-by: Zhu Yanjun Signed-off-by: Willy Tarreau commit 0cfd94f0456b86f937cb3b54aa842cda8ecd6bac Author: Mikulas Patocka Date: Thu Sep 11 13:43:12 2014 -0400 dm crypt: fix access beyond the end of allocated space commit d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed upstream The DM crypt target accesses memory beyond allocated space resulting in a crash on 32 bit x86 systems. This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm crypt: use async crypto"). However, this bug was masked by the fact that kmalloc rounds the size up to the next power of two. This bug wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio data"). By switching to using per-bio data there was no longer any padding beyond the end of a dm-crypt allocated memory block. To minimize allocation overhead dm-crypt puts several structures into one block allocated with kmalloc. The block holds struct ablkcipher_request, cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))), struct dm_crypt_request and an initialization vector. The variable dmreq_start is set to offset of struct dm_crypt_request within this memory block. dm-crypt allocates the block with this size: cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size. When accessing the initialization vector, dm-crypt uses the function iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq + 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1). dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request structure. However, when dm-crypt accesses the initialization vector, it takes a pointer to the end of dm_crypt_request, aligns it, and then uses it as the initialization vector. If the end of dm_crypt_request is not aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the alignment causes the initialization vector to point beyond the allocated space. Fix this bug by calculating the variable iv_size_padding and adding it to the allocated size. Also correct the alignment of dm_crypt_request. struct dm_crypt_request is specific to dm-crypt (it isn't used by the crypto subsystem at all), so it is aligned on __alignof__(struct dm_crypt_request). Signed-off-by: Mikulas Patocka [wt: minor context adaptations, hopefully ok] Signed-off-by: Willy Tarreau commit 5e4b587d654fc122803dbe1d4d718d9ca59c198b Author: Willy Tarreau Date: Sun Nov 16 18:17:45 2014 +0100 Revert "nfsd: correctly handle return value from nfsd_map_name_to_*" This reverts commit 63d059e73ff4574b79bd8aa252b5fc00b6326ddf. On Wed, Sep 03, 2014 at 11:28:43AM +1000, NeilBrown wrote: > > 2.6.32.30 contains: > > commit 63d059e73ff4574b79bd8aa252b5fc00b6326ddf > Author: NeilBrown > Date: Wed Feb 16 13:08:35 2011 +1100 > > nfsd: correctly handle return value from nfsd_map_name_to_* > > commit 47c85291d3dd1a51501555000b90f8e281a0458e upstream. > > These functions return an nfs status, not a host_err. So don't > try to convert before returning. > > This is a regression introduced by > 3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers, > but missed these two. > > Reported-by: Herbert Poetzl > Signed-off-by: NeilBrown > Signed-off-by: J. Bruce Fields > Signed-off-by: Greg Kroah-Hartman > > > But it does *not* contain a backport of > 3c726023402a2f3b28f49b9d90ebf9e71151157d. > > So rather an fixing a regression, it introduces one. > > This patch should be reverted. > > See also https://bugzilla.novell.com/show_bug.cgi?id=893787 > > NeilBrown Signed-off-by: Willy Tarreau commit 10605dbce2906ed59ab5a0b97982aec4d87472c7 Author: Eric Dumazet Date: Tue Aug 5 16:49:52 2014 +0200 sctp: fix possible seqlock seadlock in sctp_packet_transmit() [ Upstream commit 757efd32d5ce31f67193cc0e6a56e4dffcc42fb1 ] Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [] dump_stack+0x4e/0x7a [] check_preemption_disabled+0xfa/0x100 [] __this_cpu_preempt_check+0x13/0x20 [] sctp_packet_transmit+0x692/0x710 [sctp] [] sctp_outq_flush+0x2a2/0xc30 [sctp] [] ? mark_held_locks+0x7c/0xb0 [] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [] sctp_outq_uncork+0x1a/0x20 [sctp] [] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [] sctp_do_sm+0xdb/0x330 [sctp] [] ? preempt_count_sub+0xab/0x100 [] ? sctp_cname+0x70/0x70 [sctp] [] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [] sctp_sendmsg+0x88f/0xe30 [sctp] [] ? lock_release_holdtime.part.28+0x9a/0x160 [] ? put_lock_stats.isra.27+0xe/0x30 [] inet_sendmsg+0x104/0x220 [] ? inet_sendmsg+0x5/0x220 [] sock_sendmsg+0x9e/0xe0 [] ? might_fault+0xb9/0xc0 [] ? might_fault+0x5e/0xc0 [] SYSC_sendto+0x124/0x1c0 [] ? syscall_trace_enter+0x250/0x330 [] SyS_sendto+0xe/0x10 [] tracesys+0xdd/0xe2 This is a followup of commits f1d8cba61c3c4b ("inet: fix possible seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: Eric Dumazet Cc: Hannes Frederic Sowa Reported-by: Dave Jones Acked-by: Neil Horman Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit e82fb2401b31b6a2e41d183ffabd42265f02ff56 Author: Sasha Levin Date: Thu Jul 31 23:00:35 2014 -0400 iovec: make sure the caller actually wants anything in memcpy_fromiovecend [ Upstream commit 06ebb06d49486676272a3c030bfeef4bd969a8e6 ] Check for cases when the caller requests 0 bytes instead of running off and dereferencing potentially invalid iovecs. Signed-off-by: Sasha Levin Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 4d66c1e889252728b8d6417326a86aec9012d652 Author: Vlad Yasevich Date: Thu Jul 31 10:33:06 2014 -0400 net: Correctly set segment mac_len in skb_segment(). [ Upstream commit fcdfe3a7fa4cb74391d42b6a26dc07c20dab1d82 ] When performing segmentation, the mac_len value is copied right out of the original skb. However, this value is not always set correctly (like when the packet is VLAN-tagged) and we'll end up copying a bad value. One way to demonstrate this is to configure a VM which tags packets internally and turn off VLAN acceleration on the forwarding bridge port. The packets show up corrupt like this: 16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0, 0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d. 0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0.. 0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3 0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp 0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp ... This also leads to awful throughput as GSO packets are dropped and cause retransmissions. The solution is to set the mac_len using the values already available in then new skb. We've already adjusted all of the header offset, so we might as well correctly figure out the mac_len using skb_reset_mac_len(). After this change, packets are segmented correctly and performance is restored. CC: Eric Dumazet Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller [wt: open-code skb_mac_len() as 2.6.32 doesn't have it] Signed-off-by: Willy Tarreau commit b5b640efe723b0087f71b56a055e8d559b82c8cb Author: Vlad Yasevich Date: Thu Jul 31 10:30:25 2014 -0400 macvlan: Initialize vlan_features to turn on offload support. [ Upstream commit 081e83a78db9b0ae1f5eabc2dedecc865f509b98 ] Macvlan devices do not initialize vlan_features. As a result, any vlan devices configured on top of macvlans perform very poorly. Initialize vlan_features based on the vlan features of the lower-level device. Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 9063b87d25f1dbf3f3e0b91ab84b08b99b53ce4f Author: Christoph Paasch Date: Tue Jul 29 13:40:57 2014 +0200 tcp: Fix integer-overflow in TCP vegas [ Upstream commit 1f74e613ded11517db90b2bd57e9464d9e0fb161 ] In vegas we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. Then, we need to do do_div to allow this to be used on 32-bit arches. Cc: Stephen Hemminger Cc: Neal Cardwell Cc: Eric Dumazet Cc: David Laight Cc: Doug Leith Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 8b2de953aafa1fc799c0ca1e48ca615f884d9bc0 Author: Christoph Paasch Date: Tue Jul 29 12:07:27 2014 +0200 tcp: Fix integer-overflows in TCP veno [ Upstream commit 45a07695bc64b3ab5d6d2215f9677e5b8c05a7d0 ] In veno we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it failed to add the required cast in tcp_veno_cong_avoid(). Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit cf903573620912e1562dc4efe793a3c2e28b188a Author: Andrey Ryabinin Date: Sat Jul 26 21:26:58 2014 +0400 net: sendmsg: fix NULL pointer dereference [ Upstream commit 40eea803c6b2cfaab092f053248cbeab3f368412 ] Sasha's report: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel with the KASAN patchset, I've stumbled on the following spew: > > [ 4448.949424] ================================================================== > [ 4448.951737] AddressSanitizer: user-memory-access on address 0 > [ 4448.952988] Read of size 2 by thread T19638: > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 > [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40 > [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d > [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000 > [ 4448.961266] Call Trace: > [ 4448.963158] dump_stack (lib/dump_stack.c:52) > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184) > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352) > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555) > [ 4448.970103] sock_sendmsg (net/socket.c:654) > [ 4448.971584] ? might_fault (mm/memory.c:3741) > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740) > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64) > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096) > [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273) > [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1)) > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188) > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181) > [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2)) > [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201) > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542) > [ 4448.988929] ================================================================== This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0. After this report there was no usual "Unable to handle kernel NULL pointer dereference" and this gave me a clue that address 0 is mapped and contains valid socket address structure in it. This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic). Commit message states that: "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address." But in fact this affects sendto when address 0 is mapped and contains socket address structure in it. In such case copy-in address will succeed, verify_iovec() function will successfully exit with msg->msg_namelen > 0 and msg->msg_name == NULL. This patch fixes it by setting msg_namelen to 0 if msg_name == NULL. Cc: Hannes Frederic Sowa Cc: Eric Dumazet Cc: Reported-by: Sasha Levin Signed-off-by: Andrey Ryabinin Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit ebec0c65d3945312fd56d894f0fa910c5f4c0199 Author: Daniel Borkmann Date: Tue Jul 22 15:22:45 2014 +0200 net: sctp: inherit auth_capable on INIT collisions [ Upstream commit 1be9a950c646c9092fb3618197f7b6bfb50e82aa ] Jason reported an oops caused by SCTP on his ARM machine with SCTP authentication enabled: Internal error: Oops: 17 [#1] ARM CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1 task: c6eefa40 ti: c6f52000 task.ti: c6f52000 PC is at sctp_auth_calculate_hmac+0xc4/0x10c LR is at sg_init_table+0x20/0x38 pc : [] lr : [] psr: 40000013 sp : c6f538e8 ip : 00000000 fp : c6f53924 r10: c6f50d80 r9 : 00000000 r8 : 00010000 r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254 r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660 Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005397f Table: 06f28000 DAC: 00000015 Process sctp-test (pid: 104, stack limit = 0xc6f521c0) Stack: (0xc6f538e8 to 0xc6f54000) [...] Backtrace: [] (sctp_auth_calculate_hmac+0x0/0x10c) from [] (sctp_packet_transmit+0x33c/0x5c8) [] (sctp_packet_transmit+0x0/0x5c8) from [] (sctp_outq_flush+0x7fc/0x844) [] (sctp_outq_flush+0x0/0x844) from [] (sctp_outq_uncork+0x24/0x28) [] (sctp_outq_uncork+0x0/0x28) from [] (sctp_side_effects+0x1134/0x1220) [] (sctp_side_effects+0x0/0x1220) from [] (sctp_do_sm+0xac/0xd4) [] (sctp_do_sm+0x0/0xd4) from [] (sctp_assoc_bh_rcv+0x118/0x160) [] (sctp_assoc_bh_rcv+0x0/0x160) from [] (sctp_inq_push+0x6c/0x74) [] (sctp_inq_push+0x0/0x74) from [] (sctp_rcv+0x7d8/0x888) While we already had various kind of bugs in that area ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache auth_enable per endpoint"), this one is a bit of a different kind. Giving a bit more background on why SCTP authentication is needed can be found in RFC4895: SCTP uses 32-bit verification tags to protect itself against blind attackers. These values are not changed during the lifetime of an SCTP association. Looking at new SCTP extensions, there is the need to have a method of proving that an SCTP chunk(s) was really sent by the original peer that started the association and not by a malicious attacker. To cause this bug, we're triggering an INIT collision between peers; normal SCTP handshake where both sides intent to authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO parameters that are being negotiated among peers: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- RFC4895 says that each endpoint therefore knows its own random number and the peer's random number *after* the association has been established. The local and peer's random number along with the shared key are then part of the secret used for calculating the HMAC in the AUTH chunk. Now, in our scenario, we have 2 threads with 1 non-blocking SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling sctp_bindx(3), listen(2) and connect(2) against each other, thus the handshake looks similar to this, e.g.: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------- -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------> ... Since such collisions can also happen with verification tags, the RFC4895 for AUTH rather vaguely says under section 6.1: In case of INIT collision, the rules governing the handling of this Random Number follow the same pattern as those for the Verification Tag, as explained in Section 5.2.4 of RFC 2960 [5]. Therefore, each endpoint knows its own Random Number and the peer's Random Number after the association has been established. In RFC2960, section 5.2.4, we're eventually hitting Action B: B) In this case, both sides may be attempting to start an association at about the same time but the peer endpoint started its INIT after responding to the local endpoint's INIT. Thus it may have picked a new Verification Tag not being aware of the previous Tag it had sent this endpoint. The endpoint should stay in or enter the ESTABLISHED state but it MUST update its peer's Verification Tag from the State Cookie, stop any init or cookie timers that may running and send a COOKIE ACK. In other words, the handling of the Random parameter is the same as behavior for the Verification Tag as described in Action B of section 5.2.4. Looking at the code, we exactly hit the sctp_sf_do_dupcook_b() case which triggers an SCTP_CMD_UPDATE_ASSOC command to the side effect interpreter, and in fact it properly copies over peer_{random, hmacs, chunks} parameters from the newly created association to update the existing one. Also, the old asoc_shared_key is being released and based on the new params, sctp_auth_asoc_init_active_key() updated. However, the issue observed in this case is that the previous asoc->peer.auth_capable was 0, and has *not* been updated, so that instead of creating a new secret, we're doing an early return from the function sctp_auth_asoc_init_active_key() leaving asoc->asoc_shared_key as NULL. However, we now have to authenticate chunks from the updated chunk list (e.g. COOKIE-ACK). That in fact causes the server side when responding with ... <------------------ AUTH; COOKIE-ACK ----------------- ... to trigger a NULL pointer dereference, since in sctp_packet_transmit(), it discovers that an AUTH chunk is being queued for xmit, and thus it calls sctp_auth_calculate_hmac(). Since the asoc->active_key_id is still inherited from the endpoint, and the same as encoded into the chunk, it uses asoc->asoc_shared_key, which is still NULL, as an asoc_key and dereferences it in ... crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len) ... causing an oops. All this happens because sctp_make_cookie_ack() called with the *new* association has the peer.auth_capable=1 and therefore marks the chunk with auth=1 after checking sctp_auth_send_cid(), but it is *actually* sent later on over the then *updated* association's transport that didn't initialize its shared key due to peer.auth_capable=0. Since control chunks in that case are not sent by the temporary association which are scheduled for deletion, they are issued for xmit via SCTP_CMD_REPLY in the interpreter with the context of the *updated* association. peer.auth_capable was 0 in the updated association (which went from COOKIE_WAIT into ESTABLISHED state), since all previous processing that performed sctp_process_init() was being done on temporary associations, that we eventually throw away each time. The correct fix is to update to the new peer.auth_capable value as well in the collision case via sctp_assoc_update(), so that in case the collision migrated from 0 -> 1, sctp_auth_asoc_init_active_key() can properly recalculate the secret. This therefore fixes the observed server panic. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Reported-by: Jason Gunthorpe Signed-off-by: Daniel Borkmann Tested-by: Jason Gunthorpe Cc: Vlad Yasevich Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 696a63662e71d3d9bc60d63eb16bee157c4b706e Author: Eric Dumazet Date: Mon Jul 21 07:17:42 2014 +0200 ipv4: fix buffer overflow in ip_options_compile() [ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ] There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [] ip_options_compile+0x121/0x9c0 [28504.915394] [] ip_options_get_from_user+0xad/0x120 [28504.916843] [] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [] ip_setsockopt+0x30/0xa0 [28504.919490] [] tcp_setsockopt+0x5b/0x90 [28504.920835] [] sock_common_setsockopt+0x5f/0x70 [28504.922208] [] SyS_setsockopt+0xa2/0x140 [28504.923459] [] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [] ip_options_get_from_user+0x35/0x120 [28504.926884] [] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [] ip_setsockopt+0x30/0xa0 [28504.929175] [] tcp_setsockopt+0x5b/0x90 [28504.930400] [] sock_common_setsockopt+0x5f/0x70 [28504.931677] [] SyS_setsockopt+0xa2/0x140 [28504.932851] [] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 19a0d61e688489f39778cf20eb71f2f19b79cd1a Author: Sowmini Varadhan Date: Wed Jul 16 10:02:26 2014 -0400 sunvnet: clean up objects created in vnet_new() on vnet_exit() [ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ] Nothing cleans up the objects created by vnet_new(), they are completely leaked. vnet_exit(), after doing the vio_unregister_driver() to clean up ports, should call a helper function that iterates over vnet_list and cleans up those objects. This includes unregister_netdevice() as well as free_netdev(). Signed-off-by: Sowmini Varadhan Acked-by: Dave Kleikamp Reviewed-by: Karl Volz Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit ef94866eb091d2d92bf99633daf2d819825abe89 Author: Daniel Borkmann Date: Sat Jul 12 20:30:35 2014 +0200 net: sctp: fix information leaks in ulpevent layer [ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ] While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit a8a0b4eb46ebe3944cd59fa7e72aa93c768ed6ed Author: Andrey Utkin Date: Mon Jul 7 23:22:50 2014 +0300 appletalk: Fix socket referencing in skb [ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ] Setting just skb->sk without taking its reference and setting a destructor is invalid. However, in the places where this was done, skb is used in a way not requiring skb->sk setting. So dropping the setting of skb->sk. Thanks to Eric Dumazet for correct solution. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441 Reported-by: Ed Martin Signed-off-by: Andrey Utkin Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 540e9ab0a30cd3954352d3c80ff7d404b17ec0e1 Author: dingtianhong Date: Wed Jul 2 13:50:48 2014 +0800 igmp: fix the problem when mc leave group [ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ] The problem was triggered by these steps: 1) create socket, bind and then setsockopt for add mc group. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("192.168.1.2"); setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); 2) drop the mc group for this socket. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("0.0.0.0"); setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)); 3) and then drop the socket, I found the mc group was still used by the dev: netstat -g Interface RefCnt Group --------------- ------ --------------------- eth2 1 255.0.0.37 Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need to be released for the netdev when drop the socket, but this process was broken when route default is NULL, the reason is that: The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr is NULL, the default route dev will be chosen, then the ifindex is got from the dev, then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL, the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be released from the mc_list, but the dev didn't dec the refcnt for this mc group, so when dropping the socket, the mc_list is NULL and the dev still keep this group. v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs, so I add the checking for the in_dev before polling the mc_list, make sure when we remove the mc group, dec the refcnt to the real dev which was using the mc address. The problem would never happened again. Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 87e0f9d64a7e128d116edff850a419385e0eac50 Author: Neal Cardwell Date: Wed Jun 18 21:15:03 2014 -0400 tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb [ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ] If there is an MSS change (or misbehaving receiver) that causes a SACK to arrive that covers the end of an skb but is less than one MSS, then tcp_match_skb_to_sack() was rounding up pkt_len to the full length of the skb ("Round if necessary..."), then chopping all bytes off the skb and creating a zero-byte skb in the write queue. This was visible now because the recently simplified TLP logic in bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe. Consider the following example scenario: mss: 1000 skb: seq: 0 end_seq: 4000 len: 4000 SACK: start_seq: 3999 end_seq: 4000 The tcp_match_skb_to_sack() code will compute: in_sack = false pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 new_len += mss = 4000 Previously we would find the new_len > skb->len check failing, so we would fall through and set pkt_len = new_len = 4000 and chop off pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment afterward in the write queue. With this new commit, we notice that the new new_len >= skb->len check succeeds, so that we return without trying to fragment. Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Reported-by: Eric Dumazet Signed-off-by: Neal Cardwell Cc: Eric Dumazet Cc: Yuchung Cheng Cc: Ilpo Jarvinen Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit c7008cf7d89f2f2f19f09f004588811d99fdcd2e Author: Mikulas Patocka Date: Sat Jul 5 11:27:30 2014 -0400 sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue Hi This is backport of commit fd1232b214af43a973443aec6a2808f16ee5bf70. It is suitable for all stable branches up to and including 3.14.* Mikulas commit fd1232b214af43a973443aec6a2808f16ee5bf70 Author: Mikulas Patocka Date: Tue Apr 8 21:52:05 2014 -0400 sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue This patch fixes I/O errors with the sym53c8xx_2 driver when the disk returns QUEUE FULL status. When the controller encounters an error (including QUEUE FULL or BUSY status), it aborts all not yet submitted requests in the function sym_dequeue_from_squeue. This function aborts them with DID_SOFT_ERROR. If the disk has full tag queue, the request that caused the overflow is aborted with QUEUE FULL status (and the scsi midlayer properly retries it until it is accepted by the disk), but the sym53c8xx_2 driver aborts the following requests with DID_SOFT_ERROR --- for them, the midlayer does just a few retries and then signals the error up to sd. The result is that disk returning QUEUE FULL causes request failures. The error was reproduced on 53c895 with COMPAQ BD03685A24 disk (rebranded ST336607LC) with command queue 48 or 64 tags. The disk has 64 tags, but under some access patterns it return QUEUE FULL when there are less than 64 pending tags. The SCSI specification allows returning QUEUE FULL anytime and it is up to the host to retry. Signed-off-by: Mikulas Patocka Cc: Matthew Wilcox Cc: James Bottomley Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau commit a3b45048ba450a3f94e5f1a8c616a0925a0da07a Author: Tejun Heo Date: Thu Jul 3 15:43:15 2014 -0400 ptrace,x86: force IRET path after a ptrace_stop() [ Upstream commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a ] The 'sysret' fastpath does not correctly restore even all regular registers, much less any segment registers or reflags values. That is very much part of why it's faster than 'iret'. Normally that isn't a problem, because the normal ptrace() interface catches the process using the signal handler infrastructure, which always returns with an iret. However, some paths can get caught using ptrace_event() instead of the signal path, and for those we need to make sure that we aren't going to return to user space using 'sysret'. Otherwise the modifications that may have been done to the register set by the tracer wouldn't necessarily take effect. Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from arch_ptrace_stop_needed() which is invoked from ptrace_stop(). Signed-off-by: Tejun Heo Reported-by: Andy Lutomirski Acked-by: Oleg Nesterov Suggested-by: Linus Torvalds Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds [wt: fixes CVE-2014-4699] Signed-off-by: Willy Tarreau commit 883f30e7d98d97d704df38cd2972a33be852f23a Author: Lars-Peter Clausen Date: Wed Jun 18 13:32:31 2014 +0200 ALSA: control: Protect user controls against concurrent access [ Upstream commit 07f4d9d74a04aa7c72c5dae0ef97565f28f17b92 ] The user-control put and get handlers as well as the tlv do not protect against concurrent access from multiple threads. Since the state of the control is not updated atomically it is possible that either two write operations or a write and a read operation race against each other. Both can lead to arbitrary memory disclosure. This patch introduces a new lock that protects user-controls from concurrent access. Since applications typically access controls sequentially than in parallel a single lock per card should be fine. Signed-off-by: Lars-Peter Clausen Acked-by: Jaroslav Kysela Cc: Signed-off-by: Takashi Iwai [wt: fixes CVE-2014-4652] Signed-off-by: Willy Tarreau commit cd684d88d3f89954322b3d15fc26b2055d542784 Author: Mathias Krause Date: Sun Apr 13 18:23:33 2014 +0200 filter: prevent nla extensions to peek beyond the end of the message [ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy Cc: Pablo Neira Ayuso Signed-off-by: Mathias Krause Acked-by: Daniel Borkmann Signed-off-by: David S. Miller [geissert: backport to 2.6.32] [wt: fixes CVE-2014-3144] Signed-off-by: Willy Tarreau commit 767270fca3aa1545db2197e351ffa9ade870335e Author: Vlastimil Babka Date: Thu May 8 10:28:02 2014 +0100 mm: try_to_unmap_cluster() should lock_page() before mlocking commit 68f662b89de39aae5f43ecd3fbc6ea120d0a47cf upstream A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin fuzzing with trinity. The call site try_to_unmap_cluster() does not lock the pages other than its check_page parameter (which is already locked). The BUG_ON in mlock_vma_page() is not documented and its purpose is somewhat unclear, but apparently it serializes against page migration, which could otherwise fail to transfer the PG_mlocked flag. This would not be fatal, as the page would be eventually encountered again, but NR_MLOCK accounting would become distorted nevertheless. This patch adds a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that effect. The call site try_to_unmap_cluster() is fixed so that for page != check_page, trylock_page() is attempted (to avoid possible deadlocks as we already have check_page locked) and mlock_vma_page() is performed only upon success. If the page lock cannot be obtained, the page is left without PG_mlocked, which is again not a problem in the whole unevictable memory design. Signed-off-by: Vlastimil Babka Signed-off-by: Bob Liu Reported-by: Sasha Levin Cc: Wanpeng Li Cc: Michel Lespinasse Cc: KOSAKI Motohiro Acked-by: Rik van Riel Cc: David Rientjes Cc: Mel Gorman Cc: Hugh Dickins Cc: Joonsoo Kim Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (back ported from commit 57e68e9cd65b4b8eb4045a1e0d0746458502554c) CVE-2014-3122 BugLink: http://bugs.launchpad.net/bugs/1316268 Signed-off-by: Luis Henriques Acked-by: Andy Whitcroft Signed-off-by: Tim Gardner Signed-off-by: Willy Tarreau commit 24866e75aa5f9ae2b8917e9bec420f7b966d0836 Author: Xufeng Zhang Date: Thu Jun 12 10:53:36 2014 +0800 sctp: Fix sk_ack_backlog wrap-around problem Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman Signed-off-by: Xufeng Zhang Acked-by: Daniel Borkmann Acked-by: Vlad Yasevich Signed-off-by: David S. Miller (cherry picked from commit d3217b15a19a4779c39b212358a5c71d725822ee) [wt: fixes CVE-2014-4667] Signed-off-by: Willy Tarreau commit 199615725dd969a800154f48f739ea83b3c86eec Author: Lars-Peter Clausen Date: Wed Jun 18 13:32:34 2014 +0200 ALSA: control: Handle numid overflow Each control gets automatically assigned its numids when the control is created. The allocation is done by incrementing the numid by the amount of allocated numids per allocation. This means that excessive creation and destruction of controls (e.g. via SNDRV_CTL_IOCTL_ELEM_ADD/REMOVE) can cause the id to eventually overflow. Currently when this happens for the control that caused the overflow kctl->id.numid + kctl->count will also over flow causing it to be smaller than kctl->id.numid. Most of the code assumes that this is something that can not happen, so we need to make sure that it won't happen Signed-off-by: Lars-Peter Clausen Acked-by: Jaroslav Kysela Cc: Signed-off-by: Takashi Iwai (cherry picked from commit ac902c112d90a89e59916f751c2745f4dbdbb4bd) [wt: part 2 of CVE-2014-4656] Signed-off-by: Willy Tarreau commit 365c843b9e11bc85cacd8fe3d93965734297f35b Author: Lars-Peter Clausen Date: Wed Jun 18 13:32:35 2014 +0200 ALSA: control: Make sure that id->index does not overflow The ALSA control code expects that the range of assigned indices to a control is continuous and does not overflow. Currently there are no checks to enforce this. If a control with a overflowing index range is created that control becomes effectively inaccessible and unremovable since snd_ctl_find_id() will not be able to find it. This patch adds a check that makes sure that controls with a overflowing index range can not be created. Signed-off-by: Lars-Peter Clausen Acked-by: Jaroslav Kysela Cc: Signed-off-by: Takashi Iwai (cherry picked from commit 883a1d49f0d77d30012f114b2e19fc141beb3e8e) [wt: part 1 of CVE-2014-4656 fix] Signed-off-by: Willy Tarreau commit 9ff558c6afe440afa94ef870322386cc06541843 Author: Al Viro Date: Tue Dec 22 23:45:11 2009 -0500 fix autofs/afs/etc. magic mountpoint breakage We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT) if /mnt/tmp is an autofs direct mount. The reason is that nd.last_type is bogus here; we want LAST_BIND for everything of that kind and we get LAST_NORM left over from finding parent directory. So make sure that it *is* set properly; set to LAST_BIND before doing ->follow_link() - for normal symlinks it will be changed by __vfs_follow_link() and everything else needs it set that way. Signed-off-by: Al Viro (cherry picked from commit 86acdca1b63e6890540fa19495cfc708beff3d8b) [wt: fixes CVE-2014-0203] Signed-off-by: Willy Tarreau commit e24be1e3939658159a9aa0c0658f05763d811293 Author: Markos Chandras Date: Wed Jan 22 14:40:00 2014 +0000 MIPS: asm: thread_info: Add _TIF_SECCOMP flag Add _TIF_SECCOMP flag to _TIF_WORK_SYSCALL_ENTRY to indicate that the system call needs to be checked against a seccomp filter. Signed-off-by: Markos Chandras Reviewed-by: Paul Burton Reviewed-by: James Hogan Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6405/ Signed-off-by: Ralf Baechle (cherry picked from commit 137f7df8cead00688524c82360930845396b8a21) [wt: fixes CVE-2014-4157 - no _TIF_NOHZ nor _TIF_SYSCALL_TRACEPOINT in 2.6.32] Signed-off-by: Willy Tarreau commit 10283345bce209b89526581fd2ad5c9cbf9be5f1 Author: Ralf Baechle Date: Wed May 29 01:02:18 2013 +0200 MIPS: Cleanup flags in syscall flags handlers. This will simplify further modifications. Signed-off-by: Ralf Baechle (cherry picked from commit e7f3b48af7be9f8007a224663a5b91340626fed5) [wt: this patch is only there to ease integration of next one] Signed-off-by: Willy Tarreau commit 74f03fddc990db7624b7576332b57b9724862657 Author: Stefan Bader Date: Fri Aug 15 10:57:46 2014 +0200 x86_32, entry: Clean up sysenter_badsys declaration commit 554086d85e "x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)" introduced a new jump label (sysenter_badsys) but somehow the END statements seem to have gone wrong (at least it feels that way to me). This does not seem to be a fatal problem, but just for the sake of symmetry, change the second syscall_badsys to sysenter_badsys. Signed-off-by: Stefan Bader Link: http://lkml.kernel.org/r/1408093066-31021-1-git-send-email-stefan.bader@canonical.com Acked-by: Andy Lutomirski Signed-off-by: H. Peter Anvin (cherry picked from commit fb21b84e7f809ef04b1e5aed5d463cf0d4866638) Signed-off-by: Willy Tarreau commit d079023667c5ee0a17814a71ce9abdf68b22a952 Author: Sven Wegener Date: Tue Jul 22 10:26:06 2014 +0200 x86_32, entry: Store badsys error code in %eax Commit 554086d ("x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry code, resulting in syscall() not returning proper errors for undefined syscalls on CPUs supporting the sysenter feature. The following code: > int result = syscall(666); > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno)); results in: > result=666 errno=0 error=Success Obviously, the syscall return value is the called syscall number, but it should have been an ENOSYS error. When run under ptrace it behaves correctly, which makes it hard to debug in the wild: > result=-1 errno=38 error=Function not implemented The %eax register is the return value register. For debugging via ptrace the syscall entry code stores the complete register context on the stack. The badsys handlers only store the ENOSYS error code in the ptrace register set and do not set %eax like a regular syscall handler would. The old resume_userspace call chain contains code that clobbers %eax and it restores %eax from the ptrace registers afterwards. The same goes for the ptrace-enabled call chain. When ptrace is not used, the syscall return value is the passed-in syscall number from the untouched %eax register. Use %eax as the return value register in syscall_badsys and sysenter_badsys, like a real syscall handler does, and have the caller push the value onto the stack for ptrace access. Signed-off-by: Sven Wegener Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net Reviewed-and-tested-by: Andy Lutomirski Cc: Signed-off-by: H. Peter Anvin (cherry picked from commit 8142b215501f8b291a108a202b3a053a265b03dd) Signed-off-by: Willy Tarreau commit 561ca83b38ce42113f64240ddb01ac6545ad0705 Author: Andy Lutomirski Date: Mon Jun 23 14:22:15 2014 -0700 x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508) The bad syscall nr paths are their own incomprehensible route through the entry control flow. Rearrange them to work just like syscalls that return -ENOSYS. This fixes an OOPS in the audit code when fast-path auditing is enabled and sysenter gets a bad syscall nr (CVE-2014-4508). This has probably been broken since Linux 2.6.27: af0575bba0 i386 syscall audit fast-path Cc: stable@vger.kernel.org Cc: Roland McGrath Reported-by: Toralf Förster Signed-off-by: Andy Lutomirski Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net Signed-off-by: H. Peter Anvin (cherry picked from commit 554086d85e71f30abe46fc014fea31929a7c6a8a) [WT: this fix is incorrect and requires the two following patches] Signed-off-by: Willy Tarreau