commit b473606512cb679a58f7fc0ba2775343ffdd9cbf Author: Willy Tarreau Date: Sun May 24 10:10:32 2015 +0200 Linux 2.6.32.66 Signed-off-by: Willy Tarreau commit db91e176c20bc15d1352b817f2339ef6ce32a8ec Author: Catalin Marinas Date: Fri Mar 20 16:48:13 2015 +0000 net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour Commit db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) introduced the clamping of msg_namelen when the unsigned value was larger than sizeof(struct sockaddr_storage). This caused a msg_namelen of -1 to be valid. The native code was subsequently fixed by commit dbb490b96584 (net: socket: error on a negative msg_namelen). In addition, the native code sets msg_namelen to 0 when msg_name is NULL. This was done in commit (6a2a2b3ae075 net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland) and subsequently updated by 08adb7dabd48 (fold verify_iovec() into copy_msghdr_from_user()). This patch brings the get_compat_msghdr() in line with copy_msghdr_from_user(). Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) Cc: David S. Miller Cc: Dan Carpenter Signed-off-by: Catalin Marinas Signed-off-by: David S. Miller (cherry picked from commit 91edd096e224941131f896b86838b1e59553696a) Cc: Ben Hutchings Signed-off-by: Willy Tarreau commit c8412c6c97217654c7a6d9b9030f7910ca9a0b82 Author: Alexey Khoroshilov Date: Sat Apr 18 02:53:25 2015 +0300 sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND) A deadlock can be initiated by userspace via ioctl(SNDCTL_SEQ_OUTOFBAND) on /dev/sequencer with TMR_ECHO midi event. In this case the control flow is: sound_ioctl() -> case SND_DEV_SEQ: case SND_DEV_SEQ2: sequencer_ioctl() -> case SNDCTL_SEQ_OUTOFBAND: spin_lock_irqsave(&lock,flags); play_event(); -> case EV_TIMING: seq_timing_event() -> case TMR_ECHO: seq_copy_to_input() -> spin_lock_irqsave(&lock,flags); It seems that spin_lock_irqsave() around play_event() is not necessary, because the only other call location in seq_startplay() makes the call without acquiring spinlock. So, the patch just removes spinlocks around play_event(). By the way, it removes unreachable code in seq_timing_event(), since (seq_mode == SEQ_2) case is handled in the beginning. Compile tested only. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: Takashi Iwai (cherry picked from commit bc26d4d06e337ade069f33d3f4377593b24e6e36) Signed-off-by: Willy Tarreau commit 533c1137c5ea83fcd59a0c5ec4b5e18e099677c0 Author: Sergei Antonov Date: Tue Mar 17 03:10:48 2015 +0100 hfsplus: fix B-tree corruption after insertion at position 0 commit 98cf21c61a7f5419d82f847c4d77bf6e96a76f5f upstream. Fix B-tree corruption when a new record is inserted at position 0 in the node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is called to update the parent index node (if exists) and it is passed hfs_find_data with a search_key containing a newly inserted key instead of the key to be updated. This results in an inconsistent index node. The bug reproduces on my machine after an extents overflow record for the catalog file (CNID=4) is inserted into the extents overflow B-tree. Because of a low (reserved) value of CNID=4, it has to become the first record in the first leaf node. The resulting first leaf node is correct: ---------------------------------------------------- | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... | ---------------------------------------------------- But the parent index key0 still contains the previous key CNID=123: ----------------------- | key0.CNID=123 | ... | ----------------------- A change in hfs_brec_insert() makes hfs_brec_update_parent() work correctly by preventing it from getting fd->record=-1 value from __hfs_brec_find(). Along the way, I removed duplicate code with unification of the if condition. The resulting code is equivalent to the original code because node is never 0. Also hfs_brec_update_parent() will now return an error after getting a negative fd->record value. However, the return value of hfs_brec_update_parent() is not checked anywhere in the file and I'm leaving it unchanged by this patch. brec.c lacks error checking after some other calls too, but this issue is of less importance than the one being fixed by this patch. Cc: stable@vger.kernel.org Cc: Joe Perches Cc: Andrew Morton Cc: Vyacheslav Dubeyko Cc: Hin-Tak Leung Cc: Anton Altaparmakov Cc: Al Viro Cc: Christoph Hellwig Signed-off-by: Sergei Antonov Signed-off-by: Willy Tarreau commit d7733a753f7b9d1170a341a4b66ed31dd4badce2 Author: Mathias Krause Date: Sat Oct 4 23:06:39 2014 +0200 posix-timers: Fix stack info leak in timer_create() commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream. If userland creates a timer without specifying a sigevent info, we'll create one ourself, using a stack local variable. Particularly will we use the timer ID as sival_int. But as sigev_value is a union containing a pointer and an int, that assignment will only partially initialize sigev_value on systems where the size of a pointer is bigger than the size of an int. On such systems we'll copy the uninitialized stack bytes from the timer_create() call to userland when the timer actually fires and we're going to deliver the signal. Initialize sigev_value with 0 to plug the stack info leak. Found in the PaX patch, written by the PaX Team. Fixes: 5a9fa7307285 ("posix-timers: kill ->it_sigev_signo and...") Signed-off-by: Mathias Krause Cc: Oleg Nesterov Cc: Brad Spengler Cc: PaX Team Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com Signed-off-by: Thomas Gleixner [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings (cherry picked from commit 3cd3a349aa3519b88d29845c0bc36bcbae158e93) Signed-off-by: Willy Tarreau commit b5f10e98858103faed1c39cb5f5404d2d3ce58a2 Author: Jan Kara Date: Wed Oct 22 20:13:39 2014 -0600 scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream. When sg_scsi_ioctl() fails to prepare request to submit in blk_rq_map_kern() we jump to a label where we just end up copying (luckily zeroed-out) kernel buffer to userspace instead of reporting error. Fix the problem by jumping to the right label. CC: Jens Axboe CC: linux-scsi@vger.kernel.org Coverity-id: 1226871 Signed-off-by: Jan Kara Fixed up the, now unused, out label. Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings (cherry picked from commit d73b032b63e8967462e1cf5763858ed89e97880f) Signed-off-by: Willy Tarreau commit a3ea94483f8561c06464bed39d2d5b854ae3b5f5 Author: Benjamin Coddington Date: Tue Sep 23 12:26:20 2014 -0400 lockd: Try to reconnect if statd has moved commit 173b3afceebe76fa2205b2c8808682d5b541fe3c upstream. If rpc.statd is restarted, upcalls to monitor hosts can fail with ECONNREFUSED. In that case force a lookup of statd's new port and retry the upcall. Signed-off-by: Benjamin Coddington Signed-off-by: Trond Myklebust [bwh: Backported to 3.2: not using RPC_TASK_SOFTCONN] Signed-off-by: Ben Hutchings (cherry picked from commit 3aabe891f32c209a2be7cd5581d2634020e801c1) Signed-off-by: Willy Tarreau commit 036783b2427dd1dd4f3cd0a1a9e97e511bfaac0d Author: Kirill A. Shutemov Date: Mon Mar 9 23:11:12 2015 +0200 pagemap: do not leak physical addresses to non-privileged userspace commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream. As pointed by recent post[1] on exploiting DRAM physical imperfection, /proc/PID/pagemap exposes sensitive information which can be used to do attacks. This disallows anybody without CAP_SYS_ADMIN to read the pagemap. [1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html [ Eventually we might want to do anything more finegrained, but for now this is the simple model. - Linus ] Signed-off-by: Kirill A. Shutemov Acked-by: Konstantin Khlebnikov Acked-by: Andy Lutomirski Cc: Pavel Emelyanov Cc: Andrew Morton Cc: Mark Seaborn Signed-off-by: Linus Torvalds [mancha security: Backported to 3.10] Signed-off-by: mancha security Signed-off-by: Ben Hutchings (cherry picked from commit 1ffc3cd9a36b504c20ce98fe5eeb5463f389e1ac) Signed-off-by: Willy Tarreau commit 66dff16a0d01823e04246478b86c8591b7aad447 Author: Jiri Pirko Date: Mon Oct 13 16:34:10 2014 +0200 ipv4: fix nexthop attlen check in fib_nh_match commit f76936d07c4eeb36d8dbb64ebd30ab46ff85d9f7 upstream. fib_nh_match does not match nexthops correctly. Example: ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \ nexthop via 192.168.122.13 dev eth0 ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \ nexthop via 192.168.122.15 dev eth0 Del command is successful and route is removed. After this patch applied, the route is correctly matched and result is: RTNETLINK answers: No such process Please consider this for stable trees as well. Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config") Signed-off-by: Jiri Pirko Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 0aba46add2915b344580569e87d9c41274b9c475) Signed-off-by: Willy Tarreau commit 1d37a6b8fd918c2b308ac1a55da94aab34acbb83 Author: Dan Carpenter Date: Sat Dec 6 16:49:24 2014 +0300 ipvs: uninitialized data with IP_VS_IPV6 commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream. The app_tcp_pkt_out() function expects "*diff" to be set and ends up using uninitialized data if CONFIG_IP_VS_IPV6 is turned on. The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov for noticing that. Signed-off-by: Dan Carpenter Acked-by: Julian Anastasov Signed-off-by: Simon Horman Signed-off-by: Ben Hutchings Cc: Pablo Neira Ayuso (cherry picked from commit 0ce625baeec39e813bef9b073f0214b513b2ef2d) Signed-off-by: Willy Tarreau commit 43318c2ee17c9c5cb41cecc810d1c1b9a53c3588 Author: Eli Cohen Date: Sun Sep 14 16:47:52 2014 +0300 IB/core: Avoid leakage from kernel to user space commit 377b513485fd885dea1083a9a5430df65b35e048 upstream. Clear the reserved field of struct ib_uverbs_async_event_desc which is copied to user space. Signed-off-by: Eli Cohen Reviewed-by: Yann Droneaud Signed-off-by: Roland Dreier Signed-off-by: Ben Hutchings Cc: Yann Droneaud (cherry picked from commit 852acc0151014ba9731e2f5f2f3df3b6a8960d40) Signed-off-by: Willy Tarreau commit abe59a6ab4dc7f438e44e2307c378828b0b06c5a Author: Ian Abbott Date: Mon Mar 23 17:50:27 2015 +0000 spi: spidev: fix possible arithmetic overflow for multi-transfer message commit f20fbaad7620af2df36a1f9d1c9ecf48ead5b747 upstream. `spidev_message()` sums the lengths of the individual SPI transfers to determine the overall SPI message length. It restricts the total length, returning an error if too long, but it does not check for arithmetic overflow. For example, if the SPI message consisted of two transfers and the first has a length of 10 and the second has a length of (__u32)(-1), the total length would be seen as 9, even though the second transfer is actually very long. If the second transfer specifies a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could overrun the spidev's pre-allocated tx buffer before it reaches an invalid user memory address. Fix it by checking that neither the total nor the individual transfer lengths exceed the maximum allowed value. Thanks to Dan Carpenter for reporting the potential integer overflow. Signed-off-by: Ian Abbott Signed-off-by: Mark Brown [Ian Abbott: Note: original commit compares the lengths to INT_MAX instead of bufsiz due to changes in earlier commits.] Signed-off-by: Ben Hutchings (cherry picked from commit 7499401e4a0b01ee43cff768de4ca630dcd0bc64) Signed-off-by: Willy Tarreau commit f944afb246e7b8edd6196984e21764eeda5446d3 Author: Eric Dumazet Date: Thu Apr 23 10:42:39 2015 -0700 tcp: avoid looping in tcp_send_fin() [ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ] Presence of an unbound loop in tcp_send_fin() had always been hard to explain when analyzing crash dumps involving gigantic dying processes with millions of sockets. Lets try a different strategy : In case of memory pressure, try to add the FIN flag to last packet in write queue, even if packet was already sent. TCP stack will be able to deliver this FIN after a timeout event. Note that this FIN being delivered by a retransmit, it also carries a Push flag given our current implementation. By checking sk_under_memory_pressure(), we anticipate that cooking many FIN packets might deplete tcp memory. In the case we could not allocate a packet, even with __GFP_WAIT allocation, then not sending a FIN seems quite reasonable if it allows to get rid of this socket, free memory, and not block the process from eventually doing other useful work. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller [bwh: Backported to 3.2: - Drop inapplicable change to sk_forced_wmem_schedule() - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/] Signed-off-by: Ben Hutchings (cherry picked from commit 82241580d7734af2207ad0bb1720904f569dac3a) [wt: backported to 2.6.32: s/TCPHDR_FIN/TCPCB_FLAG_FIN/] Signed-off-by: Willy Tarreau commit b19feb6e0066835d430c98a2befb00546089895e Author: Sebastian Pöhn Date: Mon Apr 20 09:19:20 2015 +0200 ip_forward: Drop frames with attached skb->sk [ Upstream commit 2ab957492d13bb819400ac29ae55911d50a82a13 ] Initial discussion was: [FYI] xfrm: Don't lookup sk_policy for timewait sockets Forwarded frames should not have a socket attached. Especially tw sockets will lead to panics later-on in the stack. This was observed with TPROXY assigning a tw socket and broken policy routing (misconfigured). As a result frame enters forwarding path instead of input. We cannot solve this in TPROXY as it cannot know that policy routing is broken. v2: Remove useless comment Signed-off-by: Sebastian Poehn Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit fccb908d23fbae3e941e9294590dd94de6b1d822) Signed-off-by: Willy Tarreau commit b49fbe0a677522c61616715b43d5d59e7c5c05d6 Author: Eric Dumazet Date: Mon Nov 17 23:06:20 2014 -0800 tcp: make connect() mem charging friendly [ Upstream commit 355a901e6cf1b2b763ec85caa2a9f04fbcc4ab4a ] While working on sk_forward_alloc problems reported by Denys Fedoryshchenko, we found that tcp connect() (and fastopen) do not call sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so sk_forward_alloc is negative while connect is in progress. We can fix this by calling regular sk_stream_alloc_skb() both for the SYN packet (in tcp_connect()) and the syn_data packet in tcp_send_syn_data() Then, tcp_send_syn_data() can avoid copying syn_data as we simply can manipulate syn_data->cb[] to remove SYN flag (and increment seq) Instead of open coding memcpy_fromiovecend(), simply use this helper. This leaves in socket write queue clean fast clone skbs. This was tested against our fastopen packetdrill tests. Reported-by: Denys Fedoryshchenko Signed-off-by: Eric Dumazet Acked-by: Yuchung Cheng Signed-off-by: David S. Miller [bwh: Backported to 3.2: - Drop the Fast Open changes - Adjust context] Signed-off-by: Ben Hutchings (cherry picked from commit 3e2eb8946907b2d53eb906e13e01d273c6534f5c) Signed-off-by: Willy Tarreau commit 876846f70a6a1fbee8fadea9eb1a82e3ef04729f Author: Al Viro Date: Sat Mar 14 05:34:56 2015 +0000 rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() [ Upstream commit 7d985ed1dca5c90535d67ce92ef6ca520302340a ] [I would really like an ACK on that one from dhowells; it appears to be quite straightforward, but...] MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit in there. It gets passed via flags; in fact, another such check in the same function is done correctly - as flags & MSG_PEEK. It had been that way (effectively disabled) for 8 years, though, so the patch needs beating up - that case had never been tested. If it is correct, it's -stable fodder. Signed-off-by: Al Viro Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 10c82cd7d46e4c525b046c399fcd285ce138198e) Signed-off-by: Willy Tarreau commit 71372d0e02d5ce142d36f963b7eda542876da75f Author: Arnd Bergmann Date: Wed Mar 11 22:46:59 2015 +0100 rds: avoid potential stack overflow [ Upstream commit f862e07cf95d5b62a5fc5e981dd7d0dbaf33a501 ] The rds_iw_update_cm_id function stores a large 'struct rds_sock' object on the stack in order to pass a pair of addresses. This happens to just fit withint the 1024 byte stack size warning limit on x86, but just exceed that limit on ARM, which gives us this warning: net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] As the use of this large variable is basically bogus, we can rearrange the code to not do that. Instead of passing an rds socket into rds_iw_get_device, we now just pass the two addresses that we have available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly, to create two address structures on the stack there. Signed-off-by: Arnd Bergmann Acked-by: Sowmini Varadhan Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 3fe2d645fe4ea7ff6cba9020685e46c1a1dff9c0) Signed-off-by: Willy Tarreau commit c5424cf0b04845c76a87baa4a0fc51dc67340369 Author: Alexey Kodanev Date: Wed Mar 11 14:29:17 2015 +0300 net: sysctl_net_core: check SNDBUF and RCVBUF for min length [ Upstream commit b1cb59cf2efe7971d3d72a7b963d09a512d994c9 ] sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be set to incorrect values. Given that 'struct sk_buff' allocates from rcvbuf, incorrectly set buffer length could result to memory allocation failures. For example, set them as follows: # sysctl net.core.rmem_default=64 net.core.wmem_default = 64 # sysctl net.core.wmem_default=64 net.core.wmem_default = 64 # ping localhost -s 1024 -i 0 > /dev/null This could result to the following failure: skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32 head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev: kernel BUG at net/core/skbuff.c:102! invalid opcode: 0000 [#1] SMP ... task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000 RIP: 0010:[] [] skb_put+0xa1/0xb0 RSP: 0018:ffff88003ae8bc68 EFLAGS: 00010296 RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000 RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8 RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900 FS: 00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0 Stack: ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00 Call Trace: [] unix_stream_sendmsg+0x2c4/0x470 [] sock_write_iter+0x146/0x160 [] new_sync_write+0x92/0xd0 [] vfs_write+0xd6/0x180 [] SyS_write+0x59/0xd0 [] system_call_fastpath+0x12/0x17 Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 RIP [] skb_put+0xa1/0xb0 RSP Kernel panic - not syncing: Fatal exception Moreover, the possible minimum is 1, so we can get another kernel panic: ... BUG: unable to handle kernel paging request at ffff88013caee5c0 IP: [] __alloc_skb+0x12f/0x1f0 ... Signed-off-by: Alexey Kodanev Signed-off-by: David S. Miller [bwh: Backported to 3.2: delete now-unused 'one' variable] Signed-off-by: Ben Hutchings (cherry picked from commit 2d6dfb109bfbf3abd5f762173b1d73fd321dbe37) Signed-off-by: Willy Tarreau commit 0f8a4ca10ec65ffad4b6a8800d59e2ab5a1ddb31 Author: bingtian.ly@taobao.com Date: Wed Jan 23 20:35:28 2013 +0000 net: avoid to hang up on sending due to sysctl configuration overflow. commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream. I found if we write a larger than 4GB value to some sysctl variables, the sending syscall will hang up forever, because these variables are 32 bits, such large values make them overflow to 0 or negative. This patch try to fix overflow or prevent from zero value setup of below sysctl variables: net.core.wmem_default net.core.rmem_default net.core.rmem_max net.core.wmem_max net.ipv4.udp_rmem_min net.ipv4.udp_wmem_min net.ipv4.tcp_wmem net.ipv4.tcp_rmem Signed-off-by: Eric Dumazet Signed-off-by: Li Yu Signed-off-by: David S. Miller [bwh: Backported to 3.2: - Adjust context - Delete now-unused 'zero' variable] Signed-off-by: Ben Hutchings (cherry picked from commit 98eee187cdee2807bd80e6c02180c5c2abae6453) [wt: backported to 2.6.32: set strategy to sysctl_intvec where relevant] Signed-off-by: Willy Tarreau commit 65f26669f828edb1d964858236bef7ea18a7c816 Author: Michal Kubeček Date: Mon Mar 2 18:27:11 2015 +0100 udp: only allow UFO for packets from SOCK_DGRAM sockets [ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ] If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer checksum is to be computed on segmentation. However, in this case, skb->csum_start and skb->csum_offset are never set as raw socket transmit path bypasses udp_send_skb() where they are usually set. As a result, driver may access invalid memory when trying to calculate the checksum and store the result (as observed in virtio_net driver). Moreover, the very idea of modifying the userspace provided UDP header is IMHO against raw socket semantics (I wasn't able to find a document clearly stating this or the opposite, though). And while allowing CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit too intrusive change just to handle a corner case like this. Therefore disallowing UFO for packets from SOCK_DGRAM seems to be the best option. Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 332640b2821f75381b1049a904d93d4fb846334f) Signed-off-by: Willy Tarreau commit 417b2efd69006e3868628e035c7d8151ea983c82 Author: Steffen Klassert Date: Wed Jun 29 23:19:32 2011 +0000 ipv4: Don't use ufo handling on later transformed packets We might call ip_ufo_append_data() for packets that will be IPsec transformed later. This function should be used just for real udp packets. So we check for rt->dst.header_len which is only nonzero on IPsec handling and call ip_ufo_append_data() just if rt->dst.header_len is zero. Signed-off-by: Steffen Klassert Signed-off-by: David S. Miller (cherry picked from commit c146066ab80267c3305de5dda6a4083f06df9265) Signed-off-by: Willy Tarreau commit a7357ca5c20572ac9609a87be884f1e87bd28185 Author: Matthew Thode Date: Tue Feb 17 18:31:57 2015 -0600 net: reject creation of netdev names with colons [ Upstream commit a4176a9391868bfa87705bcd2e3b49e9b9dd2996 ] colons are used as a separator in netdev device lookup in dev_ioctl.c Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME Signed-off-by: Matthew Thode Signed-off-by: David S. Miller [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings (cherry picked from commit d501ebeb7da7531e92e3c8d194730341c314ff2d) Signed-off-by: Willy Tarreau commit 20914ec48a0bdce899c7a97a8ba720110a0ee9c5 Author: Ignacy Gawędzki Date: Tue Feb 17 20:15:20 2015 +0100 ematch: Fix auto-loading of ematch modules. [ Upstream commit 34eea79e2664b314cab6a30fc582fdfa7a1bb1df ] In tcf_em_validate(), after calling request_module() to load the kind-specific module, set em->ops to NULL before returning -EAGAIN, so that module_put() is not called again by tcf_em_tree_destroy(). Signed-off-by: Ignacy Gawędzki Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 9405be73262273a6904688fb2f1e60a13117cf1b) Signed-off-by: Willy Tarreau commit 221956a27cf80585f0313d9be7f55ffd8746dcc0 Author: Florian Westphal Date: Wed Jan 28 10:56:04 2015 +0100 ppp: deflate: never return len larger than output buffer [ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ] When we've run out of space in the output buffer to store more data, we will call zlib_deflate with a NULL output buffer until we've consumed remaining input. When this happens, olen contains the size the output buffer would have consumed iff we'd have had enough room. This can later cause skb_over_panic when ppp_generic skb_put()s the returned length. Reported-by: Iain Douglas Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 8bcd64423836bad3638684677f6d740bc7c9297f) Signed-off-by: Willy Tarreau commit 964a590975bdef771dfa1da6b3dfd500d34990e3 Author: Ani Sinha Date: Mon Sep 8 14:49:59 2014 -0700 net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland. commit 6a2a2b3ae0759843b22c929881cc184b00cc63ff upstream. Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will break old binaries and any code for which there is no access to source code. To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland. Signed-off-by: Ani Sinha Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit d29f1f53e5299e0bbb3e33ef8d35ed657fa633b6) Signed-off-by: Willy Tarreau commit 0c5d422155951f9b2ac3285a1264eb63683e0794 Author: Jann Horn Date: Sun Apr 19 02:48:39 2015 +0200 fs: take i_mutex during prepare_binprm for set[ug]id executables commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream. This prevents a race between chown() and execve(), where chowning a setuid-user binary to root would momentarily make the binary setuid root. This patch was mostly written by Linus Torvalds. Signed-off-by: Jann Horn Signed-off-by: Linus Torvalds [bwh: Backported to 3.2: - Drop the task_no_new_privs() and user namespace checks - Open-code file_inode() - s/READ_ONCE/ACCESS_ONCE/ - Adjust context] Signed-off-by: Ben Hutchings (cherry picked from commit 470e517be17dd6ef8670bec7bd7831ea0d3ad8a6) Signed-off-by: Willy Tarreau commit daacd26b5ebfa871be411cf190bd0d42d7712acb Author: D.S. Ljungmark Date: Wed Mar 25 09:28:15 2015 +0100 ipv6: Don't reduce hop limit for an interface commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a upstream. A local route may have a lower hop_limit set than global routes do. RFC 3756, Section 4.2.7, "Parameter Spoofing" > 1. The attacker includes a Current Hop Limit of one or another small > number which the attacker knows will cause legitimate packets to > be dropped before they reach their destination. > As an example, one possible approach to mitigate this threat is to > ignore very small hop limits. The nodes could implement a > configurable minimum hop limit, and ignore attempts to set it below > said limit. Signed-off-by: D.S. Ljungmark Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller [bwh: Backported to 3.2: adjust ND_PRINTK() usage] Signed-off-by: Ben Hutchings (cherry picked from commit f10f7d2a8200fe33c5030c7e32df3a2b3561f3cd) Signed-off-by: Willy Tarreau commit 6246ff96da330928a3fef0da741827474f473a11 Author: Sasha Levin Date: Tue Feb 3 08:55:58 2015 -0500 net: rds: use correct size for max unacked packets and bytes commit db27ebb111e9f69efece08e4cb6a34ff980f8896 upstream. Max unacked packets/bytes is an int while sizeof(long) was used in the sysctl table. This means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Signed-off-by: Sasha Levin Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 3760b67b3e419b9ac42a45417491360a14a35357) Signed-off-by: Willy Tarreau commit 716fff2a64f174e89172c2fc86f70d1008d1f31e Author: Sasha Levin Date: Fri Jan 23 20:47:00 2015 -0500 net: llc: use correct size for sysctl timeout entries commit 6b8d9117ccb4f81b1244aafa7bc70ef8fa45fc49 upstream. The timeout entries are sizeof(int) rather than sizeof(long), which means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Signed-off-by: Sasha Levin Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 88fe14be08a475ad0eea4ca7c51f32437baf41af) Signed-off-by: Willy Tarreau commit e402af7b59e1834efbcb4ce6ae5bf47702a9a329 Author: Shachar Raindel Date: Wed Mar 18 17:39:08 2015 +0000 IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic commit 8494057ab5e40df590ef6ef7d66324d3ae33356b upstream. Properly verify that the resulting page aligned end address is larger than both the start address and the length of the memory area requested. Both the start and length arguments for ib_umem_get are controlled by the user. A misbehaving user can provide values which will cause an integer overflow when calculating the page aligned end address. This overflow can cause also miscalculation of the number of pages mapped, and additional logic issues. Addresses: CVE-2014-8159 Signed-off-by: Shachar Raindel Signed-off-by: Jack Morgenstein Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier Signed-off-by: Ben Hutchings (cherry picked from commit 485f16b743d98527620396639b73d7214006f3c7) Signed-off-by: Willy Tarreau commit 1b9143bddfef9047efcfc4627014a20a7c8c743f Author: Daniel Borkmann Date: Thu Jan 22 18:26:54 2015 +0100 net: sctp: fix slab corruption from use after free on INIT collisions commit 600ddd6825543962fb807884169e57b580dba208 upstream When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit 1be9a950c646 ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit 1be9a950c646 ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [] SyS_listen+0x9d/0xb0 [ 535.166848] [] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76 ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by: Daniel Borkmann Acked-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit f014e54c21bf0ee03bac265d84feffbb7e233e89 Author: Daniel Borkmann Date: Mon Nov 10 18:00:09 2014 +0100 net: sctp: fix memory leak in auth key management commit 4184b2a79a7612a9272ce20d639934584a1f3786 upstream. A very minimal and simple user space application allocating an SCTP socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing the socket again will leak the memory containing the authentication key from user space: unreferenced object 0xffff8800837047c0 (size 16): comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s) hex dump (first 16 bytes): 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] kmemleak_alloc+0x4e/0xb0 [] __kmalloc+0xe8/0x270 [] sctp_auth_create_key+0x23/0x50 [sctp] [] sctp_auth_set_key+0xa1/0x140 [sctp] [] sctp_setsockopt+0xd03/0x1180 [sctp] [] sock_common_setsockopt+0x14/0x20 [] SyS_setsockopt+0x71/0xd0 [] system_call_fastpath+0x12/0x17 [] 0xffffffffffffffff This is bad because of two things, we can bring down a machine from user space when auth_enable=1, but also we would leave security sensitive keying material in memory without clearing it after use. The issue is that sctp_auth_create_key() already sets the refcount to 1, but after allocation sctp_auth_set_key() does an additional refcount on it, and thus leaving it around when we free the socket. Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann Cc: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings (cherry picked from commit 3af10169145c8eed7b3591c0644da4298405efbc) Signed-off-by: Willy Tarreau commit 09b5d759d0f67a0eaf96ef51d7b33b8d54b581d1 Author: Jan Kara Date: Thu Dec 18 17:26:10 2014 +0100 isofs: Fix unchecked printing of ER records commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream We didn't check length of rock ridge ER records before printing them. Thus corrupted isofs image can cause us to access and print some memory behind the buffer with obvious consequences. Reported-and-tested-by: Carl Henrik Lunde CC: stable@vger.kernel.org Signed-off-by: Jan Kara Signed-off-by: Willy Tarreau commit 08313e26e06d4aa9ce1cbba1a8e359e9cab9ad56 Author: Jan Kara Date: Mon Dec 15 14:22:46 2014 +0100 isofs: Fix infinite looping over CE entries commit f54e18f1b831c92f6512d2eedb224cd63d607d3d upstream Rock Ridge extensions define so called Continuation Entries (CE) which define where is further space with Rock Ridge data. Corrupted isofs image can contain arbitrarily long chain of these, including a one containing loop and thus causing kernel to end in an infinite loop when traversing these entries. Limit the traversal to 32 entries which should be more than enough space to store all the Rock Ridge data. Reported-by: P J P CC: stable@vger.kernel.org Signed-off-by: Jan Kara Signed-off-by: Willy Tarreau commit 8e4213323511e95e81c743e50c00974a407c0fe5 Author: Florian Westphal Date: Fri Sep 26 11:35:42 2014 +0200 netfilter: conntrack: disable generic tracking for known protocols commit db29a9508a9246e77087c5531e45b2c88ec6988b upstream Given following iptables ruleset: -P FORWARD DROP -A FORWARD -m sctp --dport 9 -j ACCEPT -A FORWARD -p tcp --dport 80 -j ACCEPT -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT One would assume that this allows SCTP on port 9 and TCP on port 80. Unfortunately, if the SCTP conntrack module is not loaded, this allows *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT, which we think is a security issue. This is because on the first SCTP packet on port 9, we create a dummy "generic l4" conntrack entry without any port information (since conntrack doesn't know how to extract this information). All subsequent packets that are unknown will then be in established state since they will fallback to proto_generic and will match the 'generic' entry. Our originally proposed version [1] completely disabled generic protocol tracking, but Jozsef suggests to not track protocols for which a more suitable helper is available, hence we now mitigate the issue for in tree known ct protocol helpers only, so that at least NAT and direction information will still be preserved for others. [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html Joint work with Daniel Borkmann. Signed-off-by: Florian Westphal Signed-off-by: Daniel Borkmann Acked-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit 7e6536a247bfd50110bba6a62fa611697a2eae22 Author: Ben Hutchings Date: Thu Jan 29 02:50:33 2015 +0000 splice: Apply generic position and size checks to each write We need to check the position and size of file writes against various limits, using generic_write_check(). This was not being done for the splice write path. It was fixed upstream by commit 8d0207652cbe ("->splice_write() via ->write_iter()") but we can't apply that. CVE-2014-7822 Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit 17f35338d3b877b7434281e615e1926e560b97dc Author: Robert Baldyga Date: Mon Nov 24 07:56:21 2014 +0100 serial: samsung: wait for transfer completion before clock disable This patch adds waiting until transmit buffer and shifter will be empty before clock disabling. Without this fix it's possible to have clock disabled while data was not transmited yet, which causes unproper state of TX line and problems in following data transfers. Cc: stable@vger.kernel.org Signed-off-by: Robert Baldyga Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 1ff383a4c3eda8893ec61b02831826e1b1f46b41) Signed-off-by: Willy Tarreau commit d75221306dc828395739fee7efae402a5cc45df3 Author: Shai Fultheim Date: Fri Apr 20 01:12:32 2012 +0300 x86: Conditionally update time when ack-ing pending irqs commit 42fa4250436304d4650fa271f37671f6cee24e08 upstream. On virtual environments, apic_read could take a long time. As a result, under certain conditions the ack pending loop may exit without any queued irqs left, but after more than one second. A warning will be printed needlessly in this case. If the loop is about to exit regardless of max_loops, don't update it. Signed-off-by: Shai Fultheim [ rebased and reworded the commit message] Signed-off-by: Ido Yariv Acked-by: Thomas Gleixner Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings (cherry picked from commit c9f1417be9acae3a9867f8bdab2b7924d76cf6ac) Signed-off-by: Willy Tarreau commit c069a4f21b6542f071a803f1ab9a7bd897ec271c Author: Andy Lutomirski Date: Thu Mar 5 01:09:44 2015 +0100 x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization commit 956421fbb74c3a6261903f3836c0740187cf038b upstream. 'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and the related state make sense for 'ret_from_sys_call'. This is entirely the wrong check. TS_COMPAT would make a little more sense, but there's really no point in keeping this optimization at all. This fixes a return to the wrong user CS if we came from int 0x80 in a 64-bit task. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net [ Backported from tip:x86/asm. ] Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings (cherry picked from commit 159891c0953a89a28f793fc52373b031262c44d2) Signed-off-by: Willy Tarreau commit ebcbe1397436bf3a77f64afdca4733fc9569f274 Author: Borislav Petkov Date: Wed Jan 15 00:07:11 2014 +0100 x86, cpu, amd: Add workaround for family 16h, erratum 793 commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan Signed-off-by: H. Peter Anvin [bwh: Backported to 3.2: - Adjust filename - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to avoid crashing on KVM. This was fixed upstream by commit 8f86a7373a1c ("x86, AMD: Convert to the new bit access MSR accessors") but that's too much trouble to backport. Here we must use {rd,wr}msrl_amd_safe().] Signed-off-by: Willy Tarreau commit ba455d81e9d648495bac3b43b18449ee6cca95a1 Author: Hector Marco-Gisbert Date: Sat Feb 14 09:33:50 2015 -0800 ASLR: fix stack randomization on 64-bit systems commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. CVE-2015-1593 Signed-off-by: Hector Marco-Gisbert Signed-off-by: Ismael Ripoll [kees: rebase, fix 80 char, clean up commit message, add test example, cve] Signed-off-by: Kees Cook Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau commit 7f36f1ac05082cd750750e9ac8e7b1fa735886e2 Author: Andy Lutomirski Date: Fri Dec 19 16:04:11 2014 -0800 x86_64, vdso: Fix the vdso address randomization algorithm commit 394f56fe480140877304d342dec46d50dc823d46 upstream The theory behind vdso randomization is that it's mapped at a random offset above the top of the stack. To avoid wasting a page of memory for an extra page table, the vdso isn't supposed to extend past the lowest PMD into which it can fit. Other than that, the address should be a uniformly distributed address that meets all of the alignment requirements. The current algorithm is buggy: the vdso has about a 50% probability of being at the very end of a PMD. The current algorithm also has a decent chance of failing outright due to incorrect handling of the case where the top of the stack is near the top of its PMD. This fixes the implementation. The paxtest estimate of vdso "randomisation" improves from 11 bits to 18 bits. (Disclaimer: I don't know what the paxtest code is actually calculating.) It's worth noting that this algorithm is inherently biased: the vdso is more likely to end up near the end of its PMD than near the beginning. Ideally we would either nix the PMD sharing requirement or jointly randomize the vdso and the stack to reduce the bias. In the mean time, this is a considerable improvement with basically no risk of compatibility issues, since the allowed outputs of the algorithm are unchanged. As an easy test, doing this: for i in `seq 10000` do grep -P vdso /proc/self/maps |cut -d- -f1 done |sort |uniq -d used to produce lots of output (1445 lines on my most recent run). A tiny subset looks like this: 7fffdfffe000 7fffe01fe000 7fffe05fe000 7fffe07fe000 7fffe09fe000 7fffe0bfe000 7fffe0dfe000 Note the suspicious fe000 endings. With the fix, I get a much more palatable 76 repeated addresses. Reviewed-by: Kees Cook Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski [bwh: Backported to 2.6.32: - The whole file is only built for x86_64; adjust context and comment for this - We don't have align_vdso_addr()] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit b23c0b06e64250ed18a97bb1f6bb6690c40ce8c4 Author: Andy Lutomirski Date: Fri Dec 5 19:03:28 2014 -0800 x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit commit 29fa6825463c97e5157284db80107d1bfac5d77b upstream paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Suggested-by: Konrad Rzeszutek Wilk Signed-off-by: Andy Lutomirski Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings [bwh: Backported to 2.6.32: adjust indentation, context] Signed-off-by: Willy Tarreau commit c52fdba6b35002f3c9df47947e35754b8f0b522f Author: Andy Lutomirski Date: Wed Dec 17 14:48:30 2014 -0800 x86/tls: Don't validate lm in set_thread_area() after all commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream. It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski Acked-by: Thomas Gleixner Cc: Linus Torvalds Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net Signed-off-by: Ingo Molnar [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings (cherry picked from commit c759a579c902167d656ee303d518cb5eed2af278) Signed-off-by: Willy Tarreau commit 18cb16aa9cc854434587c9c0d23a2d5f0a083715 Author: Andy Lutomirski Date: Thu Dec 4 16:48:17 2014 -0800 x86/tls: Disallow unusual TLS segments commit 0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream. Users have no business installing custom code segments into the GDT, and segments that are not present but are otherwise valid are a historical source of interesting attacks. For completeness, block attempts to set the L bit. (Prior to this patch, the L bit would have been silently dropped.) This is an ABI break. I've checked glibc, musl, and Wine, and none of them look like they'll have any trouble. Note to stable maintainers: this is a hardening patch that fixes no known bugs. Given the possibility of ABI issues, this probably shouldn't be backported quickly. Signed-off-by: Andy Lutomirski Acked-by: H. Peter Anvin Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: security@kernel.org Cc: Willy Tarreau Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings (cherry picked from commit fbc3c534ddffeebba6f943945ac71ec83cfa04b8) Signed-off-by: Willy Tarreau commit 1f50d3c7d68ecc12cd6cf2706065d25a6cf4b928 Author: Andy Lutomirski Date: Thu Jan 22 11:27:59 2015 -0800 x86, tls: Interpret an all-zero struct user_desc as "no segment" commit 3669ef9fa7d35f573ec9c0e0341b29251c2734a7 upstream. The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't *quite* what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings (cherry picked from commit 3175b4cb1aa4b1430fada4679be4598f6eb8872b) Signed-off-by: Willy Tarreau commit 598b6280c55ff1a105cf8cd0f6b95aee86ce9016 Author: Andy Lutomirski Date: Thu Jan 22 11:27:58 2015 -0800 x86, tls, ldt: Stop checking lm in LDT_empty commit e30ab185c490e9a9381385529e0fd32f0a399495 upstream. 32-bit programs don't have an lm bit in their ABI, so they can't reliably cause LDT_empty to return true without resorting to memset. They shouldn't need to do this. This should fix a longstanding, if minor, issue in all 64-bit kernels as well as a potential regression in the TLS hardening code. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings (cherry picked from commit f62570cbdcb6dd53d0e2361488f9ea2c4cf17ec9) Signed-off-by: Willy Tarreau commit 85e01300f68eba959288a79901419ff06457a895 Author: Andy Lutomirski Date: Thu Dec 4 16:48:16 2014 -0800 x86/tls: Validate TLS entries to protect espfix commit 41bdc78544b8a93a9c6814b8bbbfef966272abbe upstream Installing a 16-bit RW data segment into the GDT defeats espfix. AFAICT this will not affect glibc, Wine, or dosemu at all. Signed-off-by: Andy Lutomirski Acked-by: H. Peter Anvin Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: security@kernel.org Cc: Willy Tarreau Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit f0d8cc6fafe9e747025f067ae77bdb7ddf06144b Author: Andy Lutomirski Date: Mon Nov 24 17:39:06 2014 -0800 x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs commit 7ddc6a2199f1da405a2fb68c40db8899b1a8cd87 upstream. These functions can be executed on the int3 stack, so kprobes are dangerous. Tracing is probably a bad idea, too. Fixes: b645af2d5905 ("x86_64, traps: Rework bad_iret") Signed-off-by: Andy Lutomirski Cc: Linus Torvalds Cc: Steven Rostedt Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net Signed-off-by: Ingo Molnar [bwh: Backported to 3.2: - Use __kprobes instead of NOKPROBE_SYMBOL() - Don't use __visible] Signed-off-by: Ben Hutchings (cherry picked from commit 8ea4c465ecb59846abed3d000d64b21b8e31aeb0) Signed-off-by: Willy Tarreau