aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2015-05-29Linux 2.6.32.66HEADv2.6.32.66.26masterWilly Tarreau1-1/+1
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() ↵Catalin Marinas1-0/+7
behaviour Commit db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) introduced the clamping of msg_namelen when the unsigned value was larger than sizeof(struct sockaddr_storage). This caused a msg_namelen of -1 to be valid. The native code was subsequently fixed by commit dbb490b96584 (net: socket: error on a negative msg_namelen). In addition, the native code sets msg_namelen to 0 when msg_name is NULL. This was done in commit (6a2a2b3ae075 net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland) and subsequently updated by 08adb7dabd48 (fold verify_iovec() into copy_msghdr_from_user()). This patch brings the get_compat_msghdr() in line with copy_msghdr_from_user(). Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) Cc: David S. Miller <davem@davemloft.net> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 91edd096e224941131f896b86838b1e59553696a) Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND)Alexey Khoroshilov1-10/+2
A deadlock can be initiated by userspace via ioctl(SNDCTL_SEQ_OUTOFBAND) on /dev/sequencer with TMR_ECHO midi event. In this case the control flow is: sound_ioctl() -> case SND_DEV_SEQ: case SND_DEV_SEQ2: sequencer_ioctl() -> case SNDCTL_SEQ_OUTOFBAND: spin_lock_irqsave(&lock,flags); play_event(); -> case EV_TIMING: seq_timing_event() -> case TMR_ECHO: seq_copy_to_input() -> spin_lock_irqsave(&lock,flags); It seems that spin_lock_irqsave() around play_event() is not necessary, because the only other call location in seq_startplay() makes the call without acquiring spinlock. So, the patch just removes spinlocks around play_event(). By the way, it removes unreachable code in seq_timing_event(), since (seq_mode == SEQ_2) case is handled in the beginning. Compile tested only. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Takashi Iwai <tiwai@suse.de> (cherry picked from commit bc26d4d06e337ade069f33d3f4377593b24e6e36) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29hfsplus: fix B-tree corruption after insertion at position 0Sergei Antonov1-9/+11
commit 98cf21c61a7f5419d82f847c4d77bf6e96a76f5f upstream. Fix B-tree corruption when a new record is inserted at position 0 in the node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is called to update the parent index node (if exists) and it is passed hfs_find_data with a search_key containing a newly inserted key instead of the key to be updated. This results in an inconsistent index node. The bug reproduces on my machine after an extents overflow record for the catalog file (CNID=4) is inserted into the extents overflow B-tree. Because of a low (reserved) value of CNID=4, it has to become the first record in the first leaf node. The resulting first leaf node is correct: ---------------------------------------------------- | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... | ---------------------------------------------------- But the parent index key0 still contains the previous key CNID=123: ----------------------- | key0.CNID=123 | ... | ----------------------- A change in hfs_brec_insert() makes hfs_brec_update_parent() work correctly by preventing it from getting fd->record=-1 value from __hfs_brec_find(). Along the way, I removed duplicate code with unification of the if condition. The resulting code is equivalent to the original code because node is never 0. Also hfs_brec_update_parent() will now return an error after getting a negative fd->record value. However, the return value of hfs_brec_update_parent() is not checked anywhere in the file and I'm leaving it unchanged by this patch. brec.c lacks error checking after some other calls too, but this issue is of less importance than the one being fixed by this patch. Cc: stable@vger.kernel.org Cc: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29posix-timers: Fix stack info leak in timer_create()Mathias Krause1-0/+1
commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream. If userland creates a timer without specifying a sigevent info, we'll create one ourself, using a stack local variable. Particularly will we use the timer ID as sival_int. But as sigev_value is a union containing a pointer and an int, that assignment will only partially initialize sigev_value on systems where the size of a pointer is bigger than the size of an int. On such systems we'll copy the uninitialized stack bytes from the timer_create() call to userland when the timer actually fires and we're going to deliver the signal. Initialize sigev_value with 0 to plug the stack info leak. Found in the PaX patch, written by the PaX Team. Fixes: 5a9fa7307285 ("posix-timers: kill ->it_sigev_signo and...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Cc: PaX Team <pageexec@freemail.hu> Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3cd3a349aa3519b88d29845c0bc36bcbae158e93) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29scsi: Fix error handling in SCSI_IOCTL_SEND_COMMANDJan Kara1-2/+1
commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream. When sg_scsi_ioctl() fails to prepare request to submit in blk_rq_map_kern() we jump to a label where we just end up copying (luckily zeroed-out) kernel buffer to userspace instead of reporting error. Fix the problem by jumping to the right label. CC: Jens Axboe <axboe@kernel.dk> CC: linux-scsi@vger.kernel.org Coverity-id: 1226871 Signed-off-by: Jan Kara <jack@suse.cz> Fixed up the, now unused, out label. Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit d73b032b63e8967462e1cf5763858ed89e97880f) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29lockd: Try to reconnect if statd has movedBenjamin Coddington1-0/+6
commit 173b3afceebe76fa2205b2c8808682d5b541fe3c upstream. If rpc.statd is restarted, upcalls to monitor hosts can fail with ECONNREFUSED. In that case force a lookup of statd's new port and retry the upcall. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [bwh: Backported to 3.2: not using RPC_TASK_SOFTCONN] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3aabe891f32c209a2be7cd5581d2634020e801c1) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29pagemap: do not leak physical addresses to non-privileged userspaceKirill A. Shutemov1-0/+10
commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream. As pointed by recent post[1] on exploiting DRAM physical imperfection, /proc/PID/pagemap exposes sensitive information which can be used to do attacks. This disallows anybody without CAP_SYS_ADMIN to read the pagemap. [1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html [ Eventually we might want to do anything more finegrained, but for now this is the simple model. - Linus ] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Seaborn <mseaborn@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [mancha security: Backported to 3.10] Signed-off-by: mancha security <mancha1@zoho.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 1ffc3cd9a36b504c20ce98fe5eeb5463f389e1ac) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ipv4: fix nexthop attlen check in fib_nh_matchJiri Pirko1-1/+1
commit f76936d07c4eeb36d8dbb64ebd30ab46ff85d9f7 upstream. fib_nh_match does not match nexthops correctly. Example: ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \ nexthop via 192.168.122.13 dev eth0 ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \ nexthop via 192.168.122.15 dev eth0 Del command is successful and route is removed. After this patch applied, the route is correctly matched and result is: RTNETLINK answers: No such process Please consider this for stable trees as well. Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 0aba46add2915b344580569e87d9c41274b9c475) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ipvs: uninitialized data with IP_VS_IPV6Dan Carpenter1-5/+5
commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream. The app_tcp_pkt_out() function expects "*diff" to be set and ends up using uninitialized data if CONFIG_IP_VS_IPV6 is turned on. The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov for noticing that. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit 0ce625baeec39e813bef9b073f0214b513b2ef2d) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29IB/core: Avoid leakage from kernel to user spaceEli Cohen1-0/+1
commit 377b513485fd885dea1083a9a5430df65b35e048 upstream. Clear the reserved field of struct ib_uverbs_async_event_desc which is copied to user space. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Yann Droneaud <ydroneaud@opteya.com> (cherry picked from commit 852acc0151014ba9731e2f5f2f3df3b6a8960d40) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29spi: spidev: fix possible arithmetic overflow for multi-transfer messageIan Abbott1-1/+4
commit f20fbaad7620af2df36a1f9d1c9ecf48ead5b747 upstream. `spidev_message()` sums the lengths of the individual SPI transfers to determine the overall SPI message length. It restricts the total length, returning an error if too long, but it does not check for arithmetic overflow. For example, if the SPI message consisted of two transfers and the first has a length of 10 and the second has a length of (__u32)(-1), the total length would be seen as 9, even though the second transfer is actually very long. If the second transfer specifies a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could overrun the spidev's pre-allocated tx buffer before it reaches an invalid user memory address. Fix it by checking that neither the total nor the individual transfer lengths exceed the maximum allowed value. Thanks to Dan Carpenter for reporting the potential integer overflow. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org> [Ian Abbott: Note: original commit compares the lengths to INT_MAX instead of bufsiz due to changes in earlier commits.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 7499401e4a0b01ee43cff768de4ca630dcd0bc64) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29tcp: avoid looping in tcp_send_fin()Eric Dumazet1-19/+26
[ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ] Presence of an unbound loop in tcp_send_fin() had always been hard to explain when analyzing crash dumps involving gigantic dying processes with millions of sockets. Lets try a different strategy : In case of memory pressure, try to add the FIN flag to last packet in write queue, even if packet was already sent. TCP stack will be able to deliver this FIN after a timeout event. Note that this FIN being delivered by a retransmit, it also carries a Push flag given our current implementation. By checking sk_under_memory_pressure(), we anticipate that cooking many FIN packets might deplete tcp memory. In the case we could not allocate a packet, even with __GFP_WAIT allocation, then not sending a FIN seems quite reasonable if it allows to get rid of this socket, free memory, and not block the process from eventually doing other useful work. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Drop inapplicable change to sk_forced_wmem_schedule() - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 82241580d7734af2207ad0bb1720904f569dac3a) [wt: backported to 2.6.32: s/TCPHDR_FIN/TCPCB_FLAG_FIN/] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ip_forward: Drop frames with attached skb->skSebastian Pöhn1-0/+3
[ Upstream commit 2ab957492d13bb819400ac29ae55911d50a82a13 ] Initial discussion was: [FYI] xfrm: Don't lookup sk_policy for timewait sockets Forwarded frames should not have a socket attached. Especially tw sockets will lead to panics later-on in the stack. This was observed with TPROXY assigning a tw socket and broken policy routing (misconfigured). As a result frame enters forwarding path instead of input. We cannot solve this in TPROXY as it cannot know that policy routing is broken. v2: Remove useless comment Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit fccb908d23fbae3e941e9294590dd94de6b1d822) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29tcp: make connect() mem charging friendlyEric Dumazet1-5/+2
[ Upstream commit 355a901e6cf1b2b763ec85caa2a9f04fbcc4ab4a ] While working on sk_forward_alloc problems reported by Denys Fedoryshchenko, we found that tcp connect() (and fastopen) do not call sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so sk_forward_alloc is negative while connect is in progress. We can fix this by calling regular sk_stream_alloc_skb() both for the SYN packet (in tcp_connect()) and the syn_data packet in tcp_send_syn_data() Then, tcp_send_syn_data() can avoid copying syn_data as we simply can manipulate syn_data->cb[] to remove SYN flag (and increment seq) Instead of open coding memcpy_fromiovecend(), simply use this helper. This leaves in socket write queue clean fast clone skbs. This was tested against our fastopen packetdrill tests. Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Drop the Fast Open changes - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3e2eb8946907b2d53eb906e13e01d273c6534f5c) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()Al Viro1-1/+1
[ Upstream commit 7d985ed1dca5c90535d67ce92ef6ca520302340a ] [I would really like an ACK on that one from dhowells; it appears to be quite straightforward, but...] MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit in there. It gets passed via flags; in fact, another such check in the same function is done correctly - as flags & MSG_PEEK. It had been that way (effectively disabled) for 8 years, though, so the patch needs beating up - that case had never been tested. If it is correct, it's -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 10c82cd7d46e4c525b046c399fcd285ce138198e) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29rds: avoid potential stack overflowArnd Bergmann1-18/+22
[ Upstream commit f862e07cf95d5b62a5fc5e981dd7d0dbaf33a501 ] The rds_iw_update_cm_id function stores a large 'struct rds_sock' object on the stack in order to pass a pair of addresses. This happens to just fit withint the 1024 byte stack size warning limit on x86, but just exceed that limit on ARM, which gives us this warning: net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] As the use of this large variable is basically bogus, we can rearrange the code to not do that. Instead of passing an rds socket into rds_iw_get_device, we now just pass the two addresses that we have available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly, to create two address structures on the stack there. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3fe2d645fe4ea7ff6cba9020685e46c1a1dff9c0) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: sysctl_net_core: check SNDBUF and RCVBUF for min lengthAlexey Kodanev1-5/+6
[ Upstream commit b1cb59cf2efe7971d3d72a7b963d09a512d994c9 ] sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be set to incorrect values. Given that 'struct sk_buff' allocates from rcvbuf, incorrectly set buffer length could result to memory allocation failures. For example, set them as follows: # sysctl net.core.rmem_default=64 net.core.wmem_default = 64 # sysctl net.core.wmem_default=64 net.core.wmem_default = 64 # ping localhost -s 1024 -i 0 > /dev/null This could result to the following failure: skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32 head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL> kernel BUG at net/core/skbuff.c:102! invalid opcode: 0000 [#1] SMP ... task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000 RIP: 0010:[<ffffffff8155fbd1>] [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP: 0018:ffff88003ae8bc68 EFLAGS: 00010296 RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000 RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8 RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900 FS: 00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0 Stack: ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00 Call Trace: [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470 [<ffffffff81556f56>] sock_write_iter+0x146/0x160 [<ffffffff811d9612>] new_sync_write+0x92/0xd0 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180 [<ffffffff811da499>] SyS_write+0x59/0xd0 [<ffffffff81651532>] system_call_fastpath+0x12/0x17 Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 RIP [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP <ffff88003ae8bc68> Kernel panic - not syncing: Fatal exception Moreover, the possible minimum is 1, so we can get another kernel panic: ... BUG: unable to handle kernel paging request at ffff88013caee5c0 IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0 ... Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: delete now-unused 'one' variable] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 2d6dfb109bfbf3abd5f762173b1d73fd321dbe37) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: avoid to hang up on sending due to sysctl configuration overflow.bingtian.ly@taobao.com2-8/+23
commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream. I found if we write a larger than 4GB value to some sysctl variables, the sending syscall will hang up forever, because these variables are 32 bits, such large values make them overflow to 0 or negative. This patch try to fix overflow or prevent from zero value setup of below sysctl variables: net.core.wmem_default net.core.rmem_default net.core.rmem_max net.core.wmem_max net.ipv4.udp_rmem_min net.ipv4.udp_wmem_min net.ipv4.tcp_wmem net.ipv4.tcp_rmem Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Li Yu <raise.sail@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Adjust context - Delete now-unused 'zero' variable] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 98eee187cdee2807bd80e6c02180c5c2abae6453) [wt: backported to 2.6.32: set strategy to sysctl_intvec where relevant] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29udp: only allow UFO for packets from SOCK_DGRAM socketsMichal Kubeček2-3/+4
[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ] If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer checksum is to be computed on segmentation. However, in this case, skb->csum_start and skb->csum_offset are never set as raw socket transmit path bypasses udp_send_skb() where they are usually set. As a result, driver may access invalid memory when trying to calculate the checksum and store the result (as observed in virtio_net driver). Moreover, the very idea of modifying the userspace provided UDP header is IMHO against raw socket semantics (I wasn't able to find a document clearly stating this or the opposite, though). And while allowing CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit too intrusive change just to handle a corner case like this. Therefore disallowing UFO for packets from SOCK_DGRAM seems to be the best option. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 332640b2821f75381b1049a904d93d4fb846334f) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ipv4: Don't use ufo handling on later transformed packetsSteffen Klassert1-0/+1
We might call ip_ufo_append_data() for packets that will be IPsec transformed later. This function should be used just for real udp packets. So we check for rt->dst.header_len which is only nonzero on IPsec handling and call ip_ufo_append_data() just if rt->dst.header_len is zero. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit c146066ab80267c3305de5dda6a4083f06df9265) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: reject creation of netdev names with colonsMatthew Thode1-1/+1
[ Upstream commit a4176a9391868bfa87705bcd2e3b49e9b9dd2996 ] colons are used as a separator in netdev device lookup in dev_ioctl.c Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME Signed-off-by: Matthew Thode <mthode@mthode.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit d501ebeb7da7531e92e3c8d194730341c314ff2d) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ematch: Fix auto-loading of ematch modules.Ignacy Gawędzki1-0/+1
[ Upstream commit 34eea79e2664b314cab6a30fc582fdfa7a1bb1df ] In tcf_em_validate(), after calling request_module() to load the kind-specific module, set em->ops to NULL before returning -EAGAIN, so that module_put() is not called again by tcf_em_tree_destroy(). Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 9405be73262273a6904688fb2f1e60a13117cf1b) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ppp: deflate: never return len larger than output bufferFlorian Westphal1-1/+1
[ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ] When we've run out of space in the output buffer to store more data, we will call zlib_deflate with a NULL output buffer until we've consumed remaining input. When this happens, olen contains the size the output buffer would have consumed iff we'd have had enough room. This can later cause skb_over_panic when ppp_generic skb_put()s the returned length. Reported-by: Iain Douglas <centos@1n6.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 8bcd64423836bad3638684677f6d740bc7c9297f) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr ↵Ani Sinha1-0/+3
struct from userland. commit 6a2a2b3ae0759843b22c929881cc184b00cc63ff upstream. Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will break old binaries and any code for which there is no access to source code. To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland. Signed-off-by: Ani Sinha <ani@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit d29f1f53e5299e0bbb3e33ef8d35ed657fa633b6) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29fs: take i_mutex during prepare_binprm for set[ug]id executablesJann Horn1-25/+40
commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream. This prevents a race between chown() and execve(), where chowning a setuid-user binary to root would momentarily make the binary setuid root. This patch was mostly written by Linus Torvalds. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Drop the task_no_new_privs() and user namespace checks - Open-code file_inode() - s/READ_ONCE/ACCESS_ONCE/ - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 470e517be17dd6ef8670bec7bd7831ea0d3ad8a6) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ipv6: Don't reduce hop limit for an interfaceD.S. Ljungmark1-1/+8
commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a upstream. A local route may have a lower hop_limit set than global routes do. RFC 3756, Section 4.2.7, "Parameter Spoofing" > 1. The attacker includes a Current Hop Limit of one or another small > number which the attacker knows will cause legitimate packets to > be dropped before they reach their destination. > As an example, one possible approach to mitigate this threat is to > ignore very small hop limits. The nodes could implement a > configurable minimum hop limit, and ignore attempts to set it below > said limit. Signed-off-by: D.S. Ljungmark <ljungmark@modio.se> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust ND_PRINTK() usage] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit f10f7d2a8200fe33c5030c7e32df3a2b3561f3cd) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: rds: use correct size for max unacked packets and bytesSasha Levin1-2/+2
commit db27ebb111e9f69efece08e4cb6a34ff980f8896 upstream. Max unacked packets/bytes is an int while sizeof(long) was used in the sysctl table. This means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3760b67b3e419b9ac42a45417491360a14a35357) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: llc: use correct size for sysctl timeout entriesSasha Levin1-4/+4
commit 6b8d9117ccb4f81b1244aafa7bc70ef8fa45fc49 upstream. The timeout entries are sizeof(int) rather than sizeof(long), which means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 88fe14be08a475ad0eea4ca7c51f32437baf41af) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29IB/uverbs: Prevent integer overflow in ib_umem_get address arithmeticShachar Raindel1-0/+8
commit 8494057ab5e40df590ef6ef7d66324d3ae33356b upstream. Properly verify that the resulting page aligned end address is larger than both the start address and the length of the memory area requested. Both the start and length arguments for ib_umem_get are controlled by the user. A misbehaving user can provide values which will cause an integer overflow when calculating the page aligned end address. This overflow can cause also miscalculation of the number of pages mapped, and additional logic issues. Addresses: CVE-2014-8159 Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 485f16b743d98527620396639b73d7214006f3c7) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: sctp: fix slab corruption from use after free on INIT collisionsDaniel Borkmann1-1/+0
commit 600ddd6825543962fb807884169e57b580dba208 upstream When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit 1be9a950c646 ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit 1be9a950c646 ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0 [ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a76 ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29net: sctp: fix memory leak in auth key managementDaniel Borkmann1-2/+0
commit 4184b2a79a7612a9272ce20d639934584a1f3786 upstream. A very minimal and simple user space application allocating an SCTP socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing the socket again will leak the memory containing the authentication key from user space: unreferenced object 0xffff8800837047c0 (size 16): comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s) hex dump (first 16 bytes): 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811c88d8>] __kmalloc+0xe8/0x270 [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp] [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp] [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp] [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20 [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0 [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff This is bad because of two things, we can bring down a machine from user space when auth_enable=1, but also we would leave security sensitive keying material in memory without clearing it after use. The issue is that sctp_auth_create_key() already sets the refcount to 1, but after allocation sctp_auth_set_key() does an additional refcount on it, and thus leaving it around when we free the socket. Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3af10169145c8eed7b3591c0644da4298405efbc) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29isofs: Fix unchecked printing of ER recordsJan Kara1-0/+3
commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream We didn't check length of rock ridge ER records before printing them. Thus corrupted isofs image can cause us to access and print some memory behind the buffer with obvious consequences. Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29isofs: Fix infinite looping over CE entriesJan Kara1-0/+6
commit f54e18f1b831c92f6512d2eedb224cd63d607d3d upstream Rock Ridge extensions define so called Continuation Entries (CE) which define where is further space with Rock Ridge data. Corrupted isofs image can contain arbitrarily long chain of these, including a one containing loop and thus causing kernel to end in an infinite loop when traversing these entries. Limit the traversal to 32 entries which should be more than enough space to store all the Rock Ridge data. Reported-by: P J P <ppandit@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29netfilter: conntrack: disable generic tracking for known protocolsFlorian Westphal1-1/+25
commit db29a9508a9246e77087c5531e45b2c88ec6988b upstream Given following iptables ruleset: -P FORWARD DROP -A FORWARD -m sctp --dport 9 -j ACCEPT -A FORWARD -p tcp --dport 80 -j ACCEPT -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT One would assume that this allows SCTP on port 9 and TCP on port 80. Unfortunately, if the SCTP conntrack module is not loaded, this allows *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT, which we think is a security issue. This is because on the first SCTP packet on port 9, we create a dummy "generic l4" conntrack entry without any port information (since conntrack doesn't know how to extract this information). All subsequent packets that are unknown will then be in established state since they will fallback to proto_generic and will match the 'generic' entry. Our originally proposed version [1] completely disabled generic protocol tracking, but Jozsef suggests to not track protocols for which a more suitable helper is available, hence we now mitigate the issue for in tree known ct protocol helpers only, so that at least NAT and direction information will still be preserved for others. [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html Joint work with Daniel Borkmann. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29splice: Apply generic position and size checks to each writeBen Hutchings2-4/+12
We need to check the position and size of file writes against various limits, using generic_write_check(). This was not being done for the splice write path. It was fixed upstream by commit 8d0207652cbe ("->splice_write() via ->write_iter()") but we can't apply that. CVE-2014-7822 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29serial: samsung: wait for transfer completion before clock disableRobert Baldyga1-0/+4
This patch adds waiting until transmit buffer and shifter will be empty before clock disabling. Without this fix it's possible to have clock disabled while data was not transmited yet, which causes unproper state of TX line and problems in following data transfers. Cc: stable@vger.kernel.org Signed-off-by: Robert Baldyga <r.baldyga@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 1ff383a4c3eda8893ec61b02831826e1b1f46b41) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86: Conditionally update time when ack-ing pending irqsShai Fultheim1-5/+7
commit 42fa4250436304d4650fa271f37671f6cee24e08 upstream. On virtual environments, apic_read could take a long time. As a result, under certain conditions the ack pending loop may exit without any queued irqs left, but after more than one second. A warning will be printed needlessly in this case. If the loop is about to exit regardless of max_loops, don't update it. Signed-off-by: Shai Fultheim <shai@scalemp.com> [ rebased and reworded the commit message] Signed-off-by: Ido Yariv <ido@wizery.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit c9f1417be9acae3a9867f8bdab2b7924d76cf6ac) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimizationAndy Lutomirski1-5/+8
commit 956421fbb74c3a6261903f3836c0740187cf038b upstream. 'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and the related state make sense for 'ret_from_sys_call'. This is entirely the wrong check. TS_COMPAT would make a little more sense, but there's really no point in keeping this optimization at all. This fixes a return to the wrong user CS if we came from int 0x80 in a 64-bit task. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net [ Backported from tip:x86/asm. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 159891c0953a89a28f793fc52373b031262c44d2) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86, cpu, amd: Add workaround for family 16h, erratum 793Borislav Petkov2-0/+11
commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: - Adjust filename - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to avoid crashing on KVM. This was fixed upstream by commit 8f86a7373a1c ("x86, AMD: Convert to the new bit access MSR accessors") but that's too much trouble to backport. Here we must use {rd,wr}msrl_amd_safe().] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29ASLR: fix stack randomization on 64-bit systemsHector Marco-Gisbert2-5/+6
commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. CVE-2015-1593 Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [kees: rebase, fix 80 char, clean up commit message, add test example, cve] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86_64, vdso: Fix the vdso address randomization algorithmAndy Lutomirski1-10/+26
commit 394f56fe480140877304d342dec46d50dc823d46 upstream The theory behind vdso randomization is that it's mapped at a random offset above the top of the stack. To avoid wasting a page of memory for an extra page table, the vdso isn't supposed to extend past the lowest PMD into which it can fit. Other than that, the address should be a uniformly distributed address that meets all of the alignment requirements. The current algorithm is buggy: the vdso has about a 50% probability of being at the very end of a PMD. The current algorithm also has a decent chance of failing outright due to incorrect handling of the case where the top of the stack is near the top of its PMD. This fixes the implementation. The paxtest estimate of vdso "randomisation" improves from 11 bits to 18 bits. (Disclaimer: I don't know what the paxtest code is actually calculating.) It's worth noting that this algorithm is inherently biased: the vdso is more likely to end up near the end of its PMD than near the beginning. Ideally we would either nix the PMD sharing requirement or jointly randomize the vdso and the stack to reduce the bias. In the mean time, this is a considerable improvement with basically no risk of compatibility issues, since the allowed outputs of the algorithm are unchanged. As an easy test, doing this: for i in `seq 10000` do grep -P vdso /proc/self/maps |cut -d- -f1 done |sort |uniq -d used to produce lots of output (1445 lines on my most recent run). A tiny subset looks like this: 7fffdfffe000 7fffe01fe000 7fffe05fe000 7fffe07fe000 7fffe09fe000 7fffe0bfe000 7fffe0dfe000 Note the suspicious fe000 endings. With the fix, I get a much more palatable 76 repeated addresses. Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> [bwh: Backported to 2.6.32: - The whole file is only built for x86_64; adjust context and comment for this - We don't have align_vdso_addr()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefitAndy Lutomirski2-2/+8
commit 29fa6825463c97e5157284db80107d1bfac5d77b upstream paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [bwh: Backported to 2.6.32: adjust indentation, context] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86/tls: Don't validate lm in set_thread_area() after allAndy Lutomirski2-6/+7
commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream. It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit c759a579c902167d656ee303d518cb5eed2af278) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86/tls: Disallow unusual TLS segmentsAndy Lutomirski1-0/+22
commit 0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream. Users have no business installing custom code segments into the GDT, and segments that are not present but are otherwise valid are a historical source of interesting attacks. For completeness, block attempts to set the L bit. (Prior to this patch, the L bit would have been silently dropped.) This is an ABI break. I've checked glibc, musl, and Wine, and none of them look like they'll have any trouble. Note to stable maintainers: this is a hardening patch that fixes no known bugs. Given the possibility of ABI issues, this probably shouldn't be backported quickly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: security@kernel.org <security@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit fbc3c534ddffeebba6f943945ac71ec83cfa04b8) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86, tls: Interpret an all-zero struct user_desc as "no segment"Andy Lutomirski2-2/+36
commit 3669ef9fa7d35f573ec9c0e0341b29251c2734a7 upstream. The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't *quite* what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 3175b4cb1aa4b1430fada4679be4598f6eb8872b) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86, tls, ldt: Stop checking lm in LDT_emptyAndy Lutomirski1-7/+2
commit e30ab185c490e9a9381385529e0fd32f0a399495 upstream. 32-bit programs don't have an lm bit in their ABI, so they can't reliably cause LDT_empty to return true without resorting to memset. They shouldn't need to do this. This should fix a longstanding, if minor, issue in all 64-bit kernels as well as a potential regression in the TLS hardening code. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit f62570cbdcb6dd53d0e2361488f9ea2c4cf17ec9) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86/tls: Validate TLS entries to protect espfixAndy Lutomirski1-0/+23
commit 41bdc78544b8a93a9c6814b8bbbfef966272abbe upstream Installing a 16-bit RW data segment into the GDT defeats espfix. AFAICT this will not affect glibc, Wine, or dosemu at all. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: security@kernel.org <security@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2015-05-29x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regsAndy Lutomirski1-2/+2
commit 7ddc6a2199f1da405a2fb68c40db8899b1a8cd87 upstream. These functions can be executed on the int3 stack, so kprobes are dangerous. Tracing is probably a bad idea, too. Fixes: b645af2d5905 ("x86_64, traps: Rework bad_iret") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: - Use __kprobes instead of NOKPROBE_SYMBOL() - Don't use __visible] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 8ea4c465ecb59846abed3d000d64b21b8e31aeb0) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15Linux 2.6.32.65v2.6.32.65.26Willy Tarreau1-1/+1
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15proc connector: Delete spurious memset in proc_exit_connector()Ben Hutchings1-1/+0
Upstream commit e727ca82e0e9 ("proc connector: fix info leaks") changed many functions that don't exist in 2.6.32.y. When it was cherry-picked into 2.6.32.61, one extra memset() calls was inserted into proc_exit_connector(). This results in clearing the cpu field of exit events. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15cciss: Fix misapplied "cciss: fix info leak in cciss_ioctl32_passthru()"Ben Hutchings1-1/+1
Upstream commit 58f09e00ae09 was applied to the wrong function when cherry-picked for 2.6.32.61. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15block: Fix blk_execute_rq_nowait() dead queue handlingMuthukumar Ratty1-3/+8
commit e81ca6fe85b77109a32489a5db82f575d51dfc98 upstream. If the queue is dead blk_execute_rq_nowait() doesn't invoke the done() callback function. That will result in blk_execute_rq() being stuck in wait_for_completion(). Avoid this by initializing rq->end_io to the done() callback before we check the queue state. Also, make sure the queue lock is held around the invocation of the done() callback. Found this through source code review. Signed-off-by: Muthukumar Ratty <muthur@gmail.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15block: add missing blk_queue_dead() checksTejun Heo2-2/+8
commit 8ba61435d73f2274e12d4d823fde06735e8f6a54 upstream. blk_insert_cloned_request(), blk_execute_rq_nowait() and blk_flush_plug_list() either didn't check whether the queue was dead or did it without holding queue_lock. Update them so that dead state is checked while holding queue_lock. AFAICS, this plugs all holes (requeue doesn't matter as the request is transitioning atomically from in_flight to queued). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> [bwh: Backported to 2.6.32: - Drop inapplicable changes to queue_unplugged() and blk_flush_plug_list() - We don't have blk_queue_dead() so open-code it - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15md/raid6: Fix misapplied backport in 2.6.32.64Ben Hutchings1-2/+2
Upstream commit 9c4bdf697c39 ("md/raid6: avoid data corruption during recovery of double-degraded RAID6") changes handle_stripe(), but we have separate functions for RAID5 and RAID6 and need to apply the change to handle_stripe6(). When cherry-picked, the change was wrongly applied to handle_stripe5(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15sctp: Fix double-free introduced by bad backport in 2.6.32.62Ben Hutchings1-1/+0
One deletion was omitted from the backport of upstream commit c485658bae87 ("net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk"). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15vlan: Don't propagate flag changes on down interfaces.Matthijs Kooijman1-4/+6
commit deede2fabe24e00bd7e246eb81cd5767dc6fcfc7 upstream. When (de)configuring a vlan interface, the IFF_ALLMULTI ans IFF_PROMISC flags are cleared or set on the underlying interface. So, if these flags are changed on a vlan interface that is not up, the flags underlying interface might be set or cleared twice. Only propagating flag changes when a device is up makes sure this does not happen. It also makes sure that an underlying device is not set to promiscuous or allmulti mode for a vlan device that is down. Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: This is a dependency of commit d2615bf45069 ("net: core: Always propagate flag changes to interfaces"), already backported in 2.6.32.62] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15ttusb-dec: buffer overflow in ioctlDan Carpenter1-0/+3
commit dc0ab1ddeb0c5f5eb3f37a72eadb394792b3c40d upstream We need to add a limit check here so we don't overflow the buffer. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> (backported from commit f2e323ec96077642d397bb1c355def536d489d16) CVE-2014-8884 BugLink: http://bugs.launchpad.net/bugs/1395187 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15mac80211: fix fragmentation code, particularly for encryptionJohannes Berg1-1/+1
commit a722a419815ed203b519151f9556859ff256638b upstream The "new" fragmentation code (since my rewrite almost 5 years ago) erroneously sets skb->len rather than using skb_trim() to adjust the length of the first fragment after copying out all the others. This leaves the skb tail pointer pointing to after where the data originally ended, and thus causes the encryption MIC to be written at that point, rather than where it belongs: immediately after the data. The impact of this is that if software encryption is done, then a) encryption doesn't work for the first fragment, the connection becomes unusable as the first fragment will never be properly verified at the receiver, the MIC is practically guaranteed to be wrong b) we leak up to 8 bytes of plaintext (!) of the packet out into the air This is only mitigated by the fact that many devices are capable of doing encryption in hardware, in which case this can't happen as the tail pointer is irrelevant in that case. Additionally, fragmentation is not used very frequently and would normally have to be configured manually. Fix this by using skb_trim() properly. Cc: stable@vger.kernel.org Fixes: 2de8e0d999b8 ("mac80211: rewrite fragmentation") Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com> (backported from commit 338f977f4eb441e69bb9a46eaa0ac715c931a67f) CVE-2014-8709 BugLink: http://bugs.launchpad.net/bugs/1392013 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed ↵Daniel Borkmann1-0/+3
packet commit a6e1af6f5e9e75095ae1b004f6f00082e5fc4d2d upstream An SCTP server doing ASCONF will panic on malformed INIT ping-of-death in the form of: ------------ INIT[PARAM: SET_PRIMARY_IP] ------------> While the INIT chunk parameter verification dissects through many things in order to detect malformed input, it misses to actually check parameters inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary IP address' parameter in ASCONF, which has as a subparameter an address parameter. So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0 and thus sctp_get_af_specific() returns NULL, too, which we then happily dereference unconditionally through af->from_addr_param(). The trace for the log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] PGD 0 Oops: 0000 [#1] SMP [...] Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp] [...] Call Trace: <IRQ> [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp] [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp] [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [...] A minimal way to address this is to check for NULL as we do on all other such occasions where we know sctp_get_af_specific() could possibly return with NULL. Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit e40607cbe270a9e8360907cb1e62ddf0736e4864) CVE-2014-7841 BugLink: http://bugs.launchpad.net/bugs/1392820 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15udf: Avoid infinite loop when processing indirect ICBsJan Kara1-14/+21
commit 541d302ee5c46336cbad333222bc278b76cc1c42 upstream We did not implement any bound on number of indirect ICBs we follow when loading inode. Thus corrupted medium could cause kernel to go into an infinite loop, possibly causing a stack overflow. Fix the possible stack overflow by removing recursion from __udf_read_inode() and limit number of indirect ICBs we follow to avoid infinite loops. Signed-off-by: Jan Kara <jack@suse.cz> (back ported from commit c03aa9f6e1f938618e6db2e23afef0574efeeb65) [ luis: adjusted context and replaced udf_err() by printk() ] CVE-2014-6410 BugLink: http://bugs.launchpad.net/bugs/1370042 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15net: sctp: fix remote memory pressure from excessive queueingDaniel Borkmann2-26/+10
commit b2d33ef40de7161c23032106284959ae75bdf3cd upstream This scenario is not limited to ASCONF, just taken as one example triggering the issue. When receiving ASCONF probes in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------> [...] ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------> ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed ASCONFs and have increasing serial numbers, we process such ASCONF chunk(s) marked with !end_of_packet and !singleton, since we have not yet reached the SCTP packet end. SCTP does only do verification on a chunk by chunk basis, as an SCTP packet is nothing more than just a container of a stream of chunks which it eats up one by one. We could run into the case that we receive a packet with a malformed tail, above marked as trailing JUNK. All previous chunks are here goodformed, so the stack will eat up all previous chunks up to this point. In case JUNK does not fit into a chunk header and there are no more other chunks in the input queue, or in case JUNK contains a garbage chunk header, but the encoded chunk length would exceed the skb tail, or we came here from an entirely different scenario and the chunk has pdiscard=1 mark (without having had a flush point), it will happen, that we will excessively queue up the association's output queue (a correct final chunk may then turn it into a response flood when flushing the queue ;)): I ran a simple script with incremental ASCONF serial numbers and could see the server side consuming excessive amount of RAM [before/after: up to 2GB and more]. The issue at heart is that the chunk train basically ends with !end_of_packet and !singleton markers and since commit 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") therefore preventing an output queue flush point in sctp_do_sm() -> sctp_cmd_interpreter() on the input chunk (chunk = event_arg) even though local_cork is set, but its precedence has changed since then. In the normal case, the last chunk with end_of_packet=1 would trigger the queue flush to accommodate possible outgoing bundling. In the input queue, sctp_inq_pop() seems to do the right thing in terms of discarding invalid chunks. So, above JUNK will not enter the state machine and instead be released and exit the sctp_assoc_bh_rcv() chunk processing loop. It's simply the flush point being missing at loop exit. Adding a try-flush approach on the output queue might not work as the underlying infrastructure might be long gone at this point due to the side-effect interpreter run. One possibility, albeit a bit of a kludge, would be to defer invalid chunk freeing into the state machine in order to possibly trigger packet discards and thus indirectly a queue flush on error. It would surely be better to discard chunks as in the current, perhaps better controlled environment, but going back and forth, it's simply architecturally not possible. I tried various trailing JUNK attack cases and it seems to look good now. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 26b87c7881006311828bb0ab271a551a62dcceb4) CVE-2014-3688 BugLink: http://bugs.launchpad.net/bugs/1386393 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15net: sctp: fix panic on duplicate ASCONF chunksDaniel Borkmann2-0/+7
commit 12b18fdbbda747ad22a7684413a6559b681e503c upstream When receiving a e.g. semi-good formed connection scan in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---------------- ASCONF_a; ASCONF_b -----------------> ... where ASCONF_a equals ASCONF_b chunk (at least both serials need to be equal), we panic an SCTP server! The problem is that good-formed ASCONF chunks that we reply with ASCONF_ACK chunks are cached per serial. Thus, when we receive a same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do not need to process them again on the server side (that was the idea, also proposed in the RFC). Instead, we know it was cached and we just resend the cached chunk instead. So far, so good. Where things get nasty is in SCTP's side effect interpreter, that is, sctp_cmd_interpreter(): While incoming ASCONF_a (chunk = event_arg) is being marked !end_of_packet and !singleton, and we have an association context, we do not flush the outqueue the first time after processing the ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it queued up, although we set local_cork to 1. Commit 2e3216cd54b1 changed the precedence, so that as long as we get bundled, incoming chunks we try possible bundling on outgoing queue as well. Before this commit, we would just flush the output queue. Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we continue to process the same ASCONF_b chunk from the packet. As we have cached the previous ASCONF_ACK, we find it, grab it and do another SCTP_CMD_REPLY command on it. So, effectively, we rip the chunk->list pointers and requeue the same ASCONF_ACK chunk another time. Since we process ASCONF_b, it's correctly marked with end_of_packet and we enforce an uncork, and thus flush, thus crashing the kernel. Fix it by testing if the ASCONF_ACK is currently pending and if that is the case, do not requeue it. When flushing the output queue we may relink the chunk for preparing an outgoing packet, but eventually unlink it when it's copied into the skb right before transmission. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit b69040d8e39f20d5215a03502a8e8b4c6ab78395) CVE-2014-3687 BugLink: http://bugs.launchpad.net/bugs/1386392 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15USB: whiteheat: Added bounds checking for bulk command responseJames Forshaw1-1/+6
commit c5fd4126151855330280ea9382684980afcfdd03 upstream This patch fixes a potential security issue in the whiteheat USB driver which might allow a local attacker to cause kernel memory corrpution. This is due to an unchecked memcpy into a fixed size buffer (of 64 bytes). On EHCI and XHCI busses it's possible to craft responses greater than 64 bytes leading a buffer overflow. Signed-off-by: James Forshaw <forshaw@google.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (backported from commit 6817ae225cd650fb1c3295d769298c38b1eba818) CVE-2014-3185 BugLink: http://bugs.launchpad.net/bugs/1370036 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15ALSA: control: Fix replacing user controlsLars-Peter Clausen1-14/+9
(commit 82262a46627bebb0febcc26664746c25cef08563 upstream) There are two issues with the current implementation for replacing user controls. The first is that the code does not check if the control is actually a user control and neither does it check if the control is owned by the process that tries to remove it. That allows userspace applications to remove arbitrary controls, which can cause a user after free if a for example a driver does not expect a control to be removed from under its feed. The second issue is that on one hand when a control is replaced the user_ctl_count limit is not checked and on the other hand the user_ctl_count is increased (even though the number of user controls does not change). This allows userspace, once the user_ctl_count limit as been reached, to repeatedly replace a control until user_ctl_count overflows. Once that happens new controls can be added effectively bypassing the user_ctl_count limit. Both issues can be fixed by instead of open-coding the removal of the control that is to be replaced to use snd_ctl_remove_user_ctl(). This function does proper permission checks as well as decrements user_ctl_count after the control has been removed. Note that by using snd_ctl_remove_user_ctl() the check which returns -EBUSY at beginning of the function if the control already exists is removed. This is not a problem though since the check is quite useless, because the lock that is protecting the control list is released between the check and before adding the new control to the list, which means that it is possible that a different control with the same settings is added to the list after the check. Luckily there is another check that is done while holding the lock in snd_ctl_add(), so we'll rely on that to make sure that the same control is not added twice. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> [wt: fixes CVE-2014-4654 & CVE-2014-4655] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15ALSA: control: Don't access controls outside of protected regionsLars-Peter Clausen1-3/+5
(commit fd9f26e4eca5d08a27d12c0933fceef76ed9663d upstream) A control that is visible on the card->controls list can be freed at any time. This means we must not access any of its memory while not holding the controls_rw_lock. Otherwise we risk a use after free access. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> [wt: fixes CVE-2014-4653] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15net/l2tp: don't fall back on UDP [get|set]sockoptSasha Levin1-2/+2
(commit 3cf521f7dc87c031617fd47e4b7aa2593c2f3daf upstream) The l2tp [get|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket. As David Miller points out: "If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be" Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL. Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: James Chapman <jchapman@katalix.com> Acked-by: David Miller <davem@davemloft.net> Cc: Phil Turnbull <phil.turnbull@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [geissert: adjust file paths and context for 2.6.32] [wt: fixes CVE-2014-4943] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86_64, traps: Rework bad_iretAndy Lutomirski2-29/+48
It's possible for iretq to userspace to fail. This can happen because of a bad CS, SS, or RIP. Historically, we've handled it by fixing up an exception from iretq to land at bad_iret, which pretends that the failed iret frame was really the hardware part of #GP(0) from userspace. To make this work, there's an extra fixup to fudge the gs base into a usable state. This is suboptimal because it loses the original exception. It's also buggy because there's no guarantee that we were on the kernel stack to begin with. For example, if the failing iret happened on return from an NMI, then we'll end up executing general_protection on the NMI stack. This is bad for several reasons, the most immediate of which is that general_protection, as a non-paranoid idtentry, will try to deliver signals and/or schedule from the wrong stack. This patch throws out bad_iret entirely. As a replacement, it augments the existing swapgs fudge into a full-blown iret fixup, mostly written in C. It's should be clearer and more correct. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b645af2d5905c4e32399005b867987919cbfc3ae) [wt: notes for backport to 2.6.32: - _ASM_EXTABLE was open-coded. - removed unneeded CFI_ENDPROC - removed __visible (introduced in 2.6.37-rc1, not needed here) /wt] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in CAndy Lutomirski2-32/+26
There's nothing special enough about the espfix64 double fault fixup to justify writing it in assembly. Move it to C. This also fixes a bug: if the double fault came from an IST stack, the old asm code would return to a partially uninitialized stack frame. Fixes: 3891a04aafd668686239349ea58f3314ea2af86b Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit af726f21ed8af2cdaa4e93098dc211521218ae65) [wt: backport notes for 2.6.32 : - Adaptations to entry_64.S in declaration of do_double_fault. - no exception_enter() in 2.6.32. Seems to be only for context tracking. /wt] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86_64, traps: Stop using IST for #SSAndy Lutomirski5-22/+7
On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6f442be2fb22be02cafa606f1769fa1e6f894441) [wt: no CONFIG_TRACING on 2.6.32, Fixes CVE-2014-9090] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86/espfix/xen: Fix allocation of pages for paravirt page tablesBoris Ostrovsky1-3/+2
commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream. init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit 060e7f67c88ebbcf8745505c7ccf44c53601f7de) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86_64/entry/xen: Do not invoke espfix64 on XenAndy Lutomirski3-24/+11
commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream. This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit 8ba19cd8c351e16b6be4caca9338d19b0cb8eaa4) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86, espfix: Make it possible to disable 16-bit supportH. Peter Anvin4-5/+44
commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream. Embedded systems, which may be very memory-size-sensitive, are extremely unlikely to ever encounter any 16-bit software, so make it a CONFIG_EXPERT option to turn off support for any 16-bit software whatsoever. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit 70d87cbbd92a3611655b39003176ee1033796bf7) [wt: backport notes for 2.6.32 : - Fixed arch/x86/kernel/ldt.c (no IS_ENABLED on 2.6.32). - No CONFIG_EXPERT condition in 2.6.32. /wt] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86, espfix: Make espfix64 a Kconfig option, fix UMLH. Peter Anvin4-3/+7
commit 197725de65477bc8509b41388157c1a2283542bb upstream. Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit da22646d97b7322c757f3a7a21805a3475fed231) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86, espfix: Fix broken header guardH. Peter Anvin1-1/+1
commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream. Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit 7d4a9eabfe6c7fed70941aceb3b20bf393652bcb) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86, espfix: Move espfix definitions into a separate header fileH. Peter Anvin3-0/+19
commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream. Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit 62358ee6bb9d3d9dd4761d39ac0e1ede9ba70b0e) [wt: using DECLARE_PER_CPU instead of DECLARE_PER_CPU_READ_MOSTLY] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stackH. Peter Anvin9-26/+319
commit 3891a04aafd668686239349ea58f3314ea2af86b upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit e7836514086d53e0ffaee18d67d85d9477ecdb12) [wt: backport notes for 2.6.32 differences : - use DECLARE_PER_CPU instead of DECLARE_PER_CPU_READ_MOSTLY - replace this_cpu_read(foo) with per_cpu(foo, smp_processor_id()) - replace this_cpu_write(foo,bar) with per_cpu(foo,smp_processor_id())=bar /wt] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86-32, espfix: Remove filter for espfix32 due to raceH. Peter Anvin1-5/+0
commit 246f2d2ee1d715e1077fc47d61c394569c8ee692 upstream. It is not safe to use LAR to filter when to go down the espfix path, because the LDT is per-process (rather than per-thread) and another thread might change the descriptors behind our back. Fortunately it is always *safe* (if a bit slow) to go down the espfix path, and a 32-bit LDT stack segment is extremely rare. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit 6806fa8b6795aba9be8742a8f598f60eed26f875) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernelsH. Peter Anvin1-0/+11
commit b3b42ac2cbae1f3cecbb6229964a4d48af31d382 upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 32-bit mode. Since 16-bit support is somewhat crippled anyway on a 64-bit kernel (no V86 mode), and most (if not quite all) 64-bit processors support virtualization for the users who really need it, simply reject attempts at creating a 16-bit segment when running on top of a 64-bit kernel. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from 3.2 commit a862b5c4076b1ba4dd6c87aebac478853dc6db47) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86-64: Adjust frame type at paranoid_exit:Jan Beulich1-1/+1
As this isn't an exception or interrupt entry point, it doesn't have any of the hardware provide frame layouts active. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> (cherry picked from commit 1f130a783a796f147b080c594488b566c86007d0) [WT: only merged to minimize changes from mainline in entry_64.S] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15x86, 64-bit: Move K8 B step iret fixup to fault entry asmBrian Gerst3-35/+8
Move the handling of truncated %rip from an iret fault to the fault entry path. This allows x86-64 to use the standard search_extable() function. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Beulich <jbeulich@novell.com> LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> (cherry picked from commit ae24ffe5ecec17c956ac25371d7c2e12b4b36e53) [wt: only merged to fix patch context and ease merging of next patches] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-12-15net: sendmsg: fix failed backport of "fix NULL pointer dereference"Willy Tarreau1-1/+1
Luis Henriques reported that while backporting commit 40eea80 ("net: sendmsg: fix NULL pointer dereference") and applying the diff by hand, I made a typo resulting in the same test being done twice, and msg_name not being tested. This fixes cf90357 ("net: sendmsg: fix NULL pointer dereference") which was merged into 2.6.32.64. Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25Linux 2.6.32.64v2.6.32.64.26Willy Tarreau1-1/+1
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probeZhu Yanjun1-5/+9
2.6.x kernels require a similar logic change as commit 2c0d6ac894a [sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe] introduces for newer kernels. Since the transport has always been in state SCTP_UNCONFIRMED, it therefore wasn't active before and hasn't been used before, and it always has been, so it is unnecessary to bug the user with a notification. Reported-by: Deepak Khandelwal <khandelwal.deepak.1987@gmail.com> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Suggested-by: Michael Tuexen <tuexen@fh-muenster.de> Suggested-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25isofs: Fix unbounded recursion when processing relocated directoriesJan Kara3-22/+55
We did not check relocated directory in any way when processing Rock Ridge 'CL' tag. Thus a corrupted isofs image can possibly have a CL entry pointing to another CL entry leading to possibly unbounded recursion in kernel code and thus stack overflow or deadlocks (if there is a loop created from CL entries). Fix the problem by not allowing CL entry to point to a directory entry with CL entry (such use makes no good sense anyway) and by checking whether CL entry doesn't point to itself. CC: stable@vger.kernel.org Reported-by: Chris Evans <cevans@google.com> Signed-off-by: Jan Kara <jack@suse.cz> (cherry picked from commit 410dd3cf4c9b36f27ed4542ee18b1af5e68645a4) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25futex: Unlock hb->lock in futex_wait_requeue_pi() error pathThomas Gleixner1-0/+1
futex_wait_requeue_pi() calls futex_wait_setup(). If futex_wait_setup() succeeds it returns with hb->lock held and preemption disabled. Now the sanity check after this does: if (match_futex(&q.key, &key2)) { ret = -EINVAL; goto out_put_keys; } which releases the keys but does not release hb->lock. So we happily return to user space with hb->lock held and therefor preemption disabled. Unlock hb->lock before taking the exit route. Reported-by: Dave "Trinity" Jones <davej@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> (cherry picked from commit 13c42c2f43b19aab3195f2d357db00d1e885eaa8) [wt: 2.6.32 needs &q as first argument of queue_unlock()] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25USB: add new zte 3g-dongle's pid to option.cRui li1-1/+128
As ZTE have and will use more pid for new products this year, so we need to add some new zte 3g-dongle's pid on option.c , and delete one pid 0x0154 because it use for mass-storage port. Signed-off-by: Rui li <li.rui27@zte.com.cn> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 1608ea5f4b5d6262cd6e808839491cfb2a67405a) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25lzo: check for length overrun in variable length encoding.Willy Tarreau1-6/+37
This fix ensures that we never meet an integer overflow while adding 255 while parsing a variable length encoding. It works differently from commit 206a81c ("lzo: properly check for overruns") because instead of ensuring that we don't overrun the input, which is tricky to guarantee due to many assumptions in the code, it simply checks that the cumulated number of 255 read cannot overflow by bounding this number. The MAX_255_COUNT is the maximum number of times we can add 255 to a base count without overflowing an integer. The multiply will overflow when multiplying 255 by more than MAXINT/255. The sum will overflow earlier depending on the base count. Since the base count is taken from a u8 and a few bits, it is safe to assume that it will always be lower than or equal to 2*255, thus we can always prevent any overflow by accepting two less 255 steps. This patch also reduces the CPU overhead and actually increases performance by 1.1% compared to the initial code, while the previous fix costs 3.1% (measured on x86_64). The fix needs to be backported to all currently supported stable kernels. Reported-by: Willem Pinckaers <willem@lekkertech.net> Cc: "Don A. Bailey" <donb@securitymouse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 72cf90124e87d975d0b2114d930808c58b4c05e4) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25Documentation: lzo: document part of the encodingWilly Tarreau1-0/+164
Add a complete description of the LZO format as processed by the decompressor. I have not found a public specification of this format hence this analysis, which will be used to better understand the code. Cc: Willem Pinckaers <willem@lekkertech.net> Cc: "Don A. Bailey" <donb@securitymouse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit d98a0526434d27e261f622cf9d2e0028b5ff1a00) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25lib/lzo: Update LZO compression to current upstream versionMarkus F.X.J. Oberhumer4-342/+395
This commit updates the kernel LZO code to the current upsteam version which features a significant speed improvement - benchmarking the Calgary and Silesia test corpora typically shows a doubled performance in both compression and decompression on modern i386/x86_64/powerpc machines. Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com> (cherry picked from commit 8b975bd3f9089f8ee5d7bbfd798537b992bbc7e7) [wt: this update was only needed to apply the following security fixes] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ARM: 7670/1: fix the memset fixNicolas Pitre1-20/+13
Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations") attempted to fix a compliance issue with the memset return value. However the memset itself became broken by that patch for misaligned pointers. This fixes the above by branching over the entry code from the misaligned fixup code to avoid reloading the original pointer. Also, because the function entry alignment is wrong in the Thumb mode compilation, that fixup code is moved to the end. While at it, the entry instructions are slightly reworked to help dual issue pipelines. Signed-off-by: Nicolas Pitre <nico@linaro.org> Tested-by: Alexander Holler <holler@ahsoftware.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> (cherry picked from commit 418df63adac56841ef6b0f1fcf435bc64d4ed177) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) ↵Ivan Djelic1-41/+44
optimizations Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on assumptions about the implementation of memset and similar functions. The current ARM optimized memset code does not return the value of its first argument, as is usually expected from standard implementations. For instance in the following function: void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) { memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter)); waiter->magic = waiter; INIT_LIST_HEAD(&waiter->list); } compiled as: 800554d0 <debug_mutex_lock_common>: 800554d0: e92d4008 push {r3, lr} 800554d4: e1a00001 mov r0, r1 800554d8: e3a02010 mov r2, #16 ; 0x10 800554dc: e3a01011 mov r1, #17 ; 0x11 800554e0: eb04426e bl 80165ea0 <memset> 800554e4: e1a03000 mov r3, r0 800554e8: e583000c str r0, [r3, #12] 800554ec: e5830000 str r0, [r3] 800554f0: e5830004 str r0, [r3, #4] 800554f4: e8bd8008 pop {r3, pc} GCC assumes memset returns the value of pointer 'waiter' in register r0; causing register/memory corruptions. This patch fixes the return value of the assembly version of memset. It adds a 'mov' instruction and merges an additional load+store into existing load/store instructions. For ease of review, here is a breakdown of the patch into 4 simple steps: Step 1 ====== Perform the following substitutions: ip -> r8, then r0 -> ip, and insert 'mov ip, r0' as the first statement of the function. At this point, we have a memset() implementation returning the proper result, but corrupting r8 on some paths (the ones that were using ip). Step 2 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 1: save r8: - str lr, [sp, #-4]! + stmfd sp!, {r8, lr} and restore r8 on both exit paths: - ldmeqfd sp!, {pc} @ Now <64 bytes to go. + ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go. (...) tst r2, #16 stmneia ip!, {r1, r3, r8, lr} - ldr lr, [sp], #4 + ldmfd sp!, {r8, lr} Step 3 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 0: save r8: - stmfd sp!, {r4-r7, lr} + stmfd sp!, {r4-r8, lr} and restore r8 on both exit paths: bgt 3b - ldmeqfd sp!, {r4-r7, pc} + ldmeqfd sp!, {r4-r8, pc} (...) tst r2, #16 stmneia ip!, {r4-r7} - ldmfd sp!, {r4-r7, lr} + ldmfd sp!, {r4-r8, lr} Step 4 ====== Rewrite register list "r4-r7, r8" as "r4-r8". Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Dirk Behme <dirk.behme@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> (cherry picked from commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25net: pppoe: use correct channel MTU when using Multilink PPPChristoph Schulz1-1/+1
The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see ppp_generic module) tries to determine how big a fragment might be. According to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the corresponding comment and code in ppp_mp_explode(): /* * hdrlen includes the 2-byte PPP protocol field, but the * MTU counts only the payload excluding the protocol field. * (RFC1661 Section 2) */ mtu = pch->chan->mtu - (hdrlen - 2); However, the pppoe module *does* include the PPP protocol field in the channel MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under certain circumstances (one byte if PPP protocol compression is used, two otherwise), causing the generated Ethernet packets to be dropped. So the pppoe module has to subtract two bytes from the channel MTU. This error only manifests itself when using Multilink PPP, as otherwise the channel MTU is not used anywhere. In the following, I will describe how to reproduce this bug. We configure two pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is computed by adding the two link MTUs and subtracting the MP header twice, which is 4 bytes long.) The necessary pppd statements on both sides are "multilink mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server side, we additionally need to start two pppoe-server instances to be able to establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU of the PPP network interface to the MRRU (2976) on both sides of the connection in order to make use of the higher bandwidth. (If we didn't do that, IP fragmentation would kick in, which we want to avoid.) Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to server over the PPP link. This results in the following network packet: 2948 (echo payload) + 8 (ICMPv4 header) + 20 (IPv4 header) --------------------- 2976 (PPP payload) These 2976 bytes do not exceed the MTU of the PPP network interface, so the IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode() prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger than the negotiated MRRU. So this packet would have to be divided in three fragments. But this does not happen as each link MTU is assumed to be two bytes larger. So this packet is diveded into two fragments only, one of size 1489 and one of size 1488. Now we have for that bigger fragment: 1489 (PPP payload) + 4 (MP header) + 2 (PPP protocol field for the MP payload (0x3d)) + 6 (PPPoE header) -------------------------- 1501 (Ethernet payload) This packet exceeds the link MTU and is discarded. If one configures the link MTU on the client side to 1501, one can see the discarded Ethernet frames with tcpdump running on the client. A ping -s 2948 -c 1 192.168.15.254 leads to the smaller fragment that is correctly received on the server side: (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d) 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000, Flags [end], length 1492 and to the bigger fragment that is not received on the server side: (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d) 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000, Flags [begin], length 1493 With the patch below, we correctly obtain three fragments: 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [begin], length 1492 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [none], length 1492 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000, Flags [end], length 5 And the ICMPv4 echo request is successfully received at the server side: IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1), length 2976) 192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0, length 2956 The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698 ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies to 3.10 upwards but the fix can be applied (with minor modifications) to kernels as old as 2.6.32. Signed-off-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25md/raid6: avoid data corruption during recovery of double-degraded RAID6NeilBrown1-0/+2
During recovery of a double-degraded RAID6 it is possible for some blocks not to be recovered properly, leading to corruption. If a write happens to one block in a stripe that would be written to a missing device, and at the same time that stripe is recovering data to the other missing device, then that recovered data may not be written. This patch skips, in the double-degraded case, an optimisation that is only safe for single-degraded arrays. Bug was introduced in 2.6.32 and fix is suitable for any kernel since then. In an older kernel with separate handle_stripe5() and handle_stripe6() functions the patch must change handle_stripe6(). Cc: stable@vger.kernel.org (2.6.32+) Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8 Cc: Yuri Tikhonov <yur@emcraft.com> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: "Manibalan P" <pmanibalan@amiindia.co.in> Tested-by: "Manibalan P" <pmanibalan@amiindia.co.in> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423 Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Dan Williams <dan.j.williams@intel.com> (cherry picked from commit 9c4bdf697c39805078392d5ddbbba5ae5680e0dd) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ring-buffer: Always reset iterator to reader pageSteven Rostedt (Red Hat)1-9/+3
When performing a consuming read, the ring buffer swaps out a page from the ring buffer with a empty page and this page that was swapped out becomes the new reader page. The reader page is owned by the reader and since it was swapped out of the ring buffer, writers do not have access to it (there's an exception to that rule, but it's out of scope for this commit). When reading the "trace" file, it is a non consuming read, which means that the data in the ring buffer will not be modified. When the trace file is opened, a ring buffer iterator is allocated and writes to the ring buffer are disabled, such that the iterator will not have issues iterating over the data. Although the ring buffer disabled writes, it does not disable other reads, or even consuming reads. If a consuming read happens, then the iterator is reset and starts reading from the beginning again. My tests would sometimes trigger this bug on my i386 box: WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa() Modules linked in: CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M Call Trace: [<c18796b3>] dump_stack+0x4b/0x75 [<c103a0e3>] warn_slowpath_common+0x7e/0x95 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa [<c103a185>] warn_slowpath_fmt+0x33/0x35 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M [<c10bed04>] trace_find_cmdline+0x40/0x64 [<c10c3c16>] trace_print_context+0x27/0xec [<c10c4360>] ? trace_seq_printf+0x37/0x5b [<c10c0b15>] print_trace_line+0x319/0x39b [<c10ba3fb>] ? ring_buffer_read+0x47/0x50 [<c10c13b1>] s_show+0x192/0x1ab [<c10bfd9a>] ? s_next+0x5a/0x7c [<c112e76e>] seq_read+0x267/0x34c [<c1115a25>] vfs_read+0x8c/0xef [<c112e507>] ? seq_lseek+0x154/0x154 [<c1115ba2>] SyS_read+0x54/0x7f [<c188488e>] syscall_call+0x7/0xb ---[ end trace 3f507febd6b4cc83 ]--- >>>> ##### CPU 1 buffer started #### Which was the __trace_find_cmdline() function complaining about the pid in the event record being negative. After adding more test cases, this would trigger more often. Strangely enough, it would never trigger on a single test, but instead would trigger only when running all the tests. I believe that was the case because it required one of the tests to be shutting down via delayed instances while a new test started up. After spending several days debugging this, I found that it was caused by the iterator becoming corrupted. Debugging further, I found out why the iterator became corrupted. It happened with the rb_iter_reset(). As consuming reads may not read the full reader page, and only part of it, there's a "read" field to know where the last read took place. The iterator, must also start at the read position. In the rb_iter_reset() code, if the reader page was disconnected from the ring buffer, the iterator would start at the head page within the ring buffer (where writes still happen). But the mistake there was that it still used the "read" field to start the iterator on the head page, where it should always start at zero because readers never read from within the ring buffer where writes occur. I originally wrote a patch to have it set the iter->head to 0 instead of iter->head_page->read, but then I questioned why it wasn't always setting the iter to point to the reader page, as the reader page is still valid. The list_empty(reader_page->list) just means that it was successful in swapping out. But the reader_page may still have data. There was a bug report a long time ago that was not reproducible that had something about trace_pipe (consuming read) not matching trace (iterator read). This may explain why that happened. Anyway, the correct answer to this bug is to always use the reader page an not reset the iterator to inside the writable ring buffer. Cc: stable@vger.kernel.org Fixes: d769041f8653 "ring_buffer: implement new locking" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> (cherry picked from commit 651e22f2701b4113989237c3048d17337dd2185c) [wt: 2.6.32 has no cache{,_read} member in struct ring_buffer_iter ] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25netfilter: nfnetlink_log: fix maximum packet length logged to userspaceFlorian Westphal1-3/+5
don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work. The nla length includes the size of the nla struct, so anything larger results in u16 integer overflow. This patch is similar to 9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit c1e7dc91eed0ed1a51c9b814d648db18bf8fc6e9) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25netfilter: nf_log: account for size of NLMSG_DONE attributeFlorian Westphal1-3/+3
We currently neither account for the nlattr size, nor do we consider the size of the trailing NLMSG_DONE when allocating nlmsg skb. This can result in nflog to stop working, as __nfulnl_send() re-tries sending forever if it failed to append NLMSG_DONE (which will never work if buffer is not large enough). Reported-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit 9dfa1dfe4d5e5e66a991321ab08afe69759d797a) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25tty: Fix high cpu load if tty is unreleaseablePeter Hurley1-1/+6
Kernel oops can cause the tty to be unreleaseable (for example, if n_tty_read() crashes while on the read_wait queue). This will cause tty_release() to endlessly loop without sleeping. Use a killable sleep timeout which grows by 2n+1 jiffies over the interval [0, 120 secs.) and then jumps to forever (but still killable). NB: killable just allows for the task to be rewoken manually, not to be terminated. Cc: <stable@vger.kernel.org> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 37b164578826406a173ca7c20d9ba7430134d23e) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25net: sctp: fix skb_over_panic when receiving malformed ASCONF chunksDaniel Borkmann3-64/+60
Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8144fb1c>] skb_put+0x5c/0x70 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp] [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp] [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440 [<ffffffff81496ac5>] ip_rcv+0x275/0x350 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750 [<ffffffff81460588>] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void *)asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification *and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25fix misuses of f_count() in ppp and netlinkAl Viro1-1/+1
we used to check for "nobody else could start doing anything with that opened file" by checking that refcount was 2 or less - one for descriptor table and one we'd acquired in fget() on the way to wherever we are. That was race-prone (somebody else might have had a reference to descriptor table and do fget() just as we'd been checking) and it had become flat-out incorrect back when we switched to fget_light() on those codepaths - unlike fget(), it doesn't grab an extra reference unless the descriptor table is shared. The same change allowed a race-free check, though - we are safe exactly when refcount is less than 2. It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH, netlink hadn't grown that check until 3.9 and ppp used to live in drivers/net, not drivers/net/ppp until 3.1. The bug existed well before that, though, and the same fix used to apply in old location of file. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit 24dff96a37a2ca319e75a74d3929b2de22447ca6) [wt: apply to drivers/net/ppp_generic.c only] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25USB: kobil_sct: fix non-atomic allocation in write pathJohan Hovold1-2/+2
Write may be called from interrupt context so make sure to use GFP_ATOMIC for all allocations in write. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> (cherry picked from commit 191252837626fca0de694c18bb2aa64c118eda89) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25gianfar: disable vlan tag insertion by defaultZhu Yanjun1-6/+0
2.6.x kernels require a similar logic change as commit 51b8cbfc [gianfar: fix bug caused by e1653c3e] introduces for newer kernels. Gianfar driver originally enables vlan tag insertion by default. This will lead to unusable connections on some configurations. Since gianfar nic vlan tag insertion is disabled by default and it is not enabled any longer, it is not necessary to disable it again. Reported-by: Xu Jianrong <roy.xu@huawei.com> Suggested-by: Wang Feng <sky.wangfeng@huawei.com> Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25dm crypt: fix access beyond the end of allocated spaceMikulas Patocka1-4/+16
commit d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed upstream The DM crypt target accesses memory beyond allocated space resulting in a crash on 32 bit x86 systems. This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm crypt: use async crypto"). However, this bug was masked by the fact that kmalloc rounds the size up to the next power of two. This bug wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio data"). By switching to using per-bio data there was no longer any padding beyond the end of a dm-crypt allocated memory block. To minimize allocation overhead dm-crypt puts several structures into one block allocated with kmalloc. The block holds struct ablkcipher_request, cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))), struct dm_crypt_request and an initialization vector. The variable dmreq_start is set to offset of struct dm_crypt_request within this memory block. dm-crypt allocates the block with this size: cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size. When accessing the initialization vector, dm-crypt uses the function iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq + 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1). dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request structure. However, when dm-crypt accesses the initialization vector, it takes a pointer to the end of dm_crypt_request, aligns it, and then uses it as the initialization vector. If the end of dm_crypt_request is not aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the alignment causes the initialization vector to point beyond the allocated space. Fix this bug by calculating the variable iv_size_padding and adding it to the allocated size. Also correct the alignment of dm_crypt_request. struct dm_crypt_request is specific to dm-crypt (it isn't used by the crypto subsystem at all), so it is aligned on __alignof__(struct dm_crypt_request). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> [wt: minor context adaptations, hopefully ok] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25Revert "nfsd: correctly handle return value from nfsd_map_name_to_*"Willy Tarreau1-4/+4
This reverts commit 63d059e73ff4574b79bd8aa252b5fc00b6326ddf. On Wed, Sep 03, 2014 at 11:28:43AM +1000, NeilBrown wrote: > > 2.6.32.30 contains: > > commit 63d059e73ff4574b79bd8aa252b5fc00b6326ddf > Author: NeilBrown <neilb@suse.de> > Date: Wed Feb 16 13:08:35 2011 +1100 > > nfsd: correctly handle return value from nfsd_map_name_to_* > > commit 47c85291d3dd1a51501555000b90f8e281a0458e upstream. > > These functions return an nfs status, not a host_err. So don't > try to convert before returning. > > This is a regression introduced by > 3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers, > but missed these two. > > Reported-by: Herbert Poetzl <herbert@13thfloor.at> > Signed-off-by: NeilBrown <neilb@suse.de> > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> > > > But it does *not* contain a backport of > 3c726023402a2f3b28f49b9d90ebf9e71151157d. > > So rather an fixing a regression, it introduces one. > > This patch should be reverted. > > See also https://bugzilla.novell.com/show_bug.cgi?id=893787 > > NeilBrown Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25sctp: fix possible seqlock seadlock in sctp_packet_transmit()Eric Dumazet1-1/+1
[ Upstream commit 757efd32d5ce31f67193cc0e6a56e4dffcc42fb1 ] Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp] [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp] [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp] [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp] [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp] [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp] [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2 This is a followup of commits f1d8cba61c3c4b ("inet: fix possible seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25iovec: make sure the caller actually wants anything in memcpy_fromiovecendSasha Levin1-0/+4
[ Upstream commit 06ebb06d49486676272a3c030bfeef4bd969a8e6 ] Check for cases when the caller requests 0 bytes instead of running off and dereferencing potentially invalid iovecs. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25net: Correctly set segment mac_len in skb_segment().Vlad Yasevich1-1/+1
[ Upstream commit fcdfe3a7fa4cb74391d42b6a26dc07c20dab1d82 ] When performing segmentation, the mac_len value is copied right out of the original skb. However, this value is not always set correctly (like when the packet is VLAN-tagged) and we'll end up copying a bad value. One way to demonstrate this is to configure a VM which tags packets internally and turn off VLAN acceleration on the forwarding bridge port. The packets show up corrupt like this: 16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0, 0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d. 0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0.. 0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3 0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp 0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp ... This also leads to awful throughput as GSO packets are dropped and cause retransmissions. The solution is to set the mac_len using the values already available in then new skb. We've already adjusted all of the header offset, so we might as well correctly figure out the mac_len using skb_reset_mac_len(). After this change, packets are segmented correctly and performance is restored. CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> [wt: open-code skb_mac_len() as 2.6.32 doesn't have it] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25macvlan: Initialize vlan_features to turn on offload support.Vlad Yasevich1-0/+1
[ Upstream commit 081e83a78db9b0ae1f5eabc2dedecc865f509b98 ] Macvlan devices do not initialize vlan_features. As a result, any vlan devices configured on top of macvlans perform very poorly. Initialize vlan_features based on the vlan features of the lower-level device. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25tcp: Fix integer-overflow in TCP vegasChristoph Paasch1-1/+2
[ Upstream commit 1f74e613ded11517db90b2bd57e9464d9e0fb161 ] In vegas we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. Then, we need to do do_div to allow this to be used on 32-bit arches. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Doug Leith <doug.leith@nuim.ie> Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25tcp: Fix integer-overflows in TCP venoChristoph Paasch1-1/+1
[ Upstream commit 45a07695bc64b3ab5d6d2215f9677e5b8c05a7d0 ] In veno we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it failed to add the required cast in tcp_veno_cong_avoid(). Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25net: sendmsg: fix NULL pointer dereferenceAndrey Ryabinin2-7/+8
[ Upstream commit 40eea803c6b2cfaab092f053248cbeab3f368412 ] Sasha's report: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel with the KASAN patchset, I've stumbled on the following spew: > > [ 4448.949424] ================================================================== > [ 4448.951737] AddressSanitizer: user-memory-access on address 0 > [ 4448.952988] Read of size 2 by thread T19638: > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 > [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40 > [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d > [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000 > [ 4448.961266] Call Trace: > [ 4448.963158] dump_stack (lib/dump_stack.c:52) > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184) > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352) > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555) > [ 4448.970103] sock_sendmsg (net/socket.c:654) > [ 4448.971584] ? might_fault (mm/memory.c:3741) > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740) > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64) > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096) > [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273) > [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1)) > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188) > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181) > [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2)) > [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201) > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542) > [ 4448.988929] ================================================================== This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0. After this report there was no usual "Unable to handle kernel NULL pointer dereference" and this gave me a clue that address 0 is mapped and contains valid socket address structure in it. This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic). Commit message states that: "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address." But in fact this affects sendto when address 0 is mapped and contains socket address structure in it. In such case copy-in address will succeed, verify_iovec() function will successfully exit with msg->msg_namelen > 0 and msg->msg_name == NULL. This patch fixes it by setting msg_namelen to 0 if msg_name == NULL. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25net: sctp: inherit auth_capable on INIT collisionsDaniel Borkmann1-0/+1
[ Upstream commit 1be9a950c646c9092fb3618197f7b6bfb50e82aa ] Jason reported an oops caused by SCTP on his ARM machine with SCTP authentication enabled: Internal error: Oops: 17 [#1] ARM CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1 task: c6eefa40 ti: c6f52000 task.ti: c6f52000 PC is at sctp_auth_calculate_hmac+0xc4/0x10c LR is at sg_init_table+0x20/0x38 pc : [<c024bb80>] lr : [<c00f32dc>] psr: 40000013 sp : c6f538e8 ip : 00000000 fp : c6f53924 r10: c6f50d80 r9 : 00000000 r8 : 00010000 r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254 r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660 Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005397f Table: 06f28000 DAC: 00000015 Process sctp-test (pid: 104, stack limit = 0xc6f521c0) Stack: (0xc6f538e8 to 0xc6f54000) [...] Backtrace: [<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8) [<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844) [<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28) [<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220) [<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4) [<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160) [<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74) [<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888) While we already had various kind of bugs in that area ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache auth_enable per endpoint"), this one is a bit of a different kind. Giving a bit more background on why SCTP authentication is needed can be found in RFC4895: SCTP uses 32-bit verification tags to protect itself against blind attackers. These values are not changed during the lifetime of an SCTP association. Looking at new SCTP extensions, there is the need to have a method of proving that an SCTP chunk(s) was really sent by the original peer that started the association and not by a malicious attacker. To cause this bug, we're triggering an INIT collision between peers; normal SCTP handshake where both sides intent to authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO parameters that are being negotiated among peers: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- RFC4895 says that each endpoint therefore knows its own random number and the peer's random number *after* the association has been established. The local and peer's random number along with the shared key are then part of the secret used for calculating the HMAC in the AUTH chunk. Now, in our scenario, we have 2 threads with 1 non-blocking SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling sctp_bindx(3), listen(2) and connect(2) against each other, thus the handshake looks similar to this, e.g.: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------- -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------> ... Since such collisions can also happen with verification tags, the RFC4895 for AUTH rather vaguely says under section 6.1: In case of INIT collision, the rules governing the handling of this Random Number follow the same pattern as those for the Verification Tag, as explained in Section 5.2.4 of RFC 2960 [5]. Therefore, each endpoint knows its own Random Number and the peer's Random Number after the association has been established. In RFC2960, section 5.2.4, we're eventually hitting Action B: B) In this case, both sides may be attempting to start an association at about the same time but the peer endpoint started its INIT after responding to the local endpoint's INIT. Thus it may have picked a new Verification Tag not being aware of the previous Tag it had sent this endpoint. The endpoint should stay in or enter the ESTABLISHED state but it MUST update its peer's Verification Tag from the State Cookie, stop any init or cookie timers that may running and send a COOKIE ACK. In other words, the handling of the Random parameter is the same as behavior for the Verification Tag as described in Action B of section 5.2.4. Looking at the code, we exactly hit the sctp_sf_do_dupcook_b() case which triggers an SCTP_CMD_UPDATE_ASSOC command to the side effect interpreter, and in fact it properly copies over peer_{random, hmacs, chunks} parameters from the newly created association to update the existing one. Also, the old asoc_shared_key is being released and based on the new params, sctp_auth_asoc_init_active_key() updated. However, the issue observed in this case is that the previous asoc->peer.auth_capable was 0, and has *not* been updated, so that instead of creating a new secret, we're doing an early return from the function sctp_auth_asoc_init_active_key() leaving asoc->asoc_shared_key as NULL. However, we now have to authenticate chunks from the updated chunk list (e.g. COOKIE-ACK). That in fact causes the server side when responding with ... <------------------ AUTH; COOKIE-ACK ----------------- ... to trigger a NULL pointer dereference, since in sctp_packet_transmit(), it discovers that an AUTH chunk is being queued for xmit, and thus it calls sctp_auth_calculate_hmac(). Since the asoc->active_key_id is still inherited from the endpoint, and the same as encoded into the chunk, it uses asoc->asoc_shared_key, which is still NULL, as an asoc_key and dereferences it in ... crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len) ... causing an oops. All this happens because sctp_make_cookie_ack() called with the *new* association has the peer.auth_capable=1 and therefore marks the chunk with auth=1 after checking sctp_auth_send_cid(), but it is *actually* sent later on over the then *updated* association's transport that didn't initialize its shared key due to peer.auth_capable=0. Since control chunks in that case are not sent by the temporary association which are scheduled for deletion, they are issued for xmit via SCTP_CMD_REPLY in the interpreter with the context of the *updated* association. peer.auth_capable was 0 in the updated association (which went from COOKIE_WAIT into ESTABLISHED state), since all previous processing that performed sctp_process_init() was being done on temporary associations, that we eventually throw away each time. The correct fix is to update to the new peer.auth_capable value as well in the collision case via sctp_assoc_update(), so that in case the collision migrated from 0 -> 1, sctp_auth_asoc_init_active_key() can properly recalculate the secret. This therefore fixes the observed server panic. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ipv4: fix buffer overflow in ip_options_compile()Eric Dumazet1-0/+4
[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ] There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0 [28504.915394] [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120 [28504.916843] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.919490] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.920835] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.922208] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.923459] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120 [28504.926884] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.929175] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.930400] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.931677] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.932851] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25sunvnet: clean up objects created in vnet_new() on vnet_exit()Sowmini Varadhan1-1/+19
[ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ] Nothing cleans up the objects created by vnet_new(), they are completely leaked. vnet_exit(), after doing the vio_unregister_driver() to clean up ports, should call a helper function that iterates over vnet_list and cleans up those objects. This includes unregister_netdevice() as well as free_netdev(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Karl Volz <karl.volz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25net: sctp: fix information leaks in ulpevent layerDaniel Borkmann1-107/+15
[ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ] While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25appletalk: Fix socket referencing in skbAndrey Utkin1-3/+0
[ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ] Setting just skb->sk without taking its reference and setting a destructor is invalid. However, in the places where this was done, skb is used in a way not requiring skb->sk setting. So dropping the setting of skb->sk. Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441 Reported-by: Ed Martin <edman007@edman007.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25igmp: fix the problem when mc leave groupdingtianhong1-4/+6
[ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ] The problem was triggered by these steps: 1) create socket, bind and then setsockopt for add mc group. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("192.168.1.2"); setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); 2) drop the mc group for this socket. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("0.0.0.0"); setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)); 3) and then drop the socket, I found the mc group was still used by the dev: netstat -g Interface RefCnt Group --------------- ------ --------------------- eth2 1 255.0.0.37 Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need to be released for the netdev when drop the socket, but this process was broken when route default is NULL, the reason is that: The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr is NULL, the default route dev will be chosen, then the ifindex is got from the dev, then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL, the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be released from the mc_list, but the dev didn't dec the refcnt for this mc group, so when dropping the socket, the mc_list is NULL and the dev still keep this group. v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs, so I add the checking for the in_dev before polling the mc_list, make sure when we remove the mc group, dec the refcnt to the real dev which was using the mc address. The problem would never happened again. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skbNeal Cardwell1-1/+1
[ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ] If there is an MSS change (or misbehaving receiver) that causes a SACK to arrive that covers the end of an skb but is less than one MSS, then tcp_match_skb_to_sack() was rounding up pkt_len to the full length of the skb ("Round if necessary..."), then chopping all bytes off the skb and creating a zero-byte skb in the write queue. This was visible now because the recently simplified TLP logic in bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe. Consider the following example scenario: mss: 1000 skb: seq: 0 end_seq: 4000 len: 4000 SACK: start_seq: 3999 end_seq: 4000 The tcp_match_skb_to_sack() code will compute: in_sack = false pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 new_len += mss = 4000 Previously we would find the new_len > skb->len check failing, so we would fall through and set pkt_len = new_len = 4000 and chop off pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment afterward in the write queue. With this new commit, we notice that the new new_len >= skb->len check succeeds, so that we return without trying to fragment. Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25sym53c8xx_2: Set DID_REQUEUE return code when aborting squeueMikulas Patocka1-0/+4
Hi This is backport of commit fd1232b214af43a973443aec6a2808f16ee5bf70. It is suitable for all stable branches up to and including 3.14.* Mikulas commit fd1232b214af43a973443aec6a2808f16ee5bf70 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Tue Apr 8 21:52:05 2014 -0400 sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue This patch fixes I/O errors with the sym53c8xx_2 driver when the disk returns QUEUE FULL status. When the controller encounters an error (including QUEUE FULL or BUSY status), it aborts all not yet submitted requests in the function sym_dequeue_from_squeue. This function aborts them with DID_SOFT_ERROR. If the disk has full tag queue, the request that caused the overflow is aborted with QUEUE FULL status (and the scsi midlayer properly retries it until it is accepted by the disk), but the sym53c8xx_2 driver aborts the following requests with DID_SOFT_ERROR --- for them, the midlayer does just a few retries and then signals the error up to sd. The result is that disk returning QUEUE FULL causes request failures. The error was reproduced on 53c895 with COMPAQ BD03685A24 disk (rebranded ST336607LC) with command queue 48 or 64 tags. The disk has 64 tags, but under some access patterns it return QUEUE FULL when there are less than 64 pending tags. The SCSI specification allows returning QUEUE FULL anytime and it is up to the host to retry. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ptrace,x86: force IRET path after a ptrace_stop()Tejun Heo2-0/+19
[ Upstream commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a ] The 'sysret' fastpath does not correctly restore even all regular registers, much less any segment registers or reflags values. That is very much part of why it's faster than 'iret'. Normally that isn't a problem, because the normal ptrace() interface catches the process using the signal handler infrastructure, which always returns with an iret. However, some paths can get caught using ptrace_event() instead of the signal path, and for those we need to make sure that we aren't going to return to user space using 'sysret'. Otherwise the modifications that may have been done to the register set by the tracer wouldn't necessarily take effect. Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from arch_ptrace_stop_needed() which is invoked from ptrace_stop(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: fixes CVE-2014-4699] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ALSA: control: Protect user controls against concurrent accessLars-Peter Clausen3-6/+28
[ Upstream commit 07f4d9d74a04aa7c72c5dae0ef97565f28f17b92 ] The user-control put and get handlers as well as the tlv do not protect against concurrent access from multiple threads. Since the state of the control is not updated atomically it is possible that either two write operations or a write and a read operation race against each other. Both can lead to arbitrary memory disclosure. This patch introduces a new lock that protects user-controls from concurrent access. Since applications typically access controls sequentially than in parallel a single lock per card should be fine. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> [wt: fixes CVE-2014-4652] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25filter: prevent nla extensions to peek beyond the end of the messageMathias Krause1-1/+7
[ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> [geissert: backport to 2.6.32] [wt: fixes CVE-2014-3144] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25mm: try_to_unmap_cluster() should lock_page() before mlockingVlastimil Babka2-2/+14
commit 68f662b89de39aae5f43ecd3fbc6ea120d0a47cf upstream A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin fuzzing with trinity. The call site try_to_unmap_cluster() does not lock the pages other than its check_page parameter (which is already locked). The BUG_ON in mlock_vma_page() is not documented and its purpose is somewhat unclear, but apparently it serializes against page migration, which could otherwise fail to transfer the PG_mlocked flag. This would not be fatal, as the page would be eventually encountered again, but NR_MLOCK accounting would become distorted nevertheless. This patch adds a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that effect. The call site try_to_unmap_cluster() is fixed so that for page != check_page, trylock_page() is attempted (to avoid possible deadlocks as we already have check_page locked) and mlock_vma_page() is performed only upon success. If the page lock cannot be obtained, the page is left without PG_mlocked, which is again not a problem in the whole unevictable memory design. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (back ported from commit 57e68e9cd65b4b8eb4045a1e0d0746458502554c) CVE-2014-3122 BugLink: http://bugs.launchpad.net/bugs/1316268 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25sctp: Fix sk_ack_backlog wrap-around problemXufeng Zhang1-1/+1
Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit d3217b15a19a4779c39b212358a5c71d725822ee) [wt: fixes CVE-2014-4667] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ALSA: control: Handle numid overflowLars-Peter Clausen1-0/+4
Each control gets automatically assigned its numids when the control is created. The allocation is done by incrementing the numid by the amount of allocated numids per allocation. This means that excessive creation and destruction of controls (e.g. via SNDRV_CTL_IOCTL_ELEM_ADD/REMOVE) can cause the id to eventually overflow. Currently when this happens for the control that caused the overflow kctl->id.numid + kctl->count will also over flow causing it to be smaller than kctl->id.numid. Most of the code assumes that this is something that can not happen, so we need to make sure that it won't happen Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> (cherry picked from commit ac902c112d90a89e59916f751c2745f4dbdbb4bd) [wt: part 2 of CVE-2014-4656] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25ALSA: control: Make sure that id->index does not overflowLars-Peter Clausen1-0/+3
The ALSA control code expects that the range of assigned indices to a control is continuous and does not overflow. Currently there are no checks to enforce this. If a control with a overflowing index range is created that control becomes effectively inaccessible and unremovable since snd_ctl_find_id() will not be able to find it. This patch adds a check that makes sure that controls with a overflowing index range can not be created. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Jaroslav Kysela <perex@perex.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> (cherry picked from commit 883a1d49f0d77d30012f114b2e19fc141beb3e8e) [wt: part 1 of CVE-2014-4656 fix] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25fix autofs/afs/etc. magic mountpoint breakageAl Viro2-1/+1
We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT) if /mnt/tmp is an autofs direct mount. The reason is that nd.last_type is bogus here; we want LAST_BIND for everything of that kind and we get LAST_NORM left over from finding parent directory. So make sure that it *is* set properly; set to LAST_BIND before doing ->follow_link() - for normal symlinks it will be changed by __vfs_follow_link() and everything else needs it set that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit 86acdca1b63e6890540fa19495cfc708beff3d8b) [wt: fixes CVE-2014-0203] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25MIPS: asm: thread_info: Add _TIF_SECCOMP flagMarkos Chandras1-1/+2
Add _TIF_SECCOMP flag to _TIF_WORK_SYSCALL_ENTRY to indicate that the system call needs to be checked against a seccomp filter. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Reviewed-by: Paul Burton <paul.burton@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6405/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> (cherry picked from commit 137f7df8cead00688524c82360930845396b8a21) [wt: fixes CVE-2014-4157 - no _TIF_NOHZ nor _TIF_SYSCALL_TRACEPOINT in 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25MIPS: Cleanup flags in syscall flags handlers.Ralf Baechle5-4/+6
This will simplify further modifications. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> (cherry picked from commit e7f3b48af7be9f8007a224663a5b91340626fed5) [wt: this patch is only there to ease integration of next one] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25x86_32, entry: Clean up sysenter_badsys declarationStefan Bader1-1/+1
commit 554086d85e "x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)" introduced a new jump label (sysenter_badsys) but somehow the END statements seem to have gone wrong (at least it feels that way to me). This does not seem to be a fatal problem, but just for the sake of symmetry, change the second syscall_badsys to sysenter_badsys. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Link: http://lkml.kernel.org/r/1408093066-31021-1-git-send-email-stefan.bader@canonical.com Acked-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> (cherry picked from commit fb21b84e7f809ef04b1e5aed5d463cf0d4866638) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-11-25x86_32, entry: Store badsys error code in %eaxSven Wegener1-4/+5
Commit 554086d ("x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry code, resulting in syscall() not returning proper errors for undefined syscalls on CPUs supporting the sysenter feature. The following code: > int result = syscall(666); > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno)); results in: > result=666 errno=0 error=Success Obviously, the syscall return value is the called syscall number, but it should have been an ENOSYS error. When run under ptrace it behaves correctly, which makes it hard to debug in the wild: > result=-1 errno=38 error=Function not implemented The %eax register is the return value register. For debugging via ptrace the syscall entry code stores the complete register context on the stack. The badsys handlers only store the ENOSYS error code in the ptrace register set and do not set %eax like a regular syscall handler would. The old resume_userspace call chain contains code that clobbers %eax and it restores %eax from the ptrace registers afterwards. The same goes for the ptrace-enabled call chain. When ptrace is not used, the syscall return value is the passed-in syscall number from the untouched %eax register. Use %eax as the return value register in syscall_badsys and sysenter_badsys, like a real syscall handler does, and have the caller push the value onto the stack for ptrace access. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net Reviewed-and-tested-by: Andy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> (cherry picked from commit 8142b215501f8b291a108a202b3a053a265b03dd) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-11-25x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)Andy Lutomirski1-2/+8
The bad syscall nr paths are their own incomprehensible route through the entry control flow. Rearrange them to work just like syscalls that return -ENOSYS. This fixes an OOPS in the audit code when fast-path auditing is enabled and sysenter gets a bad syscall nr (CVE-2014-4508). This has probably been broken since Linux 2.6.27: af0575bba0 i386 syscall audit fast-path Cc: stable@vger.kernel.org Cc: Roland McGrath <roland@redhat.com> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> (cherry picked from commit 554086d85e71f30abe46fc014fea31929a7c6a8a) [WT: this fix is incorrect and requires the two following patches] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26Linux 2.6.32.63v2.6.32.63.26Willy Tarreau1-1/+1
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26net: fix regression introduced in 2.6.32.62 by sysctl fixesWilly Tarreau2-2/+4
Commits b7c9e4ee1 ("sysctl net: Keep tcp_syn_retries inside the boundary") and eedcafdc ("net: check net.core.somaxconn sysctl values") were missing a .strategy entry which is still required in 2.6.32. Because of this, the Ubuntu kernel team has faced kernel dumps during their testing. Tyler Hicks and Luis Henriques proposed this patch to fix the issue, which properly sets .strategy as needed in 2.6.32. Reported-by: Luis Henriques <luis.henriques@canonical.com> Cc: tyler.hicks@canonical.com Cc: Michal Tesar <mtesar@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26auditsc: audit_krule mask accesses need bounds checkingAndy Lutomirski1-7/+18
Fixes an easy DoS and possible information disclosure. This does nothing about the broken state of x32 auditing. eparis: If the admin has enabled auditd and has specifically loaded audit rules. This bug has been around since before git. Wow... Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit a3c54931199565930d6d84f4c3456f6440aefd41) [wt: no audit_filter_inode_name(), applied to audit_filter_inodes() instead] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26futex: Make lookup_pi_state more robustThomas Gleixner1-28/+106
The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter | pi_state | pi->owner | uTID | uODIED | ? [1] NULL | --- | --- | 0 | 0/1 | Valid [2] NULL | --- | --- | >0 | 0/1 | Valid [3] Found | NULL | -- | Any | 0/1 | Invalid [4] Found | Found | NULL | 0 | 1 | Valid [5] Found | Found | NULL | >0 | 1 | Invalid [6] Found | Found | task | 0 | 1 | Valid [7] Found | Found | NULL | Any | 0 | Invalid [8] Found | Found | task | ==taskTID | 0/1 | Valid [9] Found | Found | task | 0 | 0 | Invalid [10] Found | Found | task | !=taskTID | 0/1 | Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS | FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 54a217887a7b658e2650c3feff22756ab80c7339) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26futex: Always cleanup owner tid in unlock_piThomas Gleixner1-23/+19
If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e) [wt: adjusted context - cmpxchg_futex_value_locked() takes 3 args] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26futex: Validate atomic acquisition in futex_lock_pi_atomic()Thomas Gleixner1-3/+11
We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 ↵Thomas Gleixner1-0/+25
in futex_requeue(..., requeue_pi=1) If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit e9c243a5a6de0be8e584c604d353412584b592f8) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26futex: Prevent attaching to kernel threadsThomas Gleixner1-0/+5
We happily allow userspace to declare a random kernel thread to be the owner of a user space PI futex. Found while analysing the fallout of Dave Jones syscall fuzzer. We also should validate the thread group for private futexes and find some fast way to validate whether the "alleged" owner has RW access on the file which backs the SHM, but that's a separate issue. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org (cherry picked from commit f0d71b3dcb8332f7971b5f2363632573e6d9486a) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26futex: Add another early deadlock detection checkThomas Gleixner1-14/+35
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock detection code of rtmutex: http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com That underlying issue has been fixed with a patch to the rtmutex code, but the futex code must not call into rtmutex in that case because - it can detect that issue early - it avoids a different and more complex fixup for backing out If the user space variable got manipulated to 0x80000000 which means no lock holder, but the waiters bit set and an active pi_state in the kernel is found we can figure out the recursive locking issue by looking at the pi_state owner. If that is the current task, then we can safely return -EDEADLK. The check should have been added in commit 59fa62451 (futex: Handle futex_pi OWNER_DIED take over correctly) already, but I did not see the above issue caused by user space manipulation back then. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org (cherry picked from commit 866293ee54227584ffcb4a42f69c1f365974ba7f) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-06-26ethtool: Report link-down while interface is downBen Hutchings2-3/+18
While an interface is down, many implementations of ethtool_ops::get_link, including the default, ethtool_op_get_link(), will report the last link state seen while the interface was up. In general the current physical link state is not available if the interface is down. Define ETHTOOL_GLINK to reflect whether the interface *and* any physical port have a working link, and consistently return 0 when the interface is down. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit e596e6e4d578f2639416e620d367a3af34814a40) Cc: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20Linux 2.6.32.62v2.6.32.62.26drm-nextWilly Tarreau1-1/+1
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20floppy: don't write kernel-only members to FDRAWCMD ioctl outputMatthew Daley1-1/+6
Do not leak kernel-only floppy_raw_cmd structure members to userspace. This includes the linked-list pointer and the pointer to the allocated DMA space. Signed-off-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 2145e15e0557a01b9195d1c7199a1b92cb9be81f) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20floppy: ignore kernel-only members in FDRAWCMD ioctl inputMatthew Daley1-1/+4
Always clear out these floppy_raw_cmd struct members after copying the entire structure from userspace so that the in-kernel version is always valid and never left in an interdeterminate state. Signed-off-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit ef87dbe7614341c2e7bfe8d32fcb7028cc97442c) [wt: be careful in 2.6.32 we still have the ugly macros everywhere] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20netfilter: nf_conntrack_dccp: fix skb_header_pointer API usagesDaniel Borkmann1-3/+3
commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream Some occurences in the netfilter tree use skb_header_pointer() in the following way ... struct dccp_hdr _dh, *dh; ... skb_header_pointer(skb, dataoff, sizeof(_dh), &dh); ... where dh itself is a pointer that is being passed as the copy buffer. Instead, we need to use &_dh as the forth argument so that we're copying the data into an actual buffer that sits on the stack. Currently, we probably could overwrite memory on the stack (e.g. with a possibly mal-formed DCCP packet), but unintentionally, as we only want the buffer to be placed into _dh variable. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20s390: fix kernel crash due to linkage stack instructionsMartin Schwidefsky1-2/+5
commit 8d7f6690cedb83456edd41c9bd583783f0703bf0 upstream The kernel currently crashes with a low-address-protection exception if a user space process executes an instruction that tries to use the linkage stack. Set the base-ASTE origin and the subspace-ASTE origin of the dispatchable-unit-control-table to point to a dummy ASTE. Set up control register 15 to point to an empty linkage stack with no room left. A user space process with a linkage stack instruction will still crash but with a different exception which is correctly translated to a segmentation fault instead of a kernel oops. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20SELinux: Fix kernel BUG on empty security contexts.Stephen Smalley1-0/+4
commit 2172fa709ab32ca60e86179dc67d0857be8e2c98 upstream Setting an empty security context (length=0) on a file will lead to incorrectly dereferencing the type and other fields of the security context structure, yielding a kernel BUG. As a zero-length security context is never valid, just reject all such security contexts whether coming from userspace via setxattr or coming from the filesystem upon a getxattr request by SELinux. Setting a security context value (empty or otherwise) unknown to SELinux in the first place is only possible for a root process (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only if the corresponding SELinux mac_admin permission is also granted to the domain by policy. In Fedora policies, this is only allowed for specific domains such as livecd for setting down security contexts that are not defined in the build host policy. Reproducer: su setenforce 0 touch foo setfattr -n security.selinux foo Caveat: Relabeling or removing foo after doing the above may not be possible without booting with SELinux disabled. Any subsequent access to foo after doing the above will also trigger the BUG. BUG output from Matthew Thode: [ 473.893141] ------------[ cut here ]------------ [ 473.962110] kernel BUG at security/selinux/ss/services.c:654! [ 473.995314] invalid opcode: 0000 [#6] SMP [ 474.027196] Modules linked in: [ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I 3.13.0-grsec #1 [ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0 07/29/10 [ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti: ffff8805f50cd488 [ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>] context_struct_compute_av+0xce/0x308 [ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246 [ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX: 0000000000000100 [ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI: ffff8805e8aaa000 [ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09: 0000000000000006 [ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12: 0000000000000006 [ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15: 0000000000000000 [ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000) knlGS:0000000000000000 [ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4: 00000000000207f0 [ 474.556058] Stack: [ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98 ffff8805f1190a40 [ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990 ffff8805e8aac860 [ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060 ffff8805c0ac3d94 [ 474.690461] Call Trace: [ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a [ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b [ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179 [ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4 [ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31 [ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e [ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22 [ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d [ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91 [ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b [ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30 [ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3 [ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b [ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48 8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7 75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8 [ 475.255884] RIP [<ffffffff814681c7>] context_struct_compute_av+0xce/0x308 [ 475.296120] RSP <ffff8805c0ac3c38> [ 475.328734] ---[ end trace f076482e9d754adc ]--- Reported-by: Matthew Thode <mthode@mthode.org> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20aacraid: missing capable() check in compat ioctlDan Carpenter1-0/+2
commit f856567b930dfcdbc3323261bf77240ccdde01f5 upstream In commit d496f94d22d1 ('[SCSI] aacraid: fix security weakness') we added a check on CAP_SYS_RAWIO to the ioctl. The compat ioctls need the check as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20xfs: underflow bug in xfs_attrlist_by_handle()Dan Carpenter2-2/+4
If we allocate less than sizeof(struct attrlist) then we end up corrupting memory or doing a ZERO_PTR_SIZE dereference. This can only be triggered with CAP_SYS_ADMIN. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 071c529eb672648ee8ca3f90944bcbcc730b4c06) [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20qeth: avoid buffer overflow in snmp ioctlUrsula Braun1-1/+5
commit 6fb392b1a63ae36c31f62bc3fc8630b49d602b62 upstream Check user-defined length in snmp ioctl request and allow request only if it fits into a qeth command buffer. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Heiko Carstens <heicars2@linux.vnet.ibm.com> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Cc: <stable@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> [jmm: backport 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)Andy Honig1-1/+2
commit b963a22e6d1a266a67e9eecc88134713fd54775c upstream Under guest controllable circumstances apic_get_tmcct will execute a divide by zero and cause a crash. If the guest cpuid support tsc deadline timers and performs the following sequence of requests the host will crash. - Set the mode to periodic - Set the TMICT to 0 - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline) - Set the TMICT to non-zero. Then the lapic_timer.period will be 0, but the TMICT will not be. If the guest then reads from the TMCCT then the host will perform a divide by 0. This patch ensures that if the lapic_timer.period is 0, then the division does not occur. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20KVM: Improve create VCPU parameter (CVE-2013-4587)Andy Honig1-0/+3
commit 338c7dbadd2671189cec7faf64c84d01071b3f96 upstream In multiple functions the vcpu_id is used as an offset into a bitfield. Ag malicious user could specify a vcpu_id greater than 255 in order to set or clear bits in kernel memory. This could be used to elevate priveges in the kernel. This patch verifies that the vcpu_id provided is less than 255. The api documentation already specifies that the vcpu_id must be less than max_vcpus, but this is currently not checked. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20uml: check length in exitcode_proc_write()Dan Carpenter1-1/+3
commit 201f99f170df14ba52ea4c52847779042b7a623b upstream We don't cap the size of buffer from the user so we could write past the end of the array here. Only root can write to this file. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20crypto: ansi_cprng - Fix off by one error in non-block size requestNeil Horman1-2/+2
commit 714b33d15130cbb5ab426456d4e3de842d6c5b8a upstream Stephan Mueller reported to me recently a error in random number generation in the ansi cprng. If several small requests are made that are less than the instances block size, the remainder for loop code doesn't increment rand_data_valid in the last iteration, meaning that the last bytes in the rand_data buffer gets reused on the subsequent smaller-than-a-block request for random data. The fix is pretty easy, just re-code the for loop to make sure that rand_data_valid gets incremented appropriately Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Stephan Mueller <stephan.mueller@atsec.com> CC: Stephan Mueller <stephan.mueller@atsec.com> CC: Petr Matousek <pmatouse@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20dm snapshot: fix data corruptionMikulas Patocka1-6/+12
CVE-2013-4299 BugLink: http://bugs.launchpad.net/bugs/1241769 This patch fixes a particular type of data corruption that has been encountered when loading a snapshot's metadata from disk. When we allocate a new chunk in persistent_prepare, we increment ps->next_free and we make sure that it doesn't point to a metadata area by further incrementing it if necessary. When we load metadata from disk on device activation, ps->next_free is positioned after the last used data chunk. However, if this last used data chunk is followed by a metadata area, ps->next_free is positioned erroneously to the metadata area. A newly-allocated chunk is placed at the same location as the metadata area, resulting in data or metadata corruption. This patch changes the code so that ps->next_free skips the metadata area when metadata are loaded in function read_exceptions. The patch also moves a piece of code from persistent_prepare_exception to a separate function skip_metadata to avoid code duplication. CVE-2013-4299 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> (back ported from commit e9c6a182649f4259db704ae15a91ac820e63b0ca) Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa3-2/+9
pending data CVE-2013-4162 BugLink: http://bugs.launchpad.net/bugs/1205070 We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> (back ported from commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1) Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20exec/ptrace: fix get_dumpable() incorrect testsKees Cook5-5/+12
commit d049f74f2dbe71354d43d393ac3a188947811348 upstream The get_dumpable() return value is not boolean. Most users of the function actually want to be testing for non-SUID_DUMP_USER(1) rather than SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a protected state. Almost all places did this correctly, excepting the two places fixed in this patch. Wrong logic: if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ } or if (dumpable == 0) { /* be protective */ } or if (!dumpable) { /* be protective */ } Correct logic: if (dumpable != SUID_DUMP_USER) { /* be protective */ } or if (dumpable != 1) { /* be protective */ } Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a user was able to ptrace attach to processes that had dropped privileges to that user. (This may have been partially mitigated if Yama was enabled.) The macros have been moved into the file that declares get/set_dumpable(), which means things like the ia64 code can see them too. CVE-2013-2929 Reported-by: Vasily Kulikov <segoon@openwall.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20n_tty: Fix n_tty_write crash when echoing in raw modePeter Hurley1-0/+2
The tty atomic_write_lock does not provide an exclusion guarantee for the tty driver if the termios settings are LECHO & !OPOST. And since it is unexpected and not allowed to call TTY buffer helpers like tty_insert_flip_string concurrently, this may lead to crashes when concurrect writers call pty_write. In that case the following two writers: * the ECHOing from a workqueue and * pty_write from the process race and can overflow the corresponding TTY buffer like follows. If we look into tty_insert_flip_string_fixed_flag, there is: int space = __tty_buffer_request_room(port, goal, flags); struct tty_buffer *tb = port->buf.tail; ... memcpy(char_buf_ptr(tb, tb->used), chars, space); ... tb->used += space; so the race of the two can result in something like this: A B __tty_buffer_request_room __tty_buffer_request_room memcpy(buf(tb->used), ...) tb->used += space; memcpy(buf(tb->used), ...) ->BOOM B's memcpy is past the tty_buffer due to the previous A's tb->used increment. Since the N_TTY line discipline input processing can output concurrently with a tty write, obtain the N_TTY ldisc output_lock to serialize echo output with normal tty writes. This ensures the tty buffer helper tty_insert_flip_string is not called concurrently and everything is fine. Note that this is nicely reproducible by an ordinary user using forkpty and some setup around that (raw termios + ECHO). And it is present in kernels at least after commit d945cb9cce20ac7143c2de8d88b187f62db99bdc (pty: Rework the pty layer to use the normal buffering logic) in 2.6.31-rc3. js: add more info to the commit log js: switch to bool js: lock unconditionally js: lock only the tty->ops->write call References: CVE-2014-0196 Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 4291086b1f081b869c6d79e5b7441633dc3ace00) [wt: 2.6.32 has no n_tty_data, so output_lock is in tty, not tty->disc_data] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20tcp_cubic: fix the range of delayed_ackLiu Yu1-1/+1
commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Liu Yu <allanyuliu@tencent.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 0cda345d1b2201dd15591b163e3c92bad5191745) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20tcp_cubic: limit delayed_ack ratio to prevent divide errorstephen hemminger1-2/+7
TCP Cubic keeps a metric that estimates the amount of delayed acknowledgements to use in adjusting the window. If an abnormally large number of packets are acknowledged at once, then the update could wrap and reach zero. This kind of ACK could only happen when there was a large window and huge number of ACK's were lost. This patch limits the value of delayed ack ratio. The choice of 32 is just a conservative value since normally it should be range of 1 to 4 packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit b9f47a3aaeabdce3b42829bbb27765fa340f76ba) [wt: in 2.6.32, this fix is needed for the next one] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20tcp: fix tcp_trim_head() to adjust segment count with skb MSSNeal Cardwell1-4/+2
This commit fixes tcp_trim_head() to recalculate the number of segments in the skb with the skb's existing MSS, so trimming the head causes the skb segment count to be monotonically non-increasing - it should stay the same or go down, but not increase. Previously tcp_trim_head() used the current MSS of the connection. But if there was a decrease in MSS between original transmission and ACK (e.g. due to PMTUD), this could cause tcp_trim_head() to counter-intuitively increase the segment count when trimming bytes off the head of an skb. This violated assumptions in tcp_tso_acked() that tcp_trim_head() only decreases the packet count, so that packets_acked in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to pass u32 pkts_acked values as large as 0xffffffff to ca_ops->pkts_acked(). As an aside, if tcp_trim_head() had really wanted the skb to reflect the current MSS, it should have called tcp_set_skb_tso_segs() unconditionally, since a decrease in MSS would mean that a single-packet skb should now be sliced into multiple segments. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 5b35e1e6e9ca651e6b291c96d1106043c9af314a) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20powernow-k6: reorder frequenciesMikulas Patocka1-7/+10
commit 22c73795b101597051924556dce019385a1e2fa0 upstream This patch reorders reported frequencies from the highest to the lowest, just like in other frequency drivers. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20powernow-k6: correctly initialize default parametersMikulas Patocka1-4/+72
commit d82b922a4acc1781d368aceac2f9da43b038cab2 upstream The powernow-k6 driver used to read the initial multiplier from the powernow register. However, there is a problem with this: * If there was a frequency transition before, the multiplier read from the register corresponds to the current multiplier. * If there was no frequency transition since reset, the field in the register always reads as zero, regardless of the current multiplier that is set using switches on the mainboard and that the CPU is running at. The zero value corresponds to multiplier 4.5, so as a consequence, the powernow-k6 driver always assumes multiplier 4.5. For example, if we have 550MHz CPU with bus frequency 100MHz and multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5 and bus frequency is 122MHz. The powernow-k6 driver then sets the multiplier to 4.5, underclocking the CPU to 450MHz, but reports the current frequency as 550MHz. There is no reliable way how to read the initial multiplier. I modified the driver so that it contains a table of known frequencies (based on parameters of existing CPUs and some common overclocking schemes) and sets the multiplier according to the frequency. If the frequency is unknown (because of unusual overclocking or underclocking), the user must supply the bus speed and maximum multiplier as module parameters. This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20powernow-k6: disable cache when changing frequencyMikulas Patocka1-17/+39
commit e20e1d0ac02308e2211306fc67abcd0b2668fb8b upstream I found out that a system with k6-3+ processor is unstable during network server load. The system locks up or the network card stops receiving. The reason for the instability is the CPU frequency scaling. During frequency transition the processor is in "EPM Stop Grant" state. The documentation says that the processor doesn't respond to inquiry requests in this state. Consequently, coherency of processor caches and bus master devices is not maintained, causing the system instability. This patch flushes the cache during frequency transition. It fixes the instability. Other minor changes: * u64 invalue changed to unsigned long because the variable is 32-bit * move the logic to set the multiplier to a separate function powernow_k6_set_cpu_multiplier * preserve lower 5 bits of the powernow port instead of 4 (the voltage field has 5 bits) * mask interrupts when reading the multiplier, so that the port is not open during other activity (running other kernel code with the port open shouldn't cause any misbehavior, but we should better be safe and keep the port closed) This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20powernow-k6: set transition latency value so ondemand governor can be usedKrzysztof Helt1-1/+1
Set the transition latency to value smaller than CPUFREQ_ETERNAL so governors other than "performance" work (like the "ondemand" one). The value is found in "AMD PowerNow! Technology Platform Design Guide for Embedded Processors" dated December 2000 (AMD doc #24267A). There is the answer to one of FAQs on page 40 which states that suggested complete transition period is 200 us. Tested on K6-2+ CPU with K6-3 core (model 13, stepping 4). Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Dave Jones <davej@redhat.com> (cherry picked from commit db2820dd5445a44b4726f15a2bc89b9ded2503eb) [wt: in 2.6.32, we only need this one so that next series applies cleanly] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20gianfar: disable TX vlan based on kernel 2.6.xZhu Yanjun1-7/+1
2.6.x kernels require a similar logic change as commit e1653c3e [gianfar: do vlan cleanup] and commit 51b8cbfc [gianfar: fix bug caused by e1653c3e] introduces for newer kernels. Since there is something wrong with tx vlan of gianfar nic driver, in kernel(3.1+), tx vlan is disabled. But in kernel 2.6.x, tx vlan is still enabled. Thus,gianfar nic driver can not support vlan packets and non-vlan packets at the same time. Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20x86, fpu, amd: Clear exceptions in AMD FXSAVE workaroundLinus Torvalds1-6/+7
Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog <me@halfdog.net> Tested-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> (cherry picked from commit 26bef1318adc1b3a530ecc807ef99346db2aa8b0) [wt: in 2.6.32, patch applies to arch/x86/include/asm/i387.h. There's no static_cpu_has() so we use boot_cpu_has() like other kernels do with gcc3. ] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20libertas: potential oops in debugfsDan Carpenter1-1/+5
If we do a zero size allocation then it will oops. Also we can't be sure the user passes us a NUL terminated string so I've added a terminator. This code can only be triggered by root. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> (cherry picked from commit a497e47d4aec37aaf8f13509f3ef3d1f6a717d88) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20Fix a few incorrectly checked [io_]remap_pfn_range() callsLinus Torvalds3-51/+17
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that really should use the vm_iomap_memory() helper. This trivially converts two of them to the helper, and comments about why the third one really needs to continue to use remap_pfn_range(), and adds the missing size check. Reported-by: Nico Golde <nico@ngolde.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 7314e613d5ff9f0934f7a0f74ed7973b903315d1) [wt: vm_flags were absent in mainline, Ben removed them in 3.2, but I kept them to minimize changes and avoid any side effect] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20vm: add vm_iomap_memory() helper functionLinus Torvalds2-0/+49
Various drivers end up replicating the code to mmap() their memory buffers into user space, and our core memory remapping function may be very flexible but it is unnecessarily complicated for the common cases to use. Our internal VM uses pfn's ("page frame numbers") which simplifies things for the VM, and allows us to pass physical addresses around in a denser and more efficient format than passing a "phys_addr_t" around, and having to shift it up and down by the page size. But it just means that drivers end up doing that shifting instead at the interface level. It also means that drivers end up mucking around with internal VM things like the vma details (vm_pgoff, vm_start/end) way more than they really need to. So this just exports a function to map a certain physical memory range into user space (using a phys_addr_t based interface that is much more natural for a driver) and hides all the complexity from the driver. Some drivers will still end up tweaking the vm_page_prot details for things like prefetching or cacheability etc, but that's actually relevant to the driver, rather than caring about what the page offset of the mapping is into the particular IO memory region. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e) [WT: only needed in 2.6.32 for next commit] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20inet: fix possible memory corruption with UDP_CORK and UFOHannes Frederic Sowa2-2/+2
[ This is a simplified -stable version of a set of upstream commits. ] This is a replacement patch only for stable which does fix the problems handled by the following two commits in -net: "ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d748be887cd7639b113ba7d7ef792a7efb9) "ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b) Three frames are written on a corked udp socket for which the output netdevice has UFO enabled. If the first and third frame are smaller than the mtu and the second one is bigger, we enqueue the second frame with skb_append_datato_frags without initializing the gso fields. This leads to the third frame appended regulary and thus constructing an invalid skb. This fixes the problem by always using skb_append_datato_frags as soon as the first frag got enqueued to the skb without marking the packet as SKB_GSO_UDP. The problem with only two frames for ipv6 was fixed by "ipv6: udp packets following an UFO enqueued packet need also be handled by UFO" (2811ebac2521ceac84f2bdae402455baa6a7fb47). Cc: Jiri Pirko <jiri@resnulli.us> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 5124ae99ac8a8f63d0fca9b75adaef40b20678ff) Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa1-19/+12
In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 2811ebac2521ceac84f2bdae402455baa6a7fb47) [wt: 2.6.32 doesn't have dontfrag so remove the optimization] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20aacraid: prevent invalid pointer dereferenceMahesh Rajashekhara1-1/+2
It appears that driver runs into a problem here if fibsize is too small because we allocate user_srbcmd with fibsize size only but later we access it until user_srbcmd->sg.count to copy it over to srbcmd. It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this structure already includes one sg element and this is not needed for commands without data. So, we would recommend to add the following (instead of test for fibsize == 0). Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b4789b8e6be3151a955ade74872822f30e8cd914) CVE-2013-6380 Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20sctp: unbalanced rcu lock in ip_queue_xmit()Nicolas Dichtel1-1/+1
The bug was introduced in 2.6.32.61 by commit b8710128e201 ("inet: add RCU protection to inet->opt") (it's a backport of upstream commit f6d8bd051c39). In SCTP case, packet is already routed, hence we jump to the label 'packet_routed', but without rcu_read_lock(). After this label, rcu_read_unlock() is called unconditionally. Spotted-by: Guo Fengtian <fengtian.guo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20isdnloop: Validate NUL-terminated strings from user.YOSHIFUJI Hideaki1-0/+6
[ Upstream commit 77bc6bed7121936bb2e019a8c336075f4c8eef62 ] Return -EINVAL unless all of user-given strings are correctly NUL-terminated. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20rds: prevent dereference of a NULL device in rds_iw_laddr_checkSasha Levin1-1/+2
[ Upstream commit bf39b4247b8799935ea91d90db250ab608a58e50 ] Binding might result in a NULL device which is later dereferenced without checking. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20isdnloop: several buffer overflowsDan Carpenter1-8/+9
[ Upstream commit 7563487cbf865284dcd35e9ef5a95380da046737 ] There are three buffer overflows addressed in this patch. 1) In isdnloop_fake_err() we add an 'E' to a 60 character string and then copy it into a 60 character buffer. I have made the destination buffer 64 characters and I'm changed the sprintf() to a snprintf(). 2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60 character buffer so we have 54 characters. The ->eazlist[] is 11 characters long. I have modified the code to return if the source buffer is too long. 3) In isdnloop_command() the cbuf[] array was 60 characters long but the max length of the string then can be up to 79 characters. I made the cbuf array 80 characters long and changed the sprintf() to snprintf(). I also removed the temporary "dial" buffer and changed it to use "p" directly. Unfortunately, we pass the "cbuf" string from isdnloop_command() to isdnloop_writecmd() which truncates anything over 60 characters to make it fit in card->omsg[]. (It can accept values up to 255 characters so long as there is a '\n' character every 60 characters). For now I have just fixed the memory corruption bug and left the other problems in this driver alone. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20netlink: don't compare the nul-termination in nla_strcmpPablo Neira1-2/+8
[ Upstream commit 8b7b932434f5eee495b91a2804f5b64ebb2bc835 ] nla_strcmp compares the string length plus one, so it's implicitly including the nul-termination in the comparison. int nla_strcmp(const struct nlattr *nla, const char *str) { int len = strlen(str) + 1; ... d = memcmp(nla_data(nla), str, len); However, if NLA_STRING is used, userspace can send us a string without the nul-termination. This is a problem since the string comparison will not match as the last byte may be not the nul-termination. Fix this by skipping the comparison of the nul-termination if the attribute data is nul-terminated. Suggested by Thomas Graf. Cc: Florian Westphal <fw@strlen.de> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: socket: error on a negative msg_namelenMatthew Leach1-0/+4
[ Upstream commit dbb490b96584d4e958533fb637f08b557f505657 ] When copying in a struct msghdr from the user, if the user has set the msg_namelen parameter to a negative value it gets clamped to a valid size due to a comparison between signed and unsigned values. Ensure the syscall errors when the user passes in a negative value. Signed-off-by: Matthew Leach <matthew.leach@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunkDaniel Borkmann2-6/+2
[ Upstream commit c485658bae87faccd7aed540fd2ca3ab37992310 ] While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable"), we noticed that there's a skb memory leakage in the error path. Running the same reproducer as in ec0223ec48a9 and by unconditionally jumping to the error label (to simulate an error condition) in sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about the unfreed chunk->auth_chunk skb clone: Unreferenced object 0xffff8800b8f3a000 (size 256): comm "softirq", pid 0, jiffies 4294769856 (age 110.757s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X......... backtrace: [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210 [<ffffffff81566929>] skb_clone+0x49/0xb0 [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp] [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp] [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp] [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210 [<ffffffff815a64af>] nf_reinject+0xbf/0x180 [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue] [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink] [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0 [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink] [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250 [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0 [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0 [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380 What happens is that commit bbd0d59809f9 clones the skb containing the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case that an endpoint requires COOKIE-ECHO chunks to be authenticated: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- ------------------ AUTH; COOKIE-ECHO ----------------> <-------------------- COOKIE-ACK --------------------- When we enter sctp_sf_do_5_1D_ce() and before we actually get to the point where we process (and subsequently free) a non-NULL chunk->auth_chunk, we could hit the "goto nomem_init" path from an error condition and thus leave the cloned skb around w/o freeing it. The fix is to centrally free such clones in sctp_chunk_destroy() handler that is invoked from sctp_chunk_free() after all refs have dropped; and also move both kfree_skb(chunk->auth_chunk) there, so that chunk->auth_chunk is either NULL (since sctp_chunkify() allocs new chunks through kmem_cache_zalloc()) or non-NULL with a valid skb pointer. chunk->skb and chunk->auth_chunk are the only skbs in the sctp_chunk structure that need to be handeled. While at it, we should use consume_skb() for both. It is the same as dev_kfree_skb() but more appropriately named as we are not a device but a protocol. Also, this effectively replaces the kfree_skb() from both invocations into consume_skb(). Functions are the same only that kfree_skb() assumes that the frame was being dropped after a failure (e.g. for tools like drop monitor), usage of consume_skb() seems more appropriate in function sctp_chunk_destroy() though. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <yasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capableDaniel Borkmann1-0/+7
[ Upstream commit ec0223ec48a90cb605244b45f7c62de856403729 ] RFC4895 introduced AUTH chunks for SCTP; during the SCTP handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS being optional though): ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- A special case is when an endpoint requires COOKIE-ECHO chunks to be authenticated: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- ------------------ AUTH; COOKIE-ECHO ----------------> <-------------------- COOKIE-ACK --------------------- RFC4895, section 6.3. Receiving Authenticated Chunks says: The receiver MUST use the HMAC algorithm indicated in the HMAC Identifier field. If this algorithm was not specified by the receiver in the HMAC-ALGO parameter in the INIT or INIT-ACK chunk during association setup, the AUTH chunk and all the chunks after it MUST be discarded and an ERROR chunk SHOULD be sent with the error cause defined in Section 4.1. [...] If no endpoint pair shared key has been configured for that Shared Key Identifier, all authenticated chunks MUST be silently discarded. [...] When an endpoint requires COOKIE-ECHO chunks to be authenticated, some special procedures have to be followed because the reception of a COOKIE-ECHO chunk might result in the creation of an SCTP association. If a packet arrives containing an AUTH chunk as a first chunk, a COOKIE-ECHO chunk as the second chunk, and possibly more chunks after them, and the receiver does not have an STCB for that packet, then authentication is based on the contents of the COOKIE-ECHO chunk. In this situation, the receiver MUST authenticate the chunks in the packet by using the RANDOM parameters, CHUNKS parameters and HMAC_ALGO parameters obtained from the COOKIE-ECHO chunk, and possibly a local shared secret as inputs to the authentication procedure specified in Section 6.3. If authentication fails, then the packet is discarded. If the authentication is successful, the COOKIE-ECHO and all the chunks after the COOKIE-ECHO MUST be processed. If the receiver has an STCB, it MUST process the AUTH chunk as described above using the STCB from the existing association to authenticate the COOKIE-ECHO chunk and all the chunks after it. [...] Commit bbd0d59809f9 introduced the possibility to receive and verification of AUTH chunk, including the edge case for authenticated COOKIE-ECHO. On reception of COOKIE-ECHO, the function sctp_sf_do_5_1D_ce() handles processing, unpacks and creates a new association if it passed sanity checks and also tests for authentication chunks being present. After a new association has been processed, it invokes sctp_process_init() on the new association and walks through the parameter list it received from the INIT chunk. It checks SCTP_PARAM_RANDOM, SCTP_PARAM_HMAC_ALGO and SCTP_PARAM_CHUNKS, and copies them into asoc->peer meta data (peer_random, peer_hmacs, peer_chunks) in case sysctl -w net.sctp.auth_enable=1 is set. If in INIT's SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set, peer_random != NULL and peer_hmacs != NULL the peer is to be assumed asoc->peer.auth_capable=1, in any other case asoc->peer.auth_capable=0. Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is available, we set up a fake auth chunk and pass that on to sctp_sf_authenticate(), which at latest in sctp_auth_calculate_hmac() reliably dereferences a NULL pointer at position 0..0008 when setting up the crypto key in crypto_hash_setkey() by using asoc->asoc_shared_key that is NULL as condition key_id == asoc->active_key_id is true if the AUTH chunk was injected correctly from remote. This happens no matter what net.sctp.auth_enable sysctl says. The fix is to check for net->sctp.auth_enable and for asoc->peer.auth_capable before doing any operations like sctp_sf_authenticate() as no key is activated in sctp_auth_asoc_init_active_key() for each case. Now as RFC4895 section 6.3 states that if the used HMAC-ALGO passed from the INIT chunk was not used in the AUTH chunk, we SHOULD send an error; however in this case it would be better to just silently discard such a maliciously prepared handshake as we didn't even receive a parameter at all. Also, as our endpoint has no shared key configured, section 6.3 says that MUST silently discard, which we are doing from now onwards. Before calling sctp_sf_pdiscard(), we need not only to free the association, but also the chunk->auth_chunk skb, as commit bbd0d59809f9 created a skb clone in that case. I have tested this locally by using netfilter's nfqueue and re-injecting packets into the local stack after maliciously modifying the INIT chunk (removing RANDOM; HMAC-ALGO param) and the SCTP packet containing the COOKIE_ECHO (injecting AUTH chunk before COOKIE_ECHO). Fixed with this patch applied. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <yasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20tg3: Don't check undefined error bits in RXBDMichael Chan2-3/+6
[ Upstream commit d7b95315cc7f441418845a165ee56df723941487 ] Redefine the RXD_ERR_MASK to include only relevant error bits. This fixes a customer reported issue of randomly dropping packets on the 5719. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20virtio-net: alloc big buffers also when guest can receive UFOJason Wang1-1/+2
[ Upstream commit 0e7ede80d929ff0f830c44a543daa1acd590c749 ] We should alloc big buffers also when guest can receive UFO packets to let the big packets fit into guest rx buffer. Fixes 5c5167515d80f78f6bb538492c423adcae31ad65 (virtio-net: Allow UFO feature to be set and advertised.) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: sctp: fix sctp_connectx abi for ia32 emulation/compat modeDaniel Borkmann1-9/+32
[ Upstream commit ffd5939381c609056b33b7585fb05a77b4c695f3 ] SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of 'struct sctp_getaddrs_old' which includes a struct sockaddr pointer, sizeof(param) check will always fail in kernel as the structure in 64bit kernel space is 4bytes larger than for user binaries compiled in 32bit mode. Thus, applications making use of sctp_connectx() won't be able to run under such circumstances. Introduce a compat interface in the kernel to deal with such situations by using a 'struct compat_sctp_getaddrs_old' structure where user data is copied into it, and then sucessively transformed into a 'struct sctp_getaddrs_old' structure with the help of compat_ptr(). That fixes sctp_connectx() abi without any changes needed in user space, and lets the SCTP test suite pass when compiled in 32bit and run on 64bit kernels. Fixes: f9c67811ebc0 ("sctp: Fix regression introduced by new sctp_connectx api") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20bonding: 802.3ad: make aggregator_identifier bond-privateJiri Bohac2-4/+3
[ Upstream commit 163c8ff30dbe473abfbb24a7eac5536c87f3baa9 ] aggregator_identifier is used to assign unique aggregator identifiers to aggregators of a bond during device enslaving. aggregator_identifier is currently a global variable that is zeroed in bond_3ad_initialize(). This sequence will lead to duplicate aggregator identifiers for eth1 and eth3: create bond0 change bond0 mode to 802.3ad enslave eth0 to bond0 //eth0 gets agg id 1 enslave eth1 to bond0 //eth1 gets agg id 2 create bond1 change bond1 mode to 802.3ad enslave eth2 to bond1 //aggregator_identifier is reset to 0 //eth2 gets agg id 1 enslave eth3 to bond0 //eth3 gets agg id 2 Fix this by making aggregator_identifier private to the bond. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20tg3: Fix deadlock in tg3_change_mtu()Nithin Sujir1-2/+2
[ Upstream commit c6993dfd7db9b0c6b7ca7503a56fda9236a4710f ] Quoting David Vrabel - "5780 cards cannot have jumbo frames and TSO enabled together. When jumbo frames are enabled by setting the MTU, the TSO feature must be cleared. This is done indirectly by calling netdev_update_features() which will call tg3_fix_features() to actually clear the flags. netdev_update_features() will also trigger a new netlink message for the feature change event which will result in a call to tg3_get_stats64() which deadlocks on the tg3 lock." tg3_set_mtu() does not need to be under the tg3 lock since converting the flags to use set_bit(). Move it out to after tg3_netif_stop(). Reported-by: David Vrabel <david.vrabel@citrix.com> Tested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: fix 'ip rule' iif/oif device renameMaciej Zenczykowski1-0/+7
[ Upstream commit 946c032e5a53992ea45e062ecb08670ba39b99e3 ] ip rules with iif/oif references do not update: (detach/attach) across interface renames. Signed-off-by: Maciej Zenczykowski <maze@google.com> CC: Willem de Bruijn <willemb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: Chris Davis <chrismd@google.com> CC: Carlo Contavalli <ccontavalli@google.com> Google-Bug-Id: 12936021 Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20inet_diag: fix inet_diag_dump_icsk() timewait socket state logicNeal Cardwell1-1/+3
[ Based upon upstream commit 70315d22d3c7383f9a508d0aab21e2eb35b2303a ] Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and FIN_WAIT2 connections are represented by inet_timewait_sock (not just TIME_WAIT). Thus: (a) We need to iterate through the time_wait buckets if the user wants either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state fin-wait-2" would not return any sockets, even if there were some in FIN_WAIT2.) (b) We need to check tw_substate to see if the user wants to dump sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a given connection is in. (Before fixing this, "ss -nemoi state time-wait" would actually return sockets in state FIN_WAIT2.) An analogous fix is in v3.13: 70315d22d3c7383f9a508d0aab21e2eb35b2303a ("inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets") but that patch is quite different because 3.13 code is very different in this area due to the unification of TCP hash tables in 05dbc7b ("tcp/dccp: remove twchain") in v3.13-rc1. I tested that this applies cleanly between v3.3 and v3.12, and tested that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2 and earlier (though it makes semantic sense), and semantically is not the right fix for 3.13 and beyond (as mentioned above). Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: llc: fix use after free in llc_ui_recvmsgDaniel Borkmann1-2/+3
[ Upstream commit 4d231b76eef6c4a6bd9c96769e191517765942cb ] While commit 30a584d944fb fixes datagram interface in LLC, a use after free bug has been introduced for SOCK_STREAM sockets that do not make use of MSG_PEEK. The flow is as follow ... if (!(flags & MSG_PEEK)) { ... sk_eat_skb(sk, skb, false); ... } ... if (used + offset < skb->len) continue; ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache original length and work on skb_len to check partial reads. Fixes: 30a584d944fb ("[LLX]: SOCK_DGRAM interface fixes") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: rose: restore old recvmsg behaviorFlorian Westphal1-12/+4
[ Upstream commit f81152e35001e91997ec74a7b4e040e6ab0acccf ] recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen. After commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic), we now always take the else branch due to namelen being initialized to 0. Digging in netdev-vger-cvs git repo shows that msg_namelen was initialized with a fixed-size since at least 1995, so the else branch was never taken. Compile tested only. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20rds: prevent dereference of a NULL deviceSasha Levin1-1/+2
[ Upstream commit c2349758acf1874e4c2b93fe41d072336f1a31d0 ] Binding might result in a NULL device, which is dereferenced causing this BUG: [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097 4 [ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0 [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1317.264179] Dumping ftrace buffer: [ 1317.264774] (ftrace buffer empty) [ 1317.265220] Modules linked in: [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G W 3.13.0-rc4- next-20131218-sasha-00013-g2cebb9b-dirty #4159 [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000 [ 1317.268399] RIP: 0010:[<ffffffff84225f52>] [<ffffffff84225f52>] rds_ib_laddr_check+ 0x82/0x110 [ 1317.269670] RSP: 0000:ffff8803cd31bdf8 EFLAGS: 00010246 [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000 [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286 [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000 [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031 [ 1317.270230] FS: 00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000 0000 [ 1317.270230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0 [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 [ 1317.270230] Stack: [ 1317.270230] 0000000054086700 5408670000a25de0 5408670000000002 0000000000000000 [ 1317.270230] ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160 [ 1317.270230] ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280 [ 1317.270230] Call Trace: [ 1317.270230] [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0 [ 1317.270230] [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0 [ 1317.270230] [<ffffffff8421c9c3>] rds_bind+0x73/0xf0 [ 1317.270230] [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0 [ 1317.270230] [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0 [ 1317.270230] [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10 [ 1317.270230] [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290 [ 1317.270230] [<ffffffff83e4cece>] SyS_bind+0xe/0x10 [ 1317.270230] [<ffffffff843a6ad0>] tracesys+0xdd/0xe2 [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00 89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7 4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02 [ 1317.270230] RIP [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.270230] RSP <ffff8803cd31bdf8> [ 1317.270230] CR2: 0000000000000974 Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20hamradio/yam: fix info leak in ioctlSalva Peiró1-0/+1
[ Upstream commit 8e3fbf870481eb53b2d3a322d1fc395ad8b367ed ] The yam_ioctl() code fails to initialise the cmd field of the struct yamdrv_ioctl_cfg. Add an explicit memset(0) before filling the structure to avoid the 4-byte info leak. Signed-off-by: Salva Peiró <speiro@ai2.upv.es> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl()Wenliang Fan1-0/+2
[ Upstream commit e9db5c21d3646a6454fcd04938dd215ac3ab620a ] The local variable 'bi' comes from userspace. If userspace passed a large number to 'bi.data.calibrate', there would be an integer overflow in the following line: s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; Signed-off-by: Wenliang Fan <fanwlexca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: unix: allow bind to fail on mutex lockSasha Levin1-2/+6
[ Upstream commit 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490 ] This is similar to the set_peek_off patch where calling bind while the socket is stuck in unix_dgram_recvmsg() will block and cause a hung task spew after a while. This is also the last place that did a straightforward mutex_lock(), so there shouldn't be any more of these patches. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20net: drop_monitor: fix the value of maxattrChangli Gao1-1/+0
[ Upstream commit d323e92cc3f4edd943610557c9ea1bb4bb5056e8 ] maxattr in genl_family should be used to save the max attribute type, but not the max command type. Drop monitor doesn't support any attributes, so we should leave it as zero. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20{pktgen, xfrm} Update IPv4 header total len and checksum after tranformationfan.du1-0/+7
[ Upstream commit 3868204d6b89ea373a273e760609cb08020beb1a ] commit a553e4a6317b2cfc7659542c10fe43184ffe53da ("[PKTGEN]: IPSEC support") tried to support IPsec ESP transport transformation for pktgen, but acctually this doesn't work at all for two reasons(The orignal transformed packet has bad IPv4 checksum value, as well as wrong auth value, reported by wireshark) - After transpormation, IPv4 header total length needs update, because encrypted payload's length is NOT same as that of plain text. - After transformation, IPv4 checksum needs re-caculate because of payload has been changed. With this patch, armmed pktgen with below cofiguration, Wireshark is able to decrypted ESP packet generated by pktgen without any IPv4 checksum error or auth value error. pgset "flag IPSEC" pgset "flows 1" Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20ipv6: fix possible seqlock deadlock in ip6_finish_output2Hannes Frederic Sowa1-2/+2
[ Upstream commit 7f88c6b23afbd31545c676dea77ba9593a1a14bf ] IPv6 stats are 64 bits and thus are protected with a seqlock. By not disabling bottom-half we could deadlock here if we don't disable bh and a softirq reentrantly updates the same mib. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20inet: fix possible seqlock deadlocksEric Dumazet2-2/+2
[ Upstream commit f1d8cba61c3c4b1eb88e507249c4cb8d635d9a76 ] In commit c9e9042994d3 ("ipv4: fix possible seqlock deadlock") I left another places where IP_INC_STATS_BH() were improperly used. udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from process context, not from softirq context. This was detected by lockdep seqlock support. Reported-by: jongman heo <jongman.heo@samsung.com> Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP") Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2014-05-20bridge: flush br's address entry in fdb when remove the bridge devDing Tianhong1-0/+2
[ Upstream commit f873042093c0b418d2351fe142222b625c740149 ] When the following commands are executed: brctl addbr br0 ifconfig br0 hw ether <addr> rmmod bridge The calltrace will occur: [ 563.312114] device eth1 left promiscuous mode [ 563.312188] br0: port 1(eth1) entered disabled state [ 563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects [ 563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G O 3.12.0-0.7-default+ #9 [ 563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 563.468200] 0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8 [ 563.468204] ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8 [ 563.468206] ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78 [ 563.468209] Call Trace: [ 563.468218] [<ffffffff814d1c92>] dump_stack+0x6a/0x78 [ 563.468234] [<ffffffff81148efd>] kmem_cache_destroy+0xfd/0x100 [ 563.468242] [<ffffffffa062a270>] br_fdb_fini+0x10/0x20 [bridge] [ 563.468247] [<ffffffffa063ac76>] br_deinit+0x4e/0x50 [bridge] [ 563.468254] [<ffffffff810c7dc9>] SyS_delete_module+0x199/0x2b0 [ 563.468259] [<ffffffff814e0922>] system_call_fastpath+0x16/0x1b [ 570.377958] Bridge firewalling registered --------------------------- cut here ------------------------------- The reason is that when the bridge dev's address is changed, the br_fdb_change_mac_address() will add new address in fdb, but when the bridge was removed, the address entry in the fdb did not free, the bridge_fdb_cache still has objects when destroy the cache, Fix this by flushing the bridge address entry when removing the bridge. v2: according to the Toshiaki Makita and Vlad's suggestion, I only delete the vlan0 entry, it still have a leak here if the vlan id is other number, so I need to call fdb_delete_by_port(br, NULL, 1) to flush all entries whose dst is NULL for the bridge. Suggested-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>