commit 289b6c71f0476983dfccf772fb03878eaa5bde27 Author: Greg Kroah-Hartman Date: Sun Dec 8 08:18:58 2013 -0800 Linux 3.12.4 commit 30b722d9d7242767e009234abee86d5cc7eb03d7 Author: Pierre Ossman Date: Wed Nov 6 20:00:32 2013 +0100 drm/radeon/audio: correct ACR table commit 3e71985f2439d8c4090dc2820e497e6f3d72dcff upstream. The values were taken from the HDMI spec, but they assumed exact x/1.001 clocks. Since we round the clocks, we also need to calculate different N and CTS values. Note that the N for 25.2/1.001 MHz at 44.1 kHz audio is out of spec. Hopefully this mode is rarely used and/or HDMI sinks tolerate overly large values of N. bug: https://bugs.freedesktop.org/show_bug.cgi?id=69675 Signed-off-by: Pierre Ossman Signed-off-by: Alex Deucher Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit e610724c69a3445f399d20a12f6cd218ad153ed0 Author: Pierre Ossman Date: Wed Nov 6 20:09:08 2013 +0100 drm/radeon/audio: improve ACR calculation commit a2098250fbda149cfad9e626afe80abe3b21e574 upstream. In order to have any realistic chance of calculating proper ACR values, we need to be able to calculate both N and CTS, not just CTS. We still aim for the ideal N as specified in the HDMI spec though. bug: https://bugs.freedesktop.org/show_bug.cgi?id=69675 Signed-off-by: Pierre Ossman Signed-off-by: Alex Deucher Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit f47be34aa63fbea1a3857f9e8be5619bded8e5cf Author: Gu Zheng Date: Wed Dec 4 18:19:06 2013 +0800 aio: clean up aio ring in the fail path commit d1b9432712a25eeb06114fb4b587133525a47de5 upstream. Clean up the aio ring file in the fail path of aio_setup_ring and ioctx_alloc. And maybe it can fix the GPF issue reported by Dave Jones: https://lkml.org/lkml/2013/11/25/898 Signed-off-by: Gu Zheng Signed-off-by: Benjamin LaHaise Signed-off-by: Greg Kroah-Hartman commit efbac2ca2f13d43a9aab37b84a3cca4d73b09c7c Author: Sasha Levin Date: Tue Nov 19 17:33:03 2013 -0500 aio: nullify aio->ring_pages after freeing it commit ddb8c45ba15149ebd41d7586261c05f7ca37f9a1 upstream. After freeing ring_pages we leave it as is causing a dangling pointer. This has already caused an issue so to help catching any issues in the future NULL it out. Signed-off-by: Sasha Levin Signed-off-by: Benjamin LaHaise Signed-off-by: Greg Kroah-Hartman commit f50db974cb75e31d98b176c3c9ea92e57aa97a1b Author: Sasha Levin Date: Tue Nov 19 17:33:02 2013 -0500 aio: prevent double free in ioctx_alloc commit d558023207e008a4476a3b7bb8706b2a2bf5d84f upstream. ioctx_alloc() calls aio_setup_ring() to allocate a ring. If aio_setup_ring() fails to do so it would call aio_free_ring() before returning, but ioctx_alloc() would call aio_free_ring() again causing a double free of the ring. This is easily reproducible from userspace. Signed-off-by: Sasha Levin Signed-off-by: Benjamin LaHaise Signed-off-by: Greg Kroah-Hartman commit 603f89ac73b7f286d3fdbc26db0ada25b06bd333 Author: Dan Carpenter Date: Wed Nov 13 10:49:40 2013 +0300 aio: checking for NULL instead of IS_ERR commit 7f62656be8a8ef14c168db2d98021fb9c8cc1076 upstream. alloc_anon_inode() returns an ERR_PTR(), it doesn't return NULL. Fixes: 71ad7490c1f3 ('rework aio migrate pages to use aio fs') Signed-off-by: Dan Carpenter Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit 79b5389c84a342525b7cd1624251444a10518b95 Author: Benjamin LaHaise Date: Tue Sep 17 10:18:25 2013 -0400 rework aio migrate pages to use aio fs commit 71ad7490c1f32bd7829df76360f9fa17829868f3 upstream. Don't abuse anon_inodes.c to host private files needed by aio; we can bloody well declare a mini-fs of our own instead of patching up what anon_inodes can create for us. Tested-by: Benjamin LaHaise Acked-by: Benjamin LaHaise Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit 4875cd486d22cb6be9c1f77fd951310f0aaeb433 Author: Al Viro Date: Wed Oct 2 22:35:11 2013 -0400 take anon inode allocation to libfs.c commit 6987843ff7e836ea65b554905aec34d2fad05c94 upstream. Signed-off-by: Al Viro Cc: Benjamin LaHaise Signed-off-by: Greg Kroah-Hartman commit 9a3809da8d4e37a8039fe308de63396eb18b3e89 Author: Kent Overstreet Date: Thu Oct 10 19:31:47 2013 -0700 aio: Fix a trinity splat commit e34ecee2ae791df674dfb466ce40692ca6218e43 upstream. aio kiocb refcounting was broken - it was relying on keeping track of the number of available ring buffer entries, which it needs to do anyways; then at shutdown time it'd wait for completions to be delivered until the # of available ring buffer entries equalled what it was initialized to. Problem with that is that the ring buffer is mapped writable into userspace, so userspace could futz with the head and tail pointers to cause the kernel to see extra completions, and cause free_ioctx() to return while there were still outstanding kiocbs. Which would be bad. Fix is just to directly refcount the kiocbs - which is more straightforward, and with the new percpu refcounting code doesn't cost us any cacheline bouncing which was the whole point of the original scheme. Also clean up ioctx_alloc()'s error path and fix a bug where it wasn't subtracting from aio_nr if ioctx_add_table() failed. Signed-off-by: Kent Overstreet Cc: Benjamin LaHaise Signed-off-by: Greg Kroah-Hartman commit 27135f5f3ba151eefff1305488dc98cbbc216710 Author: Miroslav Lichvar Date: Thu Aug 1 19:31:35 2013 +0200 ntp: Make periodic RTC update more reliable commit a97ad0c4b447a132a322cedc3a5f7fa4cab4b304 upstream. The current code requires that the scheduled update of the RTC happens in the closest tick to the half of the second. This seems to be difficult to achieve reliably. The scheduled work may be missing the target time by a tick or two and be constantly rescheduled every second. Relax the limit to 10 ticks. As a typical RTC drifts in the 11-minute update interval by several milliseconds, this shouldn't affect the overall accuracy of the RTC much. Signed-off-by: Miroslav Lichvar Signed-off-by: John Stultz Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit fc655139edb1557a66fefe945ceb61715b18f929 Author: Tomoki Sekiyama Date: Tue Oct 15 16:42:19 2013 -0600 elevator: acquire q->sysfs_lock in elevator_change() commit 7c8a3679e3d8e9d92d58f282161760a0e247df97 upstream. Add locking of q->sysfs_lock into elevator_change() (an exported function) to ensure it is held to protect q->elevator from elevator_init(), even if elevator_change() is called from non-sysfs paths. sysfs path (elv_iosched_store) uses __elevator_change(), non-locking version, as the lock is already taken by elv_iosched_store(). Signed-off-by: Tomoki Sekiyama Signed-off-by: Jens Axboe Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit d6a5267e403642c8802e5bc59cf754ba44a8993b Author: Tomoki Sekiyama Date: Tue Oct 15 16:42:16 2013 -0600 elevator: Fix a race in elevator switching and md device initialization commit eb1c160b22655fd4ec44be732d6594fd1b1e44f4 upstream. The soft lockup below happens at the boot time of the system using dm multipath and the udev rules to switch scheduler. [ 356.127001] BUG: soft lockup - CPU#3 stuck for 22s! [sh:483] [ 356.127001] RIP: 0010:[] [] lock_timer_base.isra.35+0x1d/0x50 ... [ 356.127001] Call Trace: [ 356.127001] [] try_to_del_timer_sync+0x20/0x70 [ 356.127001] [] ? kmem_cache_alloc_node_trace+0x20a/0x230 [ 356.127001] [] del_timer_sync+0x52/0x60 [ 356.127001] [] cfq_exit_queue+0x32/0xf0 [ 356.127001] [] elevator_exit+0x2f/0x50 [ 356.127001] [] elevator_change+0xf1/0x1c0 [ 356.127001] [] elv_iosched_store+0x20/0x50 [ 356.127001] [] queue_attr_store+0x59/0xb0 [ 356.127001] [] sysfs_write_file+0xc6/0x140 [ 356.127001] [] vfs_write+0xbd/0x1e0 [ 356.127001] [] SyS_write+0x49/0xa0 [ 356.127001] [] system_call_fastpath+0x16/0x1b This is caused by a race between md device initialization by multipathd and shell script to switch the scheduler using sysfs. - multipathd: SyS_ioctl -> do_vfs_ioctl -> dm_ctl_ioctl -> ctl_ioctl -> table_load -> dm_setup_md_queue -> blk_init_allocated_queue -> elevator_init q->elevator = elevator_alloc(q, e); // not yet initialized - sh -c 'echo deadline > /sys/$DEVPATH/queue/scheduler': elevator_switch (in the call trace above) struct elevator_queue *old = q->elevator; q->elevator = elevator_alloc(q, new_e); elevator_exit(old); // lockup! (*) - multipathd: (cont.) err = e->ops.elevator_init_fn(q); // init fails; q->elevator is modified (*) When del_timer_sync() is called, lock_timer_base() will loop infinitely while timer->base == NULL. In this case, as timer will never initialized, it results in lockup. This patch introduces acquisition of q->sysfs_lock around elevator_init() into blk_init_allocated_queue(), to provide mutual exclusion between initialization of the q->scheduler and switching of the scheduler. This should fix this bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=902012 Signed-off-by: Tomoki Sekiyama Signed-off-by: Jens Axboe Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit f8bf6d2085715c4326f255ad93ce32e9c4c9c550 Author: Stanislaw Gruszka Date: Wed Sep 25 15:34:55 2013 +0200 rt2800: add support for radio chip RF3070 commit 3b9b74baa1af2952d719735b4a4a34706a593948 upstream. Add support for new RF chip ID: 3070. It seems to be the same as 5370, maybe vendor just put wrong value on the eeprom, but add this id anyway since devices with it showed on the marked. Signed-off-by: Stanislaw Gruszka Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 377fc8e63fb846b56d9565624a041015d5a120d7 Author: Neil Horman Date: Fri Sep 27 12:53:35 2013 -0400 iommu: Remove stack trace from broken irq remapping warning commit 05104a4e8713b27291c7bb49c1e7e68b4e243571 upstream. The warning for the irq remapping broken check in intel_irq_remapping.c is pretty pointless. We need the warning, but we know where its comming from, the stack trace will always be the same, and it needlessly triggers things like Abrt. This changes the warning to just print a text warning about BIOS being broken, without the stack trace, then sets the appropriate taint bit. Since we automatically disable irq remapping, theres no need to contiue making Abrt jump at this problem Signed-off-by: Neil Horman CC: Joerg Roedel CC: Bjorn Helgaas CC: Andy Lutomirski CC: Konrad Rzeszutek Wilk CC: Sebastian Andrzej Siewior Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 18963511320a6eae19e6a584d099a7e549803cfa Author: Julian Stecklina Date: Wed Oct 9 10:03:52 2013 +0200 iommu/vt-d: Fixed interaction of VFIO_IOMMU_MAP_DMA with IOMMU address limits commit f9423606ade08653dd8a43334f0a7fb45504c5cc upstream. The BUG_ON in drivers/iommu/intel-iommu.c:785 can be triggered from userspace via VFIO by calling the VFIO_IOMMU_MAP_DMA ioctl on a vfio device with any address beyond the addressing capabilities of the IOMMU. The problem is that the ioctl code calls iommu_iova_to_phys before it calls iommu_map. iommu_map handles the case that it gets addresses beyond the addressing capabilities of its IOMMU. intel_iommu_iova_to_phys does not. This patch fixes iommu_iova_to_phys to return NULL for addresses beyond what the IOMMU can handle. This in turn causes the ioctl call to fail in iommu_map and (correctly) return EFAULT to the user with a helpful warning message in the kernel log. Signed-off-by: Julian Stecklina Acked-by: Alex Williamson Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit 30699c9661d6ecf0fb31804723f5d870865a617a Author: Oliver Neukum Date: Wed Sep 25 09:42:55 2013 +0200 HID: hid-elo: some systems cannot stomach work around commit 403cfb53fb450d53751fdc7ee0cd6652419612cf upstream. Some systems although they have firmware class 'M', which usually needs a work around to not crash, must not be subjected to the work around because the work around crashes them. They cannot be told apart by their own device descriptor, but as they are part of compound devices, can be identified by looking at their siblings. Signed-off-by: Oliver Neukum Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit fef8c18a3f38d5446328997aaf7c8a56534df8c6 Author: Simon Wood Date: Thu Oct 10 08:20:13 2013 -0600 HID: lg: fix Report Descriptor for Logitech MOMO Force (Black) commit 348cbaa800f8161168b20f85f72abb541c145132 upstream. By default the Logitech MOMO Force (Black) presents a combined accel/brake axis ('Y'). This patch modifies the HID descriptor to present seperate accel/brake axes ('Y' and 'Z'). Signed-off-by: Simon Wood Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit acc142b485c5d09358d9269834fb3ba8ceeeed0a Author: Sasha Levin Date: Tue Nov 19 14:25:36 2013 -0500 video: kyro: fix incorrect sizes when copying to userspace commit 2ab68ec927310dc488f3403bb48f9e4ad00a9491 upstream. kyro would copy u32s and specify sizeof(unsigned long) as the size to copy. This would copy more data than intended and cause memory corruption and might leak kernel memory. Signed-off-by: Sasha Levin Signed-off-by: Tomi Valkeinen Signed-off-by: Greg Kroah-Hartman commit aafc56772e2156d2c28fe6ce072d115c0e34834e Author: Thomas Pugliese Date: Wed Oct 23 14:44:29 2013 -0500 usb: wusbcore: change WA_SEGS_MAX to a legal value commit f74b75e7f920c700636cccca669c7d16d12e9202 upstream. change WA_SEGS_MAX to a number that is legal according to the WUSB spec. Signed-off-by: Thomas Pugliese Signed-off-by: Greg Kroah-Hartman commit ed32875deb3b4e319a94042db8e8e1f7fef2591a Author: Sergei Shtylyov Date: Sun Sep 22 01:43:58 2013 +0400 usb: musb: davinci: fix resources passed to MUSB driver for DM6467 commit ea78201e2e08f8a91e30100c4c4a908b5cf295fc upstream. After commit 09fc7d22b024692b2fe8a943b246de1af307132b (usb: musb: fix incorrect usage of resource pointer), CPPI DMA driver on DaVinci DM6467 can't detect its dedicated IRQ and so the MUSB IRQ is erroneously used instead. This is because only 2 resources are passed to the MUSB driver from the DaVinci glue layer, so fix this by always copying 3 resources (it's safe since a placeholder for the 3rd resource is always there) and passing 'pdev->num_resources' instead of the size of musb_resources[] to platform_device_add_resources(). Signed-off-by: Sergei Shtylyov Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman commit 8d3022e39bd68f76669ee3200797d44afdebd841 Author: majianpeng Date: Thu Nov 14 15:16:20 2013 +1100 md/raid5: Use conf->device_lock protect changing of multi-thread resources. commit 60aaf933854511630e16be4efe0f96485e132de4 upstream. and commit 0c775d5208284700de423e6746259da54a42e1f5 When we change group_thread_cnt from sysfs entry, it can OOPS. The kernel messages are: [ 135.299021] BUG: unable to handle kernel NULL pointer dereference at (null) [ 135.299073] IP: [] handle_active_stripes+0x32b/0x440 [ 135.299107] PGD 0 [ 135.299122] Oops: 0000 [#1] SMP [ 135.299144] Modules linked in: netconsole e1000e ptp pps_core [ 135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24 [ 135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000 [ 135.299283] RIP: 0010:[] [] handle_active_stripes+0x32b/0x440 [ 135.299323] RSP: 0018:ffff8800b77a5c48 EFLAGS: 00010002 [ 135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008 [ 135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00 [ 135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000 [ 135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00 [ 135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70 [ 135.299479] FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 [ 135.299510] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0 [ 135.299559] Stack: [ 135.299570] ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300 [ 135.299611] 000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8 [ 135.299654] 000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98 [ 135.299696] Call Trace: [ 135.299711] [] ? __wake_up+0x4e/0x70 [ 135.299733] [] raid5d+0x4c8/0x680 [ 135.299756] [] ? schedule_timeout+0x15d/0x1f0 [ 135.299781] [] md_thread+0x11f/0x170 [ 135.299804] [] ? wake_up_bit+0x40/0x40 [ 135.299826] [] ? md_rdev_init+0x110/0x110 [ 135.299850] [] kthread+0xc6/0xd0 [ 135.299871] [] ? kthread_freezable_should_stop+0x70/0x70 [ 135.299899] [] ret_from_fork+0x7c/0xb0 [ 135.299923] [] ? kthread_freezable_should_stop+0x70/0x70 [ 135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90 [ 135.300005] RIP [] handle_active_stripes+0x32b/0x440 [ 135.300005] RSP [ 135.300005] CR2: 0000000000000000 [ 135.300005] ---[ end trace 504854e5bb7562ed ]--- [ 135.300005] Kernel panic - not syncing: Fatal exception This is because raid5d() can be running when the multi-thread resources are changed via system. We see need to provide locking. mddev->device_lock is suitable, but we cannot simple call alloc_thread_groups under this lock as we cannot allocate memory while holding a spinlock. So change alloc_thread_groups() to allocate and return the data structures, then raid5_store_group_thread_cnt() can take the lock while updating the pointers to the data structures. This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x stable series. Fixes: b721420e8719131896b009b11edbbd27 Signed-off-by: Jianpeng Ma Signed-off-by: NeilBrown Reviewed-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman commit 4350fdaea3ba400b48e739acde0785318a6954ad Author: Mel Gorman Date: Tue Nov 12 15:08:32 2013 -0800 mm: numa: return the number of base pages altered by protection changes commit 72403b4a0fbdf433c1fe0127e49864658f6f6468 upstream. Commit 0255d4918480 ("mm: Account for a THP NUMA hinting update as one PTE update") was added to account for the number of PTE updates when marking pages prot_numa. task_numa_work was using the old return value to track how much address space had been updated. Altering the return value causes the scanner to do more work than it is configured or documented to in a single unit of work. This patch reverts that commit and accounts for the number of THP updates separately in vmstat. It is up to the administrator to interpret the pair of values correctly. This is a straight-forward operation and likely to only be of interest when actively debugging NUMA balancing problems. The impact of this patch is that the NUMA PTE scanner will scan slower when THP is enabled and workloads may converge slower as a result. On the flip size system CPU usage should be lower than recent tests reported. This is an illustrative example of a short single JVM specjbb test specjbb 3.12.0 3.12.0 vanilla acctupdates TPut 1 26143.00 ( 0.00%) 25747.00 ( -1.51%) TPut 7 185257.00 ( 0.00%) 183202.00 ( -1.11%) TPut 13 329760.00 ( 0.00%) 346577.00 ( 5.10%) TPut 19 442502.00 ( 0.00%) 460146.00 ( 3.99%) TPut 25 540634.00 ( 0.00%) 549053.00 ( 1.56%) TPut 31 512098.00 ( 0.00%) 519611.00 ( 1.47%) TPut 37 461276.00 ( 0.00%) 474973.00 ( 2.97%) TPut 43 403089.00 ( 0.00%) 414172.00 ( 2.75%) 3.12.0 3.12.0 vanillaacctupdates User 5169.64 5184.14 System 100.45 80.02 Elapsed 252.75 251.85 Performance is similar but note the reduction in system CPU time. While this showed a performance gain, it will not be universal but at least it'll be behaving as documented. The vmstats are obviously different but here is an obvious interpretation of them from mmtests. 3.12.0 3.12.0 vanillaacctupdates NUMA page range updates 1408326 11043064 NUMA huge PMD updates 0 21040 NUMA PTE updates 1408326 291624 "NUMA page range updates" == nr_pte_updates and is the value returned to the NUMA pte scanner. NUMA huge PMD updates were the number of THP updates which in combination can be used to calculate how many ptes were updated from userspace. Signed-off-by: Mel Gorman Reported-by: Alex Thorlton Reviewed-by: Rik van Riel Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit eaeeaec383f3228446715e660851f73423501eba Author: Dwight Engen Date: Thu Aug 15 14:08:03 2013 -0400 xfs: add capability check to free eofblocks ioctl commit 8c567a7fab6e086a0284eee2db82348521e7120c upstream. Check for CAP_SYS_ADMIN since the caller can truncate preallocated blocks from files they do not own nor have write access to. A more fine grained access check was considered: require the caller to specify their own uid/gid and to use inode_permission to check for write, but this would not catch the case of an inode not reachable via path traversal from the callers mount namespace. Add check for read-only filesystem to free eofblocks ioctl. Reviewed-by: Brian Foster Reviewed-by: Dave Chinner Reviewed-by: Gao feng Signed-off-by: Dwight Engen Signed-off-by: Ben Myers Cc: Kees Cook Signed-off-by: Greg Kroah-Hartman commit d13252b415a1de2b91d240195afc7fd917c1f148 Author: Steffen Klassert Date: Wed Oct 30 11:16:28 2013 +0100 xfrm: Fix null pointer dereference when decoding sessions [ Upstream commit 84502b5ef9849a9694673b15c31bd3ac693010ae ] On some codepaths the skb does not have a dst entry when xfrm_decode_session() is called. So check for a valid skb_dst() before dereferencing the device interface index. We use 0 as the device index if there is no valid skb_dst(), or at reverse decoding we use skb_iif as device interface index. Bug was introduced with git commit bafd4bd4dc ("xfrm: Decode sessions with output interface."). Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit 83fc70f416f604cb71315de467accd0c07450c12 Author: fan.du Date: Sun Dec 1 16:28:48 2013 +0800 {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation [ Upstream commit 3868204d6b89ea373a273e760609cb08020beb1a ] commit a553e4a6317b2cfc7659542c10fe43184ffe53da ("[PKTGEN]: IPSEC support") tried to support IPsec ESP transport transformation for pktgen, but acctually this doesn't work at all for two reasons(The orignal transformed packet has bad IPv4 checksum value, as well as wrong auth value, reported by wireshark) - After transpormation, IPv4 header total length needs update, because encrypted payload's length is NOT same as that of plain text. - After transformation, IPv4 checksum needs re-caculate because of payload has been changed. With this patch, armmed pktgen with below cofiguration, Wireshark is able to decrypted ESP packet generated by pktgen without any IPv4 checksum error or auth value error. pgset "flag IPSEC" pgset "flows 1" Signed-off-by: Fan Du Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 76a7894419438faeb03cf9a83b61c27e56f7a43a Author: Hannes Frederic Sowa Date: Fri Nov 29 06:39:44 2013 +0100 ipv6: fix possible seqlock deadlock in ip6_finish_output2 [ Upstream commit 7f88c6b23afbd31545c676dea77ba9593a1a14bf ] IPv6 stats are 64 bits and thus are protected with a seqlock. By not disabling bottom-half we could deadlock here if we don't disable bh and a softirq reentrantly updates the same mib. Cc: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 556a91d4d7c6f37f26701c7c067e486f79272671 Author: Eric Dumazet Date: Thu Nov 28 09:51:22 2013 -0800 inet: fix possible seqlock deadlocks [ Upstream commit f1d8cba61c3c4b1eb88e507249c4cb8d635d9a76 ] In commit c9e9042994d3 ("ipv4: fix possible seqlock deadlock") I left another places where IP_INC_STATS_BH() were improperly used. udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from process context, not from softirq context. This was detected by lockdep seqlock support. Reported-by: jongman heo Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP") Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet Cc: Hannes Frederic Sowa Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 34923f1eccdbd8c143b664edd933ce2840a50378 Author: Jiri Pirko Date: Thu Nov 28 18:01:38 2013 +0100 team: fix master carrier set when user linkup is enabled [ Upstream commit f5e0d34382e18f396d7673a84df8e3342bea7eb6 ] When user linkup is enabled and user sets linkup of individual port, we need to recompute linkup (carrier) of master interface so the change is reflected. Fix this by calling __team_carrier_check() which does the needed work. Please apply to all stable kernels as well. Thanks. Reported-by: Jan Tluka Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2afe5cfbf8a8e9d5868dd5768fa80acacbf4b0e2 Author: Shawn Landden Date: Sun Nov 24 22:36:28 2013 -0800 net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST [ Upstream commit d3f7d56a7a4671d395e8af87071068a195257bf6 ] Commit 35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once) added an internal flag MSG_SENDPAGE_NOTLAST, similar to MSG_MORE. algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages() and need to see the new flag as identical to MSG_MORE. This fixes sendfile() on AF_ALG. v3: also fix udp Cc: Tom Herbert Cc: Eric Dumazet Cc: David S. Miller Cc: # 3.4.x + 3.2.x Reported-and-tested-by: Shawn Landden Original-patch: Richard Weinberger Signed-off-by: Shawn Landden Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2c9681ebe69e09a73aa4e7e06107deb59dd43c65 Author: Linus Walleij Date: Thu Nov 28 14:33:52 2013 +0100 net: smc91: fix crash regression on the versatile [ Upstream commit a0c20fb02592d372e744d1d739cda3e1b3defaae ] After commit e9e4ea74f06635f2ffc1dffe5ef40c854faa0a90 "net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data" The Versatile SMSC LAN91C111 is crashing like this: ------------[ cut here ]------------ kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599! Internal error: Oops - BUG: 0 [#1] ARM Modules linked in: CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24 task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000 PC is at smc_hardware_send_pkt+0x198/0x22c LR is at smc_hardware_send_pkt+0x24/0x22c pc : [] lr : [] psr: 20000013 sp : c6cd1d08 ip : 00000001 fp : 00000000 r10: c02adb08 r9 : 00000000 r8 : c6ced802 r7 : c786fba0 r6 : 00000146 r5 : c8800000 r4 : c78d6000 r3 : 0000000f r2 : 00000146 r1 : 00000000 r0 : 00000031 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005317f Table: 06cf4000 DAC: 00000015 Process udhcpc (pid: 43, stack limit = 0xc6cd01c0) Stack: (0xc6cd1d08 to 0xc6cd2000) 1d00: 00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868 1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60 1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000 1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000 1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000 1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000 1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000 1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740 1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff 1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc 1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88 1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0 1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08 1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000 1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000 1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002 1f20: 06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c 1f40: 00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0 1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0 1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8 1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000 1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000 1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000 [] (smc_hardware_send_pkt+0x198/0x22c) from [] (smc_hard_start_xmit+0xc4/0x1e8) [] (smc_hard_start_xmit+0xc4/0x1e8) from [] (dev_hard_start_xmit+0x460/0x4cc) [] (dev_hard_start_xmit+0x460/0x4cc) from [] (sch_direct_xmit+0x94/0x18c) [] (sch_direct_xmit+0x94/0x18c) from [] (dev_queue_xmit+0x238/0x42c) [] (dev_queue_xmit+0x238/0x42c) from [] (packet_sendmsg+0xbe8/0xd28) [] (packet_sendmsg+0xbe8/0xd28) from [] (sock_sendmsg+0x84/0xa8) [] (sock_sendmsg+0x84/0xa8) from [] (SyS_sendto+0xb8/0xdc) [] (SyS_sendto+0xb8/0xdc) from [] (ret_fast_syscall+0x0/0x2c) Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2) ---[ end trace 81104fe70e8da7fe ]--- Kernel panic - not syncing: Fatal exception in interrupt This is because the macro operations in smc91x.h defined for Versatile are missing SMC_outsw() as used in this commit. The Versatile needs and uses the same accessors as the other platforms in the first if(...) clause, just switch it to using that and we have one problem less to worry about. This includes a hunk of a patch from Will Deacon fixin the other 32bit platforms as well: Innokom, Ramses, PXA, PCM027. Checkpatch complains about spacing, but I have opted to follow the style of this .h-file. Cc: Russell King Cc: Nicolas Pitre Cc: Eric Miao Cc: Jonathan Cameron Cc: stable@vger.kernel.org Signed-off-by: Will Deacon Signed-off-by: Linus Walleij Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2650c2f88adb79a2ec693ccf65fd59f7389e79c9 Author: Yang Yingliang Date: Wed Nov 27 14:32:52 2013 +0800 net: 8139cp: fix a BUG_ON triggered by wrong bytes_compl [ Upstream commit 7fe0ee099ad5e3dea88d4ee1b6f20246b1ca57c3 ] Using iperf to send packets(GSO mode is on), a bug is triggered: [ 212.672781] kernel BUG at lib/dynamic_queue_limits.c:26! [ 212.673396] invalid opcode: 0000 [#1] SMP [ 212.673882] Modules linked in: 8139cp(O) nls_utf8 edd fuse loop dm_mod ipv6 i2c_piix4 8139too i2c_core intel_agp joydev pcspkr hid_generic intel_gtt floppy sr_mod mii button sg cdrom ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata scsi_mod [last unloaded: 8139cp] [ 212.676084] CPU: 0 PID: 4124 Comm: iperf Tainted: G O 3.12.0-0.7-default+ #16 [ 212.676084] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 212.676084] task: ffff8800d83966c0 ti: ffff8800db4c8000 task.ti: ffff8800db4c8000 [ 212.676084] RIP: 0010:[] [] dql_completed+0x17f/0x190 [ 212.676084] RSP: 0018:ffff880116e03e30 EFLAGS: 00010083 [ 212.676084] RAX: 00000000000005ea RBX: 0000000000000f7c RCX: 0000000000000002 [ 212.676084] RDX: ffff880111dd0dc0 RSI: 0000000000000bd4 RDI: ffff8800db6ffcc0 [ 212.676084] RBP: ffff880116e03e48 R08: 0000000000000992 R09: 0000000000000000 [ 212.676084] R10: ffffffff8181e400 R11: 0000000000000004 R12: 000000000000000f [ 212.676084] R13: ffff8800d94ec840 R14: ffff8800db440c80 R15: 000000000000000e [ 212.676084] FS: 00007f6685a3c700(0000) GS:ffff880116e00000(0000) knlGS:0000000000000000 [ 212.676084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 212.676084] CR2: 00007f6685ad6460 CR3: 00000000db714000 CR4: 00000000000006f0 [ 212.676084] Stack: [ 212.676084] ffff8800db6ffc00 000000000000000f ffff8800d94ec840 ffff880116e03eb8 [ 212.676084] ffffffffa041509f ffff880116e03e88 0000000f16e03e88 ffff8800d94ec000 [ 212.676084] 00000bd400059858 000000050000000f ffffffff81094c36 ffff880116e03eb8 [ 212.676084] Call Trace: [ 212.676084] [ 212.676084] [] cp_interrupt+0x4ef/0x590 [8139cp] [ 212.676084] [] ? ktime_get+0x56/0xd0 [ 212.676084] [] handle_irq_event_percpu+0x53/0x170 [ 212.676084] [] handle_irq_event+0x3c/0x60 [ 212.676084] [] handle_fasteoi_irq+0x55/0xf0 [ 212.676084] [] handle_irq+0x1f/0x30 [ 212.676084] [] do_IRQ+0x5b/0xe0 [ 212.676084] [] common_interrupt+0x6a/0x6a [ 212.676084] [ 212.676084] [] ? cp_start_xmit+0x621/0x97c [8139cp] [ 212.676084] [] ? cp_start_xmit+0x609/0x97c [8139cp] [ 212.676084] [] dev_hard_start_xmit+0x2c9/0x550 [ 212.676084] [] sch_direct_xmit+0x179/0x1d0 [ 212.676084] [] dev_queue_xmit+0x293/0x440 [ 212.676084] [] ip_finish_output+0x236/0x450 [ 212.676084] [] ? __alloc_pages_nodemask+0x187/0xb10 [ 212.676084] [] ip_output+0x88/0x90 [ 212.676084] [] ip_local_out+0x24/0x30 [ 212.676084] [] ip_queue_xmit+0x14d/0x3e0 [ 212.676084] [] tcp_transmit_skb+0x501/0x840 [ 212.676084] [] tcp_write_xmit+0x1e3/0xb20 [ 212.676084] [] ? skb_page_frag_refill+0x87/0xd0 [ 212.676084] [] tcp_push_one+0x2b/0x40 [ 212.676084] [] tcp_sendmsg+0x926/0xc90 [ 212.676084] [] inet_sendmsg+0x61/0xc0 [ 212.676084] [] sock_aio_write+0x101/0x120 [ 212.676084] [] ? vma_adjust+0x2e1/0x5d0 [ 212.676084] [] ? timerqueue_add+0x60/0xb0 [ 212.676084] [] do_sync_write+0x60/0x90 [ 212.676084] [] ? rw_verify_area+0x54/0xf0 [ 212.676084] [] vfs_write+0x186/0x190 [ 212.676084] [] SyS_write+0x5d/0xa0 [ 212.676084] [] system_call_fastpath+0x16/0x1b [ 212.676084] Code: ca 41 89 dc 41 29 cc 45 31 db 29 c2 41 89 c5 89 d0 45 29 c5 f7 d0 c1 e8 1f e9 43 ff ff ff 66 0f 1f 44 00 00 31 c0 e9 7b ff ff ff <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 c7 47 40 00 [ 212.676084] RIP [] dql_completed+0x17f/0x190 ------------[ cut here ]------------ When a skb has frags, bytes_compl plus skb->len nr_frags times in cp_tx(). It's not the correct value(actually, it should plus skb->len once) and it will trigger the BUG_ON(bytes_compl > num_queued - dql->num_completed). So only increase bytes_compl when finish sending all frags. pkts_compl also has a wrong value, fix it too. It's introduced by commit 871f0d4c ("8139cp: enable bql"). Suggested-by: Eric Dumazet Signed-off-by: Yang Yingliang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8b0a34fa49e2cca02beeafce2d96e9d8a53c85e3 Author: David Chang Date: Wed Nov 27 15:48:36 2013 +0800 r8169: check ALDPS bit and disable it if enabled for the 8168g [ Upstream commit 1bac1072425c86f1ac85bd5967910706677ef8b3 ] Windows driver will enable ALDPS function, but linux driver and firmware do not have any configuration related to ALDPS function for 8168g. So restart system to linux and remove the NIC cable, LAN enter ALDPS, then LAN RX will be disabled. This issue can be easily reproduced on dual boot windows and linux system with RTL_GIGA_MAC_VER_40 chip. Realtek said, ALDPS function can be disabled by configuring to PHY, switch to page 0x0A43, reg0x10 bit2=0. Signed-off-by: David Chang Acked-by: Hayes Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit df0c053e8ae401eab14447a729060bfa4f8c02ce Author: Francois Romieu Date: Tue Nov 26 00:40:58 2013 +0100 via-velocity: fix netif_receive_skb use in irq disabled section. [ Upstream commit bc9627e7e918a85e906c1a3f6d01d9b8ef911a96 ] 2fdac010bdcf10a30711b6924612dfc40daf19b8 ("via-velocity.c: update napi implementation") overlooked an irq disabling spinlock when the Rx part of the NAPI poll handler was converted from netif_rx to netif_receive_skb. NAPI Rx processing can be taken out of the locked section with a pair of napi_{disable / enable} since it only races with the MTU change function. An heavier rework of the NAPI locking would be able to perform NAPI Tx before Rx where I simply removed one of velocity_tx_srv calls. References: https://bugzilla.redhat.com/show_bug.cgi?id=1022733 Fixes: 2fdac010bdcf (via-velocity.c: update napi implementation) Signed-off-by: Francois Romieu Tested-by: Alex A. Schmidt Cc: Jamie Heilman Cc: Michele Baldessari Cc: Julia Lawall Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d172c155ae6c542947f3ebe690ac61ffa9b654f2 Author: Andy Whitcroft Date: Mon Nov 25 16:52:34 2013 +0000 xen-netback: include definition of csum_ipv6_magic [ Upstream commit ae5e8127b712313ec1b99356019ce9226fea8b88 ] We are now using csum_ipv6_magic, include the appropriate header. Avoids the following error: drivers/net/xen-netback/netback.c:1313:4: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] tcph->check = ~csum_ipv6_magic(&ipv6h->saddr, Signed-off-by: Andy Whitcroft Acked-by: Ian Campbell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f2b44be96bc1ed055baa56b6474bb9369ac186bf Author: Eric Dumazet Date: Sat Nov 23 12:59:20 2013 -0800 sch_tbf: handle too small burst [ Upstream commit 4d0820cf6a55d72350cb2d24a4504f62fbde95d9 ] If a too small burst is inadvertently set on TBF, we might trigger a bug in tbf_segment(), as 'skb' instead of 'segs' was used in a qdisc_reshape_fail() call. tc qdisc add dev eth0 root handle 1: tbf latency 50ms burst 1KB rate 50mbit Fix the bug, and add a warning, as such configuration is not going to work anyway for non GSO packets. (For some reason, one has to use a burst >= 1520 to get a working configuration, even with old kernels. This is a probable iproute2/tc bug) Based on a report and initial patch from Yang Yingliang Fixes: e43ac79a4bc6 ("sch_tbf: segment too big GSO packets") Signed-off-by: Eric Dumazet Reported-by: Yang Yingliang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 99834e4d7c5e441e268eb718e41d7e83343477fb Author: Herbert Xu Date: Fri Nov 22 10:32:11 2013 +0800 gro: Clean up tcpX_gro_receive checksum verification [ Upstream commit b8ee93ba80b5a0b6c3c06b65c34dd1276f16c047 ] This patch simplifies the checksum verification in tcpX_gro_receive by reusing the CHECKSUM_COMPLETE code for CHECKSUM_NONE. All it does for CHECKSUM_NONE is compute the partial checksum and then treat it as if it came from the hardware (CHECKSUM_COMPLETE). Signed-off-by: Herbert Xu Cheers, Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d8f703112e39ab0d58bc140d2f8db80add02acf7 Author: Herbert Xu Date: Fri Nov 22 10:31:29 2013 +0800 gro: Only verify TCP checksums for candidates [ Upstream commit cc5c00bbb44c5d68b883aa5cb9d01514a2525d94 ] In some cases we may receive IP packets that are longer than their stated lengths. Such packets are never merged in GRO. However, we may end up computing their checksums incorrectly and end up allowing packets with a bogus checksum enter our stack with the checksum status set as verified. Since such packets are rare and not performance-critical, this patch simply skips the checksum verification for them. Reported-by: Alexander Duyck Signed-off-by: Herbert Xu Acked-by: Alexander Duyck Thanks, Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c5352f3600c8066715f5fa2f4cece0890d55c0a1 Author: Herbert Xu Date: Thu Nov 21 11:10:04 2013 -0800 gso: handle new frag_list of frags GRO packets [ Upstream commit 9d8506cc2d7ea1f911c72c100193a3677f6668c3 ] Recently GRO started generating packets with frag_lists of frags. This was not handled by GSO, thus leading to a crash. Thankfully these packets are of a regular form and are easy to handle. This patch handles them in two ways. For completely non-linear frag_list entries, we simply continue to iterate over the frag_list frags once we exhaust the normal frags. For frag_list entries with linear parts, we call pskb_trim on the first part of the frag_list skb, and then process the rest of the frags in the usual way. This patch also kills a chunk of dead frag_list code that has obviously never ever been run since it ends up generating a bogus GSO-segmented packet with a frag_list entry. Future work is planned to split super big packets into TSO ones. Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb") Reported-by: Christoph Paasch Reported-by: Jerry Chu Reported-by: Sander Eikelenboom Signed-off-by: Herbert Xu Signed-off-by: Eric Dumazet Tested-by: Sander Eikelenboom Tested-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f517950d9c64e4c75d054b3268b43c60b5ab0849 Author: Veaceslav Falico Date: Fri Nov 29 09:53:23 2013 +0100 af_packet: block BH in prb_shutdown_retire_blk_timer() [ Upstream commit ec6f809ff6f19fafba3212f6aff0dda71dfac8e8 ] Currently we're using plain spin_lock() in prb_shutdown_retire_blk_timer(), however the timer might fire right in the middle and thus try to re-aquire the same spinlock, leaving us in a endless loop. To fix that, use the spin_lock_bh() to block it. Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") CC: "David S. Miller" CC: Daniel Borkmann CC: Willem de Bruijn CC: Phil Sutter CC: Eric Dumazet Reported-by: Jan Stancek Tested-by: Jan Stancek Signed-off-by: Veaceslav Falico Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit efc27dbe987ac25688b3c4ef8c4792209644306b Author: Daniel Borkmann Date: Thu Nov 21 16:50:58 2013 +0100 packet: fix use after free race in send path when dev is released [ Upstream commit e40526cb20b5ee53419452e1f03d97092f144418 ] Salam reported a use after free bug in PF_PACKET that occurs when we're sending out frames on a socket bound device and suddenly the net device is being unregistered. It appears that commit 827d9780 introduced a possible race condition between {t,}packet_snd() and packet_notifier(). In the case of a bound socket, packet_notifier() can drop the last reference to the net_device and {t,}packet_snd() might end up suddenly sending a packet over a freed net_device. To avoid reverting 827d9780 and thus introducing a performance regression compared to the current state of things, we decided to hold a cached RCU protected pointer to the net device and maintain it on write side via bind spin_lock protected register_prot_hook() and __unregister_prot_hook() calls. In {t,}packet_snd() path, we access this pointer under rcu_read_lock through packet_cached_dev_get() that holds reference to the device to prevent it from being freed through packet_notifier() while we're in send path. This is okay to do as dev_put()/dev_hold() are per-cpu counters, so this should not be a performance issue. Also, the code simplifies a bit as we don't need need_rls_dev anymore. Fixes: 827d978037d7 ("af-packet: Use existing netdev reference for bound sockets.") Reported-by: Salam Noureddine Signed-off-by: Daniel Borkmann Signed-off-by: Salam Noureddine Cc: Ben Greear Cc: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4af9d8887f6540abd355e2068af171f5400fccbf Author: Ding Tianhong Date: Sat Dec 7 22:12:05 2013 +0800 bridge: flush br's address entry in fdb when remove the bridge dev [ Upstream commit f873042093c0b418d2351fe142222b625c740149 ] When the following commands are executed: brctl addbr br0 ifconfig br0 hw ether rmmod bridge The calltrace will occur: [ 563.312114] device eth1 left promiscuous mode [ 563.312188] br0: port 1(eth1) entered disabled state [ 563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects [ 563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G O 3.12.0-0.7-default+ #9 [ 563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 563.468200] 0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8 [ 563.468204] ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8 [ 563.468206] ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78 [ 563.468209] Call Trace: [ 563.468218] [] dump_stack+0x6a/0x78 [ 563.468234] [] kmem_cache_destroy+0xfd/0x100 [ 563.468242] [] br_fdb_fini+0x10/0x20 [bridge] [ 563.468247] [] br_deinit+0x4e/0x50 [bridge] [ 563.468254] [] SyS_delete_module+0x199/0x2b0 [ 563.468259] [] system_call_fastpath+0x16/0x1b [ 570.377958] Bridge firewalling registered --------------------------- cut here ------------------------------- The reason is that when the bridge dev's address is changed, the br_fdb_change_mac_address() will add new address in fdb, but when the bridge was removed, the address entry in the fdb did not free, the bridge_fdb_cache still has objects when destroy the cache, Fix this by flushing the bridge address entry when removing the bridge. v2: according to the Toshiaki Makita and Vlad's suggestion, I only delete the vlan0 entry, it still have a leak here if the vlan id is other number, so I need to call fdb_delete_by_port(br, NULL, 1) to flush all entries whose dst is NULL for the bridge. Suggested-by: Toshiaki Makita Suggested-by: Vlad Yasevich Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a6ec437be3d70e9ba85fb34941a96577dcbb8414 Author: Vlad Yasevich Date: Tue Nov 19 20:47:15 2013 -0500 net: core: Always propagate flag changes to interfaces [ Upstream commit d2615bf450694c1302d86b9cc8a8958edfe4c3a4 ] The following commit: b6c40d68ff6498b7f63ddf97cf0aa818d748dee7 net: only invoke dev->change_rx_flags when device is UP tried to fix a problem with VLAN devices and promiscuouse flag setting. The issue was that VLAN device was setting a flag on an interface that was down, thus resulting in bad promiscuity count. This commit blocked flag propagation to any device that is currently down. A later commit: deede2fabe24e00bd7e246eb81cd5767dc6fcfc7 vlan: Don't propagate flag changes on down interfaces fixed VLAN code to only propagate flags when the VLAN interface is up, thus fixing the same issue as above, only localized to VLAN. The problem we have now is that if we have create a complex stack involving multiple software devices like bridges, bonds, and vlans, then it is possible that the flags would not propagate properly to the physical devices. A simple examle of the scenario is the following: eth0----> bond0 ----> bridge0 ---> vlan50 If bond0 or eth0 happen to be down at the time bond0 is added to the bridge, then eth0 will never have promisc mode set which is currently required for operation as part of the bridge. As a result, packets with vlan50 will be dropped by the interface. The only 2 devices that implement the special flag handling are VLAN and DSA and they both have required code to prevent incorrect flag propagation. As a result we can remove the generic solution introduced in b6c40d68ff6498b7f63ddf97cf0aa818d748dee7 and leave it to the individual devices to decide whether they will block flag propagation or not. Reported-by: Stefan Priebe Suggested-by: Veaceslav Falico Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fa2f52deaf335713f839fa17da58f60eb48528cd Author: Alexei Starovoitov Date: Tue Nov 19 19:12:34 2013 -0800 ipv4: fix race in concurrent ip_route_input_slow() [ Upstream commit dcdfdf56b4a6c9437fc37dbc9cee94a788f9b0c4 ] CPUs can ask for local route via ip_route_input_noref() concurrently. if nh_rth_input is not cached yet, CPUs will proceed to allocate equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input via rt_cache_route() Most of the time they succeed, but on occasion the following two lines: orig = *p; prev = cmpxchg(p, orig, rt); in rt_cache_route() do race and one of the cpus fails to complete cmpxchg. But ip_route_input_slow() doesn't check the return code of rt_cache_route(), so dst is leaking. dst_destroy() is never called and 'lo' device refcnt doesn't go to zero, which can be seen in the logs as: unregister_netdevice: waiting for lo to become free. Usage count = 1 Adding mdelay() between above two lines makes it easily reproducible. Fix it similar to nh_pcpu_rth_output case. Fixes: d2d68ba9fe8b ("ipv4: Cache input routes in fib_info nexthops.") Signed-off-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 30eb9c19716dbca111330f0e1d31054da9b69c24 Author: Andrey Vagin Date: Tue Nov 19 22:10:06 2013 +0400 tcp: don't update snd_nxt, when a socket is switched from repair mode [ Upstream commit dbde497966804e63a38fdedc1e3815e77097efc2 ] snd_nxt must be updated synchronously with sk_send_head. Otherwise tp->packets_out may be updated incorrectly, what may bring a kernel panic. Here is a kernel panic from my host. [ 103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 103.044025] IP: [] tcp_rearm_rto+0xcf/0x150 ... [ 146.301158] Call Trace: [ 146.301158] [] tcp_ack+0xcc0/0x12c0 Before this panic a tcp socket was restored. This socket had sent and unsent data in the write queue. Sent data was restored in repair mode, then the socket was switched from reapair mode and unsent data was restored. After that the socket was switched back into repair mode. In that moment we had a socket where write queue looks like this: snd_una snd_nxt write_seq |_________|________| | sk_send_head After a second switching from repair mode the state of socket was changed: snd_una snd_nxt, write_seq |_________ ________| | sk_send_head This state is inconsistent, because snd_nxt and sk_send_head are not synchronized. Bellow you can find a call trace, how packets_out can be incremented twice for one skb, if snd_nxt and sk_send_head are not synchronized. In this case packets_out will be always positive, even when sk_write_queue is empty. tcp_write_wakeup skb = tcp_send_head(sk); tcp_fragment if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq)) tcp_adjust_pcount(sk, skb, diff); tcp_event_new_data_sent tp->packets_out += tcp_skb_pcount(skb); I think update of snd_nxt isn't required, when a socket is switched from repair mode. Because it's initialized in tcp_connect_init. Then when a write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent, so it's always is in consistent state. I have checked, that the bug is not reproduced with this patch and all tests about restoring tcp connections work fine. Cc: Pavel Emelyanov Cc: Eric Dumazet Cc: "David S. Miller" Cc: Alexey Kuznetsov Cc: James Morris Cc: Hideaki YOSHIFUJI Cc: Patrick McHardy Signed-off-by: Andrey Vagin Acked-by: Pavel Emelyanov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5f406a1e82eab34701698cfe19d2bcf054896453 Author: Ying Xue Date: Tue Nov 19 18:09:27 2013 +0800 atm: idt77252: fix dev refcnt leak [ Upstream commit b5de4a22f157ca345cdb3575207bf46402414bc1 ] init_card() calls dev_get_by_name() to get a network deceive. But it doesn't decrease network device reference count after the device is used. Signed-off-by: Ying Xue Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fa3735e9981998674f23fb99212a591a5a69a9b0 Author: fan.du Date: Tue Nov 19 16:53:28 2013 +0800 xfrm: Release dst if this dst is improper for vti tunnel [ Upstream commit 236c9f84868534c718b6889aa624de64763281f9 ] After searching rt by the vti tunnel dst/src parameter, if this rt has neither attached to any transformation nor the transformation is not tunnel oriented, this rt should be released back to ip layer. otherwise causing dst memory leakage. Signed-off-by: Fan Du Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0a6905b2186e5ae1715545d067dde8ad830fc3f5 Author: Jiri Pirko Date: Wed Nov 6 17:52:20 2013 +0100 netfilter: push reasm skb through instead of original frag skbs [ Upstream commit 6aafeef03b9d9ecf255f3a80ed85ee070260e1ae ] Pushing original fragments through causes several problems. For example for matching, frags may not be matched correctly. Take following example: On HOSTA do: ip6tables -I INPUT -p icmpv6 -j DROP ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT and on HOSTB you do: ping6 HOSTA -s2000 (MTU is 1500) Incoming echo requests will be filtered out on HOSTA. This issue does not occur with smaller packets than MTU (where fragmentation does not happen) As was discussed previously, the only correct solution seems to be to use reassembled skb instead of separete frags. Doing this has positive side effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams dances in ipvs and conntrack can be removed. Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c entirely and use code in net/ipv6/reassembly.c instead. Signed-off-by: Jiri Pirko Acked-by: Julian Anastasov Signed-off-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 88a6ec987d157ade1ba8b4e0d105a06a4c6fd27c Author: Jiri Pirko Date: Wed Nov 6 17:52:19 2013 +0100 ip6_output: fragment outgoing reassembled skb properly [ Upstream commit 9037c3579a277f3a23ba476664629fda8c35f7c4 ] If reassembled packet would fit into outdev MTU, it is not fragmented according the original frag size and it is send as single big packet. The second case is if skb is gso. In that case fragmentation does not happen according to the original frag size. This patch fixes these. Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 72b89baeaa1f76c517ae8cfe3dc9e4c59b8d0c0f Author: Vlad Yasevich Date: Sat Nov 16 15:17:24 2013 -0500 ipv6: Fix inet6_init() cleanup order Commit 6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67 net: ipv6: Add IPv6 support to the ping socket introduced a change in the cleanup logic of inet6_init and has a bug in that ipv6_packet_cleanup() may not be called. Fix the cleanup ordering. CC: Hannes Frederic Sowa CC: Lorenzo Colitti CC: Fabio Estevam Signed-off-by: Vlad Yasevich Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 00fecf737b27ccd17efcf7090267cf2b9597b456 Author: Hannes Frederic Sowa Date: Sat Nov 23 07:22:33 2013 +0100 ipv6: fix leaking uninitialized port number of offender sockaddr [ Upstream commit 1fa4c710b6fe7b0aac9907240291b6fe6aafc3b8 ] Offenders don't have port numbers, so set it to 0. Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5bca4e493175cd62bb92af685262b11bb7d2ffba Author: Dan Carpenter Date: Wed Nov 27 15:40:21 2013 +0300 net: clamp ->msg_namelen instead of returning an error [ Upstream commit db31c55a6fb245fdbb752a2ca4aefec89afabb06 ] If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the original code that would lead to memory corruption in the kernel if you had audit configured. If you didn't have audit configured it was harmless. There are some programs such as beta versions of Ruby which use too large of a buffer and returning an error code breaks them. We should clamp the ->msg_namelen value instead. Fixes: 1661bf364ae9 ("net: heap overflow in __audit_sockaddr()") Reported-by: Eric Wong Signed-off-by: Dan Carpenter Tested-by: Eric Wong Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d4cbc4a91331e0fa5819cdf4267be4f07185eef6 Author: Hannes Frederic Sowa Date: Sat Nov 23 00:46:12 2013 +0100 inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions [ Upstream commit 85fbaa75037d0b6b786ff18658ddf0b4014ce2a4 ] Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") conditionally updated addr_len if the msg_name is written to. The recv_error and rxpmtu functions relied on the recvmsg functions to set up addr_len before. As this does not happen any more we have to pass addr_len to those functions as well and set it to the size of the corresponding sockaddr length. This broke traceroute and such. Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") Reported-by: Brad Spengler Reported-by: Tom Labanowski Cc: mpb Cc: David S. Miller Cc: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4d7aef1fc794bc093939f165ca5e78de7af01361 Author: Hannes Frederic Sowa Date: Thu Nov 21 03:14:34 2013 +0100 net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct sockaddr_storage) [ Upstream commit 68c6beb373955da0886d8f4f5995b3922ceda4be ] In that case it is probable that kernel code overwrote part of the stack. So we should bail out loudly here. The BUG_ON may be removed in future if we are sure all protocols are conformant. Suggested-by: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0cefe287488ca07c0d7962a7b4d3fbb829d09917 Author: Hannes Frederic Sowa Date: Thu Nov 21 03:14:22 2013 +0100 net: rework recvmsg handler msg_name and msg_namelen logic [ Upstream commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c ] This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller Suggested-by: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 086663e065ba2383382d6bb7d8a3d9bc9cba7cdf Author: Hannes Frederic Sowa Date: Mon Nov 18 07:07:45 2013 +0100 ping: prevent NULL pointer dereference on write to msg_name [ Upstream commit cf970c002d270c36202bd5b9c2804d3097a52da0 ] A plain read() on a socket does set msg->msg_name to NULL. So check for NULL pointer first. Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7a9b8e64a5706d62d5a5ab54fe3d0320845b1d4a Author: Hannes Frederic Sowa Date: Mon Nov 18 04:20:45 2013 +0100 inet: prevent leakage of uninitialized memory to user in recv syscalls [ Upstream commit bceaa90240b6019ed73b49965eac7d167610be69 ] Only update *addr_len when we actually fill in sockaddr, otherwise we can return uninitialized memory from the stack to the caller in the recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL) checks because we only get called with a valid addr_len pointer either from sock_common_recvmsg or inet_recvmsg. If a blocking read waits on a socket which is concurrently shut down we now return zero and set msg_msgnamelen to 0. Reported-by: mpb Suggested-by: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d902e2c30aeba8d9d76d9c7845f4b9833f7c336d Author: Eric Dumazet Date: Fri Nov 15 08:58:14 2013 -0800 pkt_sched: fq: fix pacing for small frames [ Upstream commit f52ed89971adbe79b6438c459814034707b8ab91 ] For performance reasons, sch_fq tried hard to not setup timers for every sent packet, using a quantum based heuristic : A delay is setup only if the flow exhausted its credit. Problem is that application limited flows can refill their credit for every queued packet, and they can evade pacing. This problem can also be triggered when TCP flows use small MSS values, as TSO auto sizing builds packets that are smaller than the default fq quantum (3028 bytes) This patch adds a 40 ms delay to guard flow credit refill. Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Eric Dumazet Cc: Maciej Żenczykowski Cc: Willem de Bruijn Cc: Yuchung Cheng Cc: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8ec21820ff9714a66c17e89aad8d0fc74a357b27 Author: Eric Dumazet Date: Fri Nov 15 08:57:26 2013 -0800 pkt_sched: fq: warn users using defrate [ Upstream commit 65c5189a2b57b9aa1d89e4b79da39928257c9505 ] Commit 7eec4174ff29 ("pkt_sched: fq: fix non TCP flows pacing") obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users. Suggested by David Miller Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 41d5db117867708cde76f1009ca3fd52e6244559 Author: Eric Dumazet Date: Thu Nov 14 13:37:54 2013 -0800 ipv4: fix possible seqlock deadlock [ Upstream commit c9e9042994d37cbc1ee538c500e9da1bb9d1bcdf ] ip4_datagram_connect() being called from process context, it should use IP_INC_STATS() instead of IP_INC_STATS_BH() otherwise we can deadlock on 32bit arches, or get corruptions of SNMP counters. Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP") Signed-off-by: Eric Dumazet Reported-by: Dave Jones Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ca86fc7e2f584fcf9fb33c033e9230044f15cd68 Author: Chris Metcalf Date: Thu Nov 14 12:09:21 2013 -0500 connector: improved unaligned access error fix [ Upstream commit 1ca1a4cf59ea343a1a70084fe7cc96f37f3cf5b1 ] In af3e095a1fb4, Erik Jacobsen fixed one type of unaligned access bug for ia64 by converting a 64-bit write to use put_unaligned(). Unfortunately, since gcc will convert a short memset() to a series of appropriately-aligned stores, the problem is now visible again on tilegx, where the memset that zeros out proc_event is converted to three 64-bit stores, causing an unaligned access panic. A better fix for the original problem is to ensure that proc_event is aligned to 8 bytes here. We can do that relatively easily by arranging to start the struct cn_msg aligned to 8 bytes and then offset by 4 bytes. Doing so means that the immediately following proc_event structure is then correctly aligned to 8 bytes. The result is that the memset() stores are now aligned, and as an added benefit, we can remove the put_unaligned() calls in the code. Signed-off-by: Chris Metcalf Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 67db393568a1ab1816a1a39cbbd756611e83b31f Author: Maciej Żenczykowski Date: Thu Nov 14 08:50:43 2013 -0800 pkt_sched: fq: change classification of control packets [ Upstream commit 2abc2f070eb30ac8421554a5c32229f8332c6206 ] Initial sch_fq implementation copied code from pfifo_fast to classify a packet as a high prio packet. This clashes with setups using PRIO with say 7 bands, as one of the band could be incorrectly (mis)classified by FQ. Packets would be queued in the 'internal' queue, and no pacing ever happen for this special queue. Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Maciej Żenczykowski Signed-off-by: Eric Dumazet Cc: Stephen Hemminger Cc: Willem de Bruijn Cc: Yuchung Cheng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 06c375b4936023b32172afe8e6507579a48e2388 Author: Nicolas Dichtel Date: Thu Nov 14 15:47:03 2013 +0100 ip6tnl: fix use after free of fb_tnl_dev [ Upstream commit 1e9f3d6f1c403dd2b6270f654b4747147aa2306f ] Bug has been introduced by commit bb8140947a24 ("ip6tnl: allow to use rtnl ops on fb tunnel"). When ip6_tunnel.ko is unloaded, FB device is delete by rtnl_link_unregister() and then we try to use the pointer in ip6_tnl_destroy_tunnels(). Let's add an handler for dellink, which will never remove the FB tunnel. With this patch it will no more be possible to remove it via 'ip link del ip6tnl0', but it's safer. The same fix was already proposed by Willem de Bruijn for sit interfaces. CC: Willem de Bruijn Reported-by: Steven Rostedt Signed-off-by: Nicolas Dichtel Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c2ef4d4c0e4da81c73db598a8944fcb77bd713d9 Author: Dan Carpenter Date: Thu Nov 14 11:21:10 2013 +0300 isdnloop: use strlcpy() instead of strcpy() [ Upstream commit f9a23c84486ed350cce7bb1b2828abd1f6658796 ] These strings come from a copy_from_user() and there is no way to be sure they are NUL terminated. Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7c9c354a180ab36acfa8ca04c206042ce3d22fb4 Author: Willem de Bruijn Date: Wed Nov 13 21:27:38 2013 -0500 sit: fix use after free of fb_tunnel_dev [ Upstream commit 9434266f2c645d4fcf62a03a8e36ad8075e37943 ] Bug: The fallback device is created in sit_init_net and assumed to be freed in sit_exit_net. First, it is dereferenced in that function, in sit_destroy_tunnels: struct net *net = dev_net(sitn->fb_tunnel_dev); Prior to this, rtnl_unlink_register has removed all devices that match rtnl_link_ops == sit_link_ops. Commit 205983c43700 added the line + sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; which cases the fallback device to match here and be freed before it is last dereferenced. Fix: This commit adds an explicit .delllink callback to sit_link_ops that skips deallocation at rtnl_unlink_register for the fallback device. This mechanism is comparable to the one in ip_tunnel. It also modifies sit_destroy_tunnels and its only caller sit_exit_net to avoid the offending dereference in the first place. That double lookup is more complicated than required. Test: The bug is only triggered when CONFIG_NET_NS is enabled. It causes a GPF only when CONFIG_DEBUG_SLAB is enabled. Verified that this bug exists at the mentioned commit, at davem-net HEAD and at 3.11.y HEAD. Verified that it went away after applying this patch. Fixes: 205983c43700 ("sit: allow to use rtnl ops on fb tunnel") Signed-off-by: Willem de Bruijn Acked-by: Nicolas Dichtel Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a9ac5b0bfce3f9777aa5e1512e5e605f61c81b3f Author: Eric Dumazet Date: Wed Nov 13 15:00:46 2013 -0800 net-tcp: fix panic in tcp_fastopen_cache_set() [ Upstream commit dccf76ca6b626c0c4a4e09bb221adee3270ab0ef ] We had some reports of crashes using TCP fastopen, and Dave Jones gave a nice stack trace pointing to the error. Issue is that tcp_get_metrics() should not be called with a NULL dst Fixes: 1fe4c481ba637 ("net-tcp: Fast Open client - cookie cache") Signed-off-by: Eric Dumazet Reported-by: Dave Jones Cc: Yuchung Cheng Acked-by: Yuchung Cheng Tested-by: Dave Jones Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit accbc62ee04bb72ee23a96ec10b0a6a2e2260a55 Author: Nikolay Aleksandrov Date: Wed Nov 13 17:07:46 2013 +0100 bonding: fix two race conditions in bond_store_updelay/downdelay [ Upstream commit b869ccfab1e324507fa3596e3e1308444fb68227 ] This patch fixes two race conditions between bond_store_updelay/downdelay and bond_store_miimon which could lead to division by zero as miimon can be set to 0 while either updelay/downdelay are being set and thus miss the zero check in the beginning, the zero div happens because updelay/downdelay are stored as new_value / bond->params.miimon. Use rtnl to synchronize with miimon setting. CC: Jay Vosburgh CC: Andy Gospodarek CC: Veaceslav Falico Signed-off-by: Nikolay Aleksandrov Acked-by: Veaceslav Falico Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f4a5ec10024217be0a05f568eca5a93ff756df10 Author: Eric Dumazet Date: Wed Nov 13 06:32:54 2013 -0800 tcp: tsq: restore minimal amount of queueing [ Upstream commit 98e09386c0ef4dfd48af7ba60ff908f0d525cdee ] After commit c9eeec26e32e ("tcp: TSQ can use a dynamic limit"), several users reported throughput regressions, notably on mvneta and wifi adapters. 802.11 AMPDU requires a fair amount of queueing to be effective. This patch partially reverts the change done in tcp_write_xmit() so that the minimal amount is sysctl_tcp_limit_output_bytes. It also remove the use of this sysctl while building skb stored in write queue, as TSO autosizing does the right thing anyway. Users with well behaving NICS and correct qdisc (like sch_fq), can then lower the default sysctl_tcp_limit_output_bytes value from 128KB to 8KB. This new usage of sysctl_tcp_limit_output_bytes permits each driver authors to check how their driver performs when/if the value is set to a minimum of 4KB. Normally, line rate for a single TCP flow should be possible, but some drivers rely on timers to perform TX completion and too long TX completion delays prevent reaching full throughput. Fixes: c9eeec26e32e ("tcp: TSQ can use a dynamic limit") Signed-off-by: Eric Dumazet Reported-by: Sujith Manoharan Reported-by: Arnaud Ebalard Tested-by: Sujith Manoharan Cc: Felix Fietkau Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ebee20ff2444c1b65885ab1869001922e0a1f969 Author: Jason Wang Date: Wed Nov 13 14:00:40 2013 +0800 macvtap: limit head length of skb allocated [ Upstream commit 16a3fa28630331e28208872fa5341ce210b901c7 ] We currently use hdr_len as a hint of head length which is advertised by guest. But when guest advertise a very big value, it can lead to an 64K+ allocating of kmalloc() which has a very high possibility of failure when host memory is fragmented or under heavy stress. The huge hdr_len also reduce the effect of zerocopy or even disable if a gso skb is linearized in guest. To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the head, which guarantees an order 0 allocation each time. Cc: Stefan Hajnoczi Cc: Michael S. Tsirkin Signed-off-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d485b5173b10de877781e6cabf39bee3f0bca5dc Author: Jason Wang Date: Wed Nov 13 14:00:39 2013 +0800 tuntap: limit head length of skb allocated [ Upstream commit 96f8d9ecf227638c89f98ccdcdd50b569891976c ] We currently use hdr_len as a hint of head length which is advertised by guest. But when guest advertise a very big value, it can lead to an 64K+ allocating of kmalloc() which has a very high possibility of failure when host memory is fragmented or under heavy stress. The huge hdr_len also reduce the effect of zerocopy or even disable if a gso skb is linearized in guest. To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the head, which guarantees an order 0 allocation each time. Cc: Stefan Hajnoczi Cc: Michael S. Tsirkin Signed-off-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cec2b41aaf8da29ca3bb2a56a1cd67c6675ed60f Author: Jukka Rissanen Date: Wed Nov 13 11:03:39 2013 +0200 6lowpan: Uncompression of traffic class field was incorrect [ Upstream commit 1188f05497e7bd2f2614b99c54adfbe7413d5749 ] If priority/traffic class field in IPv6 header is set (seen when using ssh), the uncompression sets the TC and Flow fields incorrectly. Example: This is IPv6 header of a sent packet. Note the priority/TC (=1) in the first byte. 00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00 00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00 00000020: 02 1e ab ff fe 4c 52 57 This gets compressed like this in the sending side 00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16 00000010: aa 2d fe 92 86 4e be c6 .... In the receiving end, the packet gets uncompressed to this IPv6 header 00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00 00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00 00000020: ab ff fe 4c 52 57 ec c2 First four bytes are set incorrectly and we have also lost two bytes from destination address. The fix is to switch the case values in switch statement when checking the TC field. Signed-off-by: Jukka Rissanen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c7cf12c15986f580ce4f75ebbcbb8659eb96a506 Author: Alexei Starovoitov Date: Tue Nov 12 14:39:13 2013 -0800 core/dev: do not ignore dmac in dev_forward_skb() [ Upstream commit 81b9eab5ebbf0d5d54da4fc168cfb02c2adc76b8 ] commit 06a23fe31ca3 ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()") and refactoring 64261f230a91 ("dev: move skb_scrub_packet() after eth_type_trans()") are forcing pkt_type to be PACKET_HOST when skb traverses veth. which means that ip forwarding will kick in inside netns even if skb->eth->h_dest != dev->dev_addr Fix order of eth_type_trans() and skb_scrub_packet() in dev_forward_skb() and in ip_tunnel_rcv() Fixes: 06a23fe31ca3 ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()") CC: Isaku Yamahata CC: Maciej Zenczykowski CC: Nicolas Dichtel Signed-off-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit db6b5bf758051d320b6b3363ddd515760010e527 Author: Felix Fietkau Date: Tue Nov 12 16:34:41 2013 +0100 usbnet: fix status interrupt urb handling [ Upstream commit 52f48d0d9aaa621ffa5e08d79da99a3f8c93b848 ] Since commit 7b0c5f21f348a66de495868b8df0284e8dfd6bbf "sierra_net: keep status interrupt URB active", sierra_net triggers status interrupt polling before the net_device is opened (in order to properly receive the sync message response). To be able to receive further interrupts, the interrupt urb needs to be re-submitted, so this patch removes the bogus check for netif_running(). Signed-off-by: Felix Fietkau Tested-by: Dan Williams Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 24e9fcb4cf4653bc4142fd05f3a87dc42f31375a Author: Veaceslav Falico Date: Tue Nov 12 15:37:40 2013 +0100 bonding: don't permit to use ARP monitoring in 802.3ad mode [ Upstream commit ec9f1d15db8185f63a2c3143dc1e90ba18541b08 ] Currently the ARP monitoring is not supported with 802.3ad, and it's prohibited to use it via the module params. However we still can set it afterwards via sysfs, cause we only check for *LB modes there. To fix this - add a check for 802.3ad mode in bonding_store_arp_interval. CC: Jay Vosburgh CC: Andy Gospodarek Signed-off-by: Veaceslav Falico Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d8b3693c2e8727131f4bc0c2b6000e7f563064fb Author: Daniel Borkmann Date: Mon Nov 11 12:20:32 2013 +0100 random32: fix off-by-one in seeding requirement [ Upstream commit 51c37a70aaa3f95773af560e6db3073520513912 ] For properly initialising the Tausworthe generator [1], we have a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15. Commit 697f8d0348 ("random32: seeding improvement") introduced a __seed() function that imposes boundary checks proposed by the errata paper [2] to properly ensure above conditions. However, we're off by one, as the function is implemented as: "return (x < m) ? x + m : x;", and called with __seed(X, 1), __seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15 would be possible, whereas the lower boundary should actually be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise an initialization with an unwanted seed could have the effect that Tausworthe's PRNG properties cannot not be ensured. Note that this PRNG is *not* used for cryptography in the kernel. [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps Joint work with Hannes Frederic Sowa. Fixes: 697f8d0348a6 ("random32: seeding improvement") Cc: Stephen Hemminger Cc: Florian Weimer Cc: Theodore Ts'o Signed-off-by: Daniel Borkmann Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 520e356e35b44957d5578a43e517ac1162141494 Author: Hannes Frederic Sowa Date: Fri Nov 8 19:26:21 2013 +0100 ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh [ Upstream commit f8c31c8f80dd882f7eb49276989a4078d33d67a7 ] Fixes a suspicious rcu derference warning. Cc: Florent Fourcot Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b379b19eb856b6a1d1e88007a930834417a96b11 Author: Duan Jiong Date: Fri Nov 8 09:56:53 2013 +0800 ipv6: use rt6_get_dflt_router to get default router in rt6_route_rcv [ Upstream commit f104a567e673f382b09542a8dc3500aa689957b4 ] As the rfc 4191 said, the Router Preference and Lifetime values in a ::/0 Route Information Option should override the preference and lifetime values in the Router Advertisement header. But when the kernel deals with a ::/0 Route Information Option, the rt6_get_route_info() always return NULL, that means that overriding will not happen, because those default routers were added without flag RTF_ROUTEINFO in rt6_add_dflt_router(). In order to deal with that condition, we should call rt6_get_dflt_router when the prefix length is 0. Signed-off-by: Duan Jiong Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 62e4ad5a4d690c688aa776ab8c264362d4c2f4fc Author: Andreas Henriksson Date: Thu Nov 7 18:26:38 2013 +0100 net: Fix "ip rule delete table 256" [ Upstream commit 13eb2ab2d33c57ebddc57437a7d341995fc9138c ] When trying to delete a table >= 256 using iproute2 the local table will be deleted. The table id is specified as a netlink attribute when it needs more then 8 bits and iproute2 then sets the table field to RT_TABLE_UNSPEC (0). Preconditions to matching the table id in the rule delete code doesn't seem to take the "table id in netlink attribute" into condition so the frh_get_table helper function never gets to do its job when matching against current rule. Use the helper function twice instead of peaking at the table value directly. Originally reported at: http://bugs.debian.org/724783 Reported-by: Nicolas HICHER Signed-off-by: Andreas Henriksson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ef06eb9f5076739ed146d74dcdd307daffcc163a Author: Amir Vadai Date: Thu Nov 7 11:08:30 2013 +0200 net/mlx4_en: Fixed crash when port type is changed [ Upstream commit 1ec4864b10171b0691ee196d7006ae56d2c153f2 ] timecounter_init() was was called only after first potential timecounter_read(). Moved mlx4_en_init_timestamp() before mlx4_en_init_netdev() Signed-off-by: Amir Vadai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 632394c5b26feb80f114a0fc7e9a3e01d464d9ac Author: Andrey Vagin Date: Thu Nov 7 08:35:12 2013 +0400 net: x86: bpf: don't forget to free sk_filter (v2) [ Upstream commit 98bbc06aabac5a2dcc46580d20c59baf8ebe479f ] sk_filter isn't freed if bpf_func is equal to sk_run_filter. This memory leak was introduced by v3.12-rc3-224-gd45ed4a4 "net: fix unsafe set_memory_rw from softirq". Before this patch sk_filter was freed in sk_filter_release_rcu, now it should be freed in bpf_jit_free. Here is output of kmemleak: unreferenced object 0xffff8800b774eab0 (size 128): comm "systemd", pid 1, jiffies 4294669014 (age 124.062s) hex dump (first 32 bytes): 00 00 00 00 0b 00 00 00 20 63 7f b7 00 88 ff ff ........ c...... 60 d4 55 81 ff ff ff ff 30 d9 55 81 ff ff ff ff `.U.....0.U..... backtrace: [] kmemleak_alloc+0x4e/0xb0 [] __kmalloc+0xef/0x260 [] sock_kmalloc+0x38/0x60 [] sk_attach_filter+0x5d/0x190 [] sock_setsockopt+0x991/0x9e0 [] SyS_setsockopt+0xb6/0xd0 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff v2: add extra { } after else Fixes: d45ed4a4e33a ("net: fix unsafe set_memory_rw from softirq") Acked-by: Daniel Borkmann Cc: Alexei Starovoitov Cc: Eric Dumazet Cc: "David S. Miller" Signed-off-by: Andrey Vagin Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6281cb601b21207cde24e0c5b43a3da56227b21d Author: Veaceslav Falico Date: Sat Sep 28 21:18:56 2013 +0200 bonding: RCUify bond_set_rx_mode() [ Upstream commit b32418705107265dfca5edfe2b547643e53a732e ] Currently we rely on rtnl locking in bond_set_rx_mode(), however it's not always the case: RTNL: assertion failed at drivers/net/bonding/bond_main.c (3391) ... [] dump_stack+0x54/0x74 [] bond_set_rx_mode+0xc7/0xd0 [bonding] [] __dev_set_rx_mode+0x57/0xa0 [] __dev_mc_add+0x58/0x70 [] dev_mc_add+0x10/0x20 [] igmp6_group_added+0x18e/0x1d0 [] ? kmem_cache_alloc_trace+0x236/0x260 [] ipv6_dev_mc_inc+0x29f/0x320 [] ipv6_sock_mc_join+0x157/0x260 ... Fix this by using RCU primitives. Reported-by: Joe Lawrence Tested-by: Joe Lawrence CC: Jay Vosburgh CC: Andy Gospodarek Signed-off-by: Veaceslav Falico Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d68268e60d0ec51e6c269f8dc34bb9fcfee970a7 Author: Hannes Frederic Sowa Date: Tue Nov 5 02:41:27 2013 +0100 ipv6: fix headroom calculation in udp6_ufo_fragment [ Upstream commit 0e033e04c2678dbbe74a46b23fffb7bb918c288e ] Commit 1e2bd517c108816220f262d7954b697af03b5f9c ("udp6: Fix udp fragmentation for tunnel traffic.") changed the calculation if there is enough space to include a fragment header in the skb from a skb->mac_header dervived one to skb_headroom. Because we already peeled off the skb to transport_header this is wrong. Change this back to check if we have enough room before the mac_header. This fixes a panic Saran Neti reported. He used the tbf scheduler which skb_gso_segments the skb. The offsets get negative and we panic in memcpy because the skb was erroneously not expanded at the head. Reported-by: Saran Neti Cc: Pravin B Shelar Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4b2867325d676b875154dbe0da8f3c4954dcaacc Author: Dan Carpenter Date: Wed Nov 13 10:52:47 2013 +0300 net: mv643xx_eth: potential NULL dereference in probe() upstream commit 6115c11fe1a5a636ac99fc823b00df4ff3c0674e We assume that "mp->phy" can be NULL a couple lines before the dereference. Fixes: 1cce16d37d0f ('net: mv643xx_eth: Add missing phy_addr_set in DT mode') Signed-off-by: Dan Carpenter Acked-by: Sebastian Hesselbarth Acked-by: Jason Gunthorpe Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit db4968268ffaf57aa36950073b8f24ccb9e186df Author: Jason Gunthorpe Date: Mon Nov 4 17:27:19 2013 -0700 net: mv643xx_eth: Add missing phy_addr_set in DT mode Commit cc9d4598 'net: mv643xx_eth: use of_phy_connect if phy_node present' made the call to phy_scan optional, if the DT has a link to the phy node. However phy_scan has the side effect of calling phy_addr_set, which writes the phy MDIO address to the ethernet controller. If phy_addr_set is not called, and the bootloader has not set the correct address then the driver will fail to function. Tested on Kirkwood. Signed-off-by: Jason Gunthorpe Acked-by: Sebastian Hesselbarth Tested-by: Arnaud Ebalard Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman