commit 35483418917d63df90bae5b2d0b7b047d7ed8ec7
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Mon Dec 14 21:41:43 2015 -0800

    Linux 4.3.3

commit e0aa614dc9ce697f51a0992507b36b3edbc7a641
Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Oct 23 07:52:54 2015 +0100

    Btrfs: fix regression running delayed references when using qgroups
    
    commit b06c4bf5c874a57254b197f53ddf588e7a24a2bf upstream.
    
    In the kernel 4.2 merge window we had a big changes to the implementation
    of delayed references and qgroups which made the no_quota field of delayed
    references not used anymore. More specifically the no_quota field is not
    used anymore as of:
    
      commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.")
    
    Leaving the no_quota field actually prevents delayed references from
    getting merged, which in turn cause the following BUG_ON(), at
    fs/btrfs/extent-tree.c, to be hit when qgroups are enabled:
    
      static int run_delayed_tree_ref(...)
      {
         (...)
         BUG_ON(node->ref_mod != 1);
         (...)
      }
    
    This happens on a scenario like the following:
    
      1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
    
      2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
         It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota.
    
      3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
         It's not merged with the reference at the tail of the list of refs
         for bytenr X because the reference at the tail, Ref2 is incompatible
         due to Ref2->no_quota != Ref3->no_quota.
    
      4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
         It's not merged with the reference at the tail of the list of refs
         for bytenr X because the reference at the tail, Ref3 is incompatible
         due to Ref3->no_quota != Ref4->no_quota.
    
      5) We run delayed references, trigger merging of delayed references,
         through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs().
    
      6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and
         all other conditions are satisfied too. So Ref1 gets a ref_mod
         value of 2.
    
      7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and
         all other conditions are satisfied too. So Ref2 gets a ref_mod
         value of 2.
    
      8) Ref1 and Ref2 aren't merged, because they have different values
         for their no_quota field.
    
      9) Delayed reference Ref1 is picked for running (select_delayed_ref()
         always prefers references with an action == BTRFS_ADD_DELAYED_REF).
         So run_delayed_tree_ref() is called for Ref1 which triggers the
         BUG_ON because Ref1->red_mod != 1 (equals 2).
    
    So fix this by removing the no_quota field, as it's not used anymore as
    of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented
    qgroup mechanism.").
    
    The use of no_quota was also buggy in at least two places:
    
    1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
       no_quota to 0 instead of 1 when the following condition was true:
       is_fstree(ref_root) || !fs_info->quota_enabled
    
    2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
       reset a node's no_quota when the condition "!is_fstree(root_objectid)
       || !root->fs_info->quota_enabled" was true but we did it only in
       an unused local stack variable, that is, we never reset the no_quota
       value in the node itself.
    
    This fixes the remainder of problems several people have been having when
    running delayed references, mostly while a balance is running in parallel,
    on a 4.2+ kernel.
    
    Very special thanks to Stéphane Lesimple for helping debugging this issue
    and testing this fix on his multi terabyte filesystem (which took more
    than one day to balance alone, plus fsck, etc).
    
    Also, this fixes deadlock issue when using the clone ioctl with qgroups
    enabled, as reported by Elias Probst in the mailing list. The deadlock
    happens because after calling btrfs_insert_empty_item we have our path
    holding a write lock on a leaf of the fs/subvol tree and then before
    releasing the path we called check_ref() which did backref walking, when
    qgroups are enabled, and tried to read lock the same leaf. The trace for
    this case is the following:
    
      INFO: task systemd-nspawn:6095 blocked for more than 120 seconds.
      (...)
      Call Trace:
        [<ffffffff86999201>] schedule+0x74/0x83
        [<ffffffff863ef64c>] btrfs_tree_read_lock+0xc0/0xea
        [<ffffffff86137ed7>] ? wait_woken+0x74/0x74
        [<ffffffff8639f0a7>] btrfs_search_old_slot+0x51a/0x810
        [<ffffffff863a129b>] btrfs_next_old_leaf+0xdf/0x3ce
        [<ffffffff86413a00>] ? ulist_add_merge+0x1b/0x127
        [<ffffffff86411688>] __resolve_indirect_refs+0x62a/0x667
        [<ffffffff863ef546>] ? btrfs_clear_lock_blocking_rw+0x78/0xbe
        [<ffffffff864122d3>] find_parent_nodes+0xaf3/0xfc6
        [<ffffffff86412838>] __btrfs_find_all_roots+0x92/0xf0
        [<ffffffff864128f2>] btrfs_find_all_roots+0x45/0x65
        [<ffffffff8639a75b>] ? btrfs_get_tree_mod_seq+0x2b/0x88
        [<ffffffff863e852e>] check_ref+0x64/0xc4
        [<ffffffff863e9e01>] btrfs_clone+0x66e/0xb5d
        [<ffffffff863ea77f>] btrfs_ioctl_clone+0x48f/0x5bb
        [<ffffffff86048a68>] ? native_sched_clock+0x28/0x77
        [<ffffffff863ed9b0>] btrfs_ioctl+0xabc/0x25cb
      (...)
    
    The problem goes away by eleminating check_ref(), which no longer is
    needed as its purpose was to get a value for the no_quota field of
    a delayed reference (this patch removes the no_quota field as mentioned
    earlier).
    
    Reported-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
    Tested-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
    Reported-by: Elias Probst <mail@eliasprobst.eu>
    Reported-by: Peter Becker <floyd.net@gmail.com>
    Reported-by: Malte Schröder <malte@tnxip.de>
    Reported-by: Derek Dongray <derek@valedon.co.uk>
    Reported-by: Erkki Seppala <flux-btrfs@inside.org>
    Cc: stable@vger.kernel.org  # 4.2+
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

commit 4f2347e64803932bc354a12830e3cd10d86e5b49
Author: Hans Verkuil <hverkuil@xs4all.nl>
Date:   Mon Sep 21 08:42:04 2015 -0300

    cobalt: fix Kconfig dependency
    
    commit fc88dd16a0e430f57458e6bd9b62a631c6ea53a1 upstream.
    
    The cobalt driver should depend on VIDEO_V4L2_SUBDEV_API.
    
    This fixes this kbuild error:
    
    tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
    head:   99bc7215bc60f6cd414cf1b85cd9d52cc596cccb
    commit: 85756a069c55e0315ac5990806899cfb607b987f [media] cobalt: add new driver
    config: x86_64-randconfig-s0-09201514 (attached as .config)
    reproduce:
      git checkout 85756a069c55e0315ac5990806899cfb607b987f
      # save the attached .config to linux build tree
      make ARCH=x86_64
    
    All error/warnings (new ones prefixed by >>):
    
       drivers/media/i2c/adv7604.c: In function 'adv76xx_get_format':
    >> drivers/media/i2c/adv7604.c:1853:9: error: implicit declaration of function 'v4l2_subdev_get_try_format' [-Werror=implicit-function-declaration]
          fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
                ^
       drivers/media/i2c/adv7604.c:1853:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
          fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
              ^
       drivers/media/i2c/adv7604.c: In function 'adv76xx_set_format':
       drivers/media/i2c/adv7604.c:1882:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
          fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
              ^
       cc1: some warnings being treated as errors
    
    Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit de48652f0e6acd07b4fa0e58a5e76f43477f8e2a
Author: Lu, Han <han.lu@intel.com>
Date:   Wed Nov 11 16:54:27 2015 +0800

    ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec
    
    commit e2656412f2a7343ecfd13eb74bac0a6e6e9c5aad upstream.
    
    Broxton and Skylake have the same behavior on display audio. So this patch
    applys Skylake fix-ups to Broxton.
    
    Signed-off-by: Lu, Han <han.lu@intel.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4ba2bd2c4729caaf6702c5426ffb7c65b9b1405d
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Thu Nov 12 12:13:57 2015 -0800

    ALSA: pci: depend on ZONE_DMA
    
    commit 2db1a57986d37653583e67ccbf13082aadc8f25d upstream.
    
    There are several sound drivers that 'select ZONE_DMA'.  This is
    backwards as ZONE_DMA is an architecture capability exported to drivers.
    Switch the polarity of the dependency to disable these drivers when the
    architecture does not support ZONE_DMA.  This was discovered in the
    context of testing/enabling devm_memremap_pages() which depends on
    ZONE_DEVICE.  ZONE_DEVICE in turn depends on !ZONE_DMA.
    
    Reported-by: Jeff Moyer <jmoyer@redhat.com>
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4bf8855ea964d7faa5dece0a3dd032d47f4b9a42
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Sep 30 15:04:42 2015 +0200

    ceph: fix message length computation
    
    commit 777d738a5e58ba3b6f3932ab1543ce93703f4873 upstream.
    
    create_request_message() computes the maximum length of a message,
    but uses the wrong type for the time stamp: sizeof(struct timespec)
    may be 8 or 16 depending on the architecture, while sizeof(struct
    ceph_timespec) is always 8, and that is what gets put into the
    message.
    
    Found while auditing the uses of timespec for y2038 problems.
    
    Fixes: b8e69066d8af ("ceph: include time stamp in every MDS request")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Yan, Zheng <zyan@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 278b54df14299e1cfbe6175e2a098bf0b6616611
Author: Ming Lei <ming.lei@canonical.com>
Date:   Tue Nov 24 10:35:29 2015 +0800

    block: fix segment split
    
    commit 578270bfbd2803dc7b0b03fbc2ac119efbc73195 upstream.
    
    Inside blk_bio_segment_split(), previous bvec pointer(bvprvp)
    always points to the iterator local variable, which is obviously
    wrong, so fix it by pointing to the local variable of 'bvprv'.
    
    Fixes: 5014c311baa2b(block: fix bogus compiler warnings in blk-merge.c)
    Reported-by: Michael Ellerman <mpe@ellerman.id.au>
    Reported-by: Mark Salter <msalter@redhat.com>
    Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
    Tested-by: Mark Salter <msalter@redhat.com>
    Signed-off-by: Ming Lei <ming.lei@canonical.com>
    Signed-off-by: Jens Axboe <axboe@fb.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit eeec50cb5826af4c70564ed49d88626439598327
Author: Junxiao Bi <junxiao.bi@oracle.com>
Date:   Fri Nov 20 15:57:30 2015 -0800

    ocfs2: fix umask ignored issue
    
    commit 8f1eb48758aacf6c1ffce18179295adbf3bd7640 upstream.
    
    New created file's mode is not masked with umask, and this makes umask not
    work for ocfs2 volume.
    
    Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure")
    Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Gang He <ghe@suse.com>
    Cc: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3994c2dee338ddc51adc2426d1208b41f04d15eb
Author: Jeff Layton <jlayton@poochiereds.net>
Date:   Wed Nov 25 13:50:11 2015 -0500

    nfs: if we have no valid attrs, then don't declare the attribute cache valid
    
    commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream.
    
    If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
    (correctly) not update any of the attributes, but it then clears the
    NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
    up to date. Don't clear the flag if the fattr struct has no valid
    attrs to apply.
    
    Reviewed-by: Steve French <steve.french@primarydata.com>
    Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4ffcc6f992ca1bef2e710ebe2fde80ced92ec263
Author: Jeff Layton <jlayton@poochiereds.net>
Date:   Wed Nov 25 13:43:14 2015 -0500

    nfs4: resend LAYOUTGET when there is a race that changes the seqid
    
    commit 4f2e9dce0c6348a95eaa56ade9bab18572221088 upstream.
    
    pnfs_layout_process will check the returned layout stateid against what
    the kernel has in-core. If it turns out that the stateid we received is
    older, then we should resend the LAYOUTGET instead of falling back to
    MDS I/O.
    
    Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9a08c6f60bbc067199214b4730df27979716f242
Author: Benjamin Coddington <bcodding@redhat.com>
Date:   Fri Nov 20 09:56:20 2015 -0500

    nfs4: start callback_ident at idr 1
    
    commit c68a027c05709330fe5b2f50c50d5fa02124b5d8 upstream.
    
    If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
    it when the nfs_client is freed.  A decoding or server bug can then find
    and try to put that first nfs_client which would lead to a crash.
    
    Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
    Fixes: d6870312659d ("nfs4client: convert to idr_alloc()")
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8489733094edbec8f3d62f96cd8b15402e66350a
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Thu Nov 5 00:01:51 2015 +0100

    debugfs: fix refcount imbalance in start_creating
    
    commit 0ee9608c89e81a1ccee52ecb58a7ff040e2522d9 upstream.
    
    In debugfs' start_creating(), we pin the file system to safely access
    its root. When we failed to create a file, we unpin the file system via
    failed_creating() to release the mount count and eventually the reference
    of the vfsmount.
    
    However, when we run into an error during lookup_one_len() when still
    in start_creating(), we only release the parent's mutex but not so the
    reference on the mount. Looks like it was done in the past, but after
    splitting portions of __create_file() into start_creating() and
    end_creating() via 190afd81e4a5 ("debugfs: split the beginning and the
    end of __create_file() off"), this seemed missed. Noticed during code
    review.
    
    Fixes: 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4e9f452926507ecabd3030efda61413a7cd9bef2
Author: Andrew Elble <aweits@rit.edu>
Date:   Thu Oct 15 12:07:28 2015 -0400

    nfsd: eliminate sending duplicate and repeated delegations
    
    commit 34ed9872e745fa56f10e9bef2cf3d2336c6c8816 upstream.
    
    We've observed the nfsd server in a state where there are
    multiple delegations on the same nfs4_file for the same client.
    The nfs client does attempt to DELEGRETURN these when they are presented to
    it - but apparently under some (unknown) circumstances the client does not
    manage to return all of them. This leads to the eventual
    attempt to CB_RECALL more than one delegation with the same nfs
    filehandle to the same client. The first recall will succeed, but the
    next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
    having delegations on cl_revoked that the client has no way to FREE
    or DELEGRETURN, with resulting inability to recover. The state manager
    on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
    and the state manager on the client will be looping unable to satisfy
    the server.
    
    List discussion also reports a race between OPEN and DELEGRETURN that
    will be avoided by only sending the delegation once to the
    client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.
    
    So, let's:
    
    1.) Not hand out duplicate delegations.
    2.) Only send them to the client once.
    
    RFC 5561:
    
    9.1.1:
    "Delegations and layouts, on the other hand, are not associated with a
    specific owner but are associated with the client as a whole
    (identified by a client ID)."
    
    10.2:
    "...the stateid for a delegation is associated with a client ID and may be
    used on behalf of all the open-owners for the given client.  A
    delegation is made to the client as a whole and not to any specific
    process or thread of control within it."
    
    Reported-by: Eric Meddaugh <etmsys@rit.edu>
    Cc: Trond Myklebust <trond.myklebust@primarydata.com>
    Cc: Olga Kornievskaia <aglo@umich.edu>
    Signed-off-by: Andrew Elble <aweits@rit.edu>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9a4a72786164f9aa3ca99189ef2948e020d31068
Author: Jeff Layton <jlayton@poochiereds.net>
Date:   Thu Sep 17 07:47:08 2015 -0400

    nfsd: serialize state seqid morphing operations
    
    commit 35a92fe8770ce54c5eb275cd76128645bea2d200 upstream.
    
    Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
    running in parallel. The server would receive the OPEN_DOWNGRADE first
    and check its seqid, but then an OPEN would race in and bump it. The
    OPEN_DOWNGRADE would then complete and bump the seqid again.  The result
    was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
    it should have been rejected since the seqid changed.
    
    The only recourse we have here I think is to serialize operations that
    bump the seqid in a stateid, particularly when we're given a seqid in
    the call. To address this, we add a new rw_semaphore to the
    nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
    after looking up the stateid to ensure that nothing else is going to
    bump it while we're operating on it.
    
    In the case of OPEN, we do a down_read, as the call doesn't contain a
    seqid. Those can run in parallel -- we just need to serialize them when
    there is a concurrent OPEN_DOWNGRADE or CLOSE.
    
    LOCK and LOCKU however always take the write lock as there is no
    opportunity for parallelizing those.
    
    Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
    Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit eee26fbfd7a6970ad29daa83f5d7d83df17b017c
Author: Stefan Richter <stefanr@s5r6.in-berlin.de>
Date:   Tue Nov 3 01:46:21 2015 +0100

    firewire: ohci: fix JMicron JMB38x IT context discovery
    
    commit 100ceb66d5c40cc0c7018e06a9474302470be73c upstream.
    
    Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
    controllers:  Often or even most of the time, the controller is
    initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
    0 IT contexts, quirks 0x10".  With 0 isochronous transmit DMA contexts
    (IT contexts), applications like audio output are impossible.
    
    However, OHCI-1394 demands that at least 4 IT contexts are implemented
    by the link layer controller, and indeed JMicron JMB38x do implement
    four of them.  Only their IsoXmitIntMask register is unreliable at early
    access.
    
    With my own JMB381 single function controller I found:
      - I can reproduce the problem with a lower probability than Craig's.
      - If I put a loop around the section which clears and reads
        IsoXmitIntMask, then either the first or the second attempt will
        return the correct initial mask of 0x0000000f.  I never encountered
        a case of needing more than a second attempt.
      - Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
        before the first write, the subsequent read will return the correct
        result.
      - If I merely ignore a wrong read result and force the known real
        result, later isochronous transmit DMA usage works just fine.
    
    So let's just fix this chip bug up by the latter method.  Tested with
    JMB381 on kernel 3.13 and 4.3.
    
    Since OHCI-1394 generally requires 4 IT contexts at a minium, this
    workaround is simply applied whenever the initial read of IsoXmitIntMask
    returns 0, regardless whether it's a JMicron chip or not.  I never heard
    of this issue together with any other chip though.
    
    I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
    and JMB388 combo controllers exactly the same as on the JMB381 single-
    function controller, but so far I haven't had a chance to let an owner
    of a combo chip run a patched kernel.
    
    Strangely enough, IsoRecvIntMask is always reported correctly, even
    though it is probed right before IsoXmitIntMask.
    
    Reported-by: Clifford Dunn
    Reported-by: Craig Moore <craig.moore@qenos.com>
    Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5cbe4871e7fd45910472918126b9f1e77a5bda84
Author: Daeho Jeong <daeho.jeong@samsung.com>
Date:   Sun Oct 18 17:02:56 2015 -0400

    ext4, jbd2: ensure entering into panic after recording an error in superblock
    
    commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream.
    
    If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
    journaling will be aborted first and the error number will be recorded
    into JBD2 superblock and, finally, the system will enter into the
    panic state in "errors=panic" option.  But, in the rare case, this
    sequence is little twisted like the below figure and it will happen
    that the system enters into panic state, which means the system reset
    in mobile environment, before completion of recording an error in the
    journal superblock. In this case, e2fsck cannot recognize that the
    filesystem failure occurred in the previous run and the corruption
    wouldn't be fixed.
    
    Task A                        Task B
    ext4_handle_error()
    -> jbd2_journal_abort()
      -> __journal_abort_soft()
        -> __jbd2_journal_abort_hard()
        | -> journal->j_flags |= JBD2_ABORT;
        |
        |                         __ext4_abort()
        |                         -> jbd2_journal_abort()
        |                         | -> __journal_abort_soft()
        |                         |   -> if (journal->j_flags & JBD2_ABORT)
        |                         |           return;
        |                         -> panic()
        |
        -> jbd2_journal_update_sb_errno()
    
    Tested-by: Hobin Woo <hobin.woo@samsung.com>
    Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5a4ead78e6a00d20924ea1485d51529d9d6c335f
Author: Lukas Czerner <lczerner@redhat.com>
Date:   Sat Oct 17 22:57:06 2015 -0400

    ext4: fix potential use after free in __ext4_journal_stop
    
    commit 6934da9238da947628be83635e365df41064b09b upstream.
    
    There is a use-after-free possibility in __ext4_journal_stop() in the
    case that we free the handle in the first jbd2_journal_stop() because
    we're referencing handle->h_err afterwards. This was introduced in
    9705acd63b125dee8b15c705216d7186daea4625 and it is wrong. Fix it by
    storing the handle->h_err value beforehand and avoid referencing
    potentially freed handle.
    
    Fixes: 9705acd63b125dee8b15c705216d7186daea4625
    Signed-off-by: Lukas Czerner <lczerner@redhat.com>
    Reviewed-by: Andreas Dilger <adilger@dilger.ca>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bcdde051c4086a43f97119fefd70735b26f924b6
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat Oct 3 10:49:29 2015 -0400

    ext4 crypto: fix bugs in ext4_encrypted_zeroout()
    
    commit 36086d43f6575c081067de9855786a2fc91df77b upstream.
    
    Fix multiple bugs in ext4_encrypted_zeroout(), including one that
    could cause us to write an encrypted zero page to the wrong location
    on disk, potentially causing data and file system corruption.
    Fortunately, this tends to only show up in stress tests, but even with
    these fixes, we are seeing some test failures with generic/127 --- but
    these are now caused by data failures instead of metadata corruption.
    
    Since ext4_encrypted_zeroout() is only used for some optimizations to
    keep the extent tree from being too fragmented, and
    ext4_encrypted_zeroout() itself isn't all that optimized from a time
    or IOPS perspective, disable the extent tree optimization for
    encrypted inodes for now.  This prevents the data corruption issues
    reported by generic/127 until we can figure out what's going wrong.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 99d17a1f12d6e56fa197f9e6c131236df50ef391
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Fri Oct 2 23:54:58 2015 -0400

    ext4 crypto: fix memory leak in ext4_bio_write_page()
    
    commit 937d7b84dca58f2565715f2c8e52f14c3d65fb22 upstream.
    
    There are times when ext4_bio_write_page() is called even though we
    don't actually need to do any I/O.  This happens when ext4_writepage()
    gets called by the jbd2 commit path when an inode needs to force its
    pages written out in order to provide data=ordered guarantees --- and
    a page is backed by an unwritten (e.g., uninitialized) block on disk,
    or if delayed allocation means the page's backing store hasn't been
    allocated yet.  In that case, we need to skip the call to
    ext4_encrypt_page(), since in addition to wasting CPU, it leads to a
    bounce page and an ext4 crypto context getting leaked.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4d8dce2c10dfaba12c767e48d254082333424c7e
Author: Ilya Dryomov <idryomov@gmail.com>
Date:   Fri Nov 27 19:23:24 2015 +0100

    rbd: don't put snap_context twice in rbd_queue_workfn()
    
    commit 70b16db86f564977df074072143284aec2cb1162 upstream.
    
    Commit 4e752f0ab0e8 ("rbd: access snapshot context and mapping size
    safely") moved ceph_get_snap_context() out of rbd_img_request_create()
    and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
    error path in rbd_queue_workfn().  However, rbd_img_request_create()
    consumes a ref on snapc, so calling ceph_put_snap_context() after
    a successful rbd_img_request_create() leads to an extra put.  Fix it.
    
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
    Reviewed-by: Josh Durgin <jdurgin@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f0009492b51e67fc498ab6bcc127c9b418806e71
Author: David Sterba <dsterba@suse.com>
Date:   Mon Nov 9 11:44:45 2015 +0100

    btrfs: fix signed overflows in btrfs_sync_file
    
    commit 9dcbeed4d7e11e1dcf5e55475de3754f0855d1c2 upstream.
    
    The calculation of range length in btrfs_sync_file leads to signed
    overflow. This was caught by PaX gcc SIZE_OVERFLOW plugin.
    
    https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
    
    The fsync call passes 0 and LLONG_MAX, the range length does not fit to
    loff_t and overflows, but the value is converted to u64 so it silently
    works as expected.
    
    The minimal fix is a typecast to u64, switching functions to take
    (start, end) instead of (start, len) would be more intrusive.
    
    Coccinelle script found that there's one more opencoded calculation of
    the length.
    
    <smpl>
    @@
    loff_t start, end;
    @@
    * end - start
    </smpl>
    
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 82c82fadbd75f62bfb9d83eb3a308b55f4c4507d
Author: Filipe Manana <fdmanana@suse.com>
Date:   Mon Nov 9 18:06:38 2015 +0000

    Btrfs: fix race when listing an inode's xattrs
    
    commit f1cd1f0b7d1b5d4aaa5711e8f4e4898b0045cb6d upstream.
    
    When listing a inode's xattrs we have a time window where we race against
    a concurrent operation for adding a new hard link for our inode that makes
    us not return any xattr to user space. In order for this to happen, the
    first xattr of our inode needs to be at slot 0 of a leaf and the previous
    leaf must still have room for an inode ref (or extref) item, and this can
    happen because an inode's listxattrs callback does not lock the inode's
    i_mutex (nor does the VFS does it for us), but adding a hard link to an
    inode makes the VFS lock the inode's i_mutex before calling the inode's
    link callback.
    
    If we have the following leafs:
    
                   Leaf X (has N items)                    Leaf Y
    
     [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 XATTR_ITEM 12345), ... ]
               slot N - 2         slot N - 1              slot 0
    
    The race illustrated by the following sequence diagram is possible:
    
           CPU 1                                               CPU 2
    
      btrfs_listxattr()
    
        searches for key (257 XATTR_ITEM 0)
    
        gets path with path->nodes[0] == leaf X
        and path->slots[0] == N
    
        because path->slots[0] is >=
        btrfs_header_nritems(leaf X), it calls
        btrfs_next_leaf()
    
        btrfs_next_leaf()
          releases the path
    
                                                       adds key (257 INODE_REF 666)
                                                       to the end of leaf X (slot N),
                                                       and leaf X now has N + 1 items
    
          searches for the key (257 INODE_REF 256),
          with path->keep_locks == 1, because that
          is the last key it saw in leaf X before
          releasing the path
    
          ends up at leaf X again and it verifies
          that the key (257 INODE_REF 256) is no
          longer the last key in leaf X, so it
          returns with path->nodes[0] == leaf X
          and path->slots[0] == N, pointing to
          the new item with key (257 INODE_REF 666)
    
        btrfs_listxattr's loop iteration sees that
        the type of the key pointed by the path is
        different from the type BTRFS_XATTR_ITEM_KEY
        and so it breaks the loop and stops looking
        for more xattr items
          --> the application doesn't get any xattr
              listed for our inode
    
    So fix this by breaking the loop only if the key's type is greater than
    BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 42c6cb15b97363f55f34f4845fe2c1503a9558a4
Author: Filipe Manana <fdmanana@suse.com>
Date:   Mon Nov 9 00:33:58 2015 +0000

    Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow
    
    commit 1d512cb77bdbda80f0dd0620a3b260d697fd581d upstream.
    
    If we are using the NO_HOLES feature, we have a tiny time window when
    running delalloc for a nodatacow inode where we can race with a concurrent
    link or xattr add operation leading to a BUG_ON.
    
    This happens because at run_delalloc_nocow() we end up casting a leaf item
    of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
    file extent item (struct btrfs_file_extent_item) and then analyse its
    extent type field, which won't match any of the expected extent types
    (values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
    explicit BUG_ON(1).
    
    The following sequence diagram shows how the race happens when running a
    no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
    neighbour leafs:
    
                 Leaf X (has N items)                    Leaf Y
    
     [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                  slot N - 2         slot N - 1              slot 0
    
     (Note the implicit hole for inode 257 regarding the [0, 8K[ range)
    
           CPU 1                                         CPU 2
    
     run_dealloc_nocow()
       btrfs_lookup_file_extent()
         --> searches for a key with value
             (257 EXTENT_DATA 4096) in the
             fs/subvol tree
         --> returns us a path with
             path->nodes[0] == leaf X and
             path->slots[0] == N
    
       because path->slots[0] is >=
       btrfs_header_nritems(leaf X), it
       calls btrfs_next_leaf()
    
       btrfs_next_leaf()
         --> releases the path
    
                                                  hard link added to our inode,
                                                  with key (257 INODE_REF 500)
                                                  added to the end of leaf X,
                                                  so leaf X now has N + 1 keys
    
         --> searches for the key
             (257 INODE_REF 256), because
             it was the last key in leaf X
             before it released the path,
             with path->keep_locks set to 1
    
         --> ends up at leaf X again and
             it verifies that the key
             (257 INODE_REF 256) is no longer
             the last key in the leaf, so it
             returns with path->nodes[0] ==
             leaf X and path->slots[0] == N,
             pointing to the new item with
             key (257 INODE_REF 500)
    
       the loop iteration of run_dealloc_nocow()
       does not break out the loop and continues
       because the key referenced in the path
       at path->nodes[0] and path->slots[0] is
       for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
       and its offset (500) is less then our delalloc
       range's end (8192)
    
       the item pointed by the path, an inode reference item,
       is (incorrectly) interpreted as a file extent item and
       we get an invalid extent type, leading to the BUG_ON(1):
    
       if (extent_type == BTRFS_FILE_EXTENT_REG ||
          extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
           (...)
       } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
           (...)
       } else {
           BUG_ON(1)
       }
    
    The same can happen if a xattr is added concurrently and ends up having
    a key with an offset smaller then the delalloc's range end.
    
    So fix this by skipping keys with a type smaller than
    BTRFS_EXTENT_DATA_KEY.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 280bd68e66a0fe027ee186a01a9edd869700a295
Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Nov 6 13:33:33 2015 +0000

    Btrfs: fix race leading to incorrect item deletion when dropping extents
    
    commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.
    
    While running a stress test I got the following warning triggered:
    
      [191627.672810] ------------[ cut here ]------------
      [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
      (...)
      [191627.701485] Call Trace:
      [191627.702037]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
      [191627.702992]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
      [191627.704091]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
      [191627.705380]  [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
      [191627.706637]  [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
      [191627.707789]  [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
      [191627.709155]  [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
      [191627.712444]  [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
      [191627.714162]  [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
      [191627.715887]  [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
      [191627.717287]  [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
      [191627.728865]  [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
      [191627.730045]  [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
      [191627.731256]  [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
      [191627.732661]  [<ffffffff81061119>] process_one_work+0x24c/0x4ae
      [191627.733822]  [<ffffffff810615b0>] worker_thread+0x206/0x2c2
      [191627.734857]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
      [191627.736052]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
      [191627.737349]  [<ffffffff810669a6>] kthread+0xef/0xf7
      [191627.738267]  [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
      [191627.739330]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
      [191627.741976]  [<ffffffff81465592>] ret_from_fork+0x42/0x70
      [191627.743080]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
      [191627.744206] ---[ end trace bbfddacb7aaada8d ]---
    
      $ cat -n fs/btrfs/file.c
      691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
      (...)
      758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
      759                  if (key.objectid > ino ||
      760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
      761                          break;
      762
      763                  fi = btrfs_item_ptr(leaf, path->slots[0],
      764                                      struct btrfs_file_extent_item);
      765                  extent_type = btrfs_file_extent_type(leaf, fi);
      766
      767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
      768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
      (...)
      774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
      (...)
      778                  } else {
      779                          WARN_ON(1);
      780                          extent_end = search_start;
      781                  }
      (...)
    
    This happened because the item we were processing did not match a file
    extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
    case we cast the item to a struct btrfs_file_extent_item pointer and
    then find a type field value that does not match any of the expected
    values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
    due to a tiny time window where a race can happen as exemplified below.
    For example, consider the following scenario where we're using the
    NO_HOLES feature and we have the following two neighbour leafs:
    
                   Leaf X (has N items)                    Leaf Y
    
    [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
              slot N - 2         slot N - 1              slot 0
    
    Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
    than explicit because NO_HOLES is enabled). Now if our inode has an
    ordered extent for the range [4K, 8K[ that is finishing, the following
    can happen:
    
              CPU 1                                       CPU 2
    
      btrfs_finish_ordered_io()
        insert_reserved_file_extent()
          __btrfs_drop_extents()
             Searches for the key
              (257 EXTENT_DATA 4096) through
              btrfs_lookup_file_extent()
    
             Key not found and we get a path where
             path->nodes[0] == leaf X and
             path->slots[0] == N
    
             Because path->slots[0] is >=
             btrfs_header_nritems(leaf X), we call
             btrfs_next_leaf()
    
             btrfs_next_leaf() releases the path
    
                                                      inserts key
                                                      (257 INODE_REF 4096)
                                                      at the end of leaf X,
                                                      leaf X now has N + 1 keys,
                                                      and the new key is at
                                                      slot N
    
             btrfs_next_leaf() searches for
             key (257 INODE_REF 256), with
             path->keep_locks set to 1,
             because it was the last key it
             saw in leaf X
    
               finds it in leaf X again and
               notices it's no longer the last
               key of the leaf, so it returns 0
               with path->nodes[0] == leaf X and
               path->slots[0] == N (which is now
               < btrfs_header_nritems(leaf X)),
               pointing to the new key
               (257 INODE_REF 4096)
    
             __btrfs_drop_extents() casts the
             item at path->nodes[0], slot
             path->slots[0], to a struct
             btrfs_file_extent_item - it does
             not skip keys for the target
             inode with a type less than
             BTRFS_EXTENT_DATA_KEY
             (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
    
             sees a bogus value for the type
             field triggering the WARN_ON in
             the trace shown above, and sets
             extent_end = search_start (4096)
    
             does the if-then-else logic to
             fixup 0 length extent items created
             by a past bug from hole punching:
    
               if (extent_end == key.offset &&
                   extent_end >= search_start)
                   goto delete_extent_item;
    
             that evaluates to true and it ends
             up deleting the key pointed to by
             path->slots[0], (257 INODE_REF 4096),
             from leaf X
    
    The same could happen for example for a xattr that ends up having a key
    with an offset value that matches search_start (very unlikely but not
    impossible).
    
    So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
    skipped, never casted to struct btrfs_file_extent_item and never deleted
    by accident. Also protect against the unexpected case of getting a key
    for a lower inode number by skipping that key and issuing a warning.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit fcb184a168452e315676f306b054c908e8ff94ab
Author: Filipe Manana <fdmanana@suse.com>
Date:   Thu Oct 22 09:47:34 2015 +0100

    Btrfs: fix regression when running delayed references
    
    commit 2c3cf7d5f6105bb957df125dfce61d4483b8742d upstream.
    
    In the kernel 4.2 merge window we had a refactoring/rework of the delayed
    references implementation in order to fix certain problems with qgroups.
    However that rework introduced one more regression that leads to the
    following trace when running delayed references for metadata:
    
    [35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
    [35908.065201] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2
    [35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
    [35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
    [35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
    [35908.065201] task: ffff880114b7d780 ti: ffff88010c4c8000 task.ti: ffff88010c4c8000
    [35908.065201] RIP: 0010:[<ffffffffa04928b5>]  [<ffffffffa04928b5>] insert_inline_extent_backref+0x52/0xb1 [btrfs]
    [35908.065201] RSP: 0018:ffff88010c4cbb08  EFLAGS: 00010293
    [35908.065201] RAX: 0000000000000000 RBX: ffff88008a661000 RCX: 0000000000000000
    [35908.065201] RDX: ffffffffa04dd58f RSI: 0000000000000001 RDI: 0000000000000000
    [35908.065201] RBP: ffff88010c4cbb40 R08: 0000000000001000 R09: ffff88010c4cb9f8
    [35908.065201] R10: 0000000000000000 R11: 000000000000002c R12: 0000000000000000
    [35908.065201] R13: ffff88020a74c578 R14: 0000000000000000 R15: 0000000000000000
    [35908.065201] FS:  0000000000000000(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
    [35908.065201] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [35908.065201] CR2: 00000000015e8708 CR3: 0000000102185000 CR4: 00000000000006e0
    [35908.065201] Stack:
    [35908.065201]  ffff88010c4cbb18 0000000000000f37 ffff88020a74c578 ffff88015a408000
    [35908.065201]  ffff880154a44000 0000000000000000 0000000000000005 ffff88010c4cbbd8
    [35908.065201]  ffffffffa0492b9a 0000000000000005 0000000000000000 0000000000000000
    [35908.065201] Call Trace:
    [35908.065201]  [<ffffffffa0492b9a>] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
    [35908.065201]  [<ffffffffa0497117>] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
    [35908.065201]  [<ffffffffa049773d>] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs]
    [35908.065201]  [<ffffffffa04a976a>] ? join_transaction.isra.10+0x25/0x41f [btrfs]
    [35908.065201]  [<ffffffffa04a97ed>] ? join_transaction.isra.10+0xa8/0x41f [btrfs]
    [35908.065201]  [<ffffffffa049914d>] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
    [35908.065201]  [<ffffffffa04992f1>] delayed_ref_async_start+0x3c/0x7b [btrfs]
    [35908.065201]  [<ffffffffa04d4b4f>] normal_work_helper+0x14c/0x32a [btrfs]
    [35908.065201]  [<ffffffffa04d4e93>] btrfs_extent_refs_helper+0x12/0x14 [btrfs]
    [35908.065201]  [<ffffffff81063b23>] process_one_work+0x24a/0x4ac
    [35908.065201]  [<ffffffff81064285>] worker_thread+0x206/0x2c2
    [35908.065201]  [<ffffffff8106407f>] ? rescuer_thread+0x2cb/0x2cb
    [35908.065201]  [<ffffffff8106407f>] ? rescuer_thread+0x2cb/0x2cb
    [35908.065201]  [<ffffffff8106904d>] kthread+0xef/0xf7
    [35908.065201]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
    [35908.065201]  [<ffffffff8147d10f>] ret_from_fork+0x3f/0x70
    [35908.065201]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
    [35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 4c 8b 45 30 8b 4d 28 45 31
    [35908.065201] RIP  [<ffffffffa04928b5>] insert_inline_extent_backref+0x52/0xb1 [btrfs]
    [35908.065201]  RSP <ffff88010c4cbb08>
    [35908.310885] ---[ end trace fe4299baf0666457 ]---
    
    This happens because the new delayed references code no longer merges
    delayed references that have different sequence values. The following
    steps are an example sequence leading to this issue:
    
    1) Transaction N starts, fs_info->tree_mod_seq has value 0;
    
    2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
       bytenr A is created, with a value of 1 and a seq value of 0;
    
    3) fs_info->tree_mod_seq is incremented to 1;
    
    4) Extent buffer A is deleted through btrfs_del_items(), which calls
       btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
       later returns the metadata extent associated to extent buffer A to
       the free space cache (the range is not pinned), because the extent
       buffer was created in the current transaction (N) and writeback never
       happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
       in the extent buffer).
       This creates the delayed reference Ref2 for bytenr A, with a value
       of -1 and a seq value of 1;
    
    5) Delayed reference Ref2 is not merged with Ref1 when we create it,
       because they have different sequence numbers (decided at
       add_delayed_ref_tail_merge());
    
    6) fs_info->tree_mod_seq is incremented to 2;
    
    7) Some task attempts to allocate a new extent buffer (done at
       extent-tree.c:find_free_extent()), but due to heavy fragmentation
       and running low on metadata space the clustered allocation fails
       and we fall back to unclustered allocation, which finds the
       extent at offset A, so a new extent buffer at offset A is allocated.
       This creates delayed reference Ref3 for bytenr A, with a value of 1
       and a seq value of 2;
    
    8) Ref3 is not merged neither with Ref2 nor Ref1, again because they
       all have different seq values;
    
    9) We start running the delayed references (__btrfs_run_delayed_refs());
    
    10) The delayed Ref1 is the first one being applied, which ends up
        creating an inline extent backref in the extent tree;
    
    10) Next the delayed reference Ref3 is selected for execution, and not
        Ref2, because select_delayed_ref() always gives a preference for
        positive references (that have an action of BTRFS_ADD_DELAYED_REF);
    
    11) When running Ref3 we encounter alreay the inline extent backref
        in the extent tree at insert_inline_extent_backref(), which makes
        us hit the following BUG_ON:
    
            BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
    
        This is always true because owner corresponds to the level of the
        extent buffer/btree node in the btree.
    
    For the scenario described above we hit the BUG_ON because we never merge
    references that have different seq values.
    
    We used to do the merging before the 4.2 kernel, more specifically, before
    the commmits:
    
      c6fc24549960 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
      c43d160fcd5e ("btrfs: delayed-ref: Cleanup the unneeded functions.")
    
    This issue became more exposed after the following change that was added
    to 4.2 as well:
    
      cffc3374e567 ("Btrfs: fix order by which delayed references are run")
    
    Which in turn fixed another regression by the two commits previously
    mentioned.
    
    So fix this by bringing back the delayed reference merge code, with the
    proper adaptations so that it operates against the new data structure
    (linked list vs old red black tree implementation).
    
    This issue was hit running fstest btrfs/063 in a loop. Several people have
    reported this issue in the mailing list when running on kernels 4.2+.
    
    Very special thanks to Stéphane Lesimple for helping debugging this issue
    and testing this fix on his multi terabyte filesystem (which took more
    than one day to balance alone, plus fsck, etc).
    
    Fixes: c6fc24549960 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
    Reported-by: Peter Becker <floyd.net@gmail.com>
    Reported-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
    Tested-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
    Reported-by: Malte Schröder <malte@tnxip.de>
    Reported-by: Derek Dongray <derek@valedon.co.uk>
    Reported-by: Erkki Seppala <flux-btrfs@inside.org>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 90291b48b1d907425d8741861fff1dfe4cf7156f
Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Oct 16 12:34:25 2015 +0100

    Btrfs: fix truncation of compressed and inlined extents
    
    commit 0305cd5f7fca85dae392b9ba85b116896eb7c1c7 upstream.
    
    When truncating a file to a smaller size which consists of an inline
    extent that is compressed, we did not discard (or made unusable) the
    data between the new file size and the old file size, wasting metadata
    space and allowing for the truncated data to be leaked and the data
    corruption/loss mentioned below.
    We were also not correctly decrementing the number of bytes used by the
    inode, we were setting it to zero, giving a wrong report for callers of
    the stat(2) syscall. The fsck tool also reported an error about a mismatch
    between the nbytes of the file versus the real space used by the file.
    
    Now because we weren't discarding the truncated region of the file, it
    was possible for a caller of the clone ioctl to actually read the data
    that was truncated, allowing for a security breach without requiring root
    access to the system, using only standard filesystem operations. The
    scenario is the following:
    
       1) User A creates a file which consists of an inline and compressed
          extent with a size of 2000 bytes - the file is not accessible to
          any other users (no read, write or execution permission for anyone
          else);
    
       2) The user truncates the file to a size of 1000 bytes;
    
       3) User A makes the file world readable;
    
       4) User B creates a file consisting of an inline extent of 2000 bytes;
    
       5) User B issues a clone operation from user A's file into its own
          file (using a length argument of 0, clone the whole range);
    
       6) User B now gets to see the 1000 bytes that user A truncated from
          its file before it made its file world readbale. User B also lost
          the bytes in the range [1000, 2000[ bytes from its own file, but
          that might be ok if his/her intention was reading stale data from
          user A that was never supposed to be public.
    
    Note that this contrasts with the case where we truncate a file from 2000
    bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In
    this case reading any byte from the range [1000, 2000[ will return a value
    of 0x00, instead of the original data.
    
    This problem exists since the clone ioctl was added and happens both with
    and without my recent data loss and file corruption fixes for the clone
    ioctl (patch "Btrfs: fix file corruption and data loss after cloning
    inline extents").
    
    So fix this by truncating the compressed inline extents as we do for the
    non-compressed case, which involves decompressing, if the data isn't already
    in the page cache, compressing the truncated version of the extent, writing
    the compressed content into the inline extent and then truncate it.
    
    The following test case for fstests reproduces the problem. In order for
    the test to pass both this fix and my previous fix for the clone ioctl
    that forbids cloning a smaller inline extent into a larger one,
    which is titled "Btrfs: fix file corruption and data loss after cloning
    inline extents", are needed. Without that other fix the test fails in a
    different way that does not leak the truncated data, instead part of
    destination file gets replaced with zeroes (because the destination file
    has a larger inline extent than the source).
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
    
      # real QA test starts here
      _need_to_be_root
      _supported_fs btrfs
      _supported_os Linux
      _require_scratch
      _require_cloner
    
      rm -f $seqres.full
    
      _scratch_mkfs >>$seqres.full 2>&1
      _scratch_mount "-o compress"
    
      # Create our test files. File foo is going to be the source of a clone operation
      # and consists of a single inline extent with an uncompressed size of 512 bytes,
      # while file bar consists of a single inline extent with an uncompressed size of
      # 256 bytes. For our test's purpose, it's important that file bar has an inline
      # extent with a size smaller than foo's inline extent.
      $XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128"   \
              -c "pwrite -S 0x2a 128 384" \
              $SCRATCH_MNT/foo | _filter_xfs_io
      $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io
    
      # Now durably persist all metadata and data. We do this to make sure that we get
      # on disk an inline extent with a size of 512 bytes for file foo.
      sync
    
      # Now truncate our file foo to a smaller size. Because it consists of a
      # compressed and inline extent, btrfs did not shrink the inline extent to the
      # new size (if the extent was not compressed, btrfs would shrink it to 128
      # bytes), it only updates the inode's i_size to 128 bytes.
      $XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo
    
      # Now clone foo's inline extent into bar.
      # This clone operation should fail with errno EOPNOTSUPP because the source
      # file consists only of an inline extent and the file's size is smaller than
      # the inline extent of the destination (128 bytes < 256 bytes). However the
      # clone ioctl was not prepared to deal with a file that has a size smaller
      # than the size of its inline extent (something that happens only for compressed
      # inline extents), resulting in copying the full inline extent from the source
      # file into the destination file.
      #
      # Note that btrfs' clone operation for inline extents consists of removing the
      # inline extent from the destination inode and copy the inline extent from the
      # source inode into the destination inode, meaning that if the destination
      # inode's inline extent is larger (N bytes) than the source inode's inline
      # extent (M bytes), some bytes (N - M bytes) will be lost from the destination
      # file. Btrfs could copy the source inline extent's data into the destination's
      # inline extent so that we would not lose any data, but that's currently not
      # done due to the complexity that would be needed to deal with such cases
      # (specially when one or both extents are compressed), returning EOPNOTSUPP, as
      # it's normally not a very common case to clone very small files (only case
      # where we get inline extents) and copying inline extents does not save any
      # space (unlike for normal, non-inlined extents).
      $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
    
      # Now because the above clone operation used to succeed, and due to foo's inline
      # extent not being shinked by the truncate operation, our file bar got the whole
      # inline extent copied from foo, making us lose the last 128 bytes from bar
      # which got replaced by the bytes in range [128, 256[ from foo before foo was
      # truncated - in other words, data loss from bar and being able to read old and
      # stale data from foo that should not be possible to read anymore through normal
      # filesystem operations. Contrast with the case where we truncate a file from a
      # size N to a smaller size M, truncate it back to size N and then read the range
      # [M, N[, we should always get the value 0x00 for all the bytes in that range.
    
      # We expected the clone operation to fail with errno EOPNOTSUPP and therefore
      # not modify our file's bar data/metadata. So its content should be 256 bytes
      # long with all bytes having the value 0xbb.
      #
      # Without the btrfs bug fix, the clone operation succeeded and resulted in
      # leaking truncated data from foo, the bytes that belonged to its range
      # [128, 256[, and losing data from bar in that same range. So reading the
      # file gave us the following content:
      #
      # 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1
      # *
      # 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a
      # *
      # 0000400
      echo "File bar's content after the clone operation:"
      od -t x1 $SCRATCH_MNT/bar
    
      # Also because the foo's inline extent was not shrunk by the truncate
      # operation, btrfs' fsck, which is run by the fstests framework everytime a
      # test completes, failed reporting the following error:
      #
      #  root 5 inode 257 errors 400, nbytes wrong
    
      status=0
      exit
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8b15aa67a2a49cf332363186f1c8249bdf2059ca
Author: Filipe Manana <fdmanana@suse.com>
Date:   Tue Oct 13 15:15:00 2015 +0100

    Btrfs: fix file corruption and data loss after cloning inline extents
    
    commit 8039d87d9e473aeb740d4fdbd59b9d2f89b2ced9 upstream.
    
    Currently the clone ioctl allows to clone an inline extent from one file
    to another that already has other (non-inlined) extents. This is a problem
    because btrfs is not designed to deal with files having inline and regular
    extents, if a file has an inline extent then it must be the only extent
    in the file and must start at file offset 0. Having a file with an inline
    extent followed by regular extents results in EIO errors when doing reads
    or writes against the first 4K of the file.
    
    Also, the clone ioctl allows one to lose data if the source file consists
    of a single inline extent, with a size of N bytes, and the destination
    file consists of a single inline extent with a size of M bytes, where we
    have M > N. In this case the clone operation removes the inline extent
    from the destination file and then copies the inline extent from the
    source file into the destination file - we lose the M - N bytes from the
    destination file, a read operation will get the value 0x00 for any bytes
    in the the range [N, M] (the destination inode's i_size remained as M,
    that's why we can read past N bytes).
    
    So fix this by not allowing such destructive operations to happen and
    return errno EOPNOTSUPP to user space.
    
    Currently the fstest btrfs/035 tests the data loss case but it totally
    ignores this - i.e. expects the operation to succeed and does not check
    the we got data loss.
    
    The following test case for fstests exercises all these cases that result
    in file corruption and data loss:
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
    
      # real QA test starts here
      _need_to_be_root
      _supported_fs btrfs
      _supported_os Linux
      _require_scratch
      _require_cloner
      _require_btrfs_fs_feature "no_holes"
      _require_btrfs_mkfs_feature "no-holes"
    
      rm -f $seqres.full
    
      test_cloning_inline_extents()
      {
          local mkfs_opts=$1
          local mount_opts=$2
    
          _scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
          _scratch_mount $mount_opts
    
          # File bar, the source for all the following clone operations, consists
          # of a single inline extent (50 bytes).
          $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
              | _filter_xfs_io
    
          # Test cloning into a file with an extent (non-inlined) where the
          # destination offset overlaps that extent. It should not be possible to
          # clone the inline extent from file bar into this file.
          $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
              | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo
    
          # Doing IO against any range in the first 4K of the file should work.
          # Due to a past clone ioctl bug which allowed cloning the inline extent,
          # these operations resulted in EIO errors.
          echo "File foo data after clone operation:"
          # All bytes should have the value 0xaa (clone operation failed and did
          # not modify our file).
          od -t x1 $SCRATCH_MNT/foo
          $XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io
    
          # Test cloning the inline extent against a file which has a hole in its
          # first 4K followed by a non-inlined extent. It should not be possible
          # as well to clone the inline extent from file bar into this file.
          $XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
              | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2
    
          # Doing IO against any range in the first 4K of the file should work.
          # Due to a past clone ioctl bug which allowed cloning the inline extent,
          # these operations resulted in EIO errors.
          echo "File foo2 data after clone operation:"
          # All bytes should have the value 0x00 (clone operation failed and did
          # not modify our file).
          od -t x1 $SCRATCH_MNT/foo2
          $XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io
    
          # Test cloning the inline extent against a file which has a size of zero
          # but has a prealloc extent. It should not be possible as well to clone
          # the inline extent from file bar into this file.
          $XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3
    
          # Doing IO against any range in the first 4K of the file should work.
          # Due to a past clone ioctl bug which allowed cloning the inline extent,
          # these operations resulted in EIO errors.
          echo "First 50 bytes of foo3 after clone operation:"
          # Should not be able to read any bytes, file has 0 bytes i_size (the
          # clone operation failed and did not modify our file).
          od -t x1 $SCRATCH_MNT/foo3
          $XFS_IO_PROG -c "pwrite -S 0xff 0 90" $SCRATCH_MNT/foo3 | _filter_xfs_io
    
          # Test cloning the inline extent against a file which consists of a
          # single inline extent that has a size not greater than the size of
          # bar's inline extent (40 < 50).
          # It should be possible to do the extent cloning from bar to this file.
          $XFS_IO_PROG -f -c "pwrite -S 0x01 0 40" $SCRATCH_MNT/foo4 \
              | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo4
    
          # Doing IO against any range in the first 4K of the file should work.
          echo "File foo4 data after clone operation:"
          # Must match file bar's content.
          od -t x1 $SCRATCH_MNT/foo4
          $XFS_IO_PROG -c "pwrite -S 0x02 0 90" $SCRATCH_MNT/foo4 | _filter_xfs_io
    
          # Test cloning the inline extent against a file which consists of a
          # single inline extent that has a size greater than the size of bar's
          # inline extent (60 > 50).
          # It should not be possible to clone the inline extent from file bar
          # into this file.
          $XFS_IO_PROG -f -c "pwrite -S 0x03 0 60" $SCRATCH_MNT/foo5 \
              | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo5
    
          # Reading the file should not fail.
          echo "File foo5 data after clone operation:"
          # Must have a size of 60 bytes, with all bytes having a value of 0x03
          # (the clone operation failed and did not modify our file).
          od -t x1 $SCRATCH_MNT/foo5
    
          # Test cloning the inline extent against a file which has no extents but
          # has a size greater than bar's inline extent (16K > 50).
          # It should not be possible to clone the inline extent from file bar
          # into this file.
          $XFS_IO_PROG -f -c "truncate 16K" $SCRATCH_MNT/foo6 | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo6
    
          # Reading the file should not fail.
          echo "File foo6 data after clone operation:"
          # Must have a size of 16K, with all bytes having a value of 0x00 (the
          # clone operation failed and did not modify our file).
          od -t x1 $SCRATCH_MNT/foo6
    
          # Test cloning the inline extent against a file which has no extents but
          # has a size not greater than bar's inline extent (30 < 50).
          # It should be possible to clone the inline extent from file bar into
          # this file.
          $XFS_IO_PROG -f -c "truncate 30" $SCRATCH_MNT/foo7 | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo7
    
          # Reading the file should not fail.
          echo "File foo7 data after clone operation:"
          # Must have a size of 50 bytes, with all bytes having a value of 0xbb.
          od -t x1 $SCRATCH_MNT/foo7
    
          # Test cloning the inline extent against a file which has a size not
          # greater than the size of bar's inline extent (20 < 50) but has
          # a prealloc extent that goes beyond the file's size. It should not be
          # possible to clone the inline extent from bar into this file.
          $XFS_IO_PROG -f -c "falloc -k 0 1M" \
                          -c "pwrite -S 0x88 0 20" \
                          $SCRATCH_MNT/foo8 | _filter_xfs_io
          $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo8
    
          echo "File foo8 data after clone operation:"
          # Must have a size of 20 bytes, with all bytes having a value of 0x88
          # (the clone operation did not modify our file).
          od -t x1 $SCRATCH_MNT/foo8
    
          _scratch_unmount
      }
    
      echo -e "\nTesting without compression and without the no-holes feature...\n"
      test_cloning_inline_extents
    
      echo -e "\nTesting with compression and without the no-holes feature...\n"
      test_cloning_inline_extents "" "-o compress"
    
      echo -e "\nTesting without compression and with the no-holes feature...\n"
      test_cloning_inline_extents "-O no-holes" ""
    
      echo -e "\nTesting with compression and with the no-holes feature...\n"
      test_cloning_inline_extents "-O no-holes" "-o compress"
    
      status=0
      exit
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b855aa43eb1c47d6e10fd90df2f0fb8ab6423c12
Author: Robin Ruede <r.ruede@gmail.com>
Date:   Wed Sep 30 21:23:33 2015 +0200

    btrfs: fix resending received snapshot with parent
    
    commit b96b1db039ebc584d03a9933b279e0d3e704c528 upstream.
    
    This fixes a regression introduced by 37b8d27d between v4.1 and v4.2.
    
    When a snapshot is received, its received_uuid is set to the original
    uuid of the subvolume. When that snapshot is then resent to a third
    filesystem, it's received_uuid is set to the second uuid
    instead of the original one. The same was true for the parent_uuid.
    This behaviour was partially changed in 37b8d27d, but in that patch
    only the parent_uuid was taken from the real original,
    not the uuid itself, causing the search for the parent to fail in
    the case below.
    
    This happens for example when trying to send a series of linked
    snapshots (e.g. created by snapper) from the backup file system back
    to the original one.
    
    The following commands reproduce the issue in v4.2.1
    (no error in 4.1.6)
    
        # setup three test file systems
        for i in 1 2 3; do
    	    truncate -s 50M fs$i
    	    mkfs.btrfs fs$i
    	    mkdir $i
    	    mount fs$i $i
        done
        echo "content" > 1/testfile
        btrfs su snapshot -r 1/ 1/snap1
        echo "changed content" > 1/testfile
        btrfs su snapshot -r 1/ 1/snap2
    
        # works fine:
        btrfs send 1/snap1 | btrfs receive 2/
        btrfs send -p 1/snap1 1/snap2 | btrfs receive 2/
    
        # ERROR: could not find parent subvolume
        btrfs send 2/snap1 | btrfs receive 3/
        btrfs send -p 2/snap1 2/snap2 | btrfs receive 3/
    
    Signed-off-by: Robin Ruede <rruede+git@gmail.com>
    Fixes: 37b8d27de5d0 ("Btrfs: use received_uuid of parent during send")
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Tested-by: Ed Tomlinson <edt@aei.ca>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 36204ef3c3a43e1dc015d2401068e4bfb45e1e09
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Dec 1 20:08:51 2015 -0800

    net_sched: fix qdisc_tree_decrease_qlen() races
    
    [ Upstream commit 4eaf3b84f2881c9c028f1d5e76c52ab575fe3a66 ]
    
    qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
    devices.
    
    One problem is that it updates sch->q.qlen and sch->qstats.drops
    on the mq/mqprio root qdisc, while it should not : Daniele
    reported underflows errors :
    [  681.774821] PAX: sch->q.qlen: 0 n: 1
    [  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
    [  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
    [  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
    [  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
    [  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
    [  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
    [  681.774962] Call Trace:
    [  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
    [  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
    [  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
    [  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
    [  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
    [  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
    [  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
    [  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
    [  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
    [  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
    [  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
    [  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
    [  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
    [  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
    
    mq/mqprio have their own ways to report qlen/drops by folding stats on
    all their queues, with appropriate locking.
    
    A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
    without proper locking : concurrent qdisc updates could corrupt the list
    that qdisc_match_from_root() parses to find a qdisc given its handle.
    
    Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
    qdisc_tree_decrease_qlen() can use to abort its tree traversal,
    as soon as it meets a mq/mqprio qdisc children.
    
    Second problem can be fixed by RCU protection.
    Qdisc are already freed after RCU grace period, so qdisc_list_add() and
    qdisc_list_del() simply have to use appropriate rcu list variants.
    
    A future patch will add a per struct netdev_queue list anchor, so that
    qdisc_tree_decrease_qlen() can have more efficient lookups.
    
    Reported-by: Daniele Fucini <dfucini@gmail.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Cong Wang <cwang@twopensource.com>
    Cc: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bae56d7e3f815b339421306fc3b2e00107b56987
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Dec 1 18:33:36 2015 +0100

    openvswitch: fix hangup on vxlan/gre/geneve device deletion
    
    [ Upstream commit 13175303024c8f4cd09e51079a8fcbbe572111ec ]
    
    Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
    to the underlying tunnel device, but never released it when such
    device is deleted.
    Deleting the underlying device via the ip tool cause the kernel to
    hangup in the netdev_wait_allrefs() loop.
    This commit ensure that on device unregistration dp_detach_port_notify()
    is called for all vports that hold the device reference, properly
    releasing it.
    
    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
    Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
    Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Acked-by: Flavio Leitner <fbl@sysclose.org>
    Acked-by: Pravin B Shelar <pshelar@nicira.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7a439c4a677039b91804b6779de18ddab6d6ebf9
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Dec 1 07:20:07 2015 -0800

    ipv6: sctp: implement sctp_v6_destroy_sock()
    
    [ Upstream commit 602dd62dfbda3e63a2d6a3cbde953ebe82bf5087 ]
    
    Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
    
    We need to call inet6_destroy_sock() to properly release
    inet6 specific fields.
    
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a9c680b65484ecb6d1d257f761b2f04e1d9e2d6e
Author: Konstantin Khlebnikov <koct9i@gmail.com>
Date:   Tue Dec 1 01:14:48 2015 +0300

    net/neighbour: fix crash at dumping device-agnostic proxy entries
    
    [ Upstream commit 6adc5fd6a142c6e2c80574c1db0c7c17dedaa42e ]
    
    Proxy entries could have null pointer to net-device.
    
    Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
    Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 645e3f33c73ad1153db0680b6833cf70d0d4dce3
Author: Eric Dumazet <edumazet@google.com>
Date:   Sun Nov 29 19:37:57 2015 -0800

    ipv6: add complete rcu protection around np->opt
    
    [ Upstream commit 45f6fad84cc305103b28d73482b344d7f5b76f39 ]
    
    This patch addresses multiple problems :
    
    UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
    while socket is not locked : Other threads can change np->opt
    concurrently. Dmitry posted a syzkaller
    (http://github.com/google/syzkaller) program desmonstrating
    use-after-free.
    
    Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
    and dccp_v6_request_recv_sock() also need to use RCU protection
    to dereference np->opt once (before calling ipv6_dup_options())
    
    This patch adds full RCU protection to np->opt
    
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 90d19ad685d03197418f5c8e970772192a032a87
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Mon Nov 30 13:02:56 2015 +0100

    bpf, array: fix heap out-of-bounds access when updating elements
    
    [ Upstream commit fbca9d2d35c6ef1b323fae75cc9545005ba25097 ]
    
    During own review but also reported by Dmitry's syzkaller [1] it has been
    noticed that we trigger a heap out-of-bounds access on eBPF array maps
    when updating elements. This happens with each map whose map->value_size
    (specified during map creation time) is not multiple of 8 bytes.
    
    In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and
    used to align array map slots for faster access. However, in function
    array_map_update_elem(), we update the element as ...
    
    memcpy(array->value + array->elem_size * index, value, array->elem_size);
    
    ... where we access 'value' out-of-bounds, since it was allocated from
    map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER)
    and later on copied through copy_from_user(value, uvalue, map->value_size).
    Thus, up to 7 bytes, we can access out-of-bounds.
    
    Same could happen from within an eBPF program, where in worst case we
    access beyond an eBPF program's designated stack.
    
    Since 1be7f75d1668 ("bpf: enable non-root eBPF programs") didn't hit an
    official release yet, it only affects priviledged users.
    
    In case of array_map_lookup_elem(), the verifier prevents eBPF programs
    from accessing beyond map->value_size through check_map_access(). Also
    from syscall side map_lookup_elem() only copies map->value_size back to
    user, so nothing could leak.
    
      [1] http://github.com/google/syzkaller
    
    Fixes: 28fbcfa08d8e ("bpf: add array type of eBPF maps")
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0eaa7b64f7c307249fc28f2a57ff20aa905910bb
Author: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Date:   Tue Nov 24 17:13:21 2015 -0500

    RDS: fix race condition when sending a message on unbound socket
    
    [ Upstream commit 8c7188b23474cca017b3ef354c4a58456f68303a ]
    
    Sasha's found a NULL pointer dereference in the RDS connection code when
    sending a message to an apparently unbound socket.  The problem is caused
    by the code checking if the socket is bound in rds_sendmsg(), which checks
    the rs_bound_addr field without taking a lock on the socket.  This opens a
    race where rs_bound_addr is temporarily set but where the transport is not
    in rds_bind(), leading to a NULL pointer dereference when trying to
    dereference 'trans' in __rds_conn_create().
    
    Vegard wrote a reproducer for this issue, so kindly ask him to share if
    you're interested.
    
    I cannot reproduce the NULL pointer dereference using Vegard's reproducer
    with this patch, whereas I could without.
    
    Complete earlier incomplete fix to CVE-2015-6937:
    
      74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection")
    
    Cc: David S. Miller <davem@davemloft.net>
    Cc: stable@vger.kernel.org
    
    Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
    Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
    Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
    Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit edd6db9dee45cdfd7b76f23ea513f5bab816c0bb
Author: Michal Kubeček <mkubecek@suse.cz>
Date:   Tue Nov 24 15:07:11 2015 +0100

    ipv6: distinguish frag queues by device for multicast and link-local packets
    
    [ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ]
    
    If a fragmented multicast packet is received on an ethernet device which
    has an active macvlan on top of it, each fragment is duplicated and
    received both on the underlying device and the macvlan. If some
    fragments for macvlan are processed before the whole packet for the
    underlying device is reassembled, the "overlapping fragments" test in
    ip6_frag_queue() discards the whole fragment queue.
    
    To resolve this, add device ifindex to the search key and require it to
    match reassembling multicast packets and packets to link-local
    addresses.
    
    Note: similar patch has been already submitted by Yoshifuji Hideaki in
    
      http://patchwork.ozlabs.org/patch/220979/
    
    but got lost and forgotten for some reason.
    
    Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 126bb499631ca8c61db5628bdaaa7294ae73d04f
Author: Ying Xue <ying.xue@windriver.com>
Date:   Tue Nov 24 13:57:57 2015 +0800

    tipc: fix error handling of expanding buffer headroom
    
    [ Upstream commit 7098356baca723513e97ca0020df4e18bc353be3 ]
    
    Coverity says:
    *** CID 1338065:  Error handling issues  (CHECKED_RETURN)
    /net/tipc/udp_media.c: 162 in tipc_udp_send_msg()
    156             struct udp_media_addr *dst = (struct udp_media_addr *)&dest->value;
    157             struct udp_media_addr *src = (struct udp_media_addr *)&b->addr.value;
    158             struct sk_buff *clone;
    159             struct rtable *rt;
    160
    161             if (skb_headroom(skb) < UDP_MIN_HEADROOM)
    >>>     CID 1338065:  Error handling issues  (CHECKED_RETURN)
    >>>     Calling "pskb_expand_head" without checking return value (as is done elsewhere 51 out of 56
    +times).
    162                     pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
    163
    164             clone = skb_clone(skb, GFP_ATOMIC);
    165             skb_set_inner_protocol(clone, htons(ETH_P_TIPC));
    166             ub = rcu_dereference_rtnl(b->media_ptr);
    167             if (!ub) {
    
    When expanding buffer headroom over udp tunnel with pskb_expand_head(),
    it's unfortunate that we don't check its return value. As a result, if
    the function returns an error code due to the lack of memory, it may
    cause unpredictable consequence as we unconditionally consider that
    it's always successful.
    
    Fixes: e53567948f82 ("tipc: conditionally expand buffer headroom over udp tunnel")
    Reported-by: <scan-admin@coverity.com>
    Cc: Stephen Hemminger <stephen@networkplumber.org>
    Signed-off-by: Ying Xue <ying.xue@windriver.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 309ef6d135b8d7baeb06afdea48e611bb16c9ee0
Author: Aaro Koskinen <aaro.koskinen@iki.fi>
Date:   Sun Nov 22 01:08:54 2015 +0200

    broadcom: fix PHY_ID_BCM5481 entry in the id table
    
    [ Upstream commit 3c25a860d17b7378822f35d8c9141db9507e3beb ]
    
    Commit fcb26ec5b18d ("broadcom: move all PHY_ID's to header")
    updated broadcom_tbl to use PHY_IDs, but incorrectly replaced 0x0143bca0
    with PHY_ID_BCM5482 (making a duplicate entry, and completely omitting
    the original). Fix that.
    
    Fixes: fcb26ec5b18d ("broadcom: move all PHY_ID's to header")
    Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b3abad339f8e268bb261e5844ab68b18a7797c29
Author: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date:   Sat Nov 21 19:46:19 2015 +0100

    vrf: fix double free and memory corruption on register_netdevice failure
    
    [ Upstream commit 7f109f7cc37108cba7243bc832988525b0d85909 ]
    
    When vrf's ->newlink is called, if register_netdevice() fails then it
    does free_netdev(), but that's also done by rtnl_newlink() so a second
    free happens and memory gets corrupted, to reproduce execute the
    following line a couple of times (1 - 5 usually is enough):
    $ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
    This works because we fail in register_netdevice() because of the wrong
    name "vrf:".
    
    And here's a trace of one crash:
    [   28.792157] ------------[ cut here ]------------
    [   28.792407] kernel BUG at fs/namei.c:246!
    [   28.792608] invalid opcode: 0000 [#1] SMP
    [   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry
    nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul
    crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse
    glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev
    parport_pc parport serio_raw pcspkr virtio_balloon virtio_console
    i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4
    ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom
    ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix
    libata virtio_pci virtio_ring virtio scsi_mod floppy
    [   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
    4.4.0-rc1+ #24
    [   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
    BIOS 1.8.1-20150318_183358- 04/01/2014
    [   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti:
    ffff88003592c000
    [   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>]
    putname+0x43/0x60
    [   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
    [   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX:
    0000000000000001
    [   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
    ffff88003784f000
    [   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09:
    0000000000000000
    [   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12:
    0000000000000000
    [   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15:
    ffff8800358c4a00
    [   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
    knlGS:0000000000000000
    [   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4:
    00000000000406f0
    [   28.796016] Stack:
    [   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0
    ffff880035a91660
    [   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940
    00ffffff81218684
    [   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000
    ffff880035b36d80
    [   28.796016] Call Trace:
    [   28.796016]  [<ffffffff8121045d>] ?
    do_execveat_common.isra.34+0x74d/0x930
    [   28.796016]  [<ffffffff812102d3>] ?
    do_execveat_common.isra.34+0x5c3/0x930
    [   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
    [   28.796016]  [<ffffffff810939a0>]
    call_usermodehelper_exec_async+0xf0/0x140
    [   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
    [   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
    [   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6
    74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d
    f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9
    [   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
    [   28.796016]  RSP <ffff88003592fe88>
    
    Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
    Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
    Acked-by: David Ahern <dsa@cumulusnetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4eb5e5c531d96b2397ad2b02fc5a496307b2e143
Author: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date:   Fri Nov 20 13:54:20 2015 +0100

    net: ip6mr: fix static mfc/dev leaks on table destruction
    
    [ Upstream commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725 ]
    
    Similar to ipv4, when destroying an mrt table the static mfc entries and
    the static devices are kept, which leads to devices that can never be
    destroyed (because of refcnt taken) and leaked memory. Make sure that
    everything is cleaned up on netns destruction.
    
    Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
    CC: Benjamin Thery <benjamin.thery@bull.net>
    Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
    Reviewed-by: Cong Wang <cwang@twopensource.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 47f706657d85e1fbfb2e5ed8c7b90ff880bd4d6e
Author: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date:   Fri Nov 20 13:54:19 2015 +0100

    net: ipmr: fix static mfc/dev leaks on table destruction
    
    [ Upstream commit 0e615e9601a15efeeb8942cf7cd4dadba0c8c5a7 ]
    
    When destroying an mrt table the static mfc entries and the static
    devices are kept, which leads to devices that can never be destroyed
    (because of refcnt taken) and leaked memory, for example:
    unreferenced object 0xffff880034c144c0 (size 192):
      comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
      hex dump (first 32 bytes):
        98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.....S.4....
        ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  ................
      backtrace:
        [<ffffffff815c1b9e>] kmemleak_alloc+0x4e/0xb0
        [<ffffffff811ea6e0>] kmem_cache_alloc+0x190/0x300
        [<ffffffff815931cb>] ip_mroute_setsockopt+0x5cb/0x910
        [<ffffffff8153d575>] do_ip_setsockopt.isra.11+0x105/0xff0
        [<ffffffff8153e490>] ip_setsockopt+0x30/0xa0
        [<ffffffff81564e13>] raw_setsockopt+0x33/0x90
        [<ffffffff814d1e14>] sock_common_setsockopt+0x14/0x20
        [<ffffffff814d0b51>] SyS_setsockopt+0x71/0xc0
        [<ffffffff815cdbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
        [<ffffffffffffffff>] 0xffffffffffffffff
    
    Make sure that everything is cleaned on netns destruction.
    
    Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
    Reviewed-by: Cong Wang <cwang@twopensource.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 609d60122e815cdad20f887de8d2d60d989ef2cb
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Fri Nov 20 00:11:56 2015 +0100

    net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds
    
    [ Upstream commit 6900317f5eff0a7070c5936e5383f589e0de7a09 ]
    
    David and HacKurx reported a following/similar size overflow triggered
    in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:
    
    (Already fixed in later grsecurity versions by Brad and PaX Team.)
    
    [ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
                   cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
    [ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
    [ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
    [ 1002.296153]  ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
    [ 1002.296162]  ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
    [ 1002.296169]  ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
    [ 1002.296176] Call Trace:
    [ 1002.296190]  [<ffffffff818129ba>] dump_stack+0x45/0x57
    [ 1002.296200]  [<ffffffff8121f838>] report_size_overflow+0x38/0x60
    [ 1002.296209]  [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
    [ 1002.296220]  [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
    [ 1002.296228]  [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
    [ 1002.296236]  [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
    [ 1002.296243]  [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
    [ 1002.296248]  [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
    [ 1002.296257]  [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
    [ 1002.296263]  [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
    [ 1002.296271]  [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85
    
    Further investigation showed that this can happen when an *odd* number of
    fds are being passed over AF_UNIX sockets.
    
    In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
    where i is the number of successfully passed fds, differ by 4 bytes due
    to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
    on 64 bit. The padding is used to align subsequent cmsg headers in the
    control buffer.
    
    When the control buffer passed in from the receiver side *lacks* these 4
    bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
    overflow in scm_detach_fds():
    
      int cmlen = CMSG_LEN(i * sizeof(int));  <--- cmlen w/o tail-padding
      err = put_user(SOL_SOCKET, &cm->cmsg_level);
      if (!err)
        err = put_user(SCM_RIGHTS, &cm->cmsg_type);
      if (!err)
        err = put_user(cmlen, &cm->cmsg_len);
      if (!err) {
        cmlen = CMSG_SPACE(i * sizeof(int));  <--- cmlen w/ 4 byte extra tail-padding
        msg->msg_control += cmlen;
        msg->msg_controllen -= cmlen;         <--- iff no tail-padding space here ...
      }                                            ... wrap-around
    
    F.e. it will wrap to a length of 18446744073709551612 bytes in case the
    receiver passed in msg->msg_controllen of 20 bytes, and the sender
    properly transferred 1 fd to the receiver, so that its CMSG_LEN results
    in 20 bytes and CMSG_SPACE in 24 bytes.
    
    In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
    issue in my tests as alignment seems always on 4 byte boundary. Same
    should be in case of native 32 bit, where we end up with 4 byte boundaries
    as well.
    
    In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
    a single fd would mean that on successful return, msg->msg_controllen is
    being set by the kernel to 24 bytes instead, thus more than the input
    buffer advertised. It could f.e. become an issue if such application later
    on zeroes or copies the control buffer based on the returned msg->msg_controllen
    elsewhere.
    
    Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).
    
    Going over the code, it seems like msg->msg_controllen is not being read
    after scm_detach_fds() in scm_recv() anymore by the kernel, good!
    
    Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
    and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
    and ___sys_recvmsg() places the updated length, that is, new msg_control -
    old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
    in the example).
    
    Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad24a
    ("[NET]: Fix function put_cmsg() which may cause usr application memory
    overflow").
    
    RFC3542, section 20.2. says:
    
      The fields shown as "XX" are possible padding, between the cmsghdr
      structure and the data, and between the data and the next cmsghdr
      structure, if required by the implementation. While sending an
      application may or may not include padding at the end of last
      ancillary data in msg_controllen and implementations must accept both
      as valid. On receiving a portable application must provide space for
      padding at the end of the last ancillary data as implementations may
      copy out the padding at the end of the control message buffer and
      include it in the received msg_controllen. When recvmsg() is called
      if msg_controllen is too small for all the ancillary data items
      including any trailing padding after the last item an implementation
      may set MSG_CTRUNC.
    
    Since we didn't place MSG_CTRUNC for already quite a long time, just do
    the same as in 1ac70e7ad24a to avoid an overflow.
    
    Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
    error in SCM_RIGHTS code sample"). Some people must have copied this (?),
    thus it got triggered in the wild (reported several times during boot by
    David and HacKurx).
    
    No Fixes tag this time as pre 2002 (that is, pre history tree).
    
    Reported-by: David Sterba <dave@jikos.cz>
    Reported-by: HacKurx <hackurx@gmail.com>
    Cc: PaX Team <pageexec@freemail.hu>
    Cc: Emese Revfy <re.emese@gmail.com>
    Cc: Brad Spengler <spender@grsecurity.net>
    Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
    Cc: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b2d8d14ba2c37e3ff2d1267d4a32a32365859b8b
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Nov 26 08:18:14 2015 -0800

    tcp: initialize tp->copied_seq in case of cross SYN connection
    
    [ Upstream commit 142a2e7ece8d8ac0e818eb2c91f99ca894730e2a ]
    
    Dmitry provided a syzkaller (http://github.com/google/syzkaller)
    generated program that triggers the WARNING at
    net/ipv4/tcp.c:1729 in tcp_recvmsg() :
    
    WARN_ON(tp->copied_seq != tp->rcv_nxt &&
            !(flags & (MSG_PEEK | MSG_TRUNC)));
    
    His program is specifically attempting a Cross SYN TCP exchange,
    that we support (for the pleasure of hackers ?), but it looks we
    lack proper tcp->copied_seq initialization.
    
    Thanks again Dmitry for your report and testings.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Tested-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 11617ee108ae5bccaf3064d2304b8608874c85f9
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Nov 18 21:03:33 2015 -0800

    tcp: fix potential huge kmalloc() calls in TCP_REPAIR
    
    [ Upstream commit 5d4c9bfbabdb1d497f21afd81501e5c54b0c85d9 ]
    
    tcp_send_rcvq() is used for re-injecting data into tcp receive queue.
    
    Problems :
    
    - No check against size is performed, allowed user to fool kernel in
      attempting very large memory allocations, eventually triggering
      OOM when memory is fragmented.
    
    - In case of fault during the copy we do not return correct errno.
    
    Lets use alloc_skb_with_frags() to cook optimal skbs.
    
    Fixes: 292e8d8c8538 ("tcp: Move rcvq sending to tcp_input.c")
    Fixes: c0e88ff0f256 ("tcp: Repair socket queues")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Pavel Emelyanov <xemul@parallels.com>
    Acked-by: Pavel Emelyanov <xemul@parallels.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6f5aa0f4aa406332521bde78e2b6290886f3961a
Author: Yuchung Cheng <ycheng@google.com>
Date:   Wed Nov 18 18:17:30 2015 -0800

    tcp: disable Fast Open on timeouts after handshake
    
    [ Upstream commit 0e45f4da5981895e885dd72fe912a3f8e32bae73 ]
    
    Some middle-boxes black-hole the data after the Fast Open handshake
    (https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf).
    The exact reason is unknown. The work-around is to disable Fast Open
    temporarily after multiple recurring timeouts with few or no data
    delivered in the established state.
    
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Christoph Paasch <cpaasch@apple.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4ef5bc914bbd40811ebc0514d84175d303fb8b4f
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Nov 18 12:40:13 2015 -0800

    tcp: md5: fix lockdep annotation
    
    [ Upstream commit 1b8e6a01e19f001e9f93b39c32387961c91ed3cc ]
    
    When a passive TCP is created, we eventually call tcp_md5_do_add()
    with sk pointing to the child. It is not owner by the user yet (we
    will add this socket into listener accept queue a bit later anyway)
    
    But we do own the spinlock, so amend the lockdep annotation to avoid
    following splat :
    
    [ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
    [ 8451.090932]
    [ 8451.090932] other info that might help us debug this:
    [ 8451.090932]
    [ 8451.090934]
    [ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
    [ 8451.090936] 3 locks held by socket_sockopt_/214795:
    [ 8451.090936]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90
    [ 8451.090947]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0
    [ 8451.090952]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500
    [ 8451.090958]
    [ 8451.090958] stack backtrace:
    [ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_
    
    [ 8451.091215] Call Trace:
    [ 8451.091216]  <IRQ>  [<ffffffff856fb29c>] dump_stack+0x55/0x76
    [ 8451.091229]  [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110
    [ 8451.091235]  [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0
    [ 8451.091239]  [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0
    [ 8451.091242]  [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190
    [ 8451.091246]  [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500
    [ 8451.091249]  [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190
    [ 8451.091253]  [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0
    [ 8451.091256]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
    [ 8451.091260]  [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0
    [ 8451.091263]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
    [ 8451.091267]  [<ffffffff85618d38>] ip_local_deliver+0x48/0x80
    [ 8451.091270]  [<ffffffff85618510>] ip_rcv_finish+0x160/0x700
    [ 8451.091273]  [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0
    [ 8451.091277]  [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90
    
    Fixes: a8afca0329988 ("tcp: md5: protects md5sig_info with RCU")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e8afa3cd3e6dfe713e0d7f8ca8062aea50a14725
Author: Bjørn Mork <bjorn@mork.no>
Date:   Wed Nov 18 21:13:07 2015 +0100

    net: qmi_wwan: add XS Stick W100-2 from 4G Systems
    
    [ Upstream commit 68242a5a1e2edce39b069385cbafb82304eac0f1 ]
    
    Thomas reports
    "
    4gsystems sells two total different LTE-surfsticks under the same name.
    ..
    The newer version of XS Stick W100 is from "omega"
    ..
    Under windows the driver switches to the same ID, and uses MI03\6 for
    network and MI01\6 for modem.
    ..
    echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id
    echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id
    
    T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
    D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
    P:  Vendor=1c9e ProdID=9b01 Rev=02.32
    S:  Manufacturer=USB Modem
    S:  Product=USB Modem
    S:  SerialNumber=
    C:  #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
    I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
    I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
    I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
    I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
    I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
    
    Now all important things are there:
    
    wwp0s29f7u2i3 (net), ttyUSB2 (at), cdc-wdm0 (qmi), ttyUSB1 (at)
    
    There is also ttyUSB0, but it is not usable, at least not for at.
    
    The device works well with qmi and ModemManager-NetworkManager.
    "
    
    Reported-by: Thomas Schäfer <tschaefer@t-online.de>
    Signed-off-by: Bjørn Mork <bjorn@mork.no>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d603fa768b63905317e9cf2cfb647ddff03d9715
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Wed Nov 18 16:40:19 2015 +0100

    net/ip6_tunnel: fix dst leak
    
    [ Upstream commit 206b49500df558dbc15d8836b09f6397ec5ed8bb ]
    
    the commit cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
    introduced percpu storage for ip6_tunnel dst cache, but while clearing
    such cache it used raw_cpu_ptr to walk the per cpu entries, so cached
    dst on non current cpu are not actually reset.
    
    This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning
    such storage.
    
    Fixes: cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3f7a8d274e209d0e5dba464618a78c37ec00e899
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Mon Nov 16 13:09:10 2015 -0500

    snmp: Remove duplicate OUTMCAST stat increment
    
    [ Upstream commit 41033f029e393a64e81966cbe34d66c6cf8a2e7e ]
    
    the OUTMCAST stat is double incremented, getting bumped once in the mcast code
    itself, and again in the common ip output path.  Remove the mcast bump, as its
    not needed
    
    Validated by the reporter, with good results
    
    Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
    Reported-by: Claus Jensen <claus.jensen@microsemi.com>
    CC: Claus Jensen <claus.jensen@microsemi.com>
    CC: David Miller <davem@davemloft.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ce2ceca37c846767abceee497f8adbb3328a8dd9
Author: Pavel Fedin <p.fedin@samsung.com>
Date:   Mon Nov 16 17:51:34 2015 +0300

    net: thunder: Check for driver data in nicvf_remove()
    
    [ Upstream commit 7750130d93decff06120df0d8ea024ff8a038a21 ]
    
    In some cases the crash is caused by nicvf_remove() being called from
    outside. For example, if we try to feed the device to vfio after the
    probe has failed for some reason. So, move the check to better place.
    
    Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5eceaad8488b24f98ada6d5dfd5a6ec2df875d07
Author: Dragos Tatulea <dragos@endocode.com>
Date:   Mon Nov 16 10:52:48 2015 +0100

    net: switchdev: fix return code of fdb_dump stub
    
    [ Upstream commit 24cb7055a3066634a0f3fa0cd6a4780652905d35 ]
    
    rtnl_fdb_dump always expects an index to be returned by the ndo_fdb_dump op,
    but when CONFIG_NET_SWITCHDEV is off, it returns an error.
    
    Fix that by returning the given unmodified idx.
    
    A similar fix was 0890cf6cb6ab ("switchdev: fix return value of
    switchdev_port_fdb_dump in case of error") but for the CONFIG_NET_SWITCHDEV=y
    case.
    
    Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.")
    Signed-off-by: Dragos Tatulea <dragos@endocode.com>
    Acked-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5c81f50e7fe60ed1d0820820df63f9193c6d5d0b
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Thu Nov 12 17:35:58 2015 +0100

    ip_tunnel: disable preemption when updating per-cpu tstats
    
    [ Upstream commit b4fe85f9c9146f60457e9512fb6055e69e6a7a65 ]
    
    Drivers like vxlan use the recently introduced
    udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
    makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
    packet, updates the struct stats using the usual
    u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
    udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
    tstats, so drivers like vxlan, immediately after, call
    iptunnel_xmit_stats, which does the same thing - calls
    u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).
    
    While vxlan is probably fine (I don't know?), calling a similar function
    from, say, an unbound workqueue, on a fully preemptable kernel causes
    real issues:
    
    [  188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6
    [  188.435579] caller is debug_smp_processor_id+0x17/0x20
    [  188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2
    [  188.435607] Call Trace:
    [  188.435611]  [<ffffffff8234e936>] dump_stack+0x4f/0x7b
    [  188.435615]  [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0
    [  188.435619]  [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20
    
    The solution would be to protect the whole
    this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
    disabling preemption and then reenabling it.
    
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
    Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 289defb1e093535624376d7098c5364f8804fdc7
Author: Eran Ben Elisha <eranbe@mellanox.com>
Date:   Thu Nov 12 19:35:29 2015 +0200

    net/mlx4_core: Fix sleeping while holding spinlock at rem_slave_counters
    
    [ Upstream commit f5adbfee72282bb1f456d52b04adacd4fe6ac502 ]
    
    When cleaning slave's counter resources, we hold a spinlock that
    protects the slave's counters list. As part of the clean, we call
    __mlx4_clear_if_stat which calls mlx4_alloc_cmd_mailbox which is a
    sleepable function.
    
    In order to fix this issue, hold the spinlock, and copy all counter
    indices into a temporary array, and release the spinlock. Afterwards,
    iterate over this array and free every counter. Repeat this scenario
    until the original list is empty (a new counter might have been added
    while releasing the counters from the temporary array).
    
    Fixes: b72ca7e96acf ("net/mlx4_core: Reset counters data when freed")
    Reported-by: Moni Shoua <monis@mellanox.com>
    Tested-by: Moni Shoua <monis@mellanox.com>
    Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
    Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 713a813ed2fab144e436a9a3e64445a441d9846d
Author: Tariq Toukan <tariqt@mellanox.com>
Date:   Thu Nov 12 19:35:26 2015 +0200

    net/mlx5e: Added self loopback prevention
    
    [ Upstream commit 66189961e986e53ae39822898fc2ce88f44c61bb ]
    
    Prevent outgoing multicast frames from looping back to the RX queue.
    
    By introducing new HW capability self_lb_en_modifiable, which indicates
    the support to modify self_lb_en bit in modify_tir command.
    
    When this capability is set we can prevent TIRs from sending back
    loopback multicast traffic to their own RQs, by "refreshing TIRs" with
    modify_tir command, on every time new channels (SQs/RQs) are created at
    device open.
    This is needed since TIRs are static and only allocated once on driver
    load, and the loopback decision is under their responsibility.
    
    Fixes issues of the kind:
    "IPv6: eth2: IPv6 duplicate address fe80::e61d:2dff:fe5c:f2e9 detected!"
    The issue is seen since the IPv6 solicitations multicast messages are
    loopedback and the network stack thinks they are coming from another host.
    
    Fixes: 5c50368f3831 ("net/mlx5e: Light-weight netdev open/stop")
    Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 176cec782335b95d7830bce4a7e6a1fe0b644594
Author: lucien <lucien.xin@gmail.com>
Date:   Thu Nov 12 13:07:07 2015 +0800

    sctp: translate host order to network order when setting a hmacid
    
    [ Upstream commit ed5a377d87dc4c87fb3e1f7f698cba38cd893103 ]
    
    now sctp auth cannot work well when setting a hmacid manually, which
    is caused by that we didn't use the network order for hmacid, so fix
    it by adding the transformation in sctp_auth_ep_set_hmacs.
    
    even we set hmacid with the network order in userspace, it still
    can't work, because of this condition in sctp_auth_ep_set_hmacs():
    
    		if (id > SCTP_AUTH_HMAC_ID_MAX)
    			return -EOPNOTSUPP;
    
    so this wasn't working before and thus it won't break compatibility.
    
    Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
    Signed-off-by: Xin Long <lucien.xin@gmail.com>
    Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Acked-by: Neil Horman <nhorman@tuxdriver.com>
    Acked-by: Vlad Yasevich <vyasevich@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 377204f209fd51a24a359873ad44a97e4b5b72c8
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed Nov 11 23:25:44 2015 +0100

    packet: fix tpacket_snd max frame len
    
    [ Upstream commit 5cfb4c8d05b4409c4044cb9c05b19705c1d9818b ]
    
    Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and
    packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
    side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
    reserve check should have reserve as 0, but currently, this is
    unconditionally set (in it's original form as dev->hard_header_len).
    
    I think this is not correct since tpacket_fill_skb() would then
    take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
    the extra VLAN_HLEN could be possible in both cases. Presumably, the
    reserve code was copied from packet_snd(), but later on missed the
    check. Make it similar as we have it in packet_snd().
    
    Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 263c8f423ade0e7bd72ea82e36da3a458fac1e30
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed Nov 11 23:25:43 2015 +0100

    packet: infer protocol from ethernet header if unset
    
    [ Upstream commit c72219b75fde768efccf7666342282fab7f9e4e7 ]
    
    In case no struct sockaddr_ll has been passed to packet
    socket's sendmsg() when doing a TX_RING flush run, then
    skb->protocol is set to po->num instead, which is the protocol
    passed via socket(2)/bind(2).
    
    Applications only xmitting can go the path of allocating the
    socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
    TX_RING with sll_protocol of 0. That way, register_prot_hook()
    is neither called on creation nor on bind time, which saves
    cycles when there's no interest in capturing anyway.
    
    That leaves us however with po->num 0 instead and therefore
    the TX_RING flush run sets skb->protocol to 0 as well. Eric
    reported that this leads to problems when using tools like
    trafgen over bonding device. I.e. the bonding's hash function
    could invoke the kernel's flow dissector, which depends on
    skb->protocol being properly set. In the current situation, all
    the traffic is then directed to a single slave.
    
    Fix it up by inferring skb->protocol from the Ethernet header
    when not set and we have ARPHRD_ETHER device type. This is only
    done in case of SOCK_RAW and where we have a dev->hard_header_len
    length. In case of ARPHRD_ETHER devices, this is guaranteed to
    cover ETH_HLEN, and therefore being accessed on the skb after
    the skb_store_bits().
    
    Reported-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bbf14f8bb1041b5c11b9ccf998a075a03f04de3d
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed Nov 11 23:25:42 2015 +0100

    packet: only allow extra vlan len on ethernet devices
    
    [ Upstream commit 3c70c132488794e2489ab045559b0ce0afcf17de ]
    
    Packet sockets can be used by various net devices and are not
    really restricted to ARPHRD_ETHER device types. However, when
    currently checking for the extra 4 bytes that can be transmitted
    in VLAN case, our assumption is that we generally probe on
    ARPHRD_ETHER devices. Therefore, before looking into Ethernet
    header, check the device type first.
    
    This also fixes the issue where non-ARPHRD_ETHER devices could
    have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
    the check would test unfilled linear part of the skb (instead
    of non-linear).
    
    Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
    Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6b4a14f4265267d8d1ff6b2465e0be19da0c256a
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed Nov 11 23:25:41 2015 +0100

    packet: always probe for transport header
    
    [ Upstream commit 8fd6c80d9dd938ca338c70698533a7e304752846 ]
    
    We concluded that the skb_probe_transport_header() should better be
    called unconditionally. Avoiding the call into the flow dissector has
    also not really much to do with the direct xmit mode.
    
    While it seems that only virtio_net code makes use of GSO from non
    RX/TX ring packet socket paths, we should probe for a transport header
    nevertheless before they hit devices.
    
    Reference: http://thread.gmane.org/gmane.linux.network/386173/
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1a43c235781426c2b4368bffb6478c379c4529aa
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed Nov 11 23:25:40 2015 +0100

    packet: do skb_probe_transport_header when we actually have data
    
    [ Upstream commit efdfa2f7848f64517008136fb41f53c4a1faf93a ]
    
    In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport
    header before doing xmit") and later on 40893fd0fd4e ("net: switch
    to use skb_probe_transport_header()") was probing for a transport
    header on the skb from a ring buffer slot, but at a time, where
    the skb has _not even_ been filled with data yet. So that call into
    the flow dissector is pretty useless. Lets do it after we've set
    up the skb frags.
    
    Fixes: c1aad275b029 ("packet: set transport header before doing xmit")
    Reported-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 55eabd378b2e07bc2b677f07fd142365be68cb2b
Author: Kamal Mostafa <kamal@canonical.com>
Date:   Wed Nov 11 14:24:27 2015 -0800

    tools/net: Use include/uapi with __EXPORTED_HEADERS__
    
    [ Upstream commit d7475de58575c904818efa369c82e88c6648ce2e ]
    
    Use the local uapi headers to keep in sync with "recently" added #define's
    (e.g. SKF_AD_VLAN_TPID).  Refactored CFLAGS, and bpf_asm doesn't need -I.
    
    Fixes: 3f356385e8a4 ("filter: bpf_asm: add minimal bpf asm tool")
    Signed-off-by: Kamal Mostafa <kamal@canonical.com>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ea9b07516f19f4c6fc14697cb6c70bb9b74e3bfc
Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date:   Fri Nov 27 18:17:05 2015 +0100

    Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"
    
    [ Upstream commit 304d888b29cf96f1dd53511ee686499cd8cdf249 ]
    
    This reverts commit ab450605b35caa768ca33e86db9403229bf42be4.
    
    In IPv6, we cannot inherit the dst of the original dst. ndisc packets
    are IPv6 packets and may take another route than the original packet.
    
    This patch breaks the following scenario: a packet comes from eth0 and
    is forwarded through vxlan1. The encapsulated packet triggers an NS
    which cannot be sent because of the wrong route.
    
    CC: Jiri Benc <jbenc@redhat.com>
    CC: Thomas Graf <tgraf@suug.ch>
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4ab43ae83f90543998b1640fb2cb556ce1a604c0
Author: Martin KaFai Lau <kafai@fb.com>
Date:   Wed Nov 11 11:51:08 2015 -0800

    ipv6: Check rt->dst.from for the DST_NOCACHE route
    
    [ Upstrem commit 02bcf4e082e4dc634409a6a6cb7def8806d6e5e6 ]
    
    All DST_NOCACHE rt6_info used to have rt->dst.from set to
    its parent.
    
    After commit 8e3d5be73681 ("ipv6: Avoid double dst_free"),
    DST_NOCACHE is also set to rt6_info which does not have
    a parent (i.e. rt->dst.from is NULL).
    
    This patch catches the rt->dst.from == NULL case.
    
    Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free")
    Signed-off-by: Martin KaFai Lau <kafai@fb.com>
    Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit dd91a7e65f8008ac989d61131670949feec08d23
Author: Martin KaFai Lau <kafai@fb.com>
Date:   Wed Nov 11 11:51:07 2015 -0800

    ipv6: Check expire on DST_NOCACHE route
    
    [ Upstream commit 5973fb1e245086071bf71994c8b54d99526ded03 ]
    
    Since the expires of the DST_NOCACHE rt can be set during
    the ip6_rt_update_pmtu(), we also need to consider the expires
    value when doing ip6_dst_check().
    
    This patches creates __rt6_check_expired() to only
    check the expire value (if one exists) of the current rt.
    
    In rt6_dst_from_check(), it adds __rt6_check_expired() as
    one of the condition check.
    
    Signed-off-by: Martin KaFai Lau <kafai@fb.com>
    Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit fdaac6a92778b95ad41e65b6f9a88b454899d8d4
Author: Martin KaFai Lau <kafai@fb.com>
Date:   Wed Nov 11 11:51:06 2015 -0800

    ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree
    
    [ Upstream commit 0d3f6d297bfb7af24d0508460fdb3d1ec4903fa3 ]
    
    The original bug report:
    https://bugzilla.redhat.com/show_bug.cgi?id=1272571
    
    The setup has a IPv4 GRE tunnel running in a IPSec.  The bug
    happens when ndisc starts sending router solicitation at the gre
    interface.  The simplified oops stack is like:
    
    __lock_acquire+0x1b2/0x1c30
    lock_acquire+0xb9/0x140
    _raw_write_lock_bh+0x3f/0x50
    __ip6_ins_rt+0x2e/0x60
    ip6_ins_rt+0x49/0x50
    ~~~~~~~~
    __ip6_rt_update_pmtu.part.54+0x145/0x250
    ip6_rt_update_pmtu+0x2e/0x40
    ~~~~~~~~
    ip_tunnel_xmit+0x1f1/0xf40
    __gre_xmit+0x7a/0x90
    ipgre_xmit+0x15a/0x220
    dev_hard_start_xmit+0x2bd/0x480
    __dev_queue_xmit+0x696/0x730
    dev_queue_xmit+0x10/0x20
    neigh_direct_output+0x11/0x20
    ip6_finish_output2+0x21f/0x770
    ip6_finish_output+0xa7/0x1d0
    ip6_output+0x56/0x190
    ~~~~~~~~
    ndisc_send_skb+0x1d9/0x400
    ndisc_send_rs+0x88/0xc0
    ~~~~~~~~
    
    The rt passed to ip6_rt_update_pmtu() is created by
    icmp6_dst_alloc() and it is not managed by the fib6 tree,
    so its rt6i_table == NULL.  When __ip6_rt_update_pmtu() creates
    a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
    and it causes the ip6_ins_rt() oops.
    
    During pmtu update, we only want to create a RTF_CACHE clone
    from a rt which is currently managed (or owned) by the
    fib6 tree.  It means either rt->rt6i_node != NULL or
    rt is a RTF_PCPU clone.
    
    It is worth to note that rt6i_table may not be NULL even it is
    not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
    Hence, rt6i_node is a better check instead of rt6i_table.
    
    Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
    Signed-off-by: Martin KaFai Lau <kafai@fb.com>
    Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
    Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
    Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 343f4f8c013a6e41ba51da7f00c3e5c95197fa59
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Thu Nov 26 12:08:18 2015 +0100

    af-unix: passcred support for sendpage
    
    [ Upstream commit 9490f886b192964796285907d777ff00fba1fa0f ]
    
    sendpage did not care about credentials at all. This could lead to
    situations in which because of fd passing between processes we could
    append data to skbs with different scm data. It is illegal to splice those
    skbs together. Instead we have to allocate a new skb and if requested
    fill out the scm details.
    
    Fixes: 869e7c62486ec ("net: af_unix: implement stream sendpage support")
    Reported-by: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
Author: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Date:   Fri Nov 20 22:07:23 2015 +0000

    unix: avoid use-after-free in ep_remove_wait_queue
    
    [ Upstream commit 7d267278a9ece963d77eefec61630223fce08c6c ]
    
    Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
    An AF_UNIX datagram socket being the client in an n:1 association with
    some server socket is only allowed to send messages to the server if the
    receive queue of this socket contains at most sk_max_ack_backlog
    datagrams. This implies that prospective writers might be forced to go
    to sleep despite none of the message presently enqueued on the server
    receive queue were sent by them. In order to ensure that these will be
    woken up once space becomes again available, the present unix_dgram_poll
    routine does a second sock_poll_wait call with the peer_wait wait queue
    of the server socket as queue argument (unix_dgram_recvmsg does a wake
    up on this queue after a datagram was received). This is inherently
    problematic because the server socket is only guaranteed to remain alive
    for as long as the client still holds a reference to it. In case the
    connection is dissolved via connect or by the dead peer detection logic
    in unix_dgram_sendmsg, the server socket may be freed despite "the
    polling mechanism" (in particular, epoll) still has a pointer to the
    corresponding peer_wait queue. There's no way to forcibly deregister a
    wait queue with epoll.
    
    Based on an idea by Jason Baron, the patch below changes the code such
    that a wait_queue_t belonging to the client socket is enqueued on the
    peer_wait queue of the server whenever the peer receive queue full
    condition is detected by either a sendmsg or a poll. A wake up on the
    peer queue is then relayed to the ordinary wait queue of the client
    socket via wake function. The connection to the peer wait queue is again
    dissolved if either a wake up is about to be relayed or the client
    socket reconnects or a dead peer is detected or the client socket is
    itself closed. This enables removing the second sock_poll_wait from
    unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
    that no blocked writer sleeps forever.
    
    Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
    Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
    Reviewed-by: Jason Baron <jbaron@akamai.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 26c04dfb4cfa10e6ed8cafd63313fef2be865fa0
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Tue Nov 17 15:10:59 2015 +0100

    af_unix: take receive queue lock while appending new skb
    
    [ Upstream commit a3a116e04cc6a94d595ead4e956ab1bc1d2f4746 ]
    
    While possibly in future we don't necessarily need to use
    sk_buff_head.lock this is a rather larger change, as it affects the
    af_unix fd garbage collector, diag and socket cleanups. This is too much
    for a stable patch.
    
    For the time being grab sk_buff_head.lock without disabling bh and irqs,
    so don't use locked skb_queue_tail.
    
    Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
    Cc: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Reported-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a78f42950a48ed4f4721b29eb228d709ac91af9b
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Mon Nov 16 16:25:56 2015 +0100

    af_unix: don't append consumed skbs to sk_receive_queue
    
    [ Upstream commit 8844f97238ca6c1ca92a5d6c69f53efd361a266f ]
    
    In case multiple writes to a unix stream socket race we could end up in a
    situation where we pre-allocate a new skb for use in unix_stream_sendpage
    but have to free it again in the locked section because another skb
    has been appended meanwhile, which we must use. Accidentally we didn't
    clear the pointer after consuming it and so we touched freed memory
    while appending it to the sk_receive_queue. So, clear the pointer after
    consuming the skb.
    
    This bug has been found with syzkaller
    (http://github.com/google/syzkaller) by Dmitry Vyukov.
    
    Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 382c8e80e40bcadb90ba1a156f522f7b5a22109f
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Tue Nov 10 16:23:15 2015 +0100

    af-unix: fix use-after-free with concurrent readers while splicing
    
    [ Upstream commit 73ed5d25dce0354ea381d6dc93005c3085fae03d ]
    
    During splicing an af-unix socket to a pipe we have to drop all
    af-unix socket locks. While doing so we allow another reader to enter
    unix_stream_read_generic which can read, copy and finally free another
    skb. If exactly this skb is just in process of being spliced we get a
    use-after-free report by kasan.
    
    First, we must make sure to not have a free while the skb is used during
    the splice operation. We simply increment its use counter before unlocking
    the reader lock.
    
    Stream sockets have the nice characteristic that we don't care about
    zero length writes and they never reach the peer socket's queue. That
    said, we can take the UNIXCB.consumed field as the indicator if the
    skb was already freed from the socket's receive queue. If the skb was
    fully consumed after we locked the reader side again we know it has been
    dropped by a second reader. We indicate a short read to user space and
    abort the current splice operation.
    
    This bug has been found with syzkaller
    (http://github.com/google/syzkaller) by Dmitry Vyukov.
    
    Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 808e146b0ae32eb615e048c3b87016fcad3dd500
Author: françois romieu <romieu@fr.zoreil.com>
Date:   Wed Nov 11 23:35:18 2015 +0100

    r8169: fix kasan reported skb use-after-free.
    
    [ Upstream commit 39174291d8e8acfd1113214a943263aaa03c57c8 ]
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
    Reported-by: Dave Jones <davej@codemonkey.org.uk>
    Fixes: d7d2d89d4b0af ("r8169: Add software counter for multicast packages")
    Acked-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Corinna Vinschen <vinschen@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bfa97da9146ac6b41f9b749e7dfd0921eeff9170
Author: Paul Gortmaker <paul.gortmaker@windriver.com>
Date:   Wed Oct 21 14:04:47 2015 +0100

    certs: add .gitignore to stop git nagging about x509_certificate_list
    
    commit 48dbc164b40dd9195dea8cd966e394819e420b64 upstream.
    
    Currently we see this in "git status" if we build in the source dir:
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
    
            certs/x509_certificate_list
    
    It looks like it used to live in kernel/ so we squash that .gitignore
    entry at the same time.  I didn't bother to dig through git history to
    see when it moved, since it is just a minor annoyance at most.
    
    Cc: David Woodhouse <dwmw2@infradead.org>
    Cc: keyrings@linux-nfs.org
    Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>