commit 3ed9fdb7ac17e98f8501bcbcf78d5374a929ef0e Author: Willy Tarreau Date: Sun Oct 7 23:36:59 2012 +0200 Linux 2.6.32.60 Signed-off-by: Willy Tarreau commit 9eee5ee58eb31de674689184b49e2ba810124227 Author: Tony Luck Date: Fri Jul 20 13:15:20 2012 -0700 dmi: Feed DMI table to /dev/random driver commit d114a33387472555188f142ed8e98acdb8181c6d upstream. Send the entire DMI (SMBIOS) table to the /dev/random driver to help seed its pools. Signed-off-by: Tony Luck Signed-off-by: Theodore Ts'o Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 708daa15c3eb2e056c06a48bc2115eda46a777f0 Author: Mark Brown Date: Thu Jul 5 20:23:21 2012 +0000 mfd: wm831x: Feed the device UUID into device_add_randomness() commit 27130f0cc3ab97560384da437e4621fc4e94f21c upstream. wm831x devices contain a unique ID value. Feed this into the newly added device_add_randomness() to add some per device seed data to the pool. Signed-off-by: Mark Brown Signed-off-by: Theodore Ts'o Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 66e540e96c3ad9e73e49e0fc7292bfe5d5e2bb11 Author: Mark Brown Date: Thu Jul 5 20:19:17 2012 +0000 rtc: wm831x: Feed the write counter into device_add_randomness() commit 9dccf55f4cb011a7552a8a2749a580662f5ed8ed upstream. The tamper evident features of the RTC include the "write counter" which is a pseudo-random number regenerated whenever we set the RTC. Since this value is unpredictable it should provide some useful seeding to the random number generator. Only do this on boot since the goal is to seed the pool rather than add useful entropy. Signed-off-by: Mark Brown Signed-off-by: Theodore Ts'o Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit a76be4858901fd9fd0fb8f594f660d1037cb03d8 Author: Tony Luck Date: Mon Jul 23 09:47:57 2012 -0700 random: Add comment to random_initialize() commit cbc96b7594b5691d61eba2db8b2ea723645be9ca upstream. Many platforms have per-machine instance data (serial numbers, asset tags, etc.) squirreled away in areas that are accessed during early system bringup. Mixing this data into the random pools has a very high value in providing better random data, so we should allow (and even encourage) architecture code to call add_device_randomness() from the setup_arch() paths. However, this limits our options for internal structure of the random driver since random_initialize() is not called until long after setup_arch(). Add a big fat comment to rand_initialize() spelling out this requirement. Suggested-by: Theodore Ts'o Signed-off-by: Tony Luck Signed-off-by: Theodore Ts'o Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit fe1f4c7d2602bda2dddb13fd34e9678514478e12 Author: Theodore Ts'o Date: Sat Jul 14 20:27:52 2012 -0400 random: remove rand_initialize_irq() commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream. With the new interrupt sampling system, we are no longer using the timer_rand_state structure in the irq descriptor, so we can stop initializing it now. [ Merged in fixes from Sedat to find some last missing references to rand_initialize_irq() ] Signed-off-by: "Theodore Ts'o" Signed-off-by: Sedat Dilek [PG: in .34 the irqdesc.h content is in irq.h instead.] Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 4bc4a0f1e034193d7c78783271053d9b00e120e1 Author: Theodore Ts'o Date: Wed Jul 4 21:23:25 2012 -0400 net: feed /dev/random with the MAC address when registering a device commit 7bf2357524408b97fec58344caf7397f8140c3fd upstream. Cc: David Miller Cc: Linus Torvalds Signed-off-by: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit d86264c4f3d78c65be21fde93ad0d8f5cf61eee6 Author: Theodore Ts'o Date: Wed Jul 4 11:22:20 2012 -0400 usb: feed USB device information to the /dev/random driver commit b04b3156a20d395a7faa8eed98698d1e17a36000 upstream. Send the USB device's serial, product, and manufacturer strings to the /dev/random driver to help seed its pools. Cc: Linus Torvalds Acked-by: Greg KH Signed-off-by: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit f2652a50ab5bd7a65c9c4156648cd75b2ed7e53d Author: Theodore Ts'o Date: Wed Jul 4 11:32:48 2012 -0400 MAINTAINERS: Theodore Ts'o is taking over the random driver commit 330e0a01d54c2b8606c56816f99af6ebc58ec92c upstream. Matt Mackall stepped down as the /dev/random driver maintainer last year, so Theodore Ts'o is taking back the /dev/random driver. Cc: Matt Mackall Signed-off-by: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 6d582d5fbb1aab4202a0387383b020d8db73b681 Author: H. Peter Anvin Date: Fri Jul 27 22:26:08 2012 -0400 random: mix in architectural randomness in extract_buf() commit d2e7c96af1e54b507ae2a6a7dd2baf588417a7e5 upstream. Mix in any architectural randomness in extract_buf() instead of xfer_secondary_buf(). This allows us to mix in more architectural randomness, and it also makes xfer_secondary_buf() faster, moving a tiny bit of additional CPU overhead to process which is extracting the randomness. [ Commit description modified by tytso to remove an extended advertisement for the RDRAND instruction. ] Signed-off-by: H. Peter Anvin Acked-by: Ingo Molnar Cc: DJ Johnston Signed-off-by: Theodore Ts'o Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 6ed1ed8d82fa11887f2b2c72d522c888fb4d3262 Author: Theodore Ts'o Date: Thu Jul 5 10:35:23 2012 -0400 random: add new get_random_bytes_arch() function commit c2557a303ab6712bb6e09447df828c557c710ac9 upstream. Create a new function, get_random_bytes_arch() which will use the architecture-specific hardware random number generator if it is present. Change get_random_bytes() to not use the HW RNG, even if it is avaiable. The reason for this is that the hw random number generator is fast (if it is present), but it requires that we trust the hardware manufacturer to have not put in a back door. (For example, an increasing counter encrypted by an AES key known to the NSA.) It's unlikely that Intel (for example) was paid off by the US Government to do this, but it's impossible for them to prove otherwise --- especially since Bull Mountain is documented to use AES as a whitener. Hence, the output of an evil, trojan-horse version of RDRAND is statistically indistinguishable from an RDRAND implemented to the specifications claimed by Intel. Short of using a tunnelling electronic microscope to reverse engineer an Ivy Bridge chip and disassembling and analyzing the CPU microcode, there's no way for us to tell for sure. Since users of get_random_bytes() in the Linux kernel need to be able to support hardware systems where the HW RNG is not present, most time-sensitive users of this interface have already created their own cryptographic RNG interface which uses get_random_bytes() as a seed. So it's much better to use the HW RNG to improve the existing random number generator, by mixing in any entropy returned by the HW RNG into /dev/random's entropy pool, but to always _use_ /dev/random's entropy pool. This way we get almost of the benefits of the HW RNG without any potential liabilities. The only benefits we forgo is the speed/performance enhancements --- and generic kernel code can't depend on depend on get_random_bytes() having the speed of a HW RNG anyway. For those places that really want access to the arch-specific HW RNG, if it is available, we provide get_random_bytes_arch(). Signed-off-by: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 02c58544b80935f5c2f5731feb12036bee4760e9 Author: Theodore Ts'o Date: Thu Jul 5 10:21:01 2012 -0400 random: use the arch-specific rng in xfer_secondary_pool commit e6d4947b12e8ad947add1032dd754803c6004824 upstream. If the CPU supports a hardware random number generator, use it in xfer_secondary_pool(), where it will significantly improve things and where we can afford it. Also, remove the use of the arch-specific rng in add_timer_randomness(), since the call is significantly slower than get_cycles(), and we're much better off using it in xfer_secondary_pool() anyway. Signed-off-by: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 7ce57122ffda206201c8125d979d87031676c383 Author: Linus Torvalds Date: Wed Jul 4 11:16:01 2012 -0400 random: create add_device_randomness() interface commit a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream. Add a new interface, add_device_randomness() for adding data to the random pool that is likely to differ between two devices (or possibly even per boot). This would be things like MAC addresses or serial numbers, or the read-out of the RTC. This does *not* add any actual entropy to the pool, but it initializes the pool to different values for devices that might otherwise be identical and have very little entropy available to them (particularly common in the embedded world). [ Modified by tytso to mix in a timestamp, since there may be some variability caused by the time needed to detect/configure the hardware in question. ] Signed-off-by: Linus Torvalds Signed-off-by: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 34e7d1043dc6bbf18e764b9ce5f42edf02587a0d Author: Theodore Ts'o Date: Wed Jul 4 10:38:30 2012 -0400 random: use lockless techniques in the interrupt path commit 902c098a3663de3fa18639efbb71b6080f0bcd3c upstream. The real-time Linux folks don't like add_interrupt_randomness() taking a spinlock since it is called in the low-level interrupt routine. This also allows us to reduce the overhead in the fast path, for the random driver, which is the interrupt collection path. Signed-off-by: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 8c3711e7d2a86b6ca4fd8344c18209606d4a8a21 Author: Theodore Ts'o Date: Mon Jul 2 07:52:16 2012 -0400 random: make 'add_interrupt_randomness()' do something sane commit 775f4b297b780601e61787b766f306ed3e1d23eb upstream. We've been moving away from add_interrupt_randomness() for various reasons: it's too expensive to do on every interrupt, and flooding the CPU with interrupts could theoretically cause bogus floods of entropy from a somewhat externally controllable source. This solves both problems by limiting the actual randomness addition to just once a second or after 64 interrupts, whicever comes first. During that time, the interrupt cycle data is buffered up in a per-cpu pool. Also, we make sure the the nonblocking pool used by urandom is initialized before we start feeding the normal input pool. This assures that /dev/urandom is returning unpredictable data as soon as possible. (Based on an original patch by Linus, but significantly modified by tytso.) Tested-by: Eric Wustrow Reported-by: Eric Wustrow Reported-by: Nadia Heninger Reported-by: Zakir Durumeric Reported-by: J. Alex Halderman . Signed-off-by: Linus Torvalds Signed-off-by: "Theodore Ts'o" [PG: minor adjustment required since .34 doesn't have f9e4989eb8 which renames "status" to "random" in kernel/irq/handle.c ] Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit cf3062b3d5bb6625f570abcc030ab24792aeee35 Author: Mathieu Desnoyers Date: Thu Apr 12 12:49:12 2012 -0700 drivers/char/random.c: fix boot id uniqueness race commit 44e4360fa3384850d65dd36fb4e6e5f2f112709b upstream. /proc/sys/kernel/random/boot_id can be read concurrently by userspace processes. If two (or more) user-space processes concurrently read boot_id when sysctl_bootid is not yet assigned, a race can occur making boot_id differ between the reads. Because the whole point of the boot id is to be unique across a kernel execution, fix this by protecting this operation with a spinlock. Given that this operation is not frequently used, hitting the spinlock on each call should not be an issue. Signed-off-by: Mathieu Desnoyers Cc: "Theodore Ts'o" Cc: Matt Mackall Signed-off-by: Eric Dumazet Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 16e9e30d537f8385f8d55268530f36775f2eb233 Author: H. Peter Anvin Date: Mon Jan 16 11:23:29 2012 -0800 random: Adjust the number of loops when initializing commit 2dac8e54f988ab58525505d7ef982493374433c3 upstream. When we are initializing using arch_get_random_long() we only need to loop enough times to touch all the bytes in the buffer; using poolwords for that does twice the number of operations necessary on a 64-bit machine, since in the random number generator code "word" means 32 bits. Signed-off-by: H. Peter Anvin Cc: "Theodore Ts'o" Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 0ce367938392bcdc37481aa39f0da26b297903b4 Author: Theodore Ts'o Date: Thu Dec 22 16:28:01 2011 -0500 random: Use arch-specific RNG to initialize the entropy store commit 3e88bdff1c65145f7ba297ccec69c774afe4c785 upstream. If there is an architecture-specific random number generator (such as RDRAND for Intel architectures), use it to initialize /dev/random's entropy stores. Even in the worst case, if RDRAND is something like AES(NSA_KEY, counter++), it won't hurt, and it will definitely help against any other adversaries. Signed-off-by: "Theodore Ts'o" Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu Signed-off-by: H. Peter Anvin Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit dd102079e9ad33c7f5377a22f6e381e36d566c61 Author: Linus Torvalds Date: Thu Dec 22 11:36:22 2011 -0800 random: Use arch_get_random_int instead of cycle counter if avail commit cf833d0b9937874b50ef2867c4e8badfd64948ce upstream. We still don't use rdrand in /dev/random, which just seems stupid. We accept the *cycle*counter* as a random input, but we don't accept rdrand? That's just broken. Sure, people can do things in user space (write to /dev/random, use rdrand in addition to /dev/random themselves etc etc), but that *still* seems to be a particularly stupid reason for saying "we shouldn't bother to try to do better in /dev/random". And even if somebody really doesn't trust rdrand as a source of random bytes, it seems singularly stupid to trust the cycle counter *more*. So I'd suggest the attached patch. I'm not going to even bother arguing that we should add more bits to the entropy estimate, because that's not the point - I don't care if /dev/random fills up slowly or not, I think it's just stupid to not use the bits we can get from rdrand and mix them into the strong randomness pool. Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGbK18zwKDwvwqnravA84A@mail.gmail.com Acked-by: "David S. Miller" Acked-by: "Theodore Ts'o" Acked-by: Herbert Xu Cc: Matt Mackall Cc: Tony Luck Cc: Eric Dumazet Signed-off-by: H. Peter Anvin Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit f81e060a8616e0d5ee985f3a2fd5f7534d6fc94f Author: Luck, Tony Date: Wed Nov 16 10:50:56 2011 -0800 fix typo/thinko in get_random_bytes() commit bd29e568a4cb6465f6e5ec7c1c1f3ae7d99cbec1 upstream. If there is an architecture-specific random number generator we use it to acquire randomness one "long" at a time. We should put these random words into consecutive words in the result buffer - not just overwrite the first word again and again. Signed-off-by: Tony Luck Acked-by: H. Peter Anvin Acked-by: Thomas Gleixner Signed-off-by: Linus Torvalds Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 67c19301dcdd94661b8d07e7f6bc67622f1d3338 Author: H. Peter Anvin Date: Sun Jul 31 14:02:19 2011 -0700 x86, random: Verify RDRAND functionality and allow it to be disabled commit 49d859d78c5aeb998b6936fcb5f288f78d713489 upstream. If the CPU declares that RDRAND is available, go through a guranteed reseed sequence, and make sure that it is actually working (producing data.) If it does not, disable the CPU feature flag. Allow RDRAND to be disabled on the command line (as opposed to at compile time) for a user who has special requirements with regards to random numbers. Signed-off-by: H. Peter Anvin Cc: Matt Mackall Cc: Herbert Xu Cc: "Theodore Ts'o" Signed-off-by: Willy Tarreau commit 5e6321d123dc795ce37b070ba631f9937ca6924d Author: H. Peter Anvin Date: Sun Jul 31 13:59:29 2011 -0700 x86, random: Architectural inlines to get random integers with RDRAND commit 628c6246d47b85f5357298601df2444d7f4dd3fd upstream. Architectural inlines to get random ints and longs using the RDRAND instruction. Intel has introduced a new RDRAND instruction, a Digital Random Number Generator (DRNG), which is functionally an high bandwidth entropy source, cryptographic whitener, and integrity monitor all built into hardware. This enables RDRAND to be used directly, bypassing the kernel random number pool. For technical documentation, see: http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/ In this patch, this is *only* used for the nonblocking random number pool. RDRAND is a nonblocking source, similar to our /dev/urandom, and is therefore not a direct replacement for /dev/random. The architectural hooks presented in the previous patch only feed the kernel internal users, which only use the nonblocking pool, and so this is not a problem. Since this instruction is available in userspace, there is no reason to have a /dev/hw_rng device driver for the purpose of feeding rngd. This is especially so since RDRAND is a nonblocking source, and needs additional whitening and reduction (see the above technical documentation for details) in order to be of "pure entropy source" quality. The CONFIG_EXPERT compile-time option can be used to disable this use of RDRAND. Signed-off-by: H. Peter Anvin Originally-by: Fenghua Yu Cc: Matt Mackall Cc: Herbert Xu Cc: "Theodore Ts'o" Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 3ddb94a16e0a9ae80a138f0cc4977f8093357e6e Author: H. Peter Anvin Date: Sun Jul 31 13:54:50 2011 -0700 random: Add support for architectural random hooks commit 63d77173266c1791f1553e9e8ccea65dc87c4485 upstream. Add support for architecture-specific hooks into the kernel-directed random number generator interfaces. This patchset does not use the architecture random number generator interfaces for the userspace-directed interfaces (/dev/random and /dev/urandom), thus eliminating the need to distinguish between them based on a pool pointer. Changes in version 3: - Moved the hooks from extract_entropy() to get_random_bytes(). - Changes the hooks to inlines. Signed-off-by: H. Peter Anvin Cc: Fenghua Yu Cc: Matt Mackall Cc: Herbert Xu Cc: "Theodore Ts'o" [PG: .34 already had "unsigned int ret" in get_random_int, so the diffstat here is slightly smaller than that of 63d7717. ] Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 165359736411775606024738396bb2797872360b Author: Kees Cook Date: Tue May 24 16:29:26 2011 -0700 x86, cpufeature: Update CPU feature RDRND to RDRAND commit 7ccafc5f75c87853f3c49845d5a884f2376e03ce upstream. The Intel manual changed the name of the CPUID bit to match the instruction name. We should follow suit for sanity's sake. (See Intel SDM Volume 2, Table 3-20 "Feature Information Returned in the ECX Register".) [ hpa: we can only do this at this time because there are currently no CPUs with this feature on the market, hence this is pre-hardware enabling. However, Cc:'ing stable so that stable can present a consistent ABI. ] Signed-off-by: Kees Cook Link: http://lkml.kernel.org/r/20110524232926.GA27728@outflux.net Signed-off-by: H. Peter Anvin Cc: Fenghua Yu Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit e5f4484418cd3aca72cd050ced9bda9520404694 Author: H. Peter Anvin Date: Wed Jul 7 10:15:12 2010 -0700 x86, cpu: Add CPU flags for F16C and RDRND commit 24da9c26f3050aee9314ec09930a24c80fe76352 upstream. Add support for the newly documented F16C (16-bit floating point conversions) and RDRND (RDRAND instruction) CPU feature flags. Signed-off-by: H. Peter Anvin Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit c933f060037d4ea361103da2816235652662f34d Author: Matt Mackall Date: Thu May 20 19:55:01 2010 +1000 random: simplify fips mode commit e954bc91bdd4bb08b8325478c5004b24a23a3522 upstream. Rather than dynamically allocate 10 bytes, move it to static allocation. This saves space and avoids the need for error checking. Signed-off-by: Matt Mackall Signed-off-by: Herbert Xu [PG: adding this simplifies required updates to random for .34 stable] Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit 9f12e204003c5a66b2cee9d397c8f14892852a92 Author: Jarod Wilson Date: Mon Feb 21 21:43:10 2011 +1100 random: update interface comments to reflect reality commit 442a4fffffa26fc3080350b4d50172f7589c3ac2 upstream. At present, the comment header in random.c makes no mention of add_disk_randomness, and instead, suggests that disk activity adds to the random pool by way of add_interrupt_randomness, which appears to not have been the case since sometime prior to the existence of git, and even prior to bitkeeper. Didn't look any further back. At least, as far as I can tell, there are no storage drivers setting IRQF_SAMPLE_RANDOM, which is a requirement for add_interrupt_randomness to trigger, so the only way for a disk to contribute entropy is by way of add_disk_randomness. Update comments accordingly, complete with special mention about solid state drives being a crappy source of entropy (see e2e1a148bc for reference). Signed-off-by: Jarod Wilson Acked-by: Matt Mackall Signed-off-by: Herbert Xu Signed-off-by: Willy Tarreau commit 929e8ba3736bab86f4f3fec14a41f581fe8f6e50 Author: Richard Kennedy Date: Sat Jul 31 19:58:00 2010 +0800 random: Reorder struct entropy_store to remove padding on 64bits commit 4015d9a865e3bcc42d88bedc8ce1551000bab664 upstream. Re-order structure entropy_store to remove 8 bytes of padding on 64 bit builds, so shrinking this structure from 72 to 64 bytes and allowing it to fit into one cache line. Signed-off-by: Richard Kennedy Signed-off-by: Matt Mackall Signed-off-by: Herbert Xu Signed-off-by: Willy Tarreau commit 265c3d3918028d673fdc484b197436a7eb34a7a0 Author: Jason Baron Date: Wed Apr 25 16:01:47 2012 -0700 epoll: clear the tfile_check_list on -ELOOP commit 13d518074a952d33d47c428419693f63389547e9 upstream. An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent circular epoll dependencies from being created. However, in that case we do not properly clear the 'tfile_check_list'. Thus, add a call to clear_tfile_check_list() for the -ELOOP case. Signed-off-by: Jason Baron Reported-by: Yurij M. Plotnikov Cc: Nelson Elhage Cc: Davide Libenzi Tested-by: Alexandra N. Kossovsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit bf2acb54cd4a7569f8d222943051b489030fcd12 Author: Jason Baron Date: Fri Mar 16 16:34:03 2012 -0400 Don't limit non-nested epoll paths commit 93dc6107a76daed81c07f50215fa6ae77691634f upstream. Commit 28d82dc1c4ed ("epoll: limit paths") that I did to limit the number of possible wakeup paths in epoll is causing a few applications to longer work (dovecot for one). The original patch is really about limiting the amount of epoll nesting (since epoll fds can be attached to other fds). Thus, we probably can allow an unlimited number of paths of depth 1. My current patch limits it at 1000. And enforce the limits on paths that have a greater depth. This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578 Signed-off-by: Jason Baron Cc: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0dc301f09ada92857af9356d8d2bcb20e016ab13 Author: Jason Baron Date: Thu Jan 12 17:17:43 2012 -0800 epoll: limit paths commit 28d82dc1c4edbc352129f97f4ca22624d1fe61de upstream. The current epoll code can be tickled to run basically indefinitely in both loop detection path check (on ep_insert()), and in the wakeup paths. The programs that tickle this behavior set up deeply linked networks of epoll file descriptors that cause the epoll algorithms to traverse them indefinitely. A couple of these sample programs have been previously posted in this thread: https://lkml.org/lkml/2011/2/25/297. To fix the loop detection path check algorithms, I simply keep track of the epoll nodes that have been already visited. Thus, the loop detection becomes proportional to the number of epoll file descriptor and links. This dramatically decreases the run-time of the loop check algorithm. In one diabolical case I tried it reduced the run-time from 15 mintues (all in kernel time) to .3 seconds. Fixing the wakeup paths could be done at wakeup time in a similar manner by keeping track of nodes that have already been visited, but the complexity is harder, since there can be multiple wakeups on different cpus...Thus, I've opted to limit the number of possible wakeup paths when the paths are created. This is accomplished, by noting that the end file descriptor points that are found during the loop detection pass (from the newly added link), are actually the sources for wakeup events. I keep a list of these file descriptors and limit the number and length of these paths that emanate from these 'source file descriptors'. In the current implemetation I allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of length 4 and 10 of length 5. Note that it is sufficient to check the 'source file descriptors' reachable from the newly added link, since no other 'source file descriptors' will have newly added links. This allows us to check only the wakeup paths that may have gotten too long, and not re-check all possible wakeup paths on the system. In terms of the path limit selection, I think its first worth noting that the most common case for epoll, is probably the model where you have 1 epoll file descriptor that is monitoring n number of 'source file descriptors'. In this case, each 'source file descriptor' has a 1 path of length 1. Thus, I believe that the limits I'm proposing are quite reasonable and in fact may be too generous. Thus, I'm hoping that the proposed limits will not prevent any workloads that currently work to fail. In terms of locking, I have extended the use of the 'epmutex' to all epoll_ctl add and remove operations. Currently its only used in a subset of the add paths. I need to hold the epmutex, so that we can correctly traverse a coherent graph, to check the number of paths. I believe that this additional locking is probably ok, since its in the setup/teardown paths, and doesn't affect the running paths, but it certainly is going to add some extra overhead. Also, worth noting is that the epmuex was recently added to the ep_ctl add operations in the initial path loop detection code using the argument that it was not on a critical path. Another thing to note here, is the length of epoll chains that is allowed. Currently, eventpoll.c defines: /* Maximum number of nesting allowed inside epoll sets */ This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS + 1). However, this limit is currently only enforced during the loop check detection code, and only when the epoll file descriptors are added in a certain order. Thus, this limit is currently easily bypassed. The newly added check for wakeup paths, stricly limits the wakeup paths to a length of 5, regardless of the order in which ep's are linked together. Thus, a side-effect of the new code is a more consistent enforcement of the graph depth. Thus far, I've tested this, using the sample programs previously mentioned, which now either return quickly or return -EINVAL. I've also testing using the piptest.c epoll tester, which showed no difference in performance. I've also created a number of different epoll networks and tested that they behave as expectded. I believe this solves the original diabolical test cases, while still preserving the sane epoll nesting. Signed-off-by: Jason Baron Cc: Nelson Elhage Cc: Davide Libenzi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d42e3563c6413824d45c1d5023114da585d6e70d Author: Oleg Nesterov Date: Fri Feb 24 20:07:29 2012 +0100 epoll: ep_unregister_pollwait() can use the freed pwq->whead commit 971316f0503a5c50633d07b83b6db2f15a3a5b00 upstream. signalfd_cleanup() ensures that ->signalfd_wqh is not used, but this is not enough. eppoll_entry->whead still points to the memory we are going to free, ep_unregister_pollwait()->remove_wait_queue() is obviously unsafe. Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL, change ep_unregister_pollwait() to check pwq->whead != NULL under rcu_read_lock() before remove_wait_queue(). We add the new helper, ep_remove_wait_queue(), for this. This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because ->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand. ep_unregister_pollwait()->remove_wait_queue() can play with already freed and potentially reused ->sighand, but this is fine. This memory must have the valid ->signalfd_wqh until rcu_read_unlock(). Reported-by: Maxime Bizon Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit ddfd81ee31234ffc2a71614794e945364bff3fda Author: Oleg Nesterov Date: Fri Feb 24 20:07:11 2012 +0100 epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream. This patch is intentionally incomplete to simplify the review. It ignores ep_unregister_pollwait() which plays with the same wqh. See the next change. epoll assumes that the EPOLL_CTL_ADD'ed file controls everything f_op->poll() needs. In particular it assumes that the wait queue can't go away until eventpoll_release(). This is not true in case of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand which is not connected to the file. This patch adds the special event, POLLFREE, currently only for epoll. It expects that init_poll_funcptr()'ed hook should do the necessary cleanup. Perhaps it should be defined as EPOLLFREE in eventpoll. __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if ->signalfd_wqh is not empty, we add the new signalfd_cleanup() helper. ep_poll_callback(POLLFREE) simply does list_del_init(task_list). This make this poll entry inconsistent, but we don't care. If you share epoll fd which contains our sigfd with another process you should blame yourself. signalfd is "really special". I simply do not know how we can define the "right" semantics if it used with epoll. The main problem is, epoll calls signalfd_poll() once to establish the connection with the wait queue, after that signalfd_poll(NULL) returns the different/inconsistent results depending on who does EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd has nothing to do with the file, it works with the current thread. In short: this patch is the hack which tries to fix the symptoms. It also assumes that nobody can take tasklist_lock under epoll locks, this seems to be true. Note: - we do not have wake_up_all_poll() but wake_up_poll() is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE. - signalfd_cleanup() uses POLLHUP along with POLLFREE, we need a couple of simple changes in eventpoll.c to make sure it can't be "lost". Reported-by: Maxime Bizon Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d34e9e0a46951e9b33a48ceb009aecf8f7724699 Author: Dan Carpenter Date: Sat Jun 9 19:08:25 2012 +0300 mtd: cafe_nand: fix an & vs | mistake commit 48f8b641297df49021093763a3271119a84990a2 upstream. The intent here was clearly to set result to true if the 0x40000000 flag was set. But instead there was a | vs & typo and we always set result to true. Artem: check the spec at wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf and this fix looks correct. Signed-off-by: Dan Carpenter Signed-off-by: Artem Bityutskiy Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a9aed3ff8217d3b95825277799ca1f44ff08e301 Author: Nikola Pajkovsky Date: Wed Aug 15 00:38:08 2012 +0200 udf: fix retun value on error path in udf_load_logicalvol commit 68766a2edcd5cd744262a70a2f67a320ac944760 upstream. In case we detect a problem and bail out, we fail to set "ret" to a nonzero value, and udf_load_logicalvol will mistakenly report success. Signed-off-by: Nikola Pajkovsky Signed-off-by: Jan Kara Signed-off-by: Willy Tarreau commit e240873cb4a9fd18de60a817100a96fe670d4359 Author: Jan Kara Date: Wed Jun 27 21:23:07 2012 +0200 udf: Fortify loading of sparing table commit 1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream. Add sanity checks when loading sparing table from disk to avoid accessing unallocated memory or writing to it. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3eaebbc174f2804441bc0b4cfdf2f864cbf39f57 Author: Paul E. McKenney Date: Fri Apr 13 03:35:13 2012 +0000 sparc64: Eliminate obsolete __handle_softirq() function commit 3d3eeb2ef26112a200785e5fca58ec58dd33bf1e upstream. The invocation of softirq is now handled by irq_exit(), so there is no need for sparc64 to invoke it on the trap-return path. In fact, doing so is a bug because if the trap occurred in the idle loop, this invocation can result in lockdep-RCU failures. The problem is that RCU ignores idle CPUs, and the sparc64 trap-return path to the softirq handlers fails to tell RCU that the CPU must be considered non-idle while those handlers are executing. This means that RCU is ignoring any RCU read-side critical sections in those handlers, which in turn means that RCU-protected data can be yanked out from under those read-side critical sections. The shiny new lockdep-RCU ability to detect RCU read-side critical sections that RCU is ignoring located this problem. The fix is straightforward: Make sparc64 stop manually invoking the softirq handlers. Reported-by: Meelis Roos Suggested-by: David Miller Signed-off-by: Paul E. McKenney Tested-by: Meelis Roos Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit b655a07ff73516c3e7d588ef65f17b96953dca00 Author: Dan Carpenter Date: Sat Mar 24 10:52:50 2012 +0300 x86, tls: Off by one limit check commit 8f0750f19789cf352d7e24a6cc50f2ab1b4f1372 upstream. These are used as offsets into an array of GDT_ENTRY_TLS_ENTRIES members so GDT_ENTRY_TLS_ENTRIES is one past the end of the array. Signed-off-by: Dan Carpenter Link: http://lkml.kernel.org/r/20120324075250.GA28258@elgon.mountain Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9b1aa953f40336f8b31390c8bcafc9352ce8806e Author: Konrad Rzeszutek Wilk Date: Wed May 30 18:23:56 2012 -0400 x86, amd, xen: Avoid NULL pointer paravirt references commit 1ab46fd319bcf1fcd9fb6311727d532b580e4eba upstream. Stub out MSR methods that aren't actually needed. This fixes a crash as Xen Dom0 on AMD Trinity systems. A bigger patch should be added to remove the paravirt machinery completely for the methods which apparently have no users! Reported-by: Andre Przywara Link: http://lkml.kernel.org/r/20120530222356.GA28417@andromeda.dapyr.net Signed-off-by: H. Peter Anvin Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit f2066a1386a3df5a1fae67797c9408e3d57e41cd Author: David Vrabel Date: Thu Apr 26 19:44:06 2012 +0100 xen: correctly check for pending events when restoring irq flags commit 7eb7ce4d2e8991aff4ecb71a81949a907ca755ac upstream. In xen_restore_fl_direct(), xen_force_evtchn_callback() was being called even if no events were pending. This resulted in (depending on workload) about a 100 times as many xen_version hypercalls as necessary. Fix this by correcting the sense of the conditional jump. This seems to give a significant performance benefit for some workloads. There is some subtle tricksy "..since the check here is trying to check both pending and masked in a single cmpw, but I think this is correct. It will call check_events now only when the combined mask+pending word is 0x0001 (aka unmasked, pending)." (Ian) Acked-by: Ian Campbell Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 642f7c50ec1863ca67367afbe0d95ab0646fa739 Author: Eric Dumazet Date: Fri Dec 2 23:41:42 2011 +0000 tcp: drop SYN+FIN messages commit fdf5af0daf8019cec2396cdef8fb042d80fe71fa upstream. Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his linux machines to their limits. Dont call conn_request() if the TCP flags includes SYN flag Reported-by: Denys Fedoryshchenko Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 20b24cf258bdbf1d6f1fc634d24d13c26be14f6f Author: Willy Tarreau Date: Thu May 17 11:14:14 2012 +0000 tcp: do_tcp_sendpages() must try to push data out on oom conditions commit bad115cfe5b509043b684d3a007ab54b80090aa1 upstream. Since recent changes on TCP splicing (starting with commits 2f533844 "tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp: tcp_sendpages() should call tcp_push() once"), I started seeing massive stalls when forwarding traffic between two sockets using splice() when pipe buffers were larger than socket buffers. Latest changes (net: netdev_alloc_skb() use build_skb()) made the problem even more apparent. The reason seems to be that if do_tcp_sendpages() fails on out of memory condition without being able to send at least one byte, tcp_push() is not called and the buffers cannot be flushed. After applying the attached patch, I cannot reproduce the stalls at all and the data rate it perfectly stable and steady under any condition which previously caused the problem to be permanent. The issue seems to have been there since before the kernel migrated to git, which makes me think that the stalls I occasionally experienced with tux during stress-tests years ago were probably related to the same issue. This issue was first encountered on 3.0.31 and 3.2.17, so please backport to -stable. Signed-off-by: Willy Tarreau Acked-by: Eric Dumazet Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit ccfa22d136750df3f822dcc8488acf653aed69ca Author: Émeric Maschino Date: Mon Jan 9 20:55:10 2012 -0800 ia64: Add accept4() syscall commit 65cc21b4523e94d5640542a818748cd3be8cd6b4 upstream. While debugging udev > 170 failure on Debian Wheezy (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=648325), it appears that the issue was in fact due to missing accept4() in ia64. This patch simply adds accept4() to ia64. Signed-off-by: Émeric Maschino Signed-off-by: Tony Luck Backported-by: Dennis Schridde Signed-off-by: Willy Tarreau commit b49940dcf5f37e5a39c4134f498ca02d13fcac9c Author: Mathias Krause Date: Wed Aug 15 11:31:54 2012 +0000 dccp: check ccid before dereferencing commit 276bdb82dedb290511467a5a4fdbe9f0b52dce6f upstream. ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with a NULL ccid pointer leading to a NULL pointer dereference. This could lead to a privilege escalation if the attacker is able to map page 0 and prepare it with a fake ccid_ops pointer. Signed-off-by: Mathias Krause Cc: Gerrit Renker Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c2b32573f11e15750f7d065c78387c55c785aee2 Author: Mel Gorman Date: Mon Jul 23 12:16:19 2012 +0100 PARISC: Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts commit bba3d8c3b3c0f2123be5bc687d1cddc13437c923 upstream. The following build error occured during a parisc build with swap-over-NFS patches applied. net/core/sock.c:274:36: error: initializer element is not constant net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks') net/core/sock.c:274:36: error: initializer element is not constant Dave Anglin says: > Here is the line in sock.i: > > struct static_key memalloc_socks = ((struct static_key) { .enabled = > ((atomic_t) { (0) }) }); The above line contains two compound literals. It also uses a designated initializer to initialize the field enabled. A compound literal is not a constant expression. The location of the above statement isn't fully clear, but if a compound literal occurs outside the body of a function, the initializer list must consist of constant expressions. Reported-by: Fengguang Wu Signed-off-by: Mel Gorman Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9925bba99f8b1533643e92409de9c6d5760ac40d Author: Jan Kara Date: Mon Sep 3 16:50:42 2012 +0200 ext3: Fix fdatasync() for files with only i_size changes commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 592c7e5eaab53db5447663eb9c2ded5fa61f6ca3 Author: Jan Kara Date: Wed Sep 5 15:48:23 2012 +0200 udf: Fix data corruption for files in ICB commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream. When a file is stored in ICB (inode), we overwrite part of the file, and the page containing file's data is not in page cache, we end up corrupting file's data by overwriting them with zeros. The problem is we use simple_write_begin() which simply zeroes parts of the page which are not written to. The problem has been introduced by be021ee4 (udf: convert to new aops). Fix the problem by providing a ->write_begin function which makes the page properly uptodate. Reported-by: Ian Abbott Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9eca376d4c4e96f144a182059dd8f7b492cd662d Author: Dave Jones Date: Thu Sep 6 12:01:00 2012 -0400 Remove user-triggerable BUG from mpol_to_str commit 80de7c3138ee9fd86a98696fd2cf7ad89b995d0a upstream. Trivially triggerable, found by trinity: kernel BUG at mm/mempolicy.c:2546! Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670) Call Trace: show_numa_map+0xd5/0x450 show_pid_numa_map+0x13/0x20 traverse+0xf2/0x230 seq_read+0x34b/0x3e0 vfs_read+0xac/0x180 sys_pread64+0xa2/0xc0 system_call_fastpath+0x1a/0x1f RIP: mpol_to_str+0x156/0x360 Signed-off-by: Dave Jones Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 65525984d5eebb61afd53fa5dabe561cae5ee88d Author: Sven Schnelle Date: Fri Aug 17 21:43:43 2012 +0200 USB: CDC ACM: Fix NULL pointer dereference commit 99f347caa4568cb803862730b3b1f1942639523f upstream. If a device specifies zero endpoints in its interface descriptor, the kernel oopses in acm_probe(). Even though that's clearly an invalid descriptor, we should test wether we have all endpoints. This is especially bad as this oops can be triggered by just plugging a USB device in. Signed-off-by: Sven Schnelle Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 80bc5ec142033266d5511bb299a738f816d520b9 Author: Stephen M. Cameron Date: Tue Aug 21 16:15:49 2012 -0700 cciss: fix incorrect scsi status reporting commit b0cf0b118c90477d1a6811f2cd2307f6a5578362 upstream. Delete code which sets SCSI status incorrectly as it's already been set correctly above this incorrect code. The bug was introduced in 2009 by commit b0e15f6db111 ("cciss: fix typo that causes scsi status to be lost.") Signed-off-by: Stephen M. Cameron Reported-by: Roel van Meer Tested-by: Roel van Meer Cc: Jens Axboe Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c01b0313216b801d69913d9f704725bd6d784b3b Author: J. Bruce Fields Date: Mon Aug 20 16:04:40 2012 -0400 svcrpc: sends on closed socket should stop immediately commit f06f00a24d76e168ecb38d352126fd203937b601 upstream. svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply. However, the XPT_CLOSE won't be acted on immediately. Meanwhile other threads could send further replies before the socket is really shut down. This can manifest as data corruption: for example, if a truncated read reply is followed by another rpc reply, that second reply will look to the client like further read data. Symptoms were data corruption preceded by svc_tcp_sendto logging something like kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket Reported-by: Malahal Naineni Tested-by: Malahal Naineni Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 70d79ae6e6c6ed674d9c2f3f57671227660356b6 Author: J. Bruce Fields Date: Fri Aug 17 17:31:53 2012 -0400 svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping commit d10f27a750312ed5638c876e4bd6aa83664cccd8 upstream. The rpc server tries to ensure that there will be room to send a reply before it receives a request. It does this by tracking, in xpt_reserved, an upper bound on the total size of the replies that is has already committed to for the socket. Currently it is adding in the estimate for a new reply *before* it checks whether there is space available. If it finds that there is not space, it then subtracts the estimate back out. This may lead the subsequent svc_xprt_enqueue to decide that there is space after all. The results is a svc_recv() that will repeatedly return -EAGAIN, causing server threads to loop without doing any actual work. Reported-by: Michael Tokarev Tested-by: Michael Tokarev Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 14f4f45ab205170935d12bcc435f2e2a849eb3f2 Author: bjschuma@gmail.com Date: Wed Aug 8 13:57:10 2012 -0400 NFS: Alias the nfs module to nfs4 commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream. This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker Signed-off-by: Trond Myklebust [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0b32b457c3c72d474e55a73244f2cb06ae1d1aac Author: Trond Myklebust Date: Mon Aug 20 12:42:15 2012 -0400 NFSv3: Ensure that do_proc_get_root() reports errors correctly commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream. If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 62c9021a6ca35d1ce2687c0e3da0e010fa53df41 Author: Al Viro Date: Mon Aug 20 15:28:00 2012 +0100 vfs: missed source of ->f_pos races commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream. compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e3a5cb6ebbd644b09ef0a5a71a2de87b5e37737e Author: Wang Xingchao Date: Mon Aug 13 14:11:10 2012 +0800 ALSA: hda - fix Copyright debug message commit 088c820b732dbfd515fc66d459d5f5777f79b406 upstream. As spec said, 1 indicates no copyright is asserted. Signed-off-by: Wang Xingchao Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit ea8af2b70128f9384f7a19ab037f91f5083aaac2 Author: Mark Ferrell Date: Tue Jul 24 14:15:13 2012 -0500 usb: serial: mos7840: Fixup mos7840_chars_in_buffer() commit 5c263b92f828af6a8cf54041db45ceae5af8f2ab upstream. * Use the buffer content length as opposed to the total buffer size. This can be a real problem when using the mos7840 as a usb serial-console as all kernel output is truncated during boot. Signed-off-by: Mark Ferrell Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5b61f615f559fe37b4e0a5d4cb2942b231c7dcc8 Author: Sarah Sharp Date: Mon Jul 23 16:06:08 2012 -0700 xhci: Increase reset timeout for Renesas 720201 host. commit 22ceac191211cf6688b1bf6ecd93c8b6bf80ed9b upstream. The NEC/Renesas 720201 xHCI host controller does not complete its reset within 250 milliseconds. In fact, it takes about 9 seconds to reset the host controller, and 1 second for the host to be ready for doorbell rings. Extend the reset and CNR polling timeout to 10 seconds each. This patch should be backported to kernels as old as 2.6.31, that contain the commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci: BIOS handoff and HW initialization." Signed-off-by: Sarah Sharp Reported-by: Edwin Klein Mentink Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 97744669e0e6c31e76182010f25451ccda5abe86 Author: Zach Brown Date: Tue Jul 24 12:10:11 2012 -0700 fuse: verify all ioctl retry iov elements commit fb6ccff667712c46b4501b920ea73a326e49626a upstream. Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 663464d6903019adbcec64129acb093f95fb0141 Author: Xiao Guangrong Date: Tue Jul 31 16:45:52 2012 -0700 mm: mmu_notifier: fix freed page still mapped in secondary MMU commit 3ad3d901bbcfb15a5e4690e55350db0899095a68 upstream. mmu_notifier_release() is called when the process is exiting. It will delete all the mmu notifiers. But at this time the page belonging to the process is still present in page tables and is present on the LRU list, so this race will happen: CPU 0 CPU 1 mmu_notifier_release: try_to_unmap: hlist_del_init_rcu(&mn->hlist); ptep_clear_flush_notify: mmu nofifler not found free page !!!!!! /* * At the point, the page has been * freed, but it is still mapped in * the secondary MMU. */ mn->ops->release(mn, mm); Then the box is not stable and sometimes we can get this bug: [ 738.075923] BUG: Bad page state in process migrate-perf pfn:03bec [ 738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping: (null) index:0x8076 [ 738.075936] page flags: 0x20000000000014(referenced|dirty) The same issue is present in mmu_notifier_unregister(). We can call ->release before deleting the notifier to ensure the page has been unmapped from the secondary MMU before it is freed. Signed-off-by: Xiao Guangrong Cc: Avi Kivity Cc: Marcelo Tosatti Cc: Paul Gortmaker Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 2af3af56e7d4756b21a2e0d86e4fc4e5b7f0df24 Author: Greg Pearson Date: Mon Jul 30 14:39:05 2012 -0700 pcdp: use early_ioremap/early_iounmap to access pcdp table commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream. efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP EFI system table and setup an early console for printk output. The routine uses ioremap/iounmap to setup access to the HCDP/PCDP table information. The call to ioremap is happening early in the boot process which leads to a panic on x86_64 systems: panic+0x01ca do_exit+0x043c oops_end+0x00a7 no_context+0x0119 __bad_area_nosemaphore+0x0138 bad_area_nosemaphore+0x000e do_page_fault+0x0321 page_fault+0x0020 reserve_memtype+0x02a1 __ioremap_caller+0x0123 ioremap_nocache+0x0012 efi_setup_pcdp_console+0x002b setup_arch+0x03a9 start_kernel+0x00d4 x86_64_start_reservations+0x012c x86_64_start_kernel+0x00fe This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console() with calls to early_ioremap/early_iounmap which can be called during early boot. This patch was tested on an x86_64 prototype system which uses the HCDP/PCDP table for early console setup. Signed-off-by: Greg Pearson Acked-by: Khalid Aziz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit b7cb122de2f6b8160860f9eae533cde0f6ff8c2a Author: Darren Hart Date: Fri Jul 20 11:53:31 2012 -0700 futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream. If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, as the trinity test suite manages to do, we miss early wakeups as q.key is equal to key2 (because they are the same uaddr). We will then attempt to dereference the pi_mutex (which would exist had the futex_q been properly requeued to a pi futex) and trigger a NULL pointer dereference. Signed-off-by: Darren Hart Cc: Dave Jones Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9246af14306ceadab0feb28ff2954416970b7465 Author: Darren Hart Date: Fri Jul 20 11:53:30 2012 -0700 futex: Fix bug in WARN_ON for NULL q.pi_state commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream. The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing the address (&q.pi_state) of the pointer instead of the value (q.pi_state) of the pointer. Correct it accordingly. Signed-off-by: Darren Hart Cc: Dave Jones Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 51ed259979b8640172ccc88c848dcf9f887a9538 Author: Darren Hart Date: Fri Jul 20 11:53:29 2012 -0700 futex: Test for pi_mutex on fault in futex_wait_requeue_pi() commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream. If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test for pi_mutex != NULL before testing the owner against current and possibly unlocking it. Signed-off-by: Darren Hart Cc: Dave Jones Cc: Dan Carpenter Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9d2404c39857211adc0679945639c9cf441c8df7 Author: Takashi Iwai Date: Mon Jul 23 11:35:55 2012 +0200 ALSA: mpu401: Fix missing initialization of irq field commit bc733d495267a23ef8660220d696c6e549ce30b3 upstream. The irq field of struct snd_mpu401 is supposed to be initialized to -1. Since it's set to zero as of now, a probing error before the irq installation results in a kernel warning "Trying to free already-free IRQ 0". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821 Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 07ab14eb16c22d05eac5177effad9b6243ed46f9 Author: Colin Ian King Date: Mon Jul 30 16:06:42 2012 +0100 USB: echi-dbgp: increase the controller wait time to come out of halt. commit f96a4216e85050c0a9d41a41ecb0ae9d8e39b509 upstream. The default 10 microsecond delay for the controller to come out of halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond. This is based on emperical testing on various USB debug ports on modern machines such as a Lenovo X220i and an Ivybridge development platform that needed to wait ~450-950 microseconds. Signed-off-by: Colin Ian King Signed-off-by: Jason Wessel Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit bf9f4ca1b6b73c232c6a3f9570ae26756d50364a Author: Mathias Krause Date: Sun Jul 29 19:45:14 2012 +0000 net/tun: fix ioctl() based info leaks [ Upstream commits a117dacde0288f3ec60b6e5bcedae8fa37ee0dfc and 8bbb181308bc348e02bfdbebdedd4e4ec9d452ce ] The tun module leaks up to 36 bytes of memory by not fully initializing a structure located on the stack that gets copied to user memory by the TUNGETIFF and SIOCGIFHWADDR ioctl()s. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit fdd6b6522990c5061ec4de0003523c57ee851eb3 Author: Jiri Kosina Date: Fri Jul 27 10:38:50 2012 +0000 tcp: perform DMA to userspace only if there is a task waiting for it [ Upstream commit 59ea33a68a9083ac98515e4861c00e71efdc49a1 ] Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT") added support for receive offloading to IOAT dma engine if available. The code in tcp_rcv_established() tries to perform early DMA copy if applicable. It however does so without checking whether the userspace task is actually expecting the data in the buffer. This is not a problem under normal circumstances, but there is a corner case where this doesn't work -- and that's when MSG_TRUNC flag to recvmsg() is used. If the IOAT dma engine is not used, the code properly checks whether there is a valid ucopy.task and the socket is owned by userspace, but misses the check in the dmaengine case. This problem can be observed in real trivially -- for example 'tbench' is a good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they have been already early-copied in tcp_rcv_established() using dma engine. This patch introduces the same check we are performing in the simple iovec copy case to the IOAT case as well. It fixes the indefinite recvmsg(MSG_TRUNC) hangs. Signed-off-by: Jiri Kosina Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c8be4f3dc9ee7feecacdcd3ad7d39f70c73ab96c Author: Dan Carpenter Date: Fri Jul 27 01:46:51 2012 +0000 USB: kaweth.c: use GFP_ATOMIC under spin_lock [ Upstream commit e4c7f259c5be99dcfc3d98f913590663b0305bf8 ] The problem is that we call this with a spin lock held. The call tree is: kaweth_start_xmit() holds kaweth->device_lock. -> kaweth_async_set_rx_mode() -> kaweth_control() -> kaweth_internal_control_msg() The kaweth_internal_control_msg() function is only called from kaweth_control() which used GFP_ATOMIC for its allocations. Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 1e7991ce5fe019a594a41d92664a11f2a76cb7a3 Author: Alan Cox Date: Tue Jul 24 08:16:25 2012 +0000 wanmain: comparing array with NULL [ Upstream commit 8b72ff6484fe303e01498b58621810a114f3cf09 ] gcc really should warn about these ! Signed-off-by: Alan Cox Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a9d0acf8d157c30374af76d43e7f05b5b108be0c Author: Paul Moore Date: Tue Jul 17 11:07:47 2012 +0000 cipso: don't follow a NULL pointer when setsockopt() is called [ Upstream commit 89d7ae34cdda4195809a5a987f697a517a2a3177 ] As reported by Alan Cox, and verified by Lin Ming, when a user attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL tag the kernel dies a terrible death when it attempts to follow a NULL pointer (the skb argument to cipso_v4_validate() is NULL when called via the setsockopt() syscall). This patch fixes this by first checking to ensure that the skb is non-NULL before using it to find the incoming network interface. In the unlikely case where the skb is NULL and the user attempts to add a CIPSO option with the _TAG_LOCAL tag we return an error as this is not something we want to allow. A simple reproducer, kindly supplied by Lin Ming, although you must have the CIPSO DOI #3 configure on the system first or you will be caught early in cipso_v4_validate(): #include #include #include #include #include struct local_tag { char type; char length; char info[4]; }; struct cipso { char type; char length; char doi[4]; struct local_tag local; }; int main(int argc, char **argv) { int sockfd; struct cipso cipso = { .type = IPOPT_CIPSO, .length = sizeof(struct cipso), .local = { .type = 128, .length = sizeof(struct local_tag), }, }; memset(cipso.doi, 0, 4); cipso.doi[3] = 3; sockfd = socket(AF_INET, SOCK_DGRAM, 0); #define SOL_IP 0 setsockopt(sockfd, SOL_IP, IP_OPTIONS, &cipso, sizeof(struct cipso)); return 0; } CC: Lin Ming Reported-by: Alan Cox Signed-off-by: Paul Moore Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 4c356cbc966496507ecb979e38dc5dfccb7656f7 Author: Neil Horman Date: Mon Jul 16 09:13:51 2012 +0000 sctp: Fix list corruption resulting from freeing an association on a list [ Upstream commit 2eebc1e188e9e45886ee00662519849339884d6d ] A few days ago Dave Jones reported this oops: [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP [22766.295376] CPU 0 [22766.295384] Modules linked in: [22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74 [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000 [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000, [22766.387137] Stack: [22766.387140] ffff880147c03a10 [22766.387140] ffffffffa169f2b6 [22766.387140] ffff88013ed95728 [22766.387143] 0000000000000002 [22766.387143] 0000000000000000 [22766.387143] ffff880003fad062 [22766.387144] ffff88013c120000 [22766.387144] [22766.387145] Call Trace: [22766.387145] [22766.387150] [] ? __sctp_lookup_association+0x62/0xd0 [sctp] [22766.387154] [] __sctp_lookup_association+0x86/0xd0 [sctp] [22766.387157] [] sctp_rcv+0x207/0xbb0 [sctp] [22766.387161] [] ? trace_hardirqs_off_caller+0x28/0xd0 [22766.387163] [] ? nf_hook_slow+0x133/0x210 [22766.387166] [] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387168] [] ip_local_deliver_finish+0x18d/0x4c0 [22766.387169] [] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387171] [] ip_local_deliver+0x47/0x80 [22766.387172] [] ip_rcv_finish+0x150/0x680 [22766.387174] [] ip_rcv+0x214/0x320 [22766.387176] [] __netif_receive_skb+0x7b7/0x910 [22766.387178] [] ? __netif_receive_skb+0x11c/0x910 [22766.387180] [] ? put_lock_stats.isra.25+0xe/0x40 [22766.387182] [] netif_receive_skb+0x23/0x1f0 [22766.387183] [] ? dev_gro_receive+0x139/0x440 [22766.387185] [] napi_skb_finish+0x70/0xa0 [22766.387187] [] napi_gro_receive+0xf5/0x130 [22766.387218] [] e1000_receive_skb+0x59/0x70 [e1000e] [22766.387242] [] e1000_clean_rx_irq+0x28b/0x460 [e1000e] [22766.387266] [] e1000e_poll+0x78/0x430 [e1000e] [22766.387268] [] net_rx_action+0x1aa/0x3d0 [22766.387270] [] ? account_system_vtime+0x10f/0x130 [22766.387273] [] __do_softirq+0xe0/0x420 [22766.387275] [] call_softirq+0x1c/0x30 [22766.387278] [] do_softirq+0xd5/0x110 [22766.387279] [] irq_exit+0xd5/0xe0 [22766.387281] [] do_IRQ+0x63/0xd0 [22766.387283] [] common_interrupt+0x6f/0x6f [22766.387283] [22766.387284] [22766.387285] [] ? retint_swapgs+0x13/0x1b [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 49 89 f5 66 c1 c0 08 66 39 46 02 [22766.387307] [22766.387307] RIP [22766.387311] [] sctp_assoc_is_match+0x19/0x90 [sctp] [22766.387311] RSP [22766.387142] ffffffffa16ab120 [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]--- [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt It appears from his analysis and some staring at the code that this is likely occuring because an association is getting freed while still on the sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable while a freed node corrupts part of the list. Nominally I would think that an mibalanced refcount was responsible for this, but I can't seem to find any obvious imbalance. What I did note however was that the two places where we create an association using sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths which free a newly created association after calling sctp_primitive_ASSOCIATE. sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to the aforementioned hash table. the sctp command interpreter that process side effects has not way to unwind previously processed commands, so freeing the association from the __sctp_connect or sctp_sendmsg error path would lead to a freed association remaining on this hash table. I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init, which allows us to proerly use hlist_unhashed to check if the node is on a hashlist safely during a delete. That in turn alows us to safely call sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths before freeing them, regardles of what the associations state is on the hash list. I noted, while I was doing this, that the __sctp_unhash_endpoint was using hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes pointers to make that function work properly, so I fixed that up in a simmilar fashion. I attempted to test this using a virtual guest running the SCTP_RR test from netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't able to recreate the problem prior to this fix, nor was I able to trigger the failure after (neither of which I suppose is suprising). Given the trace above however, I think its likely that this is what we hit. Signed-off-by: Neil Horman Reported-by: davej@redhat.com CC: davej@redhat.com CC: "David S. Miller" CC: Vlad Yasevich CC: Sridhar Samudrala CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 8927e0c6f3451196818ebb358ba975a8d5c403c9 Author: Brian Foster Date: Sun Jul 22 23:59:40 2012 -0400 ext4: don't let i_reserved_meta_blocks go negative commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 2302a0600a33089640e578ea1a3f734d085cae72 Author: J. Bruce Fields Date: Tue Jun 5 16:52:06 2012 -0400 nfsd4: our filesystems are normally case sensitive commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream. Actually, xfs and jfs can optionally be case insensitive; we'll handle that case in later patches. Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 7665fb0c4f0b08543a7c4c064a76d314d43a9863 Author: Chris Mason Date: Wed Jul 25 15:57:13 2012 -0400 Btrfs: call the ordered free operation without any locks held commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream. Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d4e89205393edf4296d2345143b73ebfe17a7531 Author: Lan Tianyu Date: Fri Jul 20 13:29:16 2012 +0800 ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check commit f197ac13f6eeb351b31250b9ab7d0da17434ea36 upstream. In the ac.c, power_supply_register()'s return value is not checked. As a result, the driver's add() ops may return success even though the device failed to initialize. For example, some BIOS may describe two ACADs in the same DSDT. The second ACAD device will fail to register, but ACPI driver's add() ops returns sucessfully. The ACPI device will receive ACPI notification and cause OOPS. https://bugzilla.redhat.com/show_bug.cgi?id=772730 Signed-off-by: Lan Tianyu Signed-off-by: Len Brown Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e59164b84573bc0ac55f000ffe678c4e7d22c68b Author: J. Bruce Fields Date: Mon Jul 23 15:17:17 2012 -0400 locks: fix checking of fcntl_setlease argument commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream. The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually *don't* cause problems in mainline, as of Dave Jones's commit 8d657eb3b438 "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Signed-off-by: J. Bruce Fields Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 105ef39bcf499240e6727855395c57806afe9457 Author: Hans de Goede Date: Wed Jul 4 09:18:01 2012 +0200 usbdevfs: Correct amount of data copied to user in processcompl_compat commit 2102e06a5f2e414694921f23591f072a5ba7db9f upstream. iso data buffers may have holes in them if some packets were short, so for iso urbs we should always copy the entire buffer, just like the regular processcompl does. Signed-off-by: Hans de Goede Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit b41f1574ee07920feefe2afaf5e4566b38e9c30d Author: Bart Van Assche Date: Fri Jun 29 15:34:26 2012 +0000 SCSI: Avoid dangling pointer in scsi_requeue_command() commit 940f5d47e2f2e1fa00443921a0abf4822335b54d upstream. When we call scsi_unprep_request() the command associated with the request gets destroyed and therefore drops its reference on the device. If this was the only reference, the device may get released and we end up with a NULL pointer deref when we call blk_requeue_request. Reported-by: Mike Christie Signed-off-by: Bart Van Assche Reviewed-by: Mike Christie Reviewed-by: Tejun Heo [jejb: enhance commend and add commit log for stable] Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3abcd813a4b952a78acf6017bde8f03675e77364 Author: Dan Williams Date: Thu Jun 21 23:25:32 2012 -0700 SCSI: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) commit 57fc2e335fd3c2f898ee73570dc81426c28dc7b4 upstream. Rapid ata hotplug on a libsas controller results in cases where libsas is waiting indefinitely on eh to perform an ata probe. A race exists between scsi_schedule_eh() and scsi_restart_operations() in the case when scsi_restart_operations() issues i/o to other devices in the sas domain. When this happens the host state transitions from SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and ->host_busy is non-zero so we put the eh thread to sleep even though ->host_eh_scheduled is active. Before putting the error handler to sleep we need to check if the host_state needs to return to SHOST_RECOVERY for another trip through eh. Since i/o that is released by scsi_restart_operations has been blocked for at least one eh cycle, this implementation allows those i/o's to run before another eh cycle starts to discourage hung task timeouts. Reported-by: Tom Jackson Tested-by: Tom Jackson Signed-off-by: Dan Williams Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e69e5d3d25d6b58543f782a515baeda064e2b601 Author: Dan Williams Date: Thu Jun 21 23:36:20 2012 -0700 SCSI: libsas: fix sas_discover_devices return code handling commit b17caa174a7e1fd2e17b26e210d4ee91c4c28b37 upstream. commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev() commit 19252de6 [SCSI] libsas: fix wide port hotplug issues The above commits seem to have confused the return value of sas_ex_discover_dev which is non-zero on failure and sas_ex_join_wide_port which just indicates short circuiting discovery on already established ports. The result is random discovery failures depending on configuration. Calls to sas_ex_join_wide_port are the source of the trouble as its return value is errantly assigned to 'res'. Convert it to bool and stop returning its result up the stack. Tested-by: Dan Melnic Reported-by: Dan Melnic Signed-off-by: Dan Williams Reviewed-by: Jack Wang Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 4b6a4ff5b8ef014109479e217fe1bd56b0a2b425 Author: Dan Williams Date: Thu Jun 21 23:36:15 2012 -0700 SCSI: libsas: continue revalidation commit 26f2f199ff150d8876b2641c41e60d1c92d2fb81 upstream. Continue running revalidation until no more broadcast devices are discovered. Fixes cases where re-discovery completes too early in a domain with multiple expanders with pending re-discovery events. Servicing BCNs can get backed up behind error recovery. Signed-off-by: Dan Williams Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 74b9a48f5709e2c0330d14c06977c36cc7f4b4a9 Author: Tiejun Chen Date: Wed Jul 11 14:22:46 2012 +1000 powerpc: Add "memory" attribute for mfmsr() commit b416c9a10baae6a177b4f9ee858b8d309542fbef upstream. Add "memory" attribute in inline assembly language as a compiler barrier to make sure 4.6.x GCC don't reorder mfmsr(). Signed-off-by: Tiejun Chen Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit ca9005b38723f9d143e599618bd31d6b8e6083a6 Author: roger blofeld Date: Thu Jun 21 05:27:14 2012 +0000 powerpc/ftrace: Fix assembly trampoline register usage commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream. Just like the module loader, ftrace needs to be updated to use r12 instead of r11 with newer gcc's. Signed-off-by: Roger Blofeld Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Paul Gortmaker Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 21e6f143d52caf115ecfd091c80d74aea60d2bb3 Author: David Daney Date: Thu Jul 19 09:11:14 2012 +0200 MIPS: Properly align the .data..init_task section. commit 7b1c0d26a8e272787f0f9fcc5f3e8531df3b3409 upstream. Improper alignment can lead to unbootable systems and/or random crashes. [ralf@linux-mips.org: This is a lond standing bug since 6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp. c422a10917f75fd19fa7fe070aaaa23e384dae6f (lmo) [MIPS: Clean up linker script using new linker script macros.] so dates back to 2.6.32.] Signed-off-by: David Daney Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/3881/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c96f4bffe5679af5b9f512f19248cd1d3a62feab Author: John Stultz Date: Fri Jul 13 01:21:50 2012 -0400 ntp: Fix STA_INS/DEL clearing bug commit 6b1859dba01c7d512b72d77e3fd7da8354235189 upstream. In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I introduced a bug that kept the STA_INS or STA_DEL bit from being cleared from time_status via adjtimex() without forcing STA_PLL first. Usually once the STA_INS is set, it isn't cleared until the leap second is applied, so its unlikely this affected anyone. However during testing I noticed it took some effort to cancel a leap second once STA_INS was set. Signed-off-by: John Stultz Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Richard Cochran Cc: Prarit Bhargava Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c9c9fee82590f6dc1fffc5f39679ca0f59eef692 Author: Andy Lutomirski Date: Thu Jul 5 16:00:11 2012 -0700 mm: Hold a file reference in madvise_remove commit 9ab4233dd08036fe34a89c7dc6f47a8bf2eb29eb upstream. Otherwise the code races with munmap (causing a use-after-free of the vma) or with close (causing a use-after-free of the struct file). The bug was introduced by commit 90ed52ebe481 ("[PATCH] holepunch: fix mmap_sem i_mutex deadlock") [bwh: Backported to 3.2: - Adjust context - madvise_remove() calls vmtruncate_range(), not do_fallocate()] [luto: Backported to 3.0: Adjust context] Cc: Hugh Dickins Cc: Miklos Szeredi Cc: Badari Pulavarty Cc: Nick Piggin Signed-off-by: Ben Hutchings Signed-off-by: Andy Lutomirski Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 1d56873669c1118c7703b7c632b8a714d6a6fa0a Author: Bjørn Mork Date: Mon Jul 2 10:33:14 2012 +0200 USB: cdc-wdm: fix lockup on error in wdm_read commit b086b6b10d9f182cd8d2f0dcfd7fd11edba93fc9 upstream. Clear the WDM_READ flag on empty reads to avoid running forever in an infinite tight loop, causing lockups: Jul 1 21:58:11 nemi kernel: [ 3658.898647] qmi_wwan 2-1:1.2: Unexpected error -71 Jul 1 21:58:36 nemi kernel: [ 3684.072021] BUG: soft lockup - CPU#0 stuck for 23s! [qmi.pl:12235] Jul 1 21:58:36 nemi kernel: [ 3684.072212] CPU 0 Jul 1 21:58:36 nemi kernel: [ 3684.072355] Jul 1 21:58:36 nemi kernel: [ 3684.072367] Pid: 12235, comm: qmi.pl Tainted: P O 3.5.0-rc2+ #13 LENOVO 2776LEG/2776LEG Jul 1 21:58:36 nemi kernel: [ 3684.072383] RIP: 0010:[] [] spin_unlock_irq+0x8/0xc [cdc_wdm] Jul 1 21:58:36 nemi kernel: [ 3684.072388] RSP: 0018:ffff88022dca1e70 EFLAGS: 00000282 Jul 1 21:58:36 nemi kernel: [ 3684.072393] RAX: ffff88022fc3f650 RBX: ffffffff811c56f7 RCX: 00000001000ce8c1 Jul 1 21:58:36 nemi kernel: [ 3684.072398] RDX: 0000000000000010 RSI: 000000000267d810 RDI: ffff88022fc3f650 Jul 1 21:58:36 nemi kernel: [ 3684.072403] RBP: ffff88022dca1eb0 R08: ffffffffa063578e R09: 0000000000000000 Jul 1 21:58:36 nemi kernel: [ 3684.072407] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000002 Jul 1 21:58:36 nemi kernel: [ 3684.072412] R13: 0000000000000246 R14: ffffffff00000002 R15: ffff8802281d8c88 Jul 1 21:58:36 nemi kernel: [ 3684.072418] FS: 00007f666a260700(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 Jul 1 21:58:36 nemi kernel: [ 3684.072423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 1 21:58:36 nemi kernel: [ 3684.072428] CR2: 000000000270d9d8 CR3: 000000022e865000 CR4: 00000000000007f0 Jul 1 21:58:36 nemi kernel: [ 3684.072433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 1 21:58:36 nemi kernel: [ 3684.072438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jul 1 21:58:36 nemi kernel: [ 3684.072444] Process qmi.pl (pid: 12235, threadinfo ffff88022dca0000, task ffff88022ff76380) Jul 1 21:58:36 nemi kernel: [ 3684.072448] Stack: Jul 1 21:58:36 nemi kernel: [ 3684.072458] ffffffffa063592e 0000000100020000 ffff88022fc3f650 ffff88022fc3f6a8 Jul 1 21:58:36 nemi kernel: [ 3684.072466] 0000000000000200 0000000100000000 000000000267d810 0000000000000000 Jul 1 21:58:36 nemi kernel: [ 3684.072475] 0000000000000000 ffff880212cfb6d0 0000000000000200 ffff880212cfb6c0 Jul 1 21:58:36 nemi kernel: [ 3684.072479] Call Trace: Jul 1 21:58:36 nemi kernel: [ 3684.072489] [] ? wdm_read+0x1a0/0x263 [cdc_wdm] Jul 1 21:58:36 nemi kernel: [ 3684.072500] [] ? vfs_read+0xa1/0xfb Jul 1 21:58:36 nemi kernel: [ 3684.072509] [] ? alarm_setitimer+0x35/0x64 Jul 1 21:58:36 nemi kernel: [ 3684.072517] [] ? sys_read+0x45/0x6e Jul 1 21:58:36 nemi kernel: [ 3684.072525] [] ? system_call_fastpath+0x16/0x1b Jul 1 21:58:36 nemi kernel: [ 3684.072557] Code: <66> 66 90 c3 83 ff ed 89 f8 74 16 7f 06 83 ff a1 75 0a c3 83 ff f4 The WDM_READ flag is normally cleared by wdm_int_callback before resubmitting the read urb, and set by wdm_in_callback when this urb returns with data or an error. But a crashing device may cause both a read error and cancelling all urbs. Make sure that the flag is cleared by wdm_read if the buffer is empty. We don't clear the flag on errors, as there may be pending data in the buffer which should be processed. The flag will instead be cleared on the next wdm_read call. Signed-off-by: Bjørn Mork Acked-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9fff5f61e4c8a2e7f1eaeb9bdeb0eb5c64ef01fc Author: Tyler Hicks Date: Tue Jun 12 11:17:01 2012 -0700 eCryptfs: Properly check for O_RDONLY flag before doing privileged open commit 9fe79d7600497ed8a95c3981cbe5b73ab98222f0 upstream. If the first attempt at opening the lower file read/write fails, eCryptfs will retry using a privileged kthread. However, the privileged retry should not happen if the lower file's inode is read-only because a read/write open will still be unsuccessful. The check for determining if the open should be retried was intended to be based on the access mode of the lower file's open flags being O_RDONLY, but the check was incorrectly performed. This would cause the open to be retried by the privileged kthread, resulting in a second failed open of the lower file. This patch corrects the check to determine if the open request should be handled by the privileged kthread. Signed-off-by: Tyler Hicks Reported-by: Dan Carpenter Acked-by: Dan Carpenter Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0f03f497571588cf8370009b0f7b737ea2061fda Author: Mel Gorman Date: Thu Jun 21 11:36:50 2012 +0100 stable: Allow merging of backports for serious user-visible performance issues commit eb3979f64d25120d60b9e761a4c58f70b1a02f86 upstream. Distribution kernel maintainers routinely backport fixes for users that were deemed important but not "something critical" as defined by the rules. To users of these kernels they are very serious and failing to fix them reduces the value of -stable. The problem is that the patches fixing these issues are often subtle and prone to regressions in other ways and need greater care and attention. To combat this, these "serious" backports should have a higher barrier to entry. This patch relaxes the rules to allow a distribution maintainer to merge to -stable a backported patch or small series that fixes a "serious" user-visible performance issue. They should include additional information on the user-visible bug affected and a link to the bugzilla entry if available. The same rules about the patch being already in mainline still apply. Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 6b8040006ca5c3bb62be3c8ceb377bbfd358d076 Author: Jan Kara Date: Tue Jul 10 17:58:04 2012 +0200 udf: Improve table length check to avoid possible overflow commit 57b9655d01ef057a523e810d29c37ac09b80eead upstream. When a partition table length is corrupted to be close to 1 << 32, the check for its length may overflow on 32-bit systems and we will think the length is valid. Later on the kernel can crash trying to read beyond end of buffer. Fix the check to avoid possible overflow. CC: stable@vger.kernel.org Reported-by: Ben Hutchings Signed-off-by: Jan Kara Signed-off-by: Willy Tarreau commit 65ff664e2f099decc176a151670d6ecb9b6efb60 Author: Jan Kara Date: Wed Jun 27 20:20:22 2012 +0200 udf: Avoid run away loop when partition table length is corrupted commit adee11b2085bee90bd8f4f52123ffb07882d6256 upstream. Check provided length of partition table so that (possibly maliciously) corrupted partition table cannot cause accessing data beyond current buffer. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit db8a784af98d392a51fa25bb651bf90a982044e3 Author: Pavel Shilovsky Date: Thu May 10 19:49:38 2012 +0400 fuse: fix stat call on 32 bit platforms commit 45c72cd73c788dd18c8113d4a404d6b4a01decf1 upstream. Now we store attr->ino at inode->i_ino, return attr->ino at the first time and then return inode->i_ino if the attribute timeout isn't expired. That's wrong on 32 bit platforms because attr->ino is 64 bit and inode->i_ino is 32 bit in this case. Fix this by saving 64 bit ino in fuse_inode structure and returning it every time we call getattr. Also squash attr->ino into inode->i_ino explicitly. Signed-off-by: Pavel Shilovsky Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit dcd92275b7aeb7659f185553cc38232722d29c8a Author: Steffen Rumler Date: Wed Jun 6 16:37:17 2012 +0200 powerpc: Fix kernel panic during kernel module load commit 3c75296562f43e6fbc6cddd3de948a7b3e4e9bcf upstream. This fixes a problem which can causes kernel oopses while loading a kernel module. According to the PowerPC EABI specification, GPR r11 is assigned the dedicated function to point to the previous stack frame. In the powerpc-specific kernel module loader, do_plt_call() (in arch/powerpc/kernel/module_32.c), GPR r11 is also used to generate trampoline code. This combination crashes the kernel, in the case where the compiler chooses to use a helper function for saving GPRs on entry, and the module loader has placed the .init.text section far away from the .text section, meaning that it has to generate a trampoline for functions in the .init.text section to call the GPR save helper. Because the trampoline trashes r11, references to the stack frame using r11 can cause an oops. The fix just uses GPR r12 instead of GPR r11 for generating the trampoline code. According to the statements from Freescale, this is safe from an EABI perspective. I've tested the fix for kernel 2.6.33 on MPC8541. Signed-off-by: Steffen Rumler [paulus@samba.org: reworded the description] Signed-off-by: Paul Mackerras Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 24f65f8bd5877ae4bbfde0e61acf0f0b9ea3f204 Author: James Bottomley Date: Wed May 30 09:45:39 2012 +0000 SCSI: fix scsi_wait_scan commit 1ff2f40305772b159a91c19590ee159d3a504afc upstream. Commit c751085943362143f84346d274e0011419c84202 Author: Rafael J. Wysocki Date: Sun Apr 12 20:06:56 2009 +0200 PM/Hibernate: Wait for SCSI devices scan to complete during resume Broke the scsi_wait_scan module in 2.6.30. Apparently debian still uses it so fix it and backport to stable before removing it in 3.6. The breakage is caused because the function template in include/scsi/scsi_scan.h is defined to be a nop unless SCSI is built in. That means that in the modular case (which is every distro), the scsi_wait_scan module does a simple async_synchronize_full() instead of waiting for scans. Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 6dd99cb6c03ddf85b4d30b7c8432c76644312a89 Author: Sarah Sharp Date: Tue May 8 07:09:26 2012 -0700 xhci: Reset reserved command ring TRBs on cleanup. commit 33b2831ac870d50cc8e01c317b07fb1e69c13fe1 upstream. When the xHCI driver needs to clean up memory (perhaps due to a failed register restore on resume from S3 or resume from S4), it needs to reset the number of reserved TRBs on the command ring to zero. Otherwise, several resume cycles (about 30) with a UAS device attached will continually increment the number of reserved TRBs, until all command submissions fail because there isn't enough room on the command ring. This patch should be backported to kernels as old as 2.6.32, that contain the commit 913a8a344ffcaf0b4a586d6662a2c66a7106557d "USB: xhci: Change how xHCI commands are handled." Signed-off-by: Sarah Sharp Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 1f2e32346f7d082246d96e4cedbd54b8739b8ea0 Author: Jan Kara Date: Sun Dec 18 17:37:02 2011 -0500 ext4: fix error handling on inode bitmap corruption commit acd6ad83517639e8f09a8c5525b1dccd81cd2a10 upstream. When insert_inode_locked() fails in ext4_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since the inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit ba57e38b830f274138a0349c4aef75d4c75e95a0 Author: Jan Kara Date: Thu Dec 8 21:13:46 2011 +0100 ext3: Fix error handling on inode bitmap corruption commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream. When insert_inode_locked() fails in ext3_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Reviewed-by: Eric Sandeen Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5816579f8bf88fd799401543877c13fab425dc5e Author: Jonathan Nieder Date: Fri May 11 04:20:20 2012 -0500 NFSv4: Revalidate uid/gid after open This is a shorter (and more appropriate for stable kernels) analog to the following upstream commit: commit 6926afd1925a54a13684ebe05987868890665e2b Author: Trond Myklebust Date: Sat Jan 7 13:22:46 2012 -0500 NFSv4: Save the owner/group name string when doing open ...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust Without this patch, logging into two different machines with home directories mounted over NFS4 and then running "vim" and typing ":q" in each reliably produces the following error on the second machine: E137: Viminfo file is not writable: /users/system/rtheys/.viminfo This regression was introduced by 80e52aced138 ("NFSv4: Don't do idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32 cycle) --- after the OPEN call, .viminfo has the default values for st_uid and st_gid (0xfffffffe) cached because we do not want to let rpciod wait for an idmapper upcall to fill them in. The fix used in mainline is to save the owner and group as strings and perform the upcall in _nfs4_proc_open outside the rpciod context, which takes about 600 lines. For stable, we can do something similar with a one-liner: make open check for the stale fields and make a (synchronous) GETATTR call to fill them when needed. Trond dictated the patch, I typed it in, and Rik tested it. Addresses http://bugs.debian.org/659111 and https://bugzilla.redhat.com/789298 Reported-by: Rik Theys Explained-by: David Flyn Signed-off-by: Jonathan Nieder Tested-by: Rik Theys Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 826b464bd782a5e4279e245dfa4ddc86d5773a87 Author: Mark Hills Date: Mon Apr 30 19:39:22 2012 +0100 ALSA: echoaudio: Remove incorrect part of assertion commit c914f55f7cdfafe9d7d5b248751902c7ab57691e upstream. This assertion seems to imply that chip->dsp_code_to_load is a pointer. It's actually an integer handle on the actual firmware, and 0 has no special meaning. The assertion prevents initialisation of a Darla20 card, but would also affect other models. It seems it was introduced in commit dd7b254d. ALSA sound/pci/echoaudio/echoaudio.c:2061 Echoaudio driver starting... ALSA sound/pci/echoaudio/echoaudio.c:1969 chip=ebe4e000 ALSA sound/pci/echoaudio/echoaudio.c:2007 pci=ed568000 irq=19 subdev=0010 Init hardware... ALSA sound/pci/echoaudio/darla20_dsp.c:36 init_hw() - Darla20 ------------[ cut here ]------------ WARNING: at sound/pci/echoaudio/echoaudio_dsp.c:478 init_hw+0x1d1/0x86c [snd_darla20]() Hardware name: Dell DM051 BUG? (!chip->dsp_code_to_load || !chip->comm_page) Signed-off-by: Mark Hills Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 88a3ee1a9002d52325622008a252bfff3dcfb940 Author: Eric Dumazet Date: Sun Apr 29 09:08:22 2012 +0000 netem: fix possible skb leak [ Upstream commit 116a0fc31c6c9b8fc821be5a96e5bf0b43260131 ] skb_checksum_help(skb) can return an error, we must free skb in this case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if skb_unshare() failed), so lets use this generic helper. Signed-off-by: Eric Dumazet Cc: Stephen Hemminger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 25849872019de099f8c5e18868f71d031f63dabb Author: Tim Bird Date: Wed May 2 22:55:39 2012 +0100 ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve commit e787ec1376e862fcea1bfd523feb7c5fb43ecdb9 upstream. The inline assembly in kernel_execve() uses r8 and r9. Since this code sequence does not return, it usually doesn't matter if the register clobber list is accurate. However, I saw a case where a particular version of gcc used r8 as an intermediate for the value eventually passed to r9. Because r8 is used in the inline assembly, and not mentioned in the clobber list, r9 was set to an incorrect value. This resulted in a kernel panic on execution of the first user-space program in the system. r9 is used in ret_to_user as the thread_info pointer, and if it's wrong, bad things happen. Signed-off-by: Tim Bird Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 31ff79fb179ea4ddae76f232f6ccc979926d7721 Author: David Ward Date: Sun Apr 15 12:31:45 2012 +0000 net_sched: gred: Fix oops in gred_dump() in WRED mode [ Upstream commit 244b65dbfede788f2fa3fe2463c44d0809e97c6b ] A parameter set exists for WRED mode, called wred_set, to hold the same values for qavg and qidlestart across all VQs. The WRED mode values had been previously held in the VQ for the default DP. After these values were moved to wred_set, the VQ for the default DP was no longer created automatically (so that it could be omitted on purpose, to have packets in the default DP enqueued directly to the device without using RED). However, gred_dump() was overlooked during that change; in WRED mode it still reads qavg/qidlestart from the VQ for the default DP, which might not even exist. As a result, this command sequence will cause an oops: tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \ DPs 3 default 2 grio tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS This fixes gred_dump() in WRED mode to use the values held in wred_set. Signed-off-by: David Ward Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 28188e3ae832e6e3cc65a5a5afd0ad4556b9cbcd Author: Davide Ciminaghi Date: Fri Apr 13 04:48:25 2012 +0000 net/ethernet: ks8851_mll fix rx frame buffer overflow [ Upstream commit 8a9a0ea6032186e3030419262678d652b88bf6a8 ] At the beginning of ks_rcv(), a for loop retrieves the header information relevant to all the frames stored in the mac's internal buffers. The number of pending frames is stored as an 8 bits field in KS_RXFCTR. If interrupts are disabled long enough to allow for more than 32 frames to accumulate in the MAC's internal buffers, a buffer overflow occurs. This patch fixes the problem by making the driver's frame_head_info buffer big enough. Well actually, since the chip appears to have 12K of internal rx buffers and the shortest ethernet frame should be 64 bytes long, maybe the limit could be set to 12*1024/64 = 192 frames, but 255 should be safer. Signed-off-by: Davide Ciminaghi Signed-off-by: Raffaele Recalcati Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit cf4f72188ee5cb68f4401d23e11148140be08743 Author: Tony Zelenoff Date: Wed Apr 11 06:15:03 2012 +0000 atl1: fix kernel panic in case of DMA errors [ Upstream commit 03662e41c7cff64a776bfb1b3816de4be43de881 ] Problem: There was two separate work_struct structures which share one handler. Unfortunately getting atl1_adapter structure from work_struct in case of DMA error was done from incorrect offset which cause kernel panics. Solution: The useless work_struct for DMA error removed and handler name changed to more generic one. Signed-off-by: Tony Zelenoff Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 237b614839ec49b7069a2524d2dfcd19734e3968 Author: Eric Dumazet Date: Fri Apr 6 10:49:10 2012 +0200 net: fix a race in sock_queue_err_skb() [ Upstream commit 110c43304db6f06490961529536c362d9ac5732f ] As soon as an skb is queued into socket error queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e0547758621cfc9388b61af865b4dc7bb541b290 Author: Eric Dumazet Date: Thu Apr 5 22:17:46 2012 +0000 netlink: fix races after skb queueing [ Upstream commit 4a7e7c2ad540e54c75489a70137bf0ec15d3a127 ] As soon as an skb is queued into socket receive_queue, another thread can consume it, so we are not allowed to reference skb anymore, or risk use after free. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e61eb0d0e11c2d45f980e109d40fb5000e8bb57c Author: Sasha Levin Date: Thu Apr 5 12:07:45 2012 +0000 phonet: Check input from user before allocating [ Upstream commit bcf1b70ac6eb0ed8286c66e6bf37cb747cbaa04c ] A phonet packet is limited to USHRT_MAX bytes, this is never checked during tx which means that the user can specify any size he wishes, and the kernel will attempt to allocate that size. In the good case, it'll lead to the following warning, but it may also cause the kernel to kick in the OOM and kill a random task on the server. [ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730() [ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46 [ 8921.756672] Call Trace: [ 8921.758185] [] warn_slowpath_common+0x87/0xb0 [ 8921.762868] [] warn_slowpath_null+0x15/0x20 [ 8921.765399] [] __alloc_pages_slowpath+0x65/0x730 [ 8921.769226] [] ? zone_watermark_ok+0x1a/0x20 [ 8921.771686] [] ? get_page_from_freelist+0x625/0x660 [ 8921.773919] [] __alloc_pages_nodemask+0x1f8/0x240 [ 8921.776248] [] kmalloc_large_node+0x70/0xc0 [ 8921.778294] [] __kmalloc_node_track_caller+0x34/0x1c0 [ 8921.780847] [] ? sock_alloc_send_pskb+0xbc/0x260 [ 8921.783179] [] __alloc_skb+0x75/0x170 [ 8921.784971] [] sock_alloc_send_pskb+0xbc/0x260 [ 8921.787111] [] ? release_sock+0x7e/0x90 [ 8921.788973] [] sock_alloc_send_skb+0x10/0x20 [ 8921.791052] [] pep_sendmsg+0x60/0x380 [ 8921.792931] [] ? pn_socket_bind+0x156/0x180 [ 8921.794917] [] ? pn_socket_autobind+0x3f/0x90 [ 8921.797053] [] pn_socket_sendmsg+0x4f/0x70 [ 8921.798992] [] sock_aio_write+0x187/0x1b0 [ 8921.801395] [] ? sub_preempt_count+0xae/0xf0 [ 8921.803501] [] ? __lock_acquire+0x42c/0x4b0 [ 8921.805505] [] ? __sock_recv_ts_and_drops+0x140/0x140 [ 8921.807860] [] do_sync_readv_writev+0xbc/0x110 [ 8921.809986] [] ? might_fault+0x97/0xa0 [ 8921.811998] [] ? security_file_permission+0x1e/0x90 [ 8921.814595] [] do_readv_writev+0xe2/0x1e0 [ 8921.816702] [] ? do_setitimer+0x1ac/0x200 [ 8921.818819] [] ? get_parent_ip+0x11/0x50 [ 8921.820863] [] ? sub_preempt_count+0xae/0xf0 [ 8921.823318] [] vfs_writev+0x46/0x60 [ 8921.825219] [] sys_writev+0x4f/0xb0 [ 8921.827127] [] system_call_fastpath+0x16/0x1b [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]--- Signed-off-by: Sasha Levin Acked-by: Rémi Denis-Courmont Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit cdb1f35dc7de42802527140a3613871c394548e1 Author: Thomas Jarosch Date: Wed Dec 7 22:08:11 2011 +0100 PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs commit f67fd55fa96f7d7295b43ffbc4a97d8f55e473aa upstream. Some BIOS implementations leave the Intel GPU interrupts enabled, even though no one is handling them (f.e. i915 driver is never loaded). Additionally the interrupt destination is not set up properly and the interrupt ends up -somewhere-. These spurious interrupts are "sticky" and the kernel disables the (shared) interrupt line after 100.000+ generated interrupts. Fix it by disabling the still enabled interrupts. This resolves crashes often seen on monitor unplug. Tested on the following boards: - Intel DH61CR: Affected - Intel DH67BL: Affected - Intel S1200KP server board: Affected - Asus P8H61-M LE: Affected, but system does not crash. Probably the IRQ ends up somewhere unnoticed. According to reports on the net, the Intel DH61WW board is also affected. Many thanks to Jesse Barnes from Intel for helping with the register configuration and to Intel in general for providing public hardware documentation. Signed-off-by: Thomas Jarosch Tested-by: Charlie Suffin Signed-off-by: Jesse Barnes Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 21f5d063fca7488c7bd3d6adec5aa0c2e5bc4ae0 Author: Kent Yoder Date: Thu Apr 5 20:34:20 2012 +0800 crypto: sha512 - Fix byte counter overflow in SHA-512 commit 25c3d30c918207556ae1d6e663150ebdf902186b upstream. The current code only increments the upper 64 bits of the SHA-512 byte counter when the number of bytes hashed happens to hit 2^64 exactly. This patch increments the upper 64 bits whenever the lower 64 bits overflows. Signed-off-by: Kent Yoder Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 79c23081de12a2d61529174f153400b7586c2cf9 Author: Alex He Date: Fri Mar 30 10:21:38 2012 +0800 xHCI: Correct the #define XHCI_LEGACY_DISABLE_SMI commit 95018a53f7653e791bba1f54c8d75d9cb700d1bd upstream. Re-define XHCI_LEGACY_DISABLE_SMI and used it in right way. All SMI enable bits will be cleared to zero and flag bits 29:31 are also cleared to zero. Other bits should be presvered as Table 146. This patch should be backported to kernels as old as 2.6.31. Signed-off-by: Alex He Signed-off-by: Sarah Sharp Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 737b15bb831a1ada007772b4165a3c5c0562787f Author: Sarah Sharp Date: Fri Mar 16 13:09:39 2012 -0700 xhci: Don't write zeroed pointers to xHC registers. commit 159e1fcc9a60fc7daba23ee8fcdb99799de3fe84 upstream. When xhci_mem_cleanup() is called, we can't be sure if the xHC is actually halted. We can ask the xHC to halt by writing to the RUN bit in the command register, but that might timeout due to a HW hang. If the host controller is still running, we should not write zeroed values to the event ring dequeue pointers or base tables, the DCBAA pointers, or the command ring pointers. Eric Fu reports his VIA VL800 host accesses the event ring pointers after a failed register restore on resume from suspend. The hypothesis is that the host never actually halted before the register write to change the event ring pointer to zero. Remove all writes of zeroed values to pointer registers in xhci_mem_cleanup(). Instead, make all callers of the function reset the host controller first, which will reset those registers to zero. xhci_mem_init() is the only caller that doesn't first halt and reset the host controller before calling xhci_mem_cleanup(). This should be backported to kernels as old as 2.6.32. Signed-off-by: Sarah Sharp Tested-by: Elric Fu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0fdec0669db8d78a517df8f00accae6ea0deab2e Author: Johan Hovold Date: Tue Mar 20 16:59:33 2012 +0100 USB: serial: fix race between probe and open commit a65a6f14dc24a90bde3f5d0073ba2364476200bf upstream. Fix race between probe and open by making sure that the disconnected flag is not cleared until all ports have been registered. A call to tty_open while probe is running may get a reference to the serial structure in serial_install before its ports have been registered. This may lead to usb_serial_core calling driver open before port is fully initialised. With ftdi_sio this result in the following NULL-pointer dereference as the private data has not been initialised at open: [ 199.698286] IP: [] ftdi_open+0x59/0xe0 [ftdi_sio] [ 199.698297] *pde = 00000000 [ 199.698303] Oops: 0000 [#1] PREEMPT SMP [ 199.698313] Modules linked in: ftdi_sio usbserial [ 199.698323] [ 199.698327] Pid: 1146, comm: ftdi_open Not tainted 3.2.11 #70 Dell Inc. Vostro 1520/0T816J [ 199.698339] EIP: 0060:[] EFLAGS: 00010286 CPU: 0 [ 199.698344] EIP is at ftdi_open+0x59/0xe0 [ftdi_sio] [ 199.698348] EAX: 0000003e EBX: f5067000 ECX: 00000000 EDX: 80000600 [ 199.698352] ESI: f48d8800 EDI: 00000001 EBP: f515dd54 ESP: f515dcfc [ 199.698356] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 199.698361] Process ftdi_open (pid: 1146, ti=f515c000 task=f481e040 task.ti=f515c000) [ 199.698364] Stack: [ 199.698368] f811a9fe f811a9e0 f811b3ef 00000000 00000000 00001388 00000000 f4a86800 [ 199.698387] 00000002 00000000 f806e68e 00000000 f532765c f481e040 00000246 22222222 [ 199.698479] 22222222 22222222 22222222 f5067004 f5327600 f5327638 f515dd74 f806e6ab [ 199.698496] Call Trace: [ 199.698504] [] ? serial_activate+0x2e/0x70 [usbserial] [ 199.698511] [] serial_activate+0x4b/0x70 [usbserial] [ 199.698521] [] tty_port_open+0x7c/0xd0 [ 199.698527] [] ? serial_set_termios+0xa0/0xa0 [usbserial] [ 199.698534] [] serial_open+0x2f/0x70 [usbserial] [ 199.698540] [] tty_open+0x20c/0x510 [ 199.698546] [] chrdev_open+0xe7/0x230 [ 199.698553] [] __dentry_open+0x1f2/0x390 [ 199.698559] [] ? _raw_spin_unlock+0x2c/0x50 [ 199.698565] [] nameidata_to_filp+0x66/0x80 [ 199.698570] [] ? cdev_put+0x20/0x20 [ 199.698576] [] do_last+0x198/0x730 [ 199.698581] [] path_openat+0xa0/0x350 [ 199.698587] [] do_filp_open+0x35/0x80 [ 199.698593] [] ? _raw_spin_unlock+0x2c/0x50 [ 199.698599] [] ? alloc_fd+0xc0/0x100 [ 199.698605] [] ? getname_flags+0x72/0x120 [ 199.698611] [] do_sys_open+0xf0/0x1c0 [ 199.698617] [] ? trace_hardirqs_on_thunk+0xc/0x10 [ 199.698623] [] sys_open+0x2e/0x40 [ 199.698628] [] sysenter_do_call+0x12/0x36 [ 199.698632] Code: 85 89 00 00 00 8b 16 8b 4d c0 c1 e2 08 c7 44 24 14 88 13 00 00 81 ca 00 00 00 80 c7 44 24 10 00 00 00 00 c7 44 24 0c 00 00 00 00 <0f> b7 41 78 31 c9 89 44 24 08 c7 44 24 04 00 00 00 00 c7 04 24 [ 199.698884] EIP: [] ftdi_open+0x59/0xe0 [ftdi_sio] SS:ESP 0068:f515dcfc [ 199.698893] CR2: 0000000000000078 [ 199.698925] ---[ end trace 77c43ec023940cff ]--- Reported-and-tested-by: Ken Huang Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a6f5fd979c52ccf8b7c416251c4326283f441006 Author: Wang YanQing Date: Sun Apr 1 08:54:02 2012 +0800 video:uvesafb: Fix oops that uvesafb try to execute NX-protected page commit b78f29ca0516266431688c5eb42d39ce42ec039a upstream. This patch fix the oops below that catched in my machine [ 81.560602] uvesafb: NVIDIA Corporation, GT216 Board - 0696a290, Chip Rev , OEM: NVIDIA, VBE v3.0 [ 81.609384] uvesafb: protected mode interface info at c000:d350 [ 81.609388] uvesafb: pmi: set display start = c00cd3b3, set palette = c00cd40e [ 81.609390] uvesafb: pmi: ports = 3b4 3b5 3ba 3c0 3c1 3c4 3c5 3c6 3c7 3c8 3c9 3cc 3ce 3cf 3d0 3d1 3d2 3d3 3d4 3d5 3da [ 81.614558] uvesafb: VBIOS/hardware doesn't support DDC transfers [ 81.614562] uvesafb: no monitor limits have been set, default refresh rate will be used [ 81.614994] uvesafb: scrolling: ypan using protected mode interface, yres_virtual=4915 [ 81.744147] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 81.744153] BUG: unable to handle kernel paging request at c00cd3b3 [ 81.744159] IP: [] 0xc00cd3b2 [ 81.744167] *pdpt = 00000000016d6001 *pde = 0000000001c7b067 *pte = 80000000000cd163 [ 81.744171] Oops: 0011 [#1] SMP [ 81.744174] Modules linked in: uvesafb(+) cfbcopyarea cfbimgblt cfbfillrect [ 81.744178] [ 81.744181] Pid: 3497, comm: modprobe Not tainted 3.3.0-rc4NX+ #71 Acer Aspire 4741 /Aspire 4741 [ 81.744185] EIP: 0060:[] EFLAGS: 00010246 CPU: 0 [ 81.744187] EIP is at 0xc00cd3b3 [ 81.744189] EAX: 00004f07 EBX: 00000000 ECX: 00000000 EDX: 00000000 [ 81.744191] ESI: f763f000 EDI: f763f6e8 EBP: f57f3a0c ESP: f57f3a00 [ 81.744192] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 81.744195] Process modprobe (pid: 3497, ti=f57f2000 task=f748c600 task.ti=f57f2000) [ 81.744196] Stack: [ 81.744197] f82512c5 f759341c 00000000 f57f3a30 c124a9bc 00000001 00000001 000001e0 [ 81.744202] f8251280 f763f000 f7593400 00000000 f57f3a40 c12598dd f5c0c000 00000000 [ 81.744206] f57f3b10 c1255efe c125a21a 00000006 f763f09c 00000000 c1c6cb60 f7593400 [ 81.744210] Call Trace: [ 81.744215] [] ? uvesafb_pan_display+0x45/0x60 [uvesafb] [ 81.744222] [] fb_pan_display+0x10c/0x160 [ 81.744226] [] ? uvesafb_vbe_find_mode+0x180/0x180 [uvesafb] [ 81.744230] [] bit_update_start+0x1d/0x50 [ 81.744232] [] fbcon_switch+0x39e/0x550 [ 81.744235] [] ? bit_cursor+0x4ea/0x560 [ 81.744240] [] redraw_screen+0x12b/0x220 [ 81.744245] [] ? tty_do_resize+0x3b/0xc0 [ 81.744247] [] vc_do_resize+0x3d2/0x3e0 [ 81.744250] [] vc_resize+0x14/0x20 [ 81.744253] [] fbcon_init+0x29d/0x500 [ 81.744255] [] ? set_inverse_trans_unicode+0xe4/0x110 [ 81.744258] [] visual_init+0xb8/0x150 [ 81.744261] [] bind_con_driver+0x16c/0x360 [ 81.744264] [] ? register_con_driver+0x6e/0x190 [ 81.744267] [] take_over_console+0x41/0x50 [ 81.744269] [] fbcon_takeover+0x6a/0xd0 [ 81.744272] [] fbcon_event_notify+0x758/0x790 [ 81.744277] [] notifier_call_chain+0x42/0xb0 [ 81.744280] [] __blocking_notifier_call_chain+0x60/0x90 [ 81.744283] [] blocking_notifier_call_chain+0x1a/0x20 [ 81.744285] [] fb_notifier_call_chain+0x11/0x20 [ 81.744288] [] register_framebuffer+0x1d9/0x2b0 [ 81.744293] [] ? ioremap_wc+0x33/0x40 [ 81.744298] [] uvesafb_probe+0xaba/0xc40 [uvesafb] [ 81.744302] [] platform_drv_probe+0xf/0x20 [ 81.744306] [] driver_probe_device+0x68/0x170 [ 81.744309] [] __device_attach+0x41/0x50 [ 81.744313] [] bus_for_each_drv+0x48/0x70 [ 81.744316] [] device_attach+0x83/0xa0 [ 81.744319] [] ? __driver_attach+0x90/0x90 [ 81.744321] [] bus_probe_device+0x6f/0x90 [ 81.744324] [] device_add+0x5e5/0x680 [ 81.744329] [] ? kvasprintf+0x43/0x60 [ 81.744332] [] ? kobject_set_name_vargs+0x64/0x70 [ 81.744335] [] ? kobject_set_name_vargs+0x64/0x70 [ 81.744339] [] platform_device_add+0xff/0x1b0 [ 81.744343] [] uvesafb_init+0x50/0x9b [uvesafb] [ 81.744346] [] do_one_initcall+0x2f/0x170 [ 81.744350] [] ? uvesafb_is_valid_mode+0x66/0x66 [uvesafb] [ 81.744355] [] sys_init_module+0xf4/0x1410 [ 81.744359] [] ? vfsmount_lock_local_unlock_cpu+0x30/0x30 [ 81.744363] [] sysenter_do_call+0x12/0x36 [ 81.744365] Code: f5 00 00 00 32 f6 66 8b da 66 d1 e3 66 ba d4 03 8a e3 b0 1c 66 ef b0 1e 66 ef 8a e7 b0 1d 66 ef b0 1f 66 ef e8 fa 00 00 00 61 c3 <60> e8 c8 00 00 00 66 8b f3 66 8b da 66 ba d4 03 b0 0c 8a e5 66 [ 81.744388] EIP: [] 0xc00cd3b3 SS:ESP 0068:f57f3a00 [ 81.744391] CR2: 00000000c00cd3b3 [ 81.744393] ---[ end trace 18b2c87c925b54d6 ]--- Signed-off-by: Wang YanQing Cc: Michal Januszewski Cc: Alan Cox Signed-off-by: Florian Tobias Schandinat Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 4c5a5473b10eb9c92ea9fb612fddbc9448a05940 Author: David S. Miller Date: Fri Apr 13 11:56:22 2012 -0700 sparc64: Fix bootup crash on sun4v. commit 9e0daff30fd7ecf698e5d20b0fa7f851e427cca5 upstream. The DS driver registers as a subsys_initcall() but this can be too early, in particular this risks registering before we've had a chance to allocate and setup module_kset in kernel/params.c which is performed also as a subsyts_initcall(). Register DS using device_initcall() insteal. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9a6039905c7a35c89517a9483c9963628be167b0 Author: Johan Hovold Date: Thu Mar 15 14:48:40 2012 +0100 Bluetooth: hci_ldisc: fix NULL-pointer dereference on tty_close commit 33b69bf80a3704d45341928e4ff68b6ebd470686 upstream. Do not close protocol driver until device has been unregistered. This fixes a race between tty_close and hci_dev_open which can result in a NULL-pointer dereference. The line discipline closes the protocol driver while we may still have hci_dev_open sleeping on the req_lock mutex resulting in a NULL-pointer dereference when lock is acquired and hci_init_req called. Bug is 100% reproducible using hciattach and a disconnected serial port: 0. # hciattach -n ttyO1 any noflow 1. hci_dev_open called from hci_power_on grabs req lock 2. hci_init_req executes but device fails to initialise (times out eventually) 3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock 4. hci_uart_tty_close detaches protocol driver and cancels init req 5. hci_dev_open (1) releases req lock 6. hci_dev_open (3) grabs req lock, calls hci_init_req, which triggers oops when request is prepared in hci_uart_send_frame [ 137.201263] Unable to handle kernel NULL pointer dereference at virtual address 00000028 [ 137.209838] pgd = c0004000 [ 137.212677] [00000028] *pgd=00000000 [ 137.216430] Internal error: Oops: 17 [#1] [ 137.220642] Modules linked in: [ 137.223846] CPU: 0 Tainted: G W (3.3.0-rc6-dirty #406) [ 137.230529] PC is at __lock_acquire+0x5c/0x1ab0 [ 137.235290] LR is at lock_acquire+0x9c/0x128 [ 137.239776] pc : [] lr : [] psr: 20000093 [ 137.239776] sp : cf869dd8 ip : c0529554 fp : c051c730 [ 137.251800] r10: 00000000 r9 : cf8673c0 r8 : 00000080 [ 137.257293] r7 : 00000028 r6 : 00000002 r5 : 00000000 r4 : c053fd70 [ 137.264129] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : 00000001 [ 137.270965] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 137.278717] Control: 10c5387d Table: 8f0f4019 DAC: 00000015 [ 137.284729] Process kworker/u:1 (pid: 7, stack limit = 0xcf8682e8) [ 137.291229] Stack: (0xcf869dd8 to 0xcf86a000) [ 137.295776] 9dc0: c0529554 00000000 [ 137.304351] 9de0: cf8673c0 cf868000 d03ea1ef cf868000 000001ef 00000470 00000000 00000002 [ 137.312927] 9e00: cf8673c0 00000001 c051c730 c00716ec 0000000c 00000440 c0529554 00000001 [ 137.321533] 9e20: c051c730 cf868000 d03ea1f3 00000000 c053b978 00000000 00000028 cf868000 [ 137.330078] 9e40: 00000000 00000000 00000002 00000000 00000000 c00733f8 00000002 00000080 [ 137.338684] 9e60: 00000000 c02a1d50 00000000 00000001 60000013 c0969a1c 60000093 c053b96c [ 137.347259] 9e80: 00000002 00000018 20000013 c02a1d50 cf0ac000 00000000 00000002 cf868000 [ 137.355834] 9ea0: 00000089 c0374130 00000002 00000000 c02a1d50 cf0ac000 0000000c cf0fc540 [ 137.364410] 9ec0: 00000018 c02a1d50 cf0fc540 00000000 cf0fc540 c0282238 c028220c cf178d80 [ 137.372985] 9ee0: 127525d8 c02821cc 9a1fa451 c032727c 9a1fa451 127525d8 cf0fc540 cf0ac4ec [ 137.381561] 9f00: cf0ac000 cf0fc540 cf0ac584 c03285f4 c0328580 cf0ac4ec cf85c740 c05510cc [ 137.390136] 9f20: ce825400 c004c914 00000002 00000000 c004c884 ce8254f5 cf869f48 00000000 [ 137.398712] 9f40: c0328580 ce825415 c0a7f914 c061af64 00000000 c048cf3c cf8673c0 cf85c740 [ 137.407287] 9f60: c05510cc c051a66c c05510ec c05510c4 cf85c750 cf868000 00000089 c004d6ac [ 137.415863] 9f80: 00000000 c0073d14 00000001 cf853ed8 cf85c740 c004d558 00000013 00000000 [ 137.424438] 9fa0: 00000000 00000000 00000000 c00516b0 00000000 00000000 cf85c740 00000000 [ 137.433013] 9fc0: 00000001 dead4ead ffffffff ffffffff c0551674 00000000 00000000 c0450aa4 [ 137.441589] 9fe0: cf869fe0 cf869fe0 cf853ed8 c005162c c0013b30 c0013b30 00ffff00 00ffff00 [ 137.450164] [] (__lock_acquire+0x5c/0x1ab0) from [] (lock_acquire+0x9c/0x128) [ 137.459503] [] (lock_acquire+0x9c/0x128) from [] (_raw_spin_lock_irqsave+0x44/0x58) [ 137.469360] [] (_raw_spin_lock_irqsave+0x44/0x58) from [] (skb_queue_tail+0x18/0x48) [ 137.479339] [] (skb_queue_tail+0x18/0x48) from [] (h4_enqueue+0x2c/0x34) [ 137.488189] [] (h4_enqueue+0x2c/0x34) from [] (hci_uart_send_frame+0x34/0x68) [ 137.497497] [] (hci_uart_send_frame+0x34/0x68) from [] (hci_send_frame+0x50/0x88) [ 137.507171] [] (hci_send_frame+0x50/0x88) from [] (hci_cmd_work+0x74/0xd4) [ 137.516204] [] (hci_cmd_work+0x74/0xd4) from [] (process_one_work+0x1a0/0x4ec) [ 137.525604] [] (process_one_work+0x1a0/0x4ec) from [] (worker_thread+0x154/0x344) [ 137.535278] [] (worker_thread+0x154/0x344) from [] (kthread+0x84/0x90) [ 137.543975] [] (kthread+0x84/0x90) from [] (kernel_thread_exit+0x0/0x8) [ 137.552734] Code: e59f4e5c e5941000 e3510000 0a000031 (e5971000) [ 137.559234] ---[ end trace 1b75b31a2719ed1e ]--- Signed-off-by: Johan Hovold Acked-by: Marcel Holtmann Signed-off-by: Johan Hedberg Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c74bc8b772076fc6b0b54a82ab3d100142ea9a99 Author: Jun Nie Date: Tue Dec 7 14:03:38 2010 +0800 Bluetooth: add NULL pointer check in HCI commit d9319560b86839506c2011346b1f2e61438a3c73 upstream. If we fail to find a hci device pointer in hci_uart, don't try to deref the NULL one we do have. Signed-off-by: Jun Nie Signed-off-by: Gustavo F. Padovan Signed-off-by: Willy Tarreau commit ab81b6abd9a1cdac0d6e84515cc808f735fc78b7 Author: Salman Qazi Date: Fri Mar 9 16:41:01 2012 -0800 sched/x86: Fix overflow in cyc2ns_offset commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 upstream. When a machine boots up, the TSC generally gets reset. However, when kexec is used to boot into a kernel, the TSC value would be carried over from the previous kernel. The computation of cycns_offset in set_cyc2ns_scale is prone to an overflow, if the machine has been up more than 208 days prior to the kexec. The overflow happens when we multiply *scale, even though there is enough room to store the final answer. We fix this issue by decomposing tsc_now into the quotient and remainder of division by CYC2NS_SCALE_FACTOR and then performing the multiplication separately on the two components. Refactor code to share the calculation with the previous fix in __cycles_2_ns(). Signed-off-by: Salman Qazi Acked-by: John Stultz Acked-by: Peter Zijlstra Cc: Paul Turner Cc: john stultz Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar Cc: Mike Galbraith Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0d1e8b8281ae7b43be54ec3409d299c99d7c975d Author: Dan Carpenter Date: Wed Jan 18 12:56:02 2012 +0300 nfsd: don't allow zero length strings in cache_parse() commit 6d8d17499810479eabd10731179c04b2ca22152f upstream. There is no point in passing a zero length string here and quite a few of that cache_parse() implementations will Oops if count is zero. Signed-off-by: Dan Carpenter Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit df2c68454e160d1635e1c1279036d573ce804d51 Author: Jan Kara Date: Thu Mar 15 09:34:02 2012 +0000 xfs: Fix oops on IO error during xlog_recover_process_iunlinks() commit d97d32edcd732110758799ae60af725e5110b3dc upstream. When an IO error happens during inode deletion run from xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not count with the fact that read of a buffer which was read a while ago can really fail which results in the oops on agi = XFS_BUF_TO_AGI(agibp); Fix the problem by cleaning up the buffer handling in xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer lock but keep buffer reference to AG buffer. That is enough for buffer to stay pinned in memory and we don't have to call xfs_read_agi() all the time. Signed-off-by: Jan Kara Reviewed-by: Dave Chinner Signed-off-by: Ben Myers Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 2e3380e6c1a74e41b94f6cc2a8c7e200ccbc81dc Author: Theodore Ts'o Date: Sun Mar 11 23:30:16 2012 -0400 ext4: check for zero length extent commit 31d4f3a2f3c73f279ff96a7135d7202ef6833f12 upstream. Explicitly test for an extent whose length is zero, and flag that as a corrupted extent. This avoids a kernel BUG_ON assertion failure. Tested: Without this patch, the file system image found in tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a kernel panic. With this patch, an ext4 file system error is noted instead, and the file system is marked as being corrupted. https://bugzilla.kernel.org/show_bug.cgi?id=42859 Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit df79b5ef0bb8426ca21ece3ff9128e03d556c00d Author: Trond Myklebust Date: Mon Mar 19 13:39:35 2012 -0400 SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up() commit 540a0f7584169651f485e8ab67461fcb06934e38 upstream. The problem is that for the case of priority queues, we have to assume that __rpc_remove_wait_queue_priority will move new elements from the tk_wait.links lists into the queue->tasks[] list. We therefore cannot use list_for_each_entry_safe() on queue->tasks[], since that will skip these new tasks that __rpc_remove_wait_queue_priority is adding. Without this fix, rpc_wake_up and rpc_wake_up_status will both fail to wake up all functions on priority wait queues, which can result in some nasty hangs. Reported-by: Andy Adamson Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 58c37cca51fc5976dbaa7e62c18da072e7a8e622 Author: Sasha Levin Date: Thu Mar 15 12:36:14 2012 -0400 ntp: Fix integer overflow when setting time commit a078c6d0e6288fad6d83fb6d5edd91ddb7b6ab33 upstream. 'long secs' is passed as divisor to div_s64, which accepts a 32bit divisor. On 64bit machines that value is trimmed back from 8 bytes back to 4, causing a divide by zero when the number is bigger than (1 << 32) - 1 and all 32 lower bits are 0. Use div64_long() instead. Signed-off-by: Sasha Levin Cc: johnstul@us.ibm.com Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman [WT: div64_long() does not exist on 2.6.32 and needs a deeper backport than desired. Instead, address the issue by controlling that the divisor is correct for use as an s32 divisor] Signed-off-by: Willy Tarreau commit f12847e2fa97bd4c33379d02492e99b47687db7f Author: Greg Kroah-Hartman Date: Tue Feb 28 09:20:09 2012 -0800 USB: ftdi_sio: fix problem when the manufacture is a NULL string commit 656d2b3964a9d0f9864d472f8dfa2dd7dd42e6c0 upstream. On some misconfigured ftdi_sio devices, if the manufacturer string is NULL, the kernel will oops when the device is plugged in. This patch fixes the problem. Reported-by: Wojciech M Zabolotny Tested-by: Wojciech M Zabolotny Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9d899d1facbd52795a6a94c667439000ab9f2bca Author: Ryusuke Konishi Date: Fri Mar 16 17:08:39 2012 -0700 nilfs2: fix NULL pointer dereference in nilfs_load_super_block() commit d7178c79d9b7c5518f9943188091a75fc6ce0675 upstream. According to the report from Slicky Devil, nilfs caused kernel oops at nilfs_load_super_block function during mount after he shrank the partition without resizing the filesystem: BUG: unable to handle kernel NULL pointer dereference at 00000048 IP: [] nilfs_load_super_block+0x17e/0x280 [nilfs2] *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP ... Call Trace: [] init_nilfs+0x4b/0x2e0 [nilfs2] [] nilfs_mount+0x447/0x5b0 [nilfs2] [] mount_fs+0x36/0x180 [] vfs_kern_mount+0x51/0xa0 [] do_kern_mount+0x3e/0xe0 [] do_mount+0x169/0x700 [] sys_mount+0x6b/0xa0 [] sysenter_do_call+0x12/0x28 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00 EIP: [] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc CR2: 0000000000000048 This turned out due to a defect in an error path which runs if the calculated location of the secondary super block was invalid. This patch fixes it and eliminates the reported oops. Reported-by: Slicky Devil Signed-off-by: Ryusuke Konishi Tested-by: Slicky Devil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3aeff2407453bf5f2c85b760df870f87df77c2e1 Author: Dan Carpenter Date: Sat Mar 3 12:09:17 2012 +0100 block, sx8: fix pointer math issue getting fw version commit ea5f4db8ece896c2ab9eafa0924148a2596c52e4 upstream. "mem" is type u8. We need parenthesis here or it screws up the pointer math probably leading to an oops. Signed-off-by: Dan Carpenter Acked-by: Jeff Garzik Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d67f05eb15321bcfaa62d544a0f11d0015736caf Author: Eric Dumazet Date: Thu Feb 23 10:55:02 2012 +0000 ipsec: be careful of non existing mac headers [ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ] Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli Tested-by: Niccolò Belli Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3a40b4fe7b6f8b25ac3dc7af13e374ca56d9dfc8 Author: Thomas Gleixner Date: Fri Mar 9 20:55:10 2012 +0100 x86: Derandom delay_tsc for 64 bit commit a7f4255f906f60f72e00aad2fb000939449ff32e upstream. Commit f0fbf0abc093 ("x86: integrate delay functions") converted delay_tsc() into a random delay generator for 64 bit. The reason is that it merged the mostly identical versions of delay_32.c and delay_64.c. Though the subtle difference of the result was: static void delay_tsc(unsigned long loops) { - unsigned bclock, now; + unsigned long bclock, now; Now the function uses rdtscl() which returns the lower 32bit of the TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64 bit this fails when the lower 32bit are close to wrap around when bclock is read, because the following check if ((now - bclock) >= loops) break; evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0 because the unsigned long (now - bclock) of these values results in 0xffffffff00000001 which is definitely larger than the loops value. That explains Tvortkos observation: "Because I am seeing udelay(500) (_occasionally_) being short, and that by delaying for some duration between 0us (yep) and 491us." Make those variables explicitely u32 again, so this works for both 32 and 64 bit. Reported-by: Tvrtko Ursulin Signed-off-by: Thomas Gleixner Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 386ca3f0f5c40110fa0d37cbab3dc831227c1c6e Author: David S. Miller Date: Fri Nov 12 13:35:00 2010 -0800 tcp: Don't change unlocked socket state in tcp_v4_err(). commit 8f49c2703b33519aaaccc63f571b465b9d2b3a2d upstream. Alexey Kuznetsov noticed a regression introduced by commit f1ecd5d9e7366609d640ff4040304ea197fbc618 ("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable") The RTO and timer modification code added to tcp_v4_err() doesn't check sock_owned_by_user(), which if true means we don't have exclusive access to the socket and therefore cannot modify it's critical state. Just skip this new code block if sock_owned_by_user() is true and eliminate the now superfluous sock_owned_by_user() code block contained within. Reported-by: Alexey Kuznetsov Signed-off-by: David S. Miller CC: Damian Lukowski Acked-by: Eric Dumazet Cc: Ben Hutchings Signed-off-by: Willy Tarreau commit 66174b0e459d78b9bcb77ee37ecff25404f91140 Author: Oleg Nesterov Date: Mon Apr 9 21:03:50 2012 +0200 cred: copy_process() should clear child->replacement_session_keyring commit 79549c6dfda0603dba9a70a53467ce62d9335c33 upstream keyctl_session_to_parent(task) sets ->replacement_session_keyring, it should be processed and cleared by key_replace_session_keyring(). However, this task can fork before it notices TIF_NOTIFY_RESUME and the new child gets the bogus ->replacement_session_keyring copied by dup_task_struct(). This is obviously wrong and, if nothing else, this leads to put_cred(already_freed_cred). change copy_creds() to clear this member. If copy_process() fails before this point the wrong ->replacement_session_keyring doesn't matter, exit_creds() won't be called. Cc: Signed-off-by: Oleg Nesterov Acked-by: David Howells Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau commit 092c87ce9f6c7bba0affc148603c5ef50f03e8fb Author: Greg Kroah-Hartman Date: Fri May 4 12:09:39 2012 -0700 hfsplus: Fix potential buffer overflows commit 6f24f892871acc47b40dd594c63606a17c714f77 upstream Commit ec81aecb2966 ("hfs: fix a potential buffer overflow") fixed a few potential buffer overflows in the hfs filesystem. But as Timo Warns pointed out, these changes also need to be made on the hfsplus filesystem as well. Reported-by: Timo Warns Acked-by: WANG Cong Cc: Alexey Khoroshilov Cc: Miklos Szeredi Cc: Sage Weil Cc: Eugene Teo Cc: Roman Zippel Cc: Al Viro Cc: Christoph Hellwig Cc: Alexey Dobriyan Cc: Dave Anderson Cc: stable Cc: Andrew Morton Signed-off-by: Greg Kroah-Hartman Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 4b3eb7bc659a5f91238ae9f16f6d0f7929578fbf Author: Jeff Mahoney Date: Wed Apr 25 14:32:09 2012 +0000 dl2k: Clean up rio_ioctl commit 1bb57e940e1958e40d51f2078f50c3a96a9b2d75 upstream The dl2k driver's rio_ioctl call has a few issues: - No permissions checking - Implements SIOCGMIIREG and SIOCGMIIREG using the SIOCDEVPRIVATE numbers - Has a few ioctls that may have been used for debugging at one point but have no place in the kernel proper. This patch removes all but the MII ioctls, renumbers them to use the standard ones, and adds the proper permission check for SIOCSMIIREG. We can also get rid of the dl2k-specific struct mii_data in favor of the generic struct mii_ioctl_data. Since we have the phyid on hand, we can add the SIOCGMIIPHY ioctl too. Most of the MII code for the driver could probably be converted to use the generic MII library but I don't have a device to test the results. Reported-by: Stephan Mueller Signed-off-by: Jeff Mahoney Signed-off-by: David S. Miller [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit aecd9f48987f2e3edb068e1dcc308d28c70c5023 Author: Francois Romieu Date: Sun Aug 21 18:32:05 2011 +0200 dl2k: use standard #defines from mii.h. commit 78f6a6bd89e9a33e4be1bc61e6990a1172aa396e upstream Signed-off-by: Francois Romieu [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit facdf5857a87c29cd2703ea582998d1f0088a9b4 Author: Jason Wang Date: Wed May 30 21:18:10 2012 +0000 net: sock: validate data_len before allocating skb in sock_alloc_send_pskb() commit cc9b17ad29ecaa20bfe426a8d4dbfb94b13ff1cc upstream We need to validate the number of pages consumed by data_len, otherwise frags array could be overflowed by userspace. So this patch validate data_len and return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS. Signed-off-by: Jason Wang Signed-off-by: David S. Miller [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 989e1ffdfe56a8607fdf89918ca4e494841c02cd Author: David Gibson Date: Wed Mar 21 16:34:12 2012 -0700 hugepages: fix use after free bug in "quota" handling commit 90481622d75715bfcb68501280a917dbfe516029 upstream hugetlbfs_{get,put}_quota() are badly named. They don't interact with the general quota handling code, and they don't much resemble its behaviour. Rather than being about maintaining limits on on-disk block usage by particular users, they are instead about maintaining limits on in-memory page usage (including anonymous MAP_PRIVATE copied-on-write pages) associated with a particular hugetlbfs filesystem instance. Worse, they work by having callbacks to the hugetlbfs filesystem code from the low-level page handling code, in particular from free_huge_page(). This is a layering violation of itself, but more importantly, if the kernel does a get_user_pages() on hugepages (which can happen from KVM amongst others), then the free_huge_page() can be delayed until after the associated inode has already been freed. If an unmount occurs at the wrong time, even the hugetlbfs superblock where the "quota" limits are stored may have been freed. Andrew Barry proposed a patch to fix this by having hugepages, instead of storing a pointer to their address_space and reaching the superblock from there, had the hugepages store pointers directly to the superblock, bumping the reference count as appropriate to avoid it being freed. Andrew Morton rejected that version, however, on the grounds that it made the existing layering violation worse. This is a reworked version of Andrew's patch, which removes the extra, and some of the existing, layering violation. It works by introducing the concept of a hugepage "subpool" at the lower hugepage mm layer - that is a finite logical pool of hugepages to allocate from. hugetlbfs now creates a subpool for each filesystem instance with a page limit set, and a pointer to the subpool gets added to each allocated hugepage, instead of the address_space pointer used now. The subpool has its own lifetime and is only freed once all pages in it _and_ all other references to it (i.e. superblocks) are gone. subpools are optional - a NULL subpool pointer is taken by the code to mean that no subpool limits are in effect. Previous discussion of this bug found in: "Fix refcounting in hugetlbfs quota handling.". See: https://lkml.org/lkml/2011/8/11/28 or http://marc.info/?l=linux-mm&m=126928970510627&w=1 v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to alloc_huge_page() - since it already takes the vma, it is not necessary. Signed-off-by: Andrew Barry Signed-off-by: David Gibson Cc: Hugh Dickins Cc: Mel Gorman Cc: Minchan Kim Cc: Hillf Danton Cc: Paul Mackerras Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit c15f065adeb5ecf241308c720d4f8b1f7953dce2 Author: Jonghwan Choi Date: Wed Apr 18 17:23:04 2012 -0400 security: fix compile error in commoncap.c commit 51b79bee627d526199b2f6a6bef8ee0c0739b6d1 upstream Add missing "personality.h" security/commoncap.c: In function 'cap_bprm_set_creds': security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function) security/commoncap.c:510: error: (Each undeclared identifier is reported only once security/commoncap.c:510: error: for each function it appears in.) Signed-off-by: Jonghwan Choi Acked-by: Serge Hallyn Signed-off-by: James Morris [dannf: adjusted to apply to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 39c46db9e053fa49c3e0b878cdb25dbc2f7be48d Author: Eric Paris Date: Tue Apr 17 16:26:54 2012 -0400 fcaps: clear the same personality flags as suid when fcaps are used commit d52fc5dde171f030170a6cb78034d166b13c9445 upstream If a process increases permissions using fcaps all of the dangerous personality flags which are cleared for suid apps should also be cleared. Thus programs given priviledge with fcaps will continue to have address space randomization enabled even if the parent tried to disable it to make it easier to attack. Signed-off-by: Eric Paris Reviewed-by: Serge Hallyn Signed-off-by: James Morris Signed-off-by: Willy Tarreau commit 595cf5bda4ee3944d8c7c45e9e91a9ec15b3181d Author: Jan Kara Date: Wed Jan 11 18:52:10 2012 +0000 xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink() commit 9b025eb3a89e041bab6698e3858706be2385d692 upstream. Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted symplink and bailed out. Fix it by jumping to 'out' instead of doing return. CC: stable@kernel.org CC: Carlos Maiolino Signed-off-by: Jan Kara Reviewed-by: Alex Elder Reviewed-by: Dave Chinner Signed-off-by: Ben Myers Signed-off-by: Willy Tarreau commit 19b280528cc21229990a7452d94239c7c8c5d609 Author: Carlos Maiolino Date: Mon Nov 7 16:10:24 2011 +0000 xfs: Fix possible memory corruption in xfs_readlink commit b52a360b2aa1c59ba9970fb0f52bbb093fcc7a24 upstream Fixes a possible memory corruption when the link is larger than MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the S_ISLNK assert, since the inode mode is checked previously in xfs_readlink_by_handle() and via VFS. Updated to address concerns raised by Ben Hutchings about the loose attention paid to 32- vs 64-bit values, and the lack of handling a potentially negative pathlen value: - Changed type of "pathlen" to be xfs_fsize_t, to match that of ip->i_d.di_size - Added checking for a negative pathlen to the too-long pathlen test, and generalized the message that gets reported in that case to reflect the change As a result, if a negative pathlen were encountered, this function would return EFSCORRUPTED (and would fail an assertion for a debug build)--just as would a too-long pathlen. Signed-off-by: Alex Elder Signed-off-by: Carlos Maiolino Reviewed-by: Christoph Hellwig [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 575a0a98961a21c9b7c73a191dca1f9bd5d4e3c7 Author: Avi Kivity Date: Wed Apr 18 19:23:50 2012 +0300 KVM: ia64: fix build due to typo commit 8281715b4109b5ee26032ff7b77c0d575c4150f7 upstream. s/kcm/kvm/. Signed-off-by: Avi Kivity Signed-off-by: Marcelo Tosatti Signed-off-by: Willy Tarreau commit ac802e6ad84584366c71aa925d580383b1578409 Author: Avi Kivity Date: Mon Mar 5 14:23:29 2012 +0200 KVM: Ensure all vcpus are consistent with in-kernel irqchip settings commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e upstream If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman Signed-off-by: Avi Kivity [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 49ef5906f1fff5dfc01561e8355561a512ac9b85 Author: Marcelo Tosatti Date: Thu Oct 29 13:44:15 2009 -0200 KVM: x86: disallow multiple KVM_CREATE_IRQCHIP commit 3ddea128ad75bd33e88780fe44f44c3717369b98 upstream Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP. Also serialize multiple accesses with kvm->lock. Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity Signed-off-by: Willy Tarreau commit 6ace7773b07bb1949a54aad85e5c2f21b6c299d6 Author: Louis Rilling Date: Fri Dec 4 14:52:42 2009 +0100 block: Fix io_context leak after failure of clone with CLONE_IO commit b69f2292063d2caf37ca9aec7d63ded203701bf3 upstream With CLONE_IO, parent's io_context->nr_tasks is incremented, but never decremented whenever copy_process() fails afterwards, which prevents exit_io_context() from calling IO schedulers exit functions. Give a task_struct to exit_io_context(), and call exit_io_context() instead of put_io_context() in copy_process() cleanup path. Signed-off-by: Louis Rilling Signed-off-by: Jens Axboe Signed-off-by: Willy Tarreau commit 36cea38d1ed7fb6d602913824bb75ceb42c0b4c0 Author: Louis Rilling Date: Fri Dec 4 14:52:41 2009 +0100 block: Fix io_context leak after clone with CLONE_IO commit 61cc74fbb87af6aa551a06a370590c9bc07e29d9 upstream With CLONE_IO, copy_io() increments both ioc->refcount and ioc->nr_tasks. However exit_io_context() only decrements ioc->refcount if ioc->nr_tasks reaches 0. Always call put_io_context() in exit_io_context(). Signed-off-by: Louis Rilling Signed-off-by: Jens Axboe Signed-off-by: Willy Tarreau commit b9a6939d5c580a3dfbcf37c0d617c7e3f181cbf1 Author: Stephan Bärwolf Date: Thu Jan 12 16:43:04 2012 +0100 KVM: x86: fix missing checks in syscall emulation commit c2226fc9e87ba3da060e47333657cd6616652b84 upstream On hosts without this patch, 32bit guests will crash (and 64bit guests may behave in a wrong way) for example by simply executing following nasm-demo-application: [bits 32] global _start SECTION .text _start: syscall (I tested it with winxp and linux - both always crashed) Disassembly of section .text: 00000000 <_start>: 0: 0f 05 syscall The reason seems a missing "invalid opcode"-trap (int6) for the syscall opcode "0f05", which is not available on Intel CPUs within non-longmodes, as also on some AMD CPUs within legacy-mode. (depending on CPU vendor, MSR_EFER and cpuid) Because previous mentioned OSs may not engage corresponding syscall target-registers (STAR, LSTAR, CSTAR), they remain NULL and (non trapping) syscalls are leading to multiple faults and finally crashs. Depending on the architecture (AMD or Intel) pretended by guests, various checks according to vendor's documentation are implemented to overcome the current issue and behave like the CPUs physical counterparts. [mtosatti: cleanup/beautify code] [bwh: Backport to 2.6.32: - Add the prerequisite read of EFER - Return -1 in the error cases rather than invoking emulate_ud() directly - Adjust context] [dannf: fix build by passing x86_emulate_ops through each call] Signed-off-by: Stephan Baerwolf Signed-off-by: Marcelo Tosatti Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit a0c74fba63e4a6ba2c94192f5b79ac3a11cd70f1 Author: Stephan Bärwolf Date: Thu Jan 12 16:43:03 2012 +0100 KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid" commit bdb42f5afebe208eae90406959383856ae2caf2b upstream In order to be able to proceed checks on CPU-specific properties within the emulator, function "get_cpuid" is introduced. With "get_cpuid" it is possible to virtually call the guests "cpuid"-opcode without changing the VM's context. [mtosatti: cleanup/beautify code] [bwh: Backport to 2.6.32: - Don't use emul_to_vcpu - Adjust context] Signed-off-by: Stephan Baerwolf Signed-off-by: Marcelo Tosatti Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit db9faaa340c78618685038d36e368de738ed3d16 Author: Ben Hutchings Date: Sun Mar 20 06:48:05 2011 +0000 rose: Add length checks to CALL_REQUEST parsing commit e0bccd315db0c2f919e7fcf9cb60db21d9986f52 upstream Define some constant offsets for CALL_REQUEST based on the description at and the definition of ROSE as using 10-digit (5-byte) addresses. Use them consistently. Validate all implicit and explicit facilities lengths. Validate the address length byte rather than either trusting or assuming its value. Signed-off-by: Ben Hutchings Signed-off-by: David S. Miller [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 8949ae789eae5b9fba37c1addcde4d5bbd0a509d Author: Jan Kiszka Date: Wed Dec 14 19:25:13 2011 +0100 KVM: x86: Prevent starting PIT timers in the absence of irqchip support commit 0924ab2cfa98b1ece26c033d696651fd62896c69 upstream User space may create the PIT and forgets about setting up the irqchips. In that case, firing PIT IRQs will crash the host: BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 IP: [] kvm_set_irq+0x30/0x170 [kvm] ... Call Trace: [] pit_do_work+0x51/0xd0 [kvm] [] process_one_work+0x111/0x4d0 [] worker_thread+0x152/0x340 [] kthread+0x7e/0x90 [] kernel_thread_helper+0x4/0x10 Prevent this by checking the irqchip mode before starting a timer. We can't deny creating the PIT if the irqchips aren't set up yet as current user land expects this order to work. Signed-off-by: Jan Kiszka Signed-off-by: Marcelo Tosatti [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit e642da01ccb67c419a26b262a067db8e8bbdb04c Author: Alex Williamson Date: Tue Dec 20 21:59:09 2011 -0700 KVM: Device assignment permission checks commit 3d27e23b17010c668db311140b17bbbb70c78fb9 upstream Only allow KVM device assignment to attach to devices which: - Are not bridges - Have BAR resources (assume others are special devices) - The user has permissions to use Assigning a bridge is a configuration error, it's not supported, and typically doesn't result in the behavior the user is expecting anyway. Devices without BAR resources are typically chipset components that also don't have host drivers. We don't want users to hold such devices captive or cause system problems by fencing them off into an iommu domain. We determine "permission to use" by testing whether the user has access to the PCI sysfs resource files. By default a normal user will not have access to these files, so it provides a good indication that an administration agent has granted the user access to the device. [Yang Bai: add missing #include] [avi: fix comment style] Signed-off-by: Alex Williamson Signed-off-by: Yang Bai Signed-off-by: Marcelo Tosatti [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 4d092c1fe004f512a70dbe66efd50cd8cf15e66c Author: Alex Williamson Date: Tue Dec 20 21:59:03 2011 -0700 KVM: Remove ability to assign a device without iommu support commit 423873736b78f549fbfa2f715f2e4de7e6c5e1e9 upstream This option has no users and it exposes a security hole that we can allow devices to be assigned without iommu protection. Make KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option. Signed-off-by: Alex Williamson Signed-off-by: Marcelo Tosatti [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 46e8a56a531183b0f4ff5cb08d28c2660e60fab8 Author: Willy Tarreau Date: Sun Sep 30 18:22:21 2012 +0200 PNP: fix "work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB" Initial stable commit : 2215d91091c465fd58da7814d1c10e09ac2d8307 This patch backported into 2.6.32.55 is enabled when CONFIG_AMD_NB is set, but this config option does not exist in 2.6.32, it was called CONFIG_K8_NB, so the fix was never applied. Some other changes were needed to make it work. first, the correct include file name was asm/k8.h and not asm/amd_nb.h, and second, amd_get_mmconfig_range() is needed and was merged by previous patch. Thanks to Jiri Slabi who reported the issue and diagnosed all the dependencies. Signed-off-by: Willy Tarreau Cc: Jiri Slaby Cc: Bjorn Helgaas Cc: Jesse Barnes Signed-off-by: Willy Tarreau commit 08dd6464bb0da8386733d7121b67bc896c9f84ba Author: Bjorn Helgaas Date: Thu Jan 5 14:27:19 2012 -0700 x86/PCI: amd: factor out MMCONFIG discovery commit 24d25dbfa63c376323096660bfa9ad45a08870ce upstream. This factors out the AMD native MMCONFIG discovery so we can use it outside amd_bus.c. amd_bus.c reads AMD MSRs so it can remove the MMCONFIG area from the PCI resources. We may also need the MMCONFIG information to work around BIOS defects in the ACPI MCFG table. Cc: Borislav Petkov Cc: Yinghai Lu Cc: stable@kernel.org Signed-off-by: Bjorn Helgaas Signed-off-by: Jesse Barnes [WT: this patch was initially not planned for 2.6.32 but is required by commit 2215d910 merged into 2.6.32.55 and which relies on amd_get_mmconfig_range() ] Cc: Jiri Slaby Signed-off-by: Willy Tarreau commit 2c224ae6d1be39d99ea53fa9acd47bec111dbd68 Author: Mike Galbraith Date: Wed Jan 5 05:41:17 2011 +0100 sched: Fix signed unsigned comparison in check_preempt_tick() commit d7d8294415f0ce4254827d4a2a5ee88b00be52a8 upstream. Signed unsigned comparison may lead to superfluous resched if leftmost is right of the current task, wasting a few cycles, and inadvertently _lengthening_ the current task's slice. Reported-by: Venkatesh Pallipadi Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1294202477.9384.5.camel@marge.simson.net> Signed-off-by: Ingo Molnar Signed-off-by: Willy Tarreau commit 0b65cf58957e0b6ca52143430871077e24868c6c Author: tom.leiming@gmail.com Date: Thu Mar 22 03:22:38 2012 +0000 usbnet: don't clear urb->dev in tx_complete commit 5d5440a835710d09f0ef18da5000541ec98b537a upstream. URB unlinking is always racing with its completion and tx_complete may be called before or during running usb_unlink_urb, so tx_complete must not clear urb->dev since it will be used in unlink path, otherwise invalid memory accesses or usb device leak may be caused inside usb_unlink_urb. Cc: stable@kernel.org Cc: Alan Stern Cc: Oliver Neukum Signed-off-by: Ming Lei Acked-by: Greg Kroah-Hartman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit ca8a7e4fa73a4a97e46db2da62502152d6d32e5d Author: tom.leiming@gmail.com Date: Thu Mar 22 03:22:18 2012 +0000 usbnet: increase URB reference count before usb_unlink_urb commit 0956a8c20b23d429e79ff86d4325583fc06f9eb4 upstream. Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid recursive locking in usbnet_stop()) fixes the recursive locking problem by releasing the skb queue lock, but it makes usb_unlink_urb racing with defer_bh, and the URB to being unlinked may be freed before or during calling usb_unlink_urb, so use-after-free problem may be triggerd inside usb_unlink_urb. The patch fixes the use-after-free problem by increasing URB reference count with skb queue lock held before calling usb_unlink_urb, so the URB won't be freed until return from usb_unlink_urb. Cc: stable@kernel.org Cc: Sebastian Andrzej Siewior Cc: Alan Stern Cc: Oliver Neukum Reported-by: Dave Jones Signed-off-by: Ming Lei Acked-by: Greg Kroah-Hartman Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 51e8bde12aed35e57076f38579f47c7cc947f23f Author: Jiri Bohac Date: Tue Apr 19 02:09:55 2011 +0000 bonding: 802.3ad - fix agg_device_up commit 2430af8b7fa37ac0be102c77f9dc6ee669d24ba9 upstream. The slave member of struct aggregator does not necessarily point to a slave which is part of the aggregator. It points to the slave structure containing the aggregator structure, while completely different slaves (or no slaves at all) may be part of the aggregator. The agg_device_up() function wrongly uses agg->slave to find the state of the aggregator. Use agg->lag_ports->slave instead. The bug has been introduced by commit 4cd6fe1c6483cde93e2ec91f58b7af9c9eea51ad ("bonding: fix link down handling in 802.3ad mode"). Signed-off-by: Jiri Bohac Signed-off-by: Jay Vosburgh Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit a82ca695591da11eb8cb74d4cb57e8f476297932 Author: Xiaotian Feng Date: Thu Mar 3 18:08:24 2011 +0800 tty_audit: fix tty_audit_add_data live lock on audit disabled commit 00bff392c81e4fb1901e5160fdd5afdb2546a6ab upstream. The current tty_audit_add_data code: do { size_t run; run = N_TTY_BUF_SIZE - buf->valid; if (run > size) run = size; memcpy(buf->data + buf->valid, data, run); buf->valid += run; data += run; size -= run; if (buf->valid == N_TTY_BUF_SIZE) tty_audit_buf_push_current(buf); } while (size != 0); If the current buffer is full, kernel will then call tty_audit_buf_push_current to empty the buffer. But if we disabled audit at the same time, tty_audit_buf_push() returns immediately if audit_enabled is zero. Without emptying the buffer. With obvious effect on tty_audit_add_data() that ends up spinning in that loop, copying 0 bytes at each iteration and attempting to push each time without any effect. Holding the lock all along. Suggested-by: Alexander Viro Signed-off-by: Xiaotian Feng Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 2f42878e56a734563c5e2aadee6e334477500cdf Author: Junxiao Bi Date: Wed Aug 22 10:21:07 2012 +0800 oprofile: use KM_NMI slot for kmap_atomic If one kernel path is using KM_USER0 slot and is interrupted by the oprofile nmi, then in copy_from_user_nmi(), the KM_USER0 slot will be overwrite and cleared to zero at last, when the control return to the original kernel path, it will access an invalid virtual address and trigger a crash. Cc: Robert Richter Cc: Greg KH Cc: stable@vger.kernel.org Signed-off-by: Junxiao Bi [WT: According to Junxiao and Robert, this patch is needed for stable kernels which include a backport of a0e3e70243f5b270bc3eca718f0a9fa5e6b8262e without 3e4d3af501cccdc8a8cca41bdbe57d54ad7e7e73, but there is no exact equivalent in mainline] Signed-off-by: Willy Tarreau commit 012660d6b30072633c89033d2b3e18a9a861ae3d Author: Colin Ian King Date: Thu Apr 5 13:31:58 2012 -0600 eCryptfs: Clear ECRYPTFS_NEW_FILE flag during truncate BugLink: http://bugs.launchpad.net/bugs/745836 The ECRYPTFS_NEW_FILE crypt_stat flag is set upon creation of a new eCryptfs file. When the flag is set, eCryptfs reads directly from the lower filesystem when bringing a page up to date. This means that no offset translation (for the eCryptfs file metadata in the lower file) and no decryption is performed. The flag is cleared just before the first write is completed (at the beginning of ecryptfs_write_begin()). It was discovered that if a new file was created and then extended with truncate, the ECRYPTFS_NEW_FILE flag was not cleared. If pages corresponding to this file are ever reclaimed, any subsequent reads would result in userspace seeing eCryptfs file metadata and encrypted file contents instead of the expected decrypted file contents. Data corruption is possible if the file is written to before the eCryptfs directory is unmounted. The data written will be copied into pages which have been read directly from the lower file rather than zeroed pages, as would be expected after extending the file with truncate. This flag, and the functionality that used it, was removed in upstream kernels in 2.6.39 with the following commits: bd4f0fe8bb7c73c738e1e11bc90d6e2cf9c6e20e fed8859b3ab94274c986cbdf7d27130e0545f02c Signed-off-by: Tyler Hicks Signed-off-by: Colin Ian King Acked-by: Stefan Bader Acked-by: Andy Whitcroft Signed-off-by: Andy Whitcroft Signed-off-by: Willy Tarreau commit 4f4bd209e4bf5a0ceea939ce706b6867ebf5a2c8 Author: Colin Ian King Date: Thu Apr 5 13:31:56 2012 -0600 eCryptfs: Copy up lower inode attrs after setting lower xattr commit 545d680938be1e86a6c5250701ce9abaf360c495 upstream. After passing through a ->setxattr() call, eCryptfs needs to copy the inode attributes from the lower inode to the eCryptfs inode, as they may have changed in the lower filesystem's ->setxattr() path. One example is if an extended attribute containing a POSIX Access Control List is being set. The new ACL may cause the lower filesystem to modify the mode of the lower inode and the eCryptfs inode would need to be updated to reflect the new mode. https://launchpad.net/bugs/926292 Signed-off-by: Tyler Hicks Reported-by: Sebastien Bacher Cc: John Johansen Cc: Acked-by: Herton Ronaldo Krzesinski Acked-by: Andy Whitcroft Signed-off-by: Colin Ian King Signed-off-by: Andy Whitcroft Signed-off-by: Willy Tarreau commit dad11ff11f3c2e70e54e9d92a914694427cc5bb1 Author: Stuart Hayes Date: Sun Jul 22 20:27:23 2012 +0100 usb: Fix deadlock in hid_reset when Dell iDRAC is reset This was fixed upstream by commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c ('workqueue: implement concurrency managed dynamic worker pool'), but that is far too large a change for stable. When Dell iDRAC is reset, the iDRAC's USB keyboard/mouse device stops responding but is not actually disconnected. This causes usbhid to hid hid_io_error(), and you get a chain of calls like... hid_reset() usb_reset_device() usb_reset_and_verify_device() usb_ep0_reinit() usb_disble_endpoint() usb_hcd_disable_endpoint() ehci_endpoint_disable() Along the way, as a result of an error/timeout with a USB transaction, ehci_clear_tt_buffer() calls usb_hub_clear_tt_buffer() (to clear a failed transaction out of the transaction translator in the hub), which schedules hub_tt_work() to be run (using keventd), and then sets qh->clearing_tt=1 so that nobody will mess with that qh until the TT is cleared. But run_workqueue() never happens for the keventd workqueue on that CPU, so hub_tt_work() never gets run. And qh->clearing_tt never gets changed back to 0. This causes ehci_endpoint_disable() to get stuck in a loop waiting for qh->clearing_tt to go to 0. Part of the problem is hid_reset() is itself running on keventd. So when that thread gets a timeout trying to talk to the HID device, it schedules clear_work (to run hub_tt_work) to run, and then gets stuck in ehci_endpoint_disable waiting for it to run. However, clear_work never gets run because the workqueue for that CPU is still waiting for hid_reset to finish. A much less invasive patch for earlier kernels is to just schedule clear_work on khubd if the usb code needs to clear the TT and it sees that it is already running on keventd. Khubd isn't used by default because it can get blocked by device enumeration sometimes, but I think it should be ok for a backup for unusual cases like this just to prevent deadlock. Signed-off-by: Stuart Hayes Signed-off-by: Shyam Iyer [bwh: Use current_is_keventd() rather than checking current->{flags,comm}] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit 3ca9407c5c2c246ed54197568e9559a33219387a Author: Adam Jackson Date: Fri Apr 16 18:20:57 2010 -0400 drm/i915: Attempt to fix watermark setup on 85x (v2) commit 8f4695ed1c9e068772bcce4cd4ff03f88d57a008 upstream. IS_MOBILE() catches 85x, so we'd always try to use the 9xx FIFO sizing; since there's an explicit 85x version, this seems wrong. v2: Handle 830m correctly too. [jn: backport to 2.6.32.y to address https://bugzilla.kernel.org/show_bug.cgi?id=42839] Signed-off-by: Adam Jackson Signed-off-by: Eric Anholt Signed-off-by: Jonathan Nieder Signed-off-by: Willy Tarreau commit 772e2e7d6b86d4db9ab4be85bc939846f74b32f3 Author: Dan Williams Date: Wed Mar 3 11:47:43 2010 -0700 ioat2: kill pending flag commit 281befa5592b0c5f9a3856b5666c62ac66d3d9ee upstream. The pending == 2 case no longer exists in the driver so, we can use ioat2_ring_pending() outside the lock to determine if there might be any descriptors in the ring that the hardware has not seen. Signed-off-by: Dan Williams Backported-by: Mike Galbraith Signed-off-by: Willy Tarreau commit 0f467fcfbdfbad2a9b1f06dd15729fd22b461bf0 Author: John Stultz Date: Mon Sep 17 22:32:10 2012 -0400 time: Move ktime_t overflow checking into timespec_valid_strict This is a -stable backport of cee58483cf56e0ba355fdd97ff5e8925329aa936 Andreas Bombe reported that the added ktime_t overflow checking added to timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of timekeeping inputs") was causing problems with X.org because it caused timeouts larger then KTIME_T to be invalid. Previously, these large timeouts would be clamped to KTIME_MAX and would never expire, which is valid. This patch splits the ktime_t overflow checking into a new timespec_valid_strict function, and converts the timekeeping codes internal checking to use this more strict function. Reported-and-tested-by: Andreas Bombe Cc: Zhouping Liu Cc: Ingo Molnar Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: stable@vger.kernel.org Signed-off-by: John Stultz Signed-off-by: Linus Torvalds Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 61b76840ddee647c0c223365378c3f394355b7d7 Author: John Stultz Date: Mon Sep 17 22:32:09 2012 -0400 time: Avoid making adjustments if we haven't accumulated anything This is a -stable backport of bf2ac312195155511a0f79325515cbb61929898a If update_wall_time() is called and the current offset isn't large enough to accumulate, avoid re-calling timekeeping_adjust which may change the clock freq and can cause 1ns inconsistencies with CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE. Signed-off-by: John Stultz Cc: Prarit Bhargava Cc: Ingo Molnar Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 436b6be6cb5bbffb227b5a414e7a1c03fe941b43 Author: John Stultz Date: Mon Sep 17 22:32:08 2012 -0400 time: Improve sanity checking of timekeeping inputs This is a -stable backport of 4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b Unexpected behavior could occur if the time is set to a value large enough to overflow a 64bit ktime_t (which is something larger then the year 2262). Also unexpected behavior could occur if large negative offsets are injected via adjtimex. So this patch improves the sanity check timekeeping inputs by improving the timespec_valid() check, and then makes better use of timespec_valid() to make sure we don't set the time to an invalid negative value or one that overflows ktime_t. Note: This does not protect from setting the time close to overflowing ktime_t and then letting natural accumulation cause the overflow. Reported-by: CAI Qian Reported-by: Sasha Levin Signed-off-by: John Stultz Cc: Peter Zijlstra Cc: Prarit Bhargava Cc: Zhouping Liu Cc: Ingo Molnar Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit dfc4098e83abd567a4f4f17c2d20869cadcf9509 Author: Thomas Gleixner Date: Tue Jul 17 18:05:35 2012 -0400 timekeeping: Add missing update call in timekeeping_resume() This is a backport of 3e997130bd2e8c6f5aaa49d6e3161d4d29b43ab0 The leap second rework unearthed another issue of inconsistent data. On timekeeping_resume() the timekeeper data is updated, but nothing calls timekeeping_update(), so now the update code in the timer interrupt sees stale values. This has been the case before those changes, but then the timer interrupt was using stale data as well so this went unnoticed for quite some time. Add the missing update call, so all the data is consistent everywhere. Reported-by: Andreas Schwab Reported-and-tested-by: "Rafael J. Wysocki" Reported-and-tested-by: Martin Steigerwald Cc: LKML Cc: Linux PM list Cc: John Stultz Cc: Ingo Molnar Cc: Peter Zijlstra , Cc: Prarit Bhargava Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner Signed-off-by: John Stultz Signed-off-by: Linus Torvalds Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 17b7f7c636d41cefe84c7a3ac440d221a3171e1c Author: John Stultz Date: Tue Jul 17 18:05:34 2012 -0400 hrtimer: Update hrtimer base offsets each hrtimer_interrupt This is a backport of 5baefd6d84163443215f4a99f6a20f054ef11236 The update of the hrtimer base offsets on all cpus cannot be made atomically from the timekeeper.lock held and interrupt disabled region as smp function calls are not allowed there. clock_was_set(), which enforces the update on all cpus, is called either from preemptible process context in case of do_settimeofday() or from the softirq context when the offset modification happened in the timer interrupt itself due to a leap second. In both cases there is a race window for an hrtimer interrupt between dropping timekeeper lock, enabling interrupts and clock_was_set() issuing the updates. Any interrupt which arrives in that window will see the new time but operate on stale offsets. So we need to make sure that an hrtimer interrupt always sees a consistent state of time and offsets. ktime_get_update_offsets() allows us to get the current monotonic time and update the per cpu hrtimer base offsets from hrtimer_interrupt() to capture a consistent state of monotonic time and the offsets. The function replaces the existing ktime_get() calls in hrtimer_interrupt(). The overhead of the new function vs. ktime_get() is minimal as it just adds two store operations. This ensures that any changes to realtime or boottime offsets are noticed and stored into the per-cpu hrtimer base structures, prior to any hrtimer expiration and guarantees that timers are not expired early. Signed-off-by: John Stultz Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Prarit Bhargava Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 22539a28b0a97e84f74f716fe42d61dd07612d2d Author: Thomas Gleixner Date: Tue Jul 17 18:05:33 2012 -0400 timekeeping: Provide hrtimer update function This is a backport of f6c06abfb3972ad4914cef57d8348fcb2932bc3b To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Prarit Bhargava Cc: stable@vger.kernel.org Signed-off-by: John Stultz Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 5a82cdee505bb35403878c0946b521a37350a365 Author: Thomas Gleixner Date: Tue Jul 17 18:05:32 2012 -0400 hrtimers: Move lock held region in hrtimer_interrupt() This is a backport of 196951e91262fccda81147d2bcf7fdab08668b40 We need to update the base offsets from this code and we need to do that under base->lock. Move the lock held region around the ktime_get() calls. The ktime_get() calls are going to be replaced with a function which gets the time and the offsets atomically. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Prarit Bhargava Cc: stable@vger.kernel.org Signed-off-by: John Stultz Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit c2358cf4165eeac6eacd6017c4419b33cacb0418 Author: Thomas Gleixner Date: Tue Jul 17 18:05:31 2012 -0400 timekeeping: Maintain ktime_t based offsets for hrtimers This is a backport of 5b9fe759a678e05be4937ddf03d50e950207c1c0 We need to update the hrtimer clock offsets from the hrtimer interrupt context. To avoid conversions from timespec to ktime_t maintain a ktime_t based representation of those offsets in the timekeeper. This puts the conversion overhead into the code which updates the underlying offsets and provides fast accessible values in the hrtimer interrupt. Signed-off-by: Thomas Gleixner Signed-off-by: John Stultz Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Prarit Bhargava Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit d155586fe34db9fc8d247a681c6524c7d75dd863 Author: John Stultz Date: Tue Jul 17 18:05:30 2012 -0400 timekeeping: Fix leapsecond triggered load spike issue This is a backport of 4873fa070ae84a4115f0b3c9dfabc224f1bc7c51 The timekeeping code misses an update of the hrtimer subsystem after a leap second happened. Due to that timers based on CLOCK_REALTIME are either expiring a second early or late depending on whether a leap second has been inserted or deleted until an operation is initiated which causes that update. Unless the update happens by some other means this discrepancy between the timekeeping and the hrtimer data stays forever and timers are expired either early or late. The reported immediate workaround - $ data -s "`date`" - is causing a call to clock_was_set() which updates the hrtimer data structures. See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix Add the missing clock_was_set() call to update_wall_time() in case of a leap second event. The actual update is deferred to softirq context as the necessary smp function call cannot be invoked from hard interrupt context. Signed-off-by: John Stultz Reported-by: Jan Engelhardt Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Prarit Bhargava Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 40821c765eb339c04407c7382c9edbb8a7d340e3 Author: John Stultz Date: Tue Jul 17 18:05:29 2012 -0400 hrtimer: Provide clock_was_set_delayed() This is a backport of f55a6faa384304c89cfef162768e88374d3312cb clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: John Stultz Reported-by: Jan Engelhardt Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Prarit Bhargava Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 9ec72638e59fca1951a65ce68fcf693f7c98ecbf Author: Thomas Gleixner Date: Tue Jul 17 18:05:28 2012 -0400 time: Move common updates to a function This is a backport of cc06268c6a87db156af2daed6e96a936b955cc82 While not a bugfix itself, it allows following fixes to backport in a more straightforward manner. CC: Thomas Gleixner CC: Eric Dumazet CC: Richard Cochran Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit af54f209dc49a5251f7b22646190d7193356bf02 Author: John Stultz Date: Tue Jul 17 18:05:27 2012 -0400 timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond This is a backport of fad0c66c4bb836d57a5f125ecd38bed653ca863a which resolves a bug the previous commit. Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC. Adjust wall_to_monotonic when NTP inserted a leapsecond. Reported-by: Richard Cochran Signed-off-by: John Stultz Tested-by: Richard Cochran Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 285a7953c6c404f68af341914e90038442ebe289 Author: Richard Cochran Date: Tue Jul 17 18:05:26 2012 -0400 ntp: Correct TAI offset during leap second This is a backport of dd48d708ff3e917f6d6b6c2b696c3f18c019feed When repeating a UTC time value during a leap second (when the UTC time should be 23:59:60), the TAI timescale should not stop. The kernel NTP code increments the TAI offset one second too late. This patch fixes the issue by incrementing the offset during the leap second itself. Signed-off-by: Richard Cochran Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit 6a29b109b9fb4c76bf55851223ae59e3322d9616 Author: John Stultz Date: Tue Jul 17 18:05:25 2012 -0400 ntp: Fix leap-second hrtimer livelock This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d This should have been backported when it was commited, but I mistook the problem as requiring the ntp_lock changes that landed in 3.4 in order for it to occur. Unfortunately the same issue can happen (with only one cpu) as follows: do_adjtimex() write_seqlock_irq(&xtime_lock); process_adjtimex_modes() process_adj_status() ntp_start_leap_timer() hrtimer_start() hrtimer_reprogram() tick_program_event() clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); [DEADLOCK] This deadlock will no always occur, as it requires the leap_timer to force a hrtimer_reprogram which only happens if its set and there's no sooner timer to expire. NOTE: This patch, being faithful to the original commit, introduces a bug (we don't update wall_to_monotonic), which will be resovled by backporting a following fix. Original commit message below: Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp subsystem has used an hrtimer for triggering the leapsecond adjustment. However, this can cause a potential livelock. Thomas diagnosed this as the following pattern: CPU 0 CPU 1 do_adjtimex() spin_lock_irq(&ntp_lock); process_adjtimex_modes(); timer_interrupt() process_adj_status(); do_timer() ntp_start_leap_timer(); write_lock(&xtime_lock); hrtimer_start(); update_wall_time(); hrtimer_reprogram(); ntp_tick_length() tick_program_event() spin_lock(&ntp_lock); clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); This patch tries to avoid the problem by reverting back to not using an hrtimer to inject leapseconds, and instead we handle the leapsecond processing in the second_overflow() function. The downside to this change is that on systems that support highres timers, the leap second processing will occur on a HZ tick boundary, (ie: ~1-10ms, depending on HZ) after the leap second instead of possibly sooner (~34us in my tests w/ x86_64 lapic). This patch applies on top of tip/timers/core. CC: Sasha Levin CC: Thomas Gleixner Reported-by: Sasha Levin Diagnoised-by: Thomas Gleixner Tested-by: Sasha Levin Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit a5bd57818e76cc60638d2d49cfe32ca0bc2b4db2 Author: Hugh Dickins Date: Sat Dec 31 11:44:01 2011 -0800 futex: Fix uninterruptible loop due to gate_area commit e6780f7243eddb133cc20ec37fa69317c218b709 upstream. It was found (by Sasha) that if you use a futex located in the gate area we get stuck in an uninterruptible infinite loop, much like the ZERO_PAGE issue. While looking at this problem, PeterZ realized you'll get into similar trouble when hitting any install_special_pages() mapping. And are there still drivers setting up their own special mmaps without page->mapping, and without special VM or pte flags to make get_user_pages fail? In most cases, if page->mapping is NULL, we do not need to retry at all: Linus points out that even /proc/sys/vm/drop_caches poses no problem, because it ends up using remove_mapping(), which takes care not to interfere when the page reference count is raised. But there is still one case which does need a retry: if memory pressure called shmem_writepage in between get_user_pages_fast dropping page table lock and our acquiring page lock, then the page gets switched from filecache to swapcache (and ->mapping set to NULL) whatever the refcount. Fault it back in to get the page->mapping needed for key->shared.inode. Reported-by: Sasha Levin Signed-off-by: Hugh Dickins Signed-off-by: Linus Torvalds [PG: 2.6.34 variable is page, not page_head, since it doesn't have a5b338f2] Signed-off-by: Paul Gortmaker Signed-off-by: Willy Tarreau commit f4bced840035548d5be51d15d62b26bdd9f0982c Author: Philipp Hahn Date: Mon Apr 23 11:07:53 2012 +0200 fix pgd_lock deadlock commit a79e53d85683c6dd9f99c90511028adc2043031f upstream. On Wednesday 16 February 2011 15:49:47 Andrea Arcangeli wrote: > Subject: fix pgd_lock deadlock > > From: Andrea Arcangeli > > It's forbidden to take the page_table_lock with the irq disabled or if > there's contention the IPIs (for tlb flushes) sent with the page_table_lock > held will never run leading to a deadlock. > > Apparently nobody takes the pgd_lock from irq so the _irqsave can be > removed. > > Signed-off-by: Andrea Arcangeli This patch (original commit Id for 2.6.38 a79e53d85683c6dd9f99c90511028adc2043031f) needs to be back-ported to 2.6.32.x as well. I observed a dead-lock problem when running a PAE enabled Debian 2.6.32.46+ kernel with 6 VCPUs as a KVM on (2.6.32, 3.2, 3.3) kernel, which showed the following behaviour: 1 VCPU is stuck in pgd_alloc() =E2=86=92 pgd_prepopulate_pmb() =E2=86=92... =E2=86=92 flush_tlb_others_ipi() while (!cpumask_empty(to_cpumask(f->flush_cpumask))) cpu_relax(); (gdb) print f->flush_cpumask $5 = {1} while all other VCPUs are stuck in pgd_alloc() =E2=86=92 spin_lock_irqsave(pgd_lock) I tracked it down to the commit 2.6.39-rc1: 4981d01eada5354d81c8929d5b2836829ba3df7b 2.6.32.34: ba456fd7ec1bdc31a4ad4a6bd02802dcaa730a33 x86: Flush TLB if PGD entry is changed in i386 PAE mode which when reverted made the bug disappear. Comparing 3.2 to 2.6.32.34 showed that the 'pgd-deadlock'-patch went into 2.6.38, that is before the 'PAE correctness'-patch, so the problem was probably never observed in the main development branch. But for 2.6.32 the 'pgd-deadlock' patch is still missing, so the 'PAE corretness'-patch made the problem worse with 2.6.32. The Patch was also back-ported to the OpenSUSE Kernel , Since the patch didn't apply cleanly on the current Debian kernel, I had to backport it for us and Debian. The patch is also available from our (German) Bugzilla or from the Debian BTS at . I have no easy test case, but running multiple parallel builds inside the VM normally triggers the bug within seconds to minutes. With the patch applied the VM survived a night building packages without any problem. Signed-off-by: Philipp Hahn Sincerely Philipp - Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH be open. fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: Andrea Arcangeli Acked-by: Rik van Riel Tested-by: Konrad Rzeszutek Wilk Signed-off-by: Andrew Morton Cc: Peter Zijlstra Cc: Linus Torvalds Cc: LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar Git-commit: a79e53d85683c6dd9f99c90511028adc2043031f Signed-off-by: Willy Tarreau commit 7cfe055fa0b70ee00220673e1b070a28c213fc3b Author: Eric Sandeen Date: Mon Feb 20 17:53:01 2012 -0500 jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer commit 15291164b22a357cb211b618adfef4fa82fc0de3 upstream. journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head state ala discard_buffer(), but does not touch _Delay or _Unwritten as discard_buffer() does. This can be problematic in some areas of the ext4 code which assume that if they have found a buffer marked unwritten or delay, then it's a live one. Perhaps those spots should check whether it is mapped as well, but if jbd2 is going to tear down a buffer, let's really tear it down completely. Without this I get some fsx failures on sub-page-block filesystems up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go away, because buried within that large change is some more flag clearing. I still think it's worth doing in jbd2, since ->invalidatepage leads here directly, and it's the right place to clear away these flags. Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" Cc: stable@vger.kernel.org BugLink: http://bugs.launchpad.net/bugs/929781 CVE-2011-4086 Signed-off-by: Stefan Bader Signed-off-by: Willy Tarreau commit b87881d8569d365959593ec70204a111f5f93bfa Author: Bing Zhao Date: Wed Mar 28 20:19:57 2012 -0700 Bluetooth: btusb: fix bInterval for high/super speed isochronous endpoints commit fa0fb93f2ac308a76fa64eb57c18511dadf97089 upstream For high-speed/super-speed isochronous endpoints, the bInterval value is used as exponent, 2^(bInterval-1). Luckily we have usb_fill_int_urb() function that handles it correctly. So we just call this function to fill in the RX URB. Cc: Marcel Holtmann Signed-off-by: Bing Zhao Acked-by: Marcel Holtmann Signed-off-by: Gustavo F. Padovan Signed-off-by: Willy Tarreau commit bff4680d211a55715bd18920a457d65488ec1d13 Author: Benjamin Herrenschmidt Date: Wed Mar 21 09:27:40 2012 +0800 powerpc/pmac: Fix SMP kernels on pre-core99 UP machines commit 78c5c68a4cf4329d17abfa469345ddf323d4fd62 upstream. The code for "powersurge" SMP would kick in and cause a crash at boot due to the lack of a NULL test. Adam Conrad reports that the 3.2 kernel, with CONFIG_SMP=y, will not boot on an OldWorld G3; we're unconditionally writing to psurge_start, but this is only set on powersurge machines. Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Jeremy Kerr Reported-by: Adam Conrad Tested-by: Adam Conrad Signed-off-by: Willy Tarreau commit 99612d20389996699322341dc3df6b605cdfddc8 Author: David Miller Date: Thu Mar 15 20:14:29 2012 -0700 Fix sparc build with newer tools. commit e0adb9902fb338a9fe634c3c2a3e474075c733ba upstream. Newer version of binutils are more strict about specifying the correct options to enable certain classes of instructions. The sparc32 build is done for v7 in order to support sun4c systems which lack hardware integer multiply and divide instructions. So we have to pass -Av8 when building the assembler routines that use these instructions and get patched into the kernel when we find out that we have a v8 capable cpu. Reported-by: Paul Gortmaker Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit ef1ff0d009281b17953262e0c53e5ae9ca03789e Author: Sony Chacko Date: Tue Mar 15 14:54:55 2011 -0700 netxen: support for GbE port settings commit bfd823bd74333615783d8108889814c6d82f2ab0 upstream. o Enable setting speed and auto negotiation parameters for GbE ports. o Hardware do not support half duplex setting currently. David Miller: Amit please update your patch to silently reject link setting attempts that are unsupported by the device. [jn: backported for 2.6.32.y by Ana Guerrero] Signed-off-by: Sony Chacko Signed-off-by: Amit Kumar Salecha Signed-off-by: David S. Miller Tested-by: Ana Guerrero # HP NC375i Signed-off-by: Jonathan Nieder Signed-off-by: Willy Tarreau Acked-by: Sony Chacko Signed-off-by: Willy Tarreau