aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2006-12-07[PATCH] taskstats: cleanup reply assemblingOleg Nesterov1-22/+16
Thomas Graf wrote: > > nla_nest_start() may return NULL, either rely on prepare_reply() to be > correct and BUG() on failure or do proper error handling for all > functions. nla_put() in taskstat.c can fail only if the 'size' argument of alloc_skb() was not right. This is a kernel bug, we should not hide it. So add 'BUG()' on error path and check for 'na == NULL'. > genlmsg_cancel() is only required in error paths for dumping > procedures. So we can remove 'genlmsg_cancel()' calls and 'void *reply' (saves 227 bytes). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Graf <tgraf@suug.ch> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] taskstats: use nla_reserve() for reply assemblingOleg Nesterov1-42/+44
Currently taskstats_user_cmd()/taskstats_exit() do: 1) allocate stats 2) fill stats 3) make a temporary copy on stack (236 bytes) 4) copy that copy to skb 5) free stats With the help of nla_reserve() we can operate on skb->data directly, thus avoiding all these steps except 2). So, before this patch: // copy *stats to skb->data int mk_reply(skb, ..., struct taskstats *stats); fill_pid(stats); mk_reply(skb, ..., stats); After: // return a pointer to skb->data struct taskstats *mk_reply(skb, ...); stat = mk_reply(skb, ...); fill_pid(stats); Shrinks taskatsks.o by 162 bytes. A stupid benchmark (send one million TASKSTATS_CMD_ATTR_PID) shows the real user sys before: 4.02 0.06 3.96 4.02 0.04 3.98 4.02 0.04 3.97 after: 3.86 0.08 3.78 3.88 0.10 3.77 3.89 0.09 3.80 but this looks suspiciously good. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] taskstats: factor out reply assemblingOleg Nesterov1-27/+28
Introduce mk_reply() helper which does all nla_put()s on reply. Saves 453 bytes and a preparation for the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] taskstats: cleanup ->signal->stats allocationOleg Nesterov2-2/+25
Allocate ->signal->stats on demand in taskstats_exit(), this allows us to remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's public interface. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] taskstats: cleanup do_exit() pathOleg Nesterov2-32/+17
do_exit: taskstats_exit_alloc() ... taskstats_exit_send() taskstats_exit_free() I think this is not good, let it be a single function exported to the core kernel, taskstats_exit(), which does alloc + send + free itself. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] taskstats_exit_alloc: optimize/simplifyOleg Nesterov1-13/+4
If there are no listeners, every task does unneeded kmem_cache alloc/free on exit. We don't need listeners->sem for 'if (!list_empty())' check. Yes, we may have a false positive, but this doesn't differ from the case when the listener is unregistered after we drop the semaphore. So we don't need to do allocation beforehand. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] profile: fix uaccess handlingHeiko Carstens1-1/+2
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exitRoland McGrath1-1/+10
The CLONE_CHILD_CLEARTID flag is used by NPTL to have its threads communicate via memory/futex when they exit, so pthread_join can synchronize using a simple futex wait. The word of user memory where NPTL stores a thread's own TID is what it passes; this gets reset to zero at thread exit. It is not desireable to touch this user memory when threads are dying due to a fatal signal. A core dump is more usefully representative of the dying program state if the threads live at the time of the crash have their NPTL data structures unperturbed. The userland expectation of CLONE_CHILD_CLEARTID has only ever been that it works for a thread making an _exit system call. This problem was identified by Ernie Petrides <petrides@redhat.com>. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ernie Petrides <petrides@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] lockdep: misc fixes in lockdep.cJarek Poplawski1-3/+6
- numeric string size replaced with constant in print_lock_name and print_lockdep_cache, - return on null pointer in print_lock_dependencies, - one more lockdep return with 0 with unlocking fix in mark_lock. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] lockdep: internal locking fixesJarek Poplawski1-10/+15
Here are mainly some lockdep returns with 0 with unlocking fixes. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] cpuset: minor code refinementsPaul Jackson1-5/+4
A couple of minor code simplifications to the kernel/cpuset.c code. No functional change. Just a little less code and a little more readable. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] Export pm_suspend for the shared APM emulationRalf Baechle1-1/+2
The new shared APM emulation just like its ARM and MIPS predecessors uses pm_suspend() which was only exported on SH. Move export to close to it's definition where it really should be anyway. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] SysRq-X: show blocked tasksIngo Molnar1-3/+8
Add SysRq-X support: show blocked (TASK_UNINTERRUPTIBLE) tasks only. Useful for debugging IO stalls. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] /proc/kallsyms reports lower-case types for some non-exported symbolsAdam B. Jerome1-8/+5
This patch addresses incorrect symbol type information reported through /proc/kallsyms. A lowercase character should designate the symbol as local (or non-exported). An uppercase character should designate the symbol as global (or external). Without this patch, some non-exported symbols are incorrectly assigned an upper-case designation in /proc/kallsyms. This patch corrects this condition by converting non-exported symbols types to lower case when appropriate and eliminates the superfluous upcase_if_global function Signed-off-by: Adam B. Jerome <abj@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] lockdep: annotate nfs/nfsd in-kernel socketsPeter Zijlstra1-0/+1
Stick NFS sockets in their own class to avoid some lockdep warnings. NFS sockets are never exposed to user-space, and will hence not trigger certain code paths that would otherwise pose deadlock scenarios. [akpm@osdl.org: cleanups] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Dickson <SteveD@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Neil Brown <neilb@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> [ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] Support for freezeable workqueuesRafael J. Wysocki1-6/+14
Make it possible to create a workqueue the worker thread of which will be frozen during suspend, along with other kernel threads. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: kill write-only variablePavel Machek1-3/+1
Cleanup write-only variable, suggested by D Binderman. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] PM: Fix swsusp debug mode testprocRafael J. Wysocki1-1/+1
The 'testproc' swsusp debug mode thaws tasks twice in a row, which is _very_ confusing. Fix that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: Fix labelsRafael J. Wysocki5-11/+11
Move all labels in the swsusp code to the second column, so that they won't fool diff -p. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: Fix coding style in suspend.cRafael J. Wysocki1-1/+2
Fix coding style in suspend.c. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: Untangle freeze_processesRafael J. Wysocki1-30/+54
Move the loop from freeze_processes() to a separate function and call it independently for user space processes and kernel threads so that the order of freezing tasks is clearly visible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: Untangle thaw_processesRafael J. Wysocki1-22/+27
Move the loop from thaw_processes() to a separate function and call it independently for kernel threads and user space processes so that the order of thawing tasks is clearly visible. Drop thaw_kernel_threads() which is never used. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] convert pm_sem to a mutexStephen Hemminger4-26/+28
The power management semaphore is only used as mutex, so convert it. [akpm@osdl.org: fix rotten bug] Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] suspend to disk fails if gdb is suspended with a traced childRafael J. Wysocki1-1/+3
Fix http://bugzilla.kernel.org/show_bug.cgi?id=7534 Fix the freezing of processes so that it won't fail if there is a traced process the parent of which has been stopped. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: maurice barnum <pixi+kbug@burble.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: Measure memory shrinking timeRafael J. Wysocki3-22/+39
Make swsusp measure and print the time needed to shrink memory during the suspend. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] suspend: don't change cpus_allowed for task initiating the suspendSiddha, Suresh B1-5/+1
Don't modify the cpus_allowed of the task initiating the suspend. _cpu_down() already makes sure that the task doing the suspend doesn't run on dying cpu. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: Support i386 systems with PAE or without PSERafael J. Wysocki1-1/+1
Make swsusp support i386 systems with PAE or without PSE. This is done by creating temporary page tables located in resume-safe page frames before the suspend image is restored in the same way as x86_64 does it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@redhat.com> Cc: Nigel Cunningham <ncunningham@linuxmail.org> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: thaw userspace and kernel space separatelyNigel Cunningham1-7/+18
Modify process thawing so that we can thaw kernel space without thawing userspace, and thaw kernelspace first. This will be useful in later patches, where I intend to get swsusp thawing kernel threads only before seeking to free memory. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: clean up whitespace in freezer outputNigel Cunningham1-8/+8
Minor whitespace and formatting modifications for the freezer. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: quieten Freezer if !CONFIG_PM_DEBUGNigel Cunningham1-1/+0
The freezer currently prints an '=' for every process that is frozen. This is pretty pointless, as the equals sign says nothing about which process is frozen, and makes logs look messier (especially if there were a large number of processes running). All we really need to know is that we started trying to freeze processes and what processes (if any) failed to freeze, or that we succeeded. Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] Add include/linux/freezer.h and move definitions from sched.hNigel Cunningham8-1/+8
Move process freezing functions from include/linux/sched.h to freezer.h, so that modifications to the freezer or the kernel configuration don't require recompiling just about everything. [akpm@osdl.org: fix ueagle driver] Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: fix platform modeStefan Seyfried1-0/+22
At some point after 2.6.13, in-kernel software suspend got "incomplete" for the so-called "platform" mode. pm_ops->prepare() is never called. A visible sign of this is the "moon" light on thinkpads not flashing during suspend. Fix by readding the pm_ops->prepare call during suspend. Signed-off-by: Stefan Seyfried <seife@suse.de> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: use __GFP_WAITRafael J. Wysocki1-3/+3
swsusp uses GFP_ATOMIC, but it can afford to use __GFP_WAIT, which will permit it to reclaim clean pagecache instead of emitting scary page-allocation-failure messages. Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: Improve handling of highmemRafael J. Wysocki5-239/+671
Currently swsusp saves the contents of highmem pages by copying them to the normal zone which is quite inefficient (eg. it requires two normal pages to be used for saving one highmem page). This may be improved by using highmem for saving the contents of saveable highmem pages. Namely, during the suspend phase of the suspend-resume cycle we try to allocate as many free highmem pages as there are saveable highmem pages. If there are not enough highmem image pages to store the contents of all of the saveable highmem pages, some of them will be stored in the "normal" memory. Next, we allocate as many free "normal" pages as needed to store the (remaining) image data. We use a memory bitmap to mark the allocated free pages (ie. highmem as well as "normal" image pages). Now, we use another memory bitmap to mark all of the saveable pages (highmem as well as "normal") and the contents of the saveable pages are copied into the image pages. Then, the second bitmap is used to save the pfns corresponding to the saveable pages and the first one is used to save their data. During the resume phase the pfns of the pages that were saveable during the suspend are loaded from the image and used to mark the "unsafe" page frames. Next, we try to allocate as many free highmem page frames as to load all of the image data that had been in the highmem before the suspend and we allocate so many free "normal" page frames that the total number of allocated free pages (highmem and "normal") is equal to the size of the image. While doing this we have to make sure that there will be some extra free "normal" and "safe" page frames for two lists of PBEs constructed later. Now, the image data are loaded, if possible, into their "original" page frames. The image data that cannot be written into their "original" page frames are loaded into "safe" page frames and their "original" kernel virtual addresses, as well as the addresses of the "safe" pages containing their copies, are stored in one of two lists of PBEs. One list of PBEs is for the copies of "normal" suspend pages (ie. "normal" pages that were saveable during the suspend) and it is used in the same way as previously (ie. by the architecture-dependent parts of swsusp). The other list of PBEs is for the copies of highmem suspend pages. The pages in this list are restored (in a reversible way) right before the arch-dependent code is called. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: add ioctl for swap files supportRafael J. Wysocki2-1/+44
To be able to use swap files as suspend storage from the userland suspend tools we need an additional ioctl() that will allow us to provide the kernel with both the swap header's offset and the identification of the resume partition. The new ioctl() should be regarded as a replacement for the SNAPSHOT_SET_SWAP_FILE ioctl() that from now on will be considered as obsolete, but has to stay for backwards compatibility of the interface. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: add resume_offset command line parameterRafael J. Wysocki3-5/+26
Add the kernel command line parameter "resume_offset=" allowing us to specify the offset, in <PAGE_SIZE> units, from the beginning of the partition pointed to by the "resume=" parameter at which the swap header is located. This offset can be determined, for example, by an application using the FIBMAP ioctl to obtain the swap header's block number for given file. [akpm@osdl.org: we don't know what type sector_t is] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: use block device offsets to identify swap locationsRafael J. Wysocki4-72/+80
Make swsusp use block device offsets instead of swap offsets to identify swap locations and make it use the same code paths for writing as well as for reading data. This allows us to use the same code for handling swap files and swap partitions and to simplify the code, eg. by dropping rw_swap_page_sync(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: rearrange swap-handling codeRafael J. Wysocki1-109/+112
Rearrange the code in kernel/power/swap.c so that the next patch is more readable. [This patch only moves the existing code.] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] swsusp: use partition device and offset to identify swap areasRafael J. Wysocki2-3/+4
The Linux kernel handles swap files almost in the same way as it handles swap partitions and there are only two differences between these two types of swap areas: (1) swap files need not be contiguous, (2) the header of a swap file is not in the first block of the partition that holds it. From the swsusp's point of view (1) is not a problem, because it is already taken care of by the swap-handling code, but (2) has to be taken into consideration. In principle the location of a swap file's header may be determined with the help of appropriate filesystem driver. Unfortunately, however, it requires the filesystem holding the swap file to be mounted, and if this filesystem is journaled, it cannot be mounted during a resume from disk. For this reason we need some other means by which swap areas can be identified. For example, to identify a swap area we can use the partition that holds the area and the offset from the beginning of this partition at which the swap header is located. The following patch allows swsusp to identify swap areas this way. It changes swap_type_of() so that it takes an additional argument representing an offset of the swap header within the partition represented by its first argument. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] uswsusp: add pmops->{prepare,enter,finish} support (aka "platform mode")Stefan Seyfried2-1/+34
Add an ioctl to the userspace swsusp code that enables the usage of the pmops->prepare, pmops->enter and pmops->finish methods (the in-kernel suspend knows these as "platform method"). These are needed on many machines to (among others) speed up resuming by letting the BIOS skip some steps or let my hp nx5000 recognise the correct ac_adapter state after resume again. It also ensures on many machines, that changed hardware (unplugged AC adapters) gets correctly detected and that kacpid does not run wild after resume. Signed-off-by: Stefan Seyfried <seife@suse.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter7-14/+14
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] slab: remove SLAB_KERNELChristoph Lameter4-6/+6
SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] mm: pagefault_{disable,enable}()Peter Zijlstra1-14/+14
Introduce pagefault_{disable,enable}() and use these where previously we did manual preempt increments/decrements to make the pagefault handler do the atomic thing. Currently they still rely on the increased preempt count, but do not rely on the disabled preemption, this might go away in the future. (NOTE: the extra barrier() in pagefault_disable might fix some holes on machines which have too many registers for their own good) [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] new scheme to preempt swap tokenAshwin Chaugule2-11/+8
The new swap token patches replace the current token traversal algo. The old algo had a crude timeout parameter that was used to handover the token from one task to another. This algo, transfers the token to the tasks that are in need of the token. The urgency for the token is based on the number of times a task is required to swap-in pages. Accordingly, the priority of a task is incremented if it has been badly affected due to swap-outs. To ensure that the token doesnt bounce around rapidly, the token holders are given a priority boost. The priority of tasks is also decremented, if their rate of swap-in's keeps reducing. This way, the condition to check whether to pre-empt the swap token, is a matter of comparing two task's priority fields. [akpm@osdl.org: cleanups] Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-05Merge branch 'master' of ↵David Howells3-1/+16
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05Merge branch 'master' of ↵David Howells8-32/+63
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-04[PATCH] severing skbuff.h -> highmem.hAl Viro1-0/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04[PATCH] severing module.h->sched.hAl Viro2-1/+15
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-02[GENL]: Add genlmsg_put_reply() to simplify building reply headersThomas Graf1-6/+2
By modyfing genlmsg_put() to take a genl_family and by adding genlmsg_put_reply() the process of constructing the netlink and generic netlink headers is simplified. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[GENL]: Add genlmsg_new() to allocate generic netlink messagesThomas Graf1-1/+1
Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NETLINK]: Do precise netlink message allocations where possibleThomas Graf1-2/+1
Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-01Driver core: show drivers in /sys/module/Kay Sievers1-6/+25
Show the drivers, which belong to the module: $ ls -l /sys/module/usbcore/drivers/ hub -> ../../../bus/usb/drivers/hub usb -> ../../../bus/usb/drivers/usb usbfs -> ../../../bus/usb/drivers/usbfs Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-11-28Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds1-4/+5
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: [PATCH] x86-64: Use stricter in process stack check for unwinder [PATCH] i386: Fix compilation with UP genericarch [PATCH] x86-64: Fix warning in io_apic.c [PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
2006-11-28[PATCH] fix create_write_pipe() error checkAkinobu Mita1-4/+4
The return value of create_write_pipe()/create_read_pipe() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-28[PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwindJan Beulich1-4/+5
This fixes a problem with gcc4 mis-compiling the stack unwind code under -Os, which resulted in 'stuck' messages whenever an assembly routine was encountered. (The second hunk is trivial cleanup.) Signed-off-by: Jan Beulich <jbeulich@novell.com>
2006-11-25[PATCH] lockdep: spin_lock_irqsave_nested()Arjan van de Ven1-0/+21
Introduce spin_lock_irqsave_nested(); implementation from: http://lkml.org/lkml/2006/6/1/122 Patch from: http://lkml.org/lkml/2006/9/13/258 [akpm@osdl.org: two compile fixes] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jiri Kosina <jikos@jikos.cz> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-25[PATCH] fix copy_process() error checkAkinobu Mita1-3/+2
The return value of copy_process() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-22Don't call "note_interrupt()" with irq descriptor lock heldLinus Torvalds2-7/+3
This reverts commit f72fa707604c015a6625e80f269506032d5430dc, and solves the problem that it tried to fix by simply making "__do_IRQ()" call the note_interrupt() function without the lock held, the way everybody else does. It should be noted that all interrupt handling code must never allow the descriptor actors to be entered "recursively" (that's why we do all the magic IRQ_PENDING stuff in the first place), so there actually is exclusion at that much higher level, even in the absense of locking. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Acked-by:Pavel Emelianov <xemul@openvz.org> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-22WorkStruct: make allyesconfigDavid Howells1-4/+6
Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22WorkStruct: Pass the work_struct pointer instead of context dataDavid Howells5-26/+30
Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22WorkStruct: Merge the pending bit into the wq_data pointerDavid Howells1-9/+32
Reclaim a word from the size of the work_struct by folding the pending bit and the wq_data pointer together. This shouldn't cause misalignment problems as all pointers should be at least 4-byte aligned. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22WorkStruct: Typedef the work function prototypeDavid Howells1-3/+3
Define a type for the work function prototype. It's not only kept in the work_struct struct, it's also passed as an argument to several functions. This makes it easier to change it. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22WorkStruct: Separate delayable and non-delayable events.David Howells1-23/+28
Separate delayable work items from non-delayable work items be splitting them into a separate structure (delayed_work), which incorporates a work_struct and the timer_list removed from work_struct. The work_struct struct is huge, and this limits it's usefulness. On a 64-bit architecture it's nearly 100 bytes in size. This reduces that by half for the non-delayable type of event. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-17[PATCH] lockdep: fix static keys in module-allocated percpu areasIngo Molnar1-1/+2
lockdep got confused by certain locks in modules: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. Call Trace: [<ffffffff8026f40d>] dump_trace+0xaa/0x3f2 [<ffffffff8026f78f>] show_trace+0x3a/0x60 [<ffffffff8026f9d1>] dump_stack+0x15/0x17 [<ffffffff802abfe8>] __lock_acquire+0x724/0x9bb [<ffffffff802ac52b>] lock_acquire+0x4d/0x67 [<ffffffff80267139>] rt_spin_lock+0x3d/0x41 [<ffffffff8839ed3f>] :ip_conntrack:__ip_ct_refresh_acct+0x131/0x174 [<ffffffff883a1334>] :ip_conntrack:udp_packet+0xbf/0xcf [<ffffffff8839f9af>] :ip_conntrack:ip_conntrack_in+0x394/0x4a7 [<ffffffff8023551f>] nf_iterate+0x41/0x7f [<ffffffff8025946a>] nf_hook_slow+0x64/0xd5 [<ffffffff802369a2>] ip_rcv+0x24e/0x506 [...] Steven Rostedt found the bug: static_obj() check did not take PERCPU_ENOUGH_ROOM into account, so in-module DEFINE_PER_CPU-area locks were triggering this message. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-16[PATCH] some irq_chip variables point to NULLZhang, Yanmin1-0/+2
I got an oops when booting 2.6.19-rc5-mm1 on my ia64 machine. Below is the log. Oops 11012296146944 [1] Modules linked in: binfmt_misc dm_mirror dm_multipath dm_mod thermal processor f an container button sg eepro100 e100 mii Pid: 0, CPU 0, comm: swapper psr : 0000121008022038 ifs : 800000000000040b ip : [<a0000001000e1411>] Not tainted ip is at __do_IRQ+0x371/0x3e0 unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 rnat: 656960155aa56aa5 bsps: a00000010058b890 pr : 656960155aa55a65 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000e1390 b6 : a0000001005beac0 b7 : e00000007f01aa00 f6 : 000000000000000000000 f7 : 0ffe69090000000000000 f8 : 1000a9090000000000000 f9 : 0ffff8000000000000000 f10 : 1000a908ffffff6f70000 f11 : 1003e0000000000000909 r1 : a000000100fbbff0 r2 : 0000000000010002 r3 : 0000000000010001 r8 : fffffffffffbffff r9 : a000000100bd8060 r10 : a000000100dd83b8 r11 : fffffffffffeffff r12 : a000000100bcbbb0 r13 : a000000100bc4000 r14 : 0000000000010000 r15 : 0000000000010000 r16 : a000000100c01aa8 r17 : a000000100d2c350 r18 : 0000000000000000 r19 : a000000100d2c300 r20 : a000000100c01a88 r21 : 0000000080010100 r22 : a000000100c01ac0 r23 : a0000001000108e0 r24 : e000000477980004 r25 : 0000000000000000 r26 : 0000000000000000 r27 : e00000000913400c r28 : e0000004799ee51c r29 : e0000004778b87f0 r30 : a000000100d2c300 r31 : a00000010005c7e0 Call Trace: [<a000000100014600>] show_stack+0x40/0xa0 sp=a000000100bcb760 bsp=a000000100bc4f40 [<a000000100014f00>] show_regs+0x840/0x880 sp=a000000100bcb930 bsp=a000000100bc4ee8 [<a000000100037fb0>] die+0x250/0x320 sp=a000000100bcb930 bsp=a000000100bc4ea0 [<a00000010005e5f0>] ia64_do_page_fault+0x8d0/0xa20 sp=a000000100bcb950 bsp=a000000100bc4e50 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcb9e0 bsp=a000000100bc4e50 [<a0000001000e1410>] __do_IRQ+0x370/0x3e0 sp=a000000100bcbbb0 bsp=a000000100bc4df0 [<a000000100011f50>] ia64_handle_irq+0x170/0x220 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a000000100012390>] ia64_pal_call_static+0x90/0xc0 sp=a000000100bcbd80 bsp=a000000100bc4d78 [<a000000100015630>] default_idle+0x90/0x160 sp=a000000100bcbd80 bsp=a000000100bc4d58 [<a000000100014290>] cpu_idle+0x1f0/0x440 sp=a000000100bcbe20 bsp=a000000100bc4d18 [<a000000100009980>] rest_init+0xc0/0xe0 sp=a000000100bcbe20 bsp=a000000100bc4d00 [<a0000001009f8ea0>] start_kernel+0x6a0/0x6c0 sp=a000000100bcbe20 bsp=a000000100bc4ca0 [<a0000001000089f0>] __end_ivt_text+0x6d0/0x6f0 sp=a000000100bcbe30 bsp=a000000100bc4c00 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! The root cause is that some irq_chip variables, especially ia64_msi_chip, initiate their memeber end to point to NULL. __do_IRQ doesn't check if irq_chip->end is null and just calls it after processing the interrupt. As irq_chip->end is called at many places, so I fix it by reinitiating irq_chip->end to dummy_irq_chip.end, e.g., a noop function. Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-14Revert "[PATCH] fix Data Acess error in dup_fd"Linus Torvalds1-1/+0
This reverts commit 0130b0b32ee53dc7add773fcea984f6a26ef1da3. Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was supposed to fix must be some unrelated memory corruption, and the "fix" actually causes more problems: "However, the new code does not look safe in all cases. If some other task has opened more files while dup_fd() released oldf->file_lock, the new code will update open_files to the new larger value. But newf was allocated with the old smaller value of open_files, therefore subsequent accesses to newf may try to write into unallocated memory." so revert it. Cc: Sharyathi Nagesh <sharyath@in.ibm.com> Cc: Sergey Vlasov <vsu@altlinux.ru> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-14[PATCH] setup_irq(): better mismatch debuggingAndrew Morton1-2/+7
When we get a mismatch between handlers on the same IRQ, all we get is "IRQ handler type mismatch for IRQ n". Let's print the name of the presently-registered handler with which we got the mismatch. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-13[PATCH] Fix misrouted interrupts deadlocksPavel Emelianov1-1/+5
While testing kernel on machine with "irqpoll" option I've caught such a lockup: __do_IRQ() spin_lock(&desc->lock); desc->chip->ack(); /* IRQ is ACKed */ note_interrupt() misrouted_irq() handle_IRQ_event() if (...) local_irq_enable_in_hardirq(); /* interrupts are enabled from now */ ... __do_IRQ() /* same IRQ we've started from */ spin_lock(&desc->lock); /* LOCKUP */ Looking at misrouted_irq() code I've found that a potential deadlock like this can also take place: 1CPU: __do_IRQ() spin_lock(&desc->lock); /* irq = A */ misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); /* irq = B */ if (desc->status & IRQ_INPROGRESS) { 2CPU: __do_IRQ() spin_lock(&desc->lock); /* irq = B */ misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); /* irq = A */ if (desc->status & IRQ_INPROGRESS) { As the second lock on both CPUs is taken before checking that this irq is being handled in another processor this may cause a deadlock. This issue is only theoretical. I propose the attached patch to fix booth problems: when trying to handle misrouted IRQ active desc->lock may be unlocked. Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-13[PATCH] fix Data Acess error in dup_fdSharyathi Nagesh1-0/+1
On running the Stress Test on machine for more than 72 hours following error message was observed. 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0] pc: c000000000060d90: .dup_fd+0x240/0x39c lr: c000000000060d6c: .dup_fd+0x21c/0x39c sp: c00000007ce2fa70 msr: 800000000000b032 dar: ffffffff00000028 dsisr: 40000000 current = 0xc000000074950980 paca = 0xc000000000454500 pid = 27330, comm = bash 0:mon> t [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable) [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88 [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520 [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4 [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74 [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc The problem is because of race window. When if(expand) block is executed in dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be modified. So actual open_files in oldf may not match with open_files variable. Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-06[PATCH] sysctl: allow a zero ctl_name in the middle of a sysctl tableEric W. Biederman1-3/+5
Since it is becoming clear that there are just enough users of the binary sysctl interface that completely removing the binary interface from the kernel will not be an option for foreseeable future, we need to find a way to address the sysctl maintenance issues. The basic problem is that sysctl requires one central authority to allocate sysctl numbers, or else conflicts and ABI breakage occur. The proc interface to sysctl does not have that problem, as names are not densely allocated. By not terminating a sysctl table until I have neither a ctl_name nor a procname, it becomes simple to add sysctl entries that don't show up in the binary sysctl interface. Which allows people to avoid allocating a binary sysctl value when not needed. I have audited the kernel code and in my reading I have not found a single sysctl table that wasn't terminated by a completely zero filled entry. So this change in behavior should not affect anything. I think this mechanism eases the pain enough that combined with a little disciple we can solve the reoccurring sysctl ABI breakage. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-06[PATCH] Improve the removed sysctl warningsEric W. Biederman1-1/+21
Don't warn about libpthread's access to kernel.version. When it receives -ENOSYS it will read /proc/sys/kernel/version. If anything else shows up print the sysctl number string. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cal Peake <cp@absolutedigital.net> Cc: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-06[PATCH] lockdep: fix delayacct locking bugPeter Zijlstra1-6/+9
Make the delayacct lock irqsave; this avoids the possible deadlock where an interrupt is taken while holding the delayacct lock which needs to take the delayacct lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-06[PATCH] Fix the spurious unlock_cpu_hotplug false warningsGautham R Shenoy1-1/+1
Cpu-hotplug locking has a minor race case caused because of setting the variable "recursive" to NULL *after* releasing the cpu_bitmask_lock in the function unlock_cpu_hotplug,instead of doing so before releasing the cpu_bitmask_lock. This was the cause of most of the recent false spurious lock_cpu_unlock warnings. This should fix the problem reported by Martin Lorenz reported in http://lkml.org/lkml/2006/10/29/127. Thanks to Srinivasa DS for pointing it out. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-04Make sure "user->sigpending" count is in syncLinus Torvalds1-4/+11
The previous commit (45c18b0bb579b5c1b89f8c99f1b6ffa4c586ba08, aka "Fix unlikely (but possible) race condition on task->user access") fixed a potential oops due to __sigqueue_alloc() getting its "user" pointer out of sync with switch_user(), and accessing a user pointer that had been de-allocated on another CPU. It still left another (much less serious) problem, where a concurrent __sigqueue_alloc and swich_user could cause sigqueue_alloc to do signal pending reference counting for a _different_ user than the one it then actually ended up using. No oops, but we'd end up with the wrong signal accounting. Another case of Oleg's eagle-eyes picking up the problem. This is trivially fixed by just making sure we load whichever "user" structure we decide to use (it doesn't matter _which_ one we pick, we just need to pick one) just once. Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-04Fix unlikely (but possible) race condition on task->user accessLinus Torvalds1-0/+11
There's a possible race condition when doing a "switch_uid()" from one user to another, which could race with another thread doing a signal allocation and looking at the old thread ->user pointer as it is freed. This explains an oops reported by Lukasz Trabinski: http://permalink.gmane.org/gmane.linux.kernel/462241 We fix this by delaying the (reference-counted) freeing of the user structure until the thread signal handler lock has been released, so that we know that the signal allocation has either seen the new value or has properly incremented the reference count of the old one. Race identified by Oleg Nesterov. Cc: Lukasz Trabinski <lukasz@wsisiz.edu.pl> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-03[PATCH] Create compat_sys_migrate_pagesStephen Rothwell2-0/+34
This is needed on bigendian 64bit architectures. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-03[PATCH] swsusp: debuggingRafael J. Wysocki1-9/+28
Add a swsusp debugging mode. This does everything that's needed for a suspend except for actually suspending. So we can look in the log messages and work out a) what code is being slow and b) which drivers are misbehaving. (1) # echo testproc > /sys/power/disk # echo disk > /sys/power/state This should turn off the non-boot CPU, freeze all processes, wait for 5 seconds and then thaw the processes and the CPU. (2) # echo test > /sys/power/disk # echo disk > /sys/power/state This should turn off the non-boot CPU, freeze all processes, shrink memory, suspend all devices, wait for 5 seconds, resume the devices etc. Cc: Pavel Machek <pavel@ucw.cz> Cc: Stefan Seyfried <seife@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-03[PATCH] schedule removal of FUTEX_FDAndrew Morton1-0/+7
Apparently FUTEX_FD is unfixably racy and nothing uses it (or if it does, it shouldn't). Add a warning printk, give any remaining users six months to migrate off it. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-03[PATCH] Add printk_timed_ratelimit()Andrew Morton1-0/+21
printk_ratelimit() has global state which makes it not useful for callers which wish to perform ratelimiting at a particular frequency. Add a printk_timed_ratelimit() which utilises caller-provided state storage to permit more flexibility. This function can in fact be used for things other than printk ratelimiting and is perhaps poorly named. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-31[PATCH] taskstats: fix sub-threads accountingOleg Nesterov1-7/+10
If there are no listeners, taskstats_exit_send() just returns because taskstats_exit_alloc() didn't allocate *tidstats. This is wrong, each sub-thread should do fill_tgid_exit() on exit, otherwise its ->delays is not recorded in ->signal->stats and lost. Q: We don't send TASKSTATS_TYPE_AGGR_TGID when single-threaded process exits. Is it good? How can the listener figure out that it was actually a process exit, not sub-thread? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-30[PATCH] xacct_add_tsk: fix pure theoretical ->mm use-after-freeOleg Nesterov1-3/+7
Paranoid fix. The task can free its ->mm after the 'if (p->mm)' check. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-30[PATCH] ndiswrapper: don't set the module->taints flagsRandy Dunlap1-1/+1
For ndiswrapper, don't set the module->taints flags, just set the kernel global tainted flag. This should allow ndiswrapper to continue to use GPL symbols. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Florin Malita <fmalita@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-29[PATCH] taskstats: fix sk_buff size calculationOleg Nesterov1-1/+2
prepare_reply() adds GENL_HDRLEN to the payload (genlmsg_total_size()), but then it does genlmsg_put()->nlmsg_put(). This means we forget to reserve a room for 'struct nlmsghdr'. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-29[PATCH] taskstats: fix sk_buff leakOleg Nesterov1-2/+1
'return genlmsg_cancel()' in taskstats_user_cmd/taskstats_exit_send potentially leaks a skb. Unless we pass 'rep_skb' to the netlink layer we own sk_buff. This means we should always do kfree_skb() on failure. [ Thomas acked and pointed out missing return value in original version ] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] workqueue: update kerneldocAlan Stern1-3/+3
This patch (as812) changes the kerneldoc comments explaining the return values from queue_work(), queue_delayed_work(), and queue_delayed_work_on(). The updated comments explain more accurately the meaning of the return code and avoid suggesting that a 0 value means the routine was unsuccessful. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] cpu-hotplug: release `workqueue_mutex' properly on CPU hot-removeSatoru Takeuchi1-6/+6
_cpu_down() acquires `workqueue_mutex' on its process, but doen't release it if __cpu_disable() fails. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] time_adjust cleared before useJim Houston1-1/+1
I notice that the code which implements adjtime clears the time_adjust value before using it. The attached patch makes the obvious fix. Acked-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Jim Houston <jim.houston@ccur.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] fill_tgid: cleanup delays accountingOleg Nesterov1-1/+1
fill_tgid() should skip not only an already exited group leader. If the task has ->exit_state != 0 it already did exit_notify(), so it also did fill_tgid_exit()->delayacct_add_tsk(->signal->stats) and we should skip it to avoid a double accounting. This patch doesn't close the race completely, but it cleanups the code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] taskstats: don't use tasklist_lockOleg Nesterov1-35/+24
Remove tasklist_lock from taskstats.c. find_task_by_pid() is rcu-safe. ->siglock allows us to traverse subthread without tasklist. Q: delay accounting looks wrong to me. If sub-thread has already called taskstats_exit_send() but didn't call release_task(self) yet it will be accounted twice. The window is big. No? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] taskstats: kill ->taskstats_lock in favor of ->siglockOleg Nesterov2-11/+7
signal_struct is (mostly) protected by ->sighand->siglock, I think we don't need ->taskstats_lock to protect ->stats. This also allows us to simplify the locking in fill_tgid(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] taskstats_tgid_free: fix usageOleg Nesterov2-1/+1
taskstats_tgid_free() is called on copy_process's error path. This is wrong. IF (clone_flags & CLONE_THREAD) We should not clear ->signal->taskstats, current uses it, it probably has a valid accumulated info. ELSE taskstats_tgid_init() set ->signal->taskstats = NULL, there is nothing to free. Move the callsite to __exit_signal(). We don't need any locking, entire thread group is exiting, nobody should have a reference to soon to be released ->signal. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] bacct_add_tsk: fix unsafe and wrong parent/group_leader dereferenceOleg Nesterov1-2/+5
1. ts = timespec_sub(uptime, current->group_leader->start_time); It is possible that current != tsk. Probably it was supposed to be 'tsk->group_leader->start_time. But why we are reading group_leader's start_time ? This accounting is per thread, not per procees, I changed this to 'tsk->start_time. Please corect me. 2. stats->ac_ppid = (tsk->parent) ? tsk->parent->pid : 0; tsk->parent never == NULL, and it is unsafe to dereference it. Both the task and it's parent may exit after the caller unlocks tasklist_lock, the memory could be unmapped (DEBUG_SLAB). (And we should use ->real_parent->tgid in fact). Q: I don't understand the 'if (thread_group_leader(tsk))' check. Why it is needed ? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] fill_tgid: fix task_struct leak and possible oopsOleg Nesterov1-6/+9
1. fill_tgid() forgets to do put_task_struct(first). 2. release_task(first) can happen after fill_tgid() drops tasklist_lock, it is unsafe to dereference first->signal. This is a temporary fix, imho the locking should be reworked. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] Constify compat_get_bitmap argumentStephen Rothwell1-1/+1
This means we can call it when the bitmap we want to fetch is declared const. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28[PATCH] Add missing space in module.c for taintskernelJan Dittmer1-1/+1
Obvious fix. Signed-off-by: Jan Dittmer <jdi@l4x.org> Acked-by: Florin Malita <fmalita@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-21[PATCH] x86-64: Speed up dwarf2 unwinderJan Beulich1-39/+279
This changes the dwarf2 unwinder to do a binary search for CIEs instead of a linear work. The linker is unfortunately not able to build a proper lookup table at link time, instead it creates one at runtime as soon as the bootmem allocator is usable (so you'll continue using the linear lookup for the first [hopefully] few calls). The code should be ready to utilize a build-time created table once a fixed linker becomes available. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-20[PATCH] kernel/nsproxy.c: use kmemdup()Alexey Dobriyan1-4/+2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20[PATCH] cad_pid sysctl with PROC_FS=nRandy Dunlap1-0/+4
If CONFIG_PROC_FS=n: kernel/sysctl.c:148: warning: 'proc_do_cad_pid' used but never defined kernel/built-in.o:(.data+0x1228): undefined reference to `proc_do_cad_pid' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20[PATCH] readjust comments of task_timeslice for kernel docBorislav Petkov1-9/+9
Signed-off-by: Borislav Petkov <petkov@math.uni-muenster.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds2-5/+7
* git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: fm801-gp - handle errors from pci_enable_device() Input: gameport core - handle errors returned by device_bind_driver() Input: serio core - handle errors returned by device_bind_driver() Lockdep: fix compile error in drivers/input/serio/serio.c Input: serio - add lockdep annotations Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() Input: atkbd - supress "too many keys" error message Input: i8042 - supress ACK/NAKs when blinking during panic Input: add missing exports to fix modular build
2006-10-17[PATCH] Convert cpu hotplug notifiers to use raw_notifier instead of ↵Neil Brown1-9/+15
blocking_notifier The use of blocking notifier by _cpu_up and _cpu_down in cpu.c has two problem. 1/ An interaction with the workqueue notifier causes lockdep to spit a warning. 2/ A notifier could conceivable be added or removed while _cpu_up or _cpu_down are in process. As each notifier is called twice (prepare then commit/abort) this could be unhealthy. To fix to we simply take cpu_add_remove_lock while adding or removing notifiers to/from the list. This makes the 'blocking' usage unnecessary as all accesses to cpu_chain are now protected by cpu_add_remove_lock. So change "blocking" to "raw" in all relevant places. This fixes 1. Credit: Andrew Morton Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> (reporter) Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17[PATCH] rt-mutex: fixup rt-mutex debug codePeter Zijlstra1-2/+2
BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted) [<c04051e3>] show_trace_log_lvl+0x58/0x16a [<c04057f0>] show_trace+0xd/0x10 [<c0405900>] dump_stack+0x19/0x1b [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a [<c04224c0>] free_task+0x15/0x24 [<c042378c>] copy_process+0x12bd/0x1324 [<c0423835>] do_fork+0x42/0x113 [<c04021dd>] sys_fork+0x19/0x1b [<c0403fb7>] syscall_call+0x7/0xb In copy_process(), dup_task_struct() also duplicates the ->pi_lock, ->pi_waiters and ->pi_blocked_on members. rt_mutex_debug_task_free() called from free_task() validates these members. However free_task() can be invoked before these members are reset for the new task. Move the initialization code before the first bail that can hit free_task(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17[PATCH] genirq: clean up irq-flow-type namingIngo Molnar1-24/+9
Introduce desc->name and eliminate the handle_irq_name() hack. Add set_irq_chip_and_handler_name() to set the flow type and name at once. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Matthew Wilcox <willy@debian.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17[PATCH] swsusp: fix memory leaksAndrew Morton1-1/+2
My fancy new swsusp IO code had a big memory leak. It's somewhat invisible because the whole mem_map[] gets overwritten after resume, but it can cause us to get low on memory during the actual suspend process. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17[PATCH] posix-cpu-timers: prevent signal delivery starvationThomas Gleixner1-6/+21
The integer divisions in the timer accounting code can round the result down to 0. Adding 0 is without effect and the signal delivery stops. Clamp the division result to minimum 1 to avoid this. Problem was reported by Seongbae Park <spark@google.com>, who provided also an inital patch. Roland sayeth: I have had some more time to think about the problem, and to reproduce it using Toyo's test case. For the record, if my understanding of the problem is correct, this happens only in one very particular case. First, the expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks) it's < nthreads so the division yields zero. Second, it only affects each thread that is so new that its CPU time accumulation is zero so now+0 is still zero and ->it_*_expires winds up staying zero. For the VIRT and PROF clocks when cputime_t is tick granularity (or the SCHED clock on configurations where sched_clock's value only advances on clock ticks), this is not hard to arrange with new threads starting up and blocking before they accumulate a whole tick of CPU time. That's what happens in Toyo's test case. Note that in general it is fine for that division to round down to zero, and set each thread's expiry time to its "now" time. The problem only arises with thread's whose "now" value is still zero, so that now+0 winds up 0 and is interpreted as "not set" instead of ">= now". So it would be a sufficient and more precise fix to just use max(ticks, 1) inside the loop when setting each it_*_expires value. But, it does no harm to round the division up to one and always advance every thread's expiry time. If the thread didn't already fire timers for the expiry time of "now", there is no expectation that it will do so before the next tick anyway. So I followed Thomas's patch in lifting the max out of the loops. This patch also covers the reload cases, which are harder to write a test for (and I didn't try). I've tested it with Toyo's case and it fixes that. [toyoa@mvista.com: fix: min_t -> max_t] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Daniel Walker <dwalker@mvista.com> Cc: Toyo Abe <toyoa@mvista.com> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Seongbae Park <spark@google.com> Cc: Peter Mattis <pmattis@google.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17[PATCH] i386 Time: Avoid PIT SMP lockupsjohn stultz1-1/+1
Avoid possible PIT livelock issues seen on SMP systems (and reported by Andi), by not allowing it as a clocksource on SMP boxes. However, since the PIT may no longer be present, we have to properly handle the cases where SMP systems have TSC skew and fall back from the TSC. Since the PIT isn't there, it would "fall back" to the TSC again. So this changes the jiffies rating to 1, and the TSC-bad rating value to 0. Thus you will get the following behavior priority on i386 systems: tsc [if present & stable] hpet [if present] cyclone [if present] acpi_pm [if present] pit [if UP] jiffies Rather then the current more complicated: tsc [if present & stable] hpet [if present] cyclone [if present] acpi_pm [if present] pit [if cpus < 4] tsc [if present & unstable] jiffies Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17[PATCH] lockdep: increase max allowed recursion depthIngo Molnar1-3/+5
In general, lockdep warnings are intended to be non-fatal, so I have put in various practical limits on internal data structure failure modes. We haven't had a /single/ lockdep-internal crash ever since lockdep went upstream [the unwinder crashes are outside of lockdep], and that's largely due to the good internal checks it does. Recursion within the dependency graph is currently limited to 20, that's probably not enough on some many-CPU boxes - this patch doubles it to 40. I have written the lockdep functions to have as small stackframes as possible, so 40 should be OK too. (The practical recursion limit should be somewhere between 100 and 200 entries. If we hit that then I'll change the algorithm to be iteration-based. Graph walking logic is so easy to program via recursion, so i'd like to keep recursion as long as possible.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-16[PATCH] fix epoll_pwait when EPOLL=nRandy Dunlap1-0/+1
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=7371 sys_epoll_pwait needs to be listed as a conditional (weak) entry point for CONFIG_EPOLL=n. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] lockdep: fix printk recursion logicIngo Molnar1-9/+2
Bug reported and fixed by Tilman Schmidt <tilman@imap.cc>: if lockdep is enabled then log messages make it to /var/log/messages belatedly. The reason is a missed wakeup of klogd. Initially there was only a lockdep_internal() protection against lockdep recursion within vprintk() - it grew the 'outer' lockdep_off()/on() protection only later on. But that lockdep_off() made the release_console_sem() within vprintk() always happen under the lockdep_internal() condition, causing the bug. The right solution to remove the inner protection against recursion here - the outer one is enough. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] lockdep: use BUILD_BUG_ONAlexey Dobriyan1-4/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] bitmap: parse input from kernel and user buffersReinette Chatre2-2/+2
lib/bitmap.c:bitmap_parse() is a library function that received as input a user buffer. This seemed to have originated from the way the write_proc function of the /proc filesystem operates. This has been reworked to not use kmalloc and eliminates a lot of get_user() overhead by performing one access_ok before using __get_user(). We need to test if we are in kernel or user space (is_user) and access the buffer differently. We cannot use __get_user() to access kernel addresses in all cases, for example in architectures with separate address space for kernel and user. This function will be useful for other uses as well; for example, taking input for /sysfs instead of /proc, so it was changed to accept kernel buffers. We have this use for the Linux UWB project, as part as the upcoming bandwidth allocator code. Only a few routines used this function and they were changed too. Signed-off-by: Reinette Chatre <reinette.chatre@linux.intel.com> Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Cc: Paul Jackson <pj@sgi.com> Cc: Joe Korty <joe.korty@ccur.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] sched: likely profilingNick Piggin1-3/+3
This likely profiling is pretty fun. I found a few possible problems in sched.c. This patch may be not measurable, but when I did measure long ago, nooping (un)likely cost a couple of % on scheduler heavy benchmarks, so it all adds up. Tweak some branch hints: - the 2nd 64 bits in the bitmask is likely to be populated, because it contains the first 28 bits (nearly 3/4) of the normal priorities. (ratio of 669669:691 ~= 1000:1). - it isn't unlikely that context switching switches to another process. it might be very rapidly switching to and from the idle process (ratio of 475815:419004 and 471330:423544). Let the branch predictor decide. - preempt_enable seems to be very often called in a nested preempt_disable or with interrupts disabled (ratio of 3567760:87965 ~= 40:1) Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Daniel Walker <dwalker@mvista.com> Cc: Hua Zhong <hzhong@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] fix Module taint flags listing in Oops/panicFlorin Malita1-45/+49
Module taint flags listing in Oops/panic has a couple of issues: * taint_flags() doesn't null-terminate the buffer after printing the flags * per-module taints are only set if the kernel is not already tainted (with that particular flag) => only the first offending module gets its taint info correctly updated Some additional changes: * 'license_gplok' is no longer needed - equivalent to !(taints & TAINT_PROPRIETARY_MODULE) - so we can drop it from struct module * exporting module taint info via /proc/module: pwc 88576 0 - Live 0xf8c32000 evilmod 6784 1 pwc, Live 0xf8bbf000 (PF) Signed-off-by: Florin Malita <fmalita@gmail.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] mm: kevent threads: use MPOL_DEFAULTChristoph Lameter1-0/+7
Switch the memory policy of the kevent threads to MPOL_DEFAULT while leaving the kzalloc of the workqueue structure on interleave. This means that all code executed in the context of the kevent thread is allocating node local. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Alok Kataria <alok.kataria@calsoftinc.com> Cc: Andi Kleen <ak@suse.de> Cc: <pj@sgi.com> Cc: <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] swsusp: Use suspend_consoleRafael J. Wysocki2-1/+15
Add suspend_console() and resume_console() to the suspend-to-disk code paths so that the users of netconsole can use swsusp with it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass()Peter Zijlstra2-5/+7
This annotation makes it possible to assign a subclass on lock init. This annotation is meant to reduce the _nested() annotations by assigning a default subclass. One could do without this annotation and rely on lockdep_set_class() exclusively, but that would require a manual stack of struct lock_class_key objects. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2006-10-10[PATCH] cpuset ANSI prototypeAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-10[PATCH] make kernel/relay.c __user-cleanAl Viro1-19/+22
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-10[PATCH] __user annotations: futexAl Viro2-13/+14
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-07[PATCH] swsusp: Make userland suspend work on SMP againRafael J. Wysocki1-1/+1
Unfortunately one of the recent changes in swsusp has broken the userland suspend on SMP. Fix it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-06[PATCH] ixp4xxdefconfig arm fixesFrederik Deweerdt1-1/+1
With the following patch, the ixp4xxdefconfig builds correctly. I'll test some more configs if I get some time. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-06[PATCH] kauditd_thread warning fixAndrew Morton1-1/+2
Squash this warning: kernel/audit.c: In function 'kauditd_thread': kernel/audit.c:367: warning: no return statement in function returning non-void We might as test kthread_should_stop(), although it's not very pointful at present. The code which starts this thread looks racy - the kernel could start multiple threads. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-05IRQ: Maintain regs pointer globally rather than passing to IRQ handlersDavid Howells5-40/+33
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead of passing regs around manually through all ~1800 interrupt handlers in the Linux kernel. The regs pointer is used in few places, but it potentially costs both stack space and code to pass it around. On the FRV arch, removing the regs parameter from all the genirq function results in a 20% speed up of the IRQ exit path (ie: from leaving timer_interrupt() to leaving do_IRQ()). Where appropriate, an arch may override the generic storage facility and do something different with the variable. On FRV, for instance, the address is maintained in GR28 at all times inside the kernel as part of general exception handling. Having looked over the code, it appears that the parameter may be handed down through up to twenty or so layers of functions. Consider a USB character device attached to a USB hub, attached to a USB controller that posts its interrupts through a cascaded auxiliary interrupt controller. A character device driver may want to pass regs to the sysrq handler through the input layer which adds another few layers of parameter passing. I've build this code with allyesconfig for x86_64 and i386. I've runtested the main part of the code on FRV and i386, though I can't test most of the drivers. I've also done partial conversion for powerpc and MIPS - these at least compile with minimal configurations. This will affect all archs. Mostly the changes should be relatively easy. Take do_IRQ(), store the regs pointer at the beginning, saving the old one: struct pt_regs *old_regs = set_irq_regs(regs); And put the old one back at the end: set_irq_regs(old_regs); Don't pass regs through to generic_handle_irq() or __do_IRQ(). In timer_interrupt(), this sort of change will be necessary: - update_process_times(user_mode(regs)); - profile_tick(CPU_PROFILING, regs); + update_process_times(user_mode(get_irq_regs())); + profile_tick(CPU_PROFILING); I'd like to move update_process_times()'s use of get_irq_regs() into itself, except that i386, alone of the archs, uses something other than user_mode(). Some notes on the interrupt handling in the drivers: (*) input_dev() is now gone entirely. The regs pointer is no longer stored in the input_dev struct. (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does something different depending on whether it's been supplied with a regs pointer or not. (*) Various IRQ handler function pointers have been moved to type irq_handler_t. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
2006-10-05IRQ: Typedef the IRQ handler function typeDavid Howells1-3/+1
Typedef the IRQ handler function type. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1356d1e5fd256997e3d3dce0777ab787d0515c7a commit)
2006-10-05IRQ: Typedef the IRQ flow handler function typeDavid Howells1-9/+3
Typedef the IRQ flow handler function type. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 8e973fbdf5716b93a0a8c0365be33a31ca0fa351 commit)
2006-10-04Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/confighLinus Torvalds2-2/+0
* master.kernel.org:/pub/scm/linux/kernel/git/davej/configh: Remove all inclusions of <linux/config.h> Manually resolved trivial path conflicts due to removed files in the sound/oss/ subdirectory.
2006-10-04Merge branch 'audit.b32' of ↵Linus Torvalds2-5/+32
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b32' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] message types updated [PATCH] name_count array overrun [PATCH] PPID filtering fix [PATCH] arch filter lists with < or > should not be accepted
2006-10-04[PATCH] rcu: simplify/improve batch tuningOleg Nesterov1-8/+3
Kill a hard-to-calculate 'rsinterval' boot parameter and per-cpu rcu_data.last_rs_qlen. Instead, it adds adds a flag rcu_ctrlblk.signaled, which records the fact that one of CPUs has sent a resched IPI since the last rcu_start_batch(). Roughly speaking, we need two rcu_start_batch()s in order to move callbacks from ->nxtlist to ->donelist. This means that when ->qlen exceeds qhimark and continues to grow, we should send a resched IPI, and then do it again after we gone through a quiescent state. On the other hand, if it was already sent, we don't need to do it again when another CPU detects overflow of the queue. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: add sched torture type to rcutortureJosh Triplett1-1/+39
Implement torture testing for the "sched" variant of RCU, which uses preempt_disable, preempt_enable, and synchronize_sched. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: add rcu_bh_sync torture type to rcutortureJosh Triplett1-1/+15
Use the newly-generic synchronous deferred free function to implement torture testing for rcu_bh using synchronize_rcu_bh rather than the asynchronous call_rcu_bh. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: add rcu_sync torture type to rcutortureJosh Triplett1-2/+15
Use the newly-generic synchronous deferred free function to implement torture testing for RCU using synchronize_rcu rather than the asynchronous call_rcu. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: refactor srcu_torture_deferred_free to work for any implementationJosh Triplett1-24/+29
Make srcu_torture_deferred_free use cur_ops->sync() so it will work for any implementation. Move and rename it in preparation for use in the ops of other implementations. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] RCU: add fake writers to rcutortureJosh Triplett1-5/+104
rcutorture currently has one writer and an arbitrary number of readers. To better exercise some of the code paths in RCU implementations, add fake writer threads which call the synchronize function for the RCU variant in a loop, with a delay between calls to arrange for different numbers of writers running in parallel. [bunk@stusta.de: cleanup] Acked-by: Paul McKenney <paulmck@us.ibm.com> Cc: Dipkanar Sarma <dipankar@in.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: Fix sign bug making rcu_random always return the same sequenceJosh Triplett1-2/+2
rcu_random uses a counter rrs_count to occasionally mix data from get_random_bytes into the state of its pseudorandom generator. However, the rrs_counter gets declared as an unsigned long, and rcu_random checks for --rrs_count < 0, so this code will never mix any real random data into the state, and will thus always return the same sequence of random numbers. Also, change the return value of rcu_random from long to unsigned long, to avoid potential issues caused by the use of the % operator, which can return negative values for negative left operands. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: Avoid kthread_stop on invalid pointer if rcutorture reader ↵Josh Triplett1-1/+1
startup fails rcu_torture_init kmallocs the array of reader threads, then creates each one with kthread_run, cleaning up with rcu_torture_cleanup if this fails. rcu_torture_cleanup calls kthread_stop on any non-NULL pointer in the array; however, any readers after the one that failed to start up will have invalid pointers, not null pointers. Avoid this by using kzalloc instead. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: Mention rcu_bh in description of rcutorture's torture_type ↵Josh Triplett1-1/+1
parameter The comment for rcutorture's torture_type parameter only lists the RCU variants rcu and srcu, but not rcu_bh; add rcu_bh to the list. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] rcu: Add MODULE_AUTHOR to rcutorture moduleJosh Triplett1-0/+1
Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] SRCU: report out-of-memory errorsAlan Stern2-3/+5
Currently the init_srcu_struct() routine has no way to report out-of-memory errors. This patch (as761) makes it return -ENOMEM when the per-cpu data allocation fails. The patch also makes srcu_init_notifier_head() report a BUG if a notifier head can't be initialized. Perhaps it should return -ENOMEM instead, but in the most likely cases where this might occur I don't think any recovery is possible. Notifier chains generally are not created dynamically. [akpm@osdl.org: avoid statement-with-side-effect in macro] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] Add SRCU-based notifier chainsAlan Stern1-1/+123
This patch (as751) adds a new type of notifier chain, based on the SRCU (Sleepable Read-Copy Update) primitives recently added to the kernel. An SRCU notifier chain is much like a blocking notifier chain, in that it must be called in process context and its callout routines are allowed to sleep. The difference is that the chain's links are protected by the SRCU mechanism rather than by an rw-semaphore, so calling the chain has extremely low overhead: no memory barriers and no cache-line bouncing. On the other hand, unregistering from the chain is expensive and the chain head requires special runtime initialization (plus cleanup if it is to be deallocated). SRCU notifiers are appropriate for notifiers that will be called very frequently and for which unregistration occurs very seldom. The proposed "task notifier" scheme qualifies, as may some of the network notifiers. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] srcu-3: add SRCU operations to rcutorturePaul E. McKenney1-10/+123
Adds SRCU operations to rcutorture and updates rcutorture documentation. Also increases the stress imposed by the rcutorture test. [bunk@stusta.de: make needlessly global code static] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] srcu-3: RCU variant permitting read-side blockingPaul E. McKenney2-1/+258
Updated patch adding a variant of RCU that permits sleeping in read-side critical sections. SRCU is as follows: o Each use of SRCU creates its own srcu_struct, and each srcu_struct has its own set of grace periods. This is critical, as it prevents one subsystem with a blocking reader from holding up SRCU grace periods for other subsystems. o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) all take a pointer to a srcu_struct. o The SRCU primitives must be called from process context. o srcu_read_lock() returns an int that must be passed to the matching srcu_read_unlock(). Realtime RCU avoids the need for this by storing the state in the task struct, but SRCU needs to allow a given code path to pass through multiple SRCU domains -- storing state in the task struct would therefore require either arbitrary space in the task struct or arbitrary limits on SRCU nesting. So I kicked the state-storage problem up to the caller. Of course, it is not permitted to call synchronize_srcu() while in an SRCU read-side critical section. o There is no call_srcu(). It would not be hard to implement one, but it seems like too easy a way to OOM the system. (Hey, we have enough trouble with call_rcu(), which does -not- permit readers to sleep!!!) So, if you want it, please tell me why... [josht@us.ibm.com: sparse notation] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] msi: simplify msi sanity checks by adding with generic irq codeEric W. Biederman1-0/+7
Currently msi.c is doing sanity checks that make certain before an irq is destroyed it has no more users. By adding irq_has_action I can perform the test is a generic way, instead of relying on a msi specific data structure. By performing the core check in dynamic_irq_cleanup I ensure every user of dynamic irqs has a test present and we don't free resources that are in use. In msi.c this allows me to kill the attrib.state member of msi_desc and all of the assciated code to maintain it. To keep from freeing data structures when irq cleanup code is called to soon changing dyanamic_irq_cleanup is insufficient because there are msi specific data structures that are also not safe to free. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] genirq: irq: add a dynamic irq creation APIEric W. Biederman1-0/+56
With the msi support comes a new concept in irq handling, irqs that are created dynamically at run time. Currently the msi code allocates irqs backwards. First it allocates a platform dependent routing value for an interrupt the ``vector'' and then it figures out from the vector which irq you are on. This msi backwards allocator suffers from two basic problems. The allocator suffers because it is trying to do something that is architecture specific in a generic way making it brittle, inflexible, and tied to tightly to the architecture implementation. The alloctor also suffers from it's very backwards nature as it has tied things together that should have no dependencies. To solve the basic dynamic irq allocation problem two new architecture specific functions are added: create_irq and destroy_irq. create_irq takes no input and returns an unused irq number, that won't be reused until it is returned to the free poll with destroy_irq. The irq then can be used for any purpose although the only initial consumer is the msi code. destroy_irq takes an irq number allocated with create_irq and returns it to the free pool. Making this functionality per architecture increases the simplicity of the irq allocation code and increases it's flexibility. dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the irq_desc initializtion that should happen for dynamic irqs. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] genirq: irq: add moved_masked_irqEric W. Biederman1-7/+21
Currently move_native_irq disables and renables the irq we are migrating to ensure we don't take that irq when we are actually doing the migration operation. Disabling the irq needs to happen but sometimes doing the work is move_native_irq is too late. On x86 with ioapics the irq move sequences needs to be: edge_triggered: mask irq. move irq. unmask irq. ack irq. level_triggered: mask irq. ack irq. move irq. unmask irq. We can easily perform the edge triggered sequence, with the current defintion of move_native_irq. However the level triggered case does not map well. For that I have added move_masked_irq, to allow me to disable the irqs around both the ack and the move. Q: Why have we not seen this problem earlier? A: The only symptom I have been able to reproduce is that if we change the vector before acknowleding an irq the wrong irq is acknowledged. Since we currently are not reprogramming the irq vector during migration no problems show up. We have to mask the irq before we acknowledge the irq or else we could hit a window where an irq is asserted just before we acknowledge it. Edge triggered irqs do not have this problem because acknowledgements do not propogate in the same way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] genirq: irq: convert the move_irq flag from a 32bit word to a single bitEric W. Biederman1-3/+3
The primary aim of this patchset is to remove maintenances problems caused by the irq infrastructure. The two big issues I address are an artificially small cap on the number of irqs, and that MSI assumes vector == irq. My primary focus is on x86_64 but I have touched other architectures where necessary to keep them from breaking. - To increase the number of irqs I modify the code to look at the (cpu, vector) pair instead of just looking at the vector. With a large number of irqs available systems with a large irq count no longer need to compress their irq numbers to fit. Removing a lot of brittle special cases. For acpi guys the result is that irq == gsi. - Addressing the fact that MSI assumes irq == vector takes a few more patches. But suffice it to say when I am done none of the generic irq code even knows what a vector is. In quick testing on a large Unisys x86_64 machine we stumbled over at least one driver that assumed that NR_IRQS could always fit into an 8 bit number. This driver is clearly buggy today. But this has become a class of bugs that it is now much easier to hit. This patch: This is a minor space optimization. In practice I don't think this has any affect because of our alignment constraints and the other fields but there is not point in chewing up an uncessary word and since we already read the flag field this should improve the cache hit ratio of the irq handler. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] name_count array overrunSteve Grubb1-2/+17
Hi, This patch removes the rdev logging from the previous patch The below patch closes an unbounded use of name_count. This can lead to oopses in some new file systems. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-10-04[PATCH] PPID filtering fixAlexander Viro1-2/+7
On Thu, Sep 28, 2006 at 04:03:06PM -0400, Eric Paris wrote: > After some looking I did not see a way to get into audit_log_exit > without having set the ppid. So I am dropping the set from there and > only doing it at the beginning. > > Please comment/ack/nak as soon as possible. Ehh... That's one hell of an overhead to be had ;-/ Let's be lazy. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-10-04[PATCH] arch filter lists with < or > should not be acceptedEric Paris1-1/+8
Currently the kernel audit system represents arch's as numbers and will gladly accept comparisons between archs using >, <, >=, <= when the only thing that makes sense is = or !=. I'm told that the next revision of auditctl will do this checking but this will provide enforcement in the kernel even for old userspace. A simple command to show the issue would be to run auditctl -d entry,always -F arch>i686 -S chmod with this patch the kernel will reject this with -EINVAL Please comment/ack/nak as soon as possible. -Eric kernel/auditfilter.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-10-04Remove all inclusions of <linux/config.h>Dave Jones2-2/+0
kbuild explicitly includes this at build time. Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-03rcutorture: Fix incorrect description of default for nreaders parameterJosh Triplett1-1/+1
The comment for the nreaders parameter of rcutorture gives the default as 4*ncpus, but the value actually defaults to 2*ncpus; fix the comment. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03remove duplicate "until" from kernel/workqueue.cRolf Eike Beer1-1/+1
s/until until/until/ Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03fix file specification in commentsUwe Zeisberger1-1/+1
Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03[PATCH] scheduler: NUMA aware placement of sched_group_allnodesChristoph Lameter1-3/+4
When the per cpu sched domains are build then they also need to be placed on the node where the cpu resides otherwise we will have frequent off node accesses which will slow down the system. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: fixing wrong comment for find_idlest_cpu()Satoru Takeuchi1-1/+1
Fixing wrong comment for find_idlest_cpu(). Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: cleanup sched_group cpu_power setupSiddha, Suresh B1-63/+82
Up to now sched group's cpu_power for each sched domain is initialized independently. This made the setup code ugly as the new sched domains are getting added. Make the sched group cpu_power setup code generic, by using domain child field and new domain flag in sched_domain. For most of the sched domains(except NUMA), sched group's cpu_power is now computed generically using the domain properties of itself and of the child domain. sched groups in NUMA domains are setup little differently and hence they don't use this generic mechanism. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: introduce child field in sched_domainSiddha, Suresh B1-10/+30
Introduce the child field in sched_domain struct and use it in sched_balance_self(). We will also use this field in cleaning up the sched group cpu_power setup(done in a different patch) code. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: don't print migration cost when only 1 CPUDave Jones1-6/+8
If only a single CPU is present, printing this doesn't make much sense. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: remove unnecessary sched group allocationsSiddha, Suresh B1-66/+38
Remove dynamic sched group allocations for MC and SMP domains. These allocations can easily fail on big systems(1024 or so CPUs) and we can live with out these dynamic allocations. [akpm@osdl.org: build fix] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] sched: force /sbin/init off isolated cpusNick Piggin1-2/+11
Force /sbin/init off isolated cpus (unless every CPU is specified as an isolcpu). Users seem to think that the isolated CPUs shouldn't have much running on them to begin with. That's fair enough: intuitive, I guess. It also means that the cpu affinity masks of tasks will not include isolcpus by default, which is also more intuitive, perhaps. /sbin/init is spawned from the boot CPU's idle thread, and /sbin/init starts the rest of userspace. So if the boot CPU is specified to be an isolcpu, then prior to this patch, all of userspace will be run there. (throw in a couple of plausible devinit -> cpuinit conversions I spotted while we're here). Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Paul Jackson <pj@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] kernel-doc for kernel/resource.cRandy Dunlap1-11/+72
Add kernel-doc function headers in kernel/resource.c and use them in DocBook. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] kernel-doc for kernel/dma.cRandy Dunlap1-1/+9
Add kernel-doc function headers in kernel/dma.c and use it in DocBook. Clean up kernel-doc in mca_dma.h (the colon (':') represents a section header). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] Create kallsyms_lookup_size_offset()Franck Bui-Huu2-44/+82
Some uses of kallsyms_lookup() do not need to find out the name of a symbol and its module's name it belongs. This is specially true in arch specific code, which needs to unwind the stack to show the back trace during oops (mips is an example). In this specific case, we just need to retreive the function's size and the offset of the active intruction inside it. Adds a new entry "kallsyms_lookup_size_offset()" This new entry does exactly the same as kallsyms_lookup() but does not require any buffers to store any names. It returns 0 if it fails otherwise 1. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.hDavid Howells1-2/+0
Revert Andrew Morton's patch to temporarily hack around the lack of a declaration of sigset_t in linux/compat.h to make the block-disablement patches build on IA64. This got accidentally pushed to Linus and should be fixed in a different manner. Also make linux/compat.h #include asm/signal.h to gain a definition of sigset_t so that it can externally declare sigset_from_compat(). This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and ppc64 and run-tested on frv. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] replace cad_pid by a struct pidCedric Le Goater2-6/+30
There are a few places in the kernel where the init task is signaled. The ctrl+alt+del sequence is one them. It kills a task, usually init, using a cached pid (cad_pid). This patch replaces the pid_t by a struct pid to avoid pid wrap around problem. The struct pid is initialized at boot time in init() and can be modified through systctl with /proc/sys/kernel/cad_pid [ I haven't found any distro using it ? ] It also introduces a small helper routine kill_cad_pid() which is used where it seemed ok to use cad_pid instead of pid 1. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] introduce get_task_pid() to fix unsafe get_pid()Oleg Nesterov1-0/+9
proc_pid_make_inode: ei->pid = get_pid(task_pid(task)); I think this is not safe. get_pid() can be preempted after checking "pid != NULL". Then the task exits, does detach_pid(), and RCU frees the pid. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] introduce kernel_execveArnd Bergmann1-3/+2
The use of execve() in the kernel is dubious, since it relies on the __KERNEL_SYSCALLS__ mechanism that stores the result in a global errno variable. As a first step of getting rid of this, change all users to a global kernel_execve function that returns a proper error code. This function is a terrible hack, and a later patch removes it again after the kernel syscalls are gone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] nsproxy cloning error path fixPavel1-1/+1
This patch fixes copy_namespaces()'s error path. when new nsproxy (new_ns) is created pointers to namespaces (ipc, uts) are copied from the old nsproxy. Later in copy_utsname, copy_ipcs, etc. according namespaces are get-ed. On error path needed namespaces are put-ed, so there's no need to put new nsproxy itelf as it woud cause putting namespaces for the second time. Found when incorporating namespaces into OpenVZ kernel. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] IPC namespace - sysctlsKirill Korotaev1-28/+88
Sysctl tweaks for IPC namespace Signed-off-by: Pavel Emelianiov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] IPC namespace - utilsKirill Korotaev1-0/+10
This patch adds basic IPC namespace functionality to IPC utils: - init_ipc_ns - copy/clone/unshare/free IPC ns - /proc preparations Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] IPC namespace coreKirill Korotaev2-17/+46
This patch set allows to unshare IPCs and have a private set of IPC objects (sem, shm, msg) inside namespace. Basically, it is another building block of containers functionality. This patch implements core IPC namespace changes: - ipc_namespace structure - new config option CONFIG_IPC_NS - adds CLONE_NEWIPC flag - unshare support [clg@fr.ibm.com: small fix for unshare of ipc namespace] [akpm@osdl.org: build fix] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] uts: copy nsproxy only when neededSerge Hallyn1-6/+14
The nsproxy was being copied in unshare() when anything was being unshared, even if it was something not referenced from nsproxy. This should end up in some cases with far more memory usage than necessary. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: implement CLONE_NEWUTS flagSerge E. Hallyn3-4/+71
Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare. [clg@fr.ibm.com: IPC unshare fix] [bunk@stusta.de: cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: sysctlSerge E. Hallyn1-19/+119
Sysctl uts patch. This will need to be done another way, but since sysctl itself needs to be container aware, 'the right thing' is a separate patchset. [akpm@osdl.org: ia64 build fix] [sam.vilain@catalyst.net.nz: cleanup] [sam.vilain@catalyst.net.nz: add proc_do_utsns_string] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: implement utsname namespacesSerge E. Hallyn3-0/+57
This patch defines the uts namespace and some manipulators. Adds the uts namespace to task_struct, and initializes a system-wide init namespace. It leaves a #define for system_utsname so sysctl will compile. This define will be removed in a separate patch. [akpm@osdl.org: build fix, cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: use init_utsname when appropriateSerge E. Hallyn2-8/+8
In some places, particularly drivers and __init code, the init utsns is the appropriate one to use. This patch replaces those with a the init_utsname helper. Changes: Removed several uses of init_utsname(). Hope I picked all the right ones in net/ipv4/ipconfig.c. These are now changed to utsname() (the per-process namespace utsname) in the previous patch (2/7) [akpm@osdl.org: CIFS fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: switch to using uts namespacesSerge E. Hallyn1-7/+7
Replace references to system_utsname to the per-process uts namespace where appropriate. This includes things like uname. Changes: Per Eric Biederman's comments, use the per-process uts namespace for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c [jdike@addtoit.com: UML fix] [clg@fr.ibm.com: cleanup] [akpm@osdl.org: build fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: exit_task_namespaces() invalidates nsproxyCedric Le Goater1-1/+1
exit_task_namespaces() has replaced the former exit_namespace(). It invalidates task->nsproxy and associated namespaces. This is an issue for the (futur) pid namespace which is required to be valid in exit_notify(). This patch moves exit_task_namespaces() after exit_notify() to keep nsproxy valid. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: incorporate fs namespace into nsproxySerge E. Hallyn3-16/+37
This moves the mount namespace into the nsproxy. The mount namespace count now refers to the number of nsproxies point to it, rather than the number of tasks. As a result, the unshare_namespace() function in kernel/fork.c no longer checks whether it is being shared. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.cSerge E. Hallyn1-0/+3
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This avoids all arches having to be updated. Compiles and boots on s390. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: add nsproxySerge E. Hallyn4-2/+102
This patch adds a nsproxy structure to the task struct. Later patches will move the fs namespace pointer into this structure, and introduce a new utsname namespace into the nsproxy. The vserver and openvz functionality, then, would be implemented in large part by virtualizing/isolating more and more resources into namespaces, each contained in the nsproxy. [akpm@osdl.org: build fix] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] make kernel/sysctl.c:_proc_do_string() staticAdrian Bunk1-2/+3
This patch makes the needlessly global _proc_do_string() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] proc: sysctl: add _proc_do_string helperSam Vilain1-29/+36
The logic in proc_do_string is worth re-using without passing in a ctl_table structure (say, we want to calculate a pointer and pass that in instead); pass in the two fields it uses from that structure as explicit arguments. Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] cpumask: export cpu_online_map and cpu_possible_map consistentlyGreg Banks1-0/+3
cpumask: ensure that the cpu_online_map and cpu_possible_map bitmasks, and hence all the macros in <linux/cpumask.h> that require them, are available to modules for all supported combinations of architecture and CONFIG_SMP. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] kretprobe spinlock deadlock patchbibo,mao1-4/+11
kprobe_flush_task() possibly calls kfree function during holding kretprobe_lock spinlock, if kfree function is probed by kretprobe that will incur spinlock deadlock. This patch moves kfree function out scope of kretprobe_lock. Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] disallow kprobes on notifier_call_chainbibo,mao1-1/+1
When kprobe is re-entered, the re-entered kprobe kernel path will will call atomic_notifier_call_chain function, if this function is kprobed that will incur numerous kprobe recursive fault. This patch disallows kprobes on atomic_notifier_call_chain function. Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] kprobe whitespace cleanupbibo,mao1-8/+8
Whitespace is used to indent, this patch cleans up these sentences by kernel coding style. Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] Kprobes: Make kprobe modules more portableAnanth N Mavinakayanahalli2-1/+26
In an effort to make kprobe modules more portable, here is a patch that: o Introduces the "symbol_name" field to struct kprobe. The symbol->address resolution now happens in the kernel in an architecture agnostic manner. 64-bit powerpc users no longer have to specify the ".symbols" o Introduces the "offset" field to struct kprobe to allow a user to specify an offset into a symbol. o The legacy mechanism of specifying the kprobe.addr is still supported. However, if both the kprobe.addr and kprobe.symbol_name are specified, probe registration fails with an -EINVAL. o The symbol resolution code uses kallsyms_lookup_name(). So CONFIG_KPROBES now depends on CONFIG_KALLSYMS o Apparantly kprobe modules were the only legitimate out-of-tree user of the kallsyms_lookup_name() EXPORT. Now that the symbol resolution happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig o Modify tcp_probe.c that uses the kprobe interface so as to make it work on multiple platforms (in its earlier form, the code wouldn't work, say, on powerpc) Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] usb: fixup usb so it uses struct pidEric W. Biederman1-4/+4
The problem with remembering a user space process by its pid is that it is possible that the process will exit, pid wrap around will occur. Converting to a struct pid avoid that problem, and paves the way for implementing a pid namespace. Also since usb is the only user of kill_proc_info_as_uid rename kill_proc_info_as_uid to kill_pid_info_as_uid and have the new version take a struct pid. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] Use struct pspace in next_pidmap and find_ge_pidEric W. Biederman1-6/+7
This updates my proc: readdir race fix (take 3) patch to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com> to introduce struct pspace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] Define struct pspaceSukadev Bhattiprolu1-22/+31
Define a per-container pid space object. And create one instance of this object, init_pspace, to define the entire pid space. Subsequent patches will provide/use interfaces to create/destroy pid spaces. Its a subset/rework of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285 . Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Kirill Korotaev <dev@sw.ru> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] Move pidmap to pspace.hSukadev Bhattiprolu1-6/+1
Move struct pidmap and PIDMAP_ENTRIES to a new file, include/linux/pspace.h where it will be used in subsequent patches to define pid spaces. Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285 [akpm@osdl.org: cleanups] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] pids coding style use struct pidmap in next_pidmapEric W. Biederman1-1/+1
Use struct pidmap instead of pidmap_t. This updates my proc: readdir race fix (take 3) patch to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com> to kill pidmap_t. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] pids: coding style: use struct pidmapSukadev Bhattiprolu1-5/+5
Use struct pidmap instead of pidmap_t. Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/271. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] file: modify struct fown_struct to use a struct pidEric W. Biederman1-1/+1
File handles can be requested to send sigio and sigurg to processes. By tracking the destination processes using struct pid instead of pid_t we make the interface safe from all potential pid wrap around problems. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] pid: export the symbols needed to use struct pid *Eric W. Biederman1-0/+3
pids aren't something that drivers should care about. However there are a lot of helper layers in the kernel that do care, and are built as modules. Before I can convert them to using struct pid instead of pid_t I need to export the appropriate symbols so they can continue to be built. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] pid: implement signal functions that take a struct pid *Eric W. Biederman1-10/+47
Currently the signal functions all either take a task or a pid_t argument. This patch implements variants that take a struct pid *. After all of the users have been update it is my intention to remove the variants that take a pid_t as using pid_t can be more work (an extra hash table lookup) and difficult to get right in the presence of multiple pid namespaces. There are two kinds of functions introduced in this patch. The are the general use functions kill_pgrp and kill_pid which take a priv argument that is ultimately used to create the appropriate siginfo information, Then there are _kill_pgrp_info, kill_pgrp_info, kill_pid_info the internal implementation helpers that take an explicit siginfo. The distinction is made because filling out an explcit siginfo is tricky, and will be even more tricky when pid namespaces are introduced. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] proc: readdir race fix (take 3)Eric W. Biederman1-0/+36
The problem: An opendir, readdir, closedir sequence can fail to report process ids that are continually in use throughout the sequence of system calls. For this race to trigger the process that proc_pid_readdir stops at must exit before readdir is called again. This can cause ps to fail to report processes, and it is in violation of posix guarantees and normal application expectations with respect to readdir. Currently there is no way to work around this problem in user space short of providing a gargantuan buffer to user space so the directory read all happens in on system call. This patch implements the normal directory semantics for proc, that guarantee that a directory entry that is neither created nor destroyed while reading the directory entry will be returned. For directory that are either created or destroyed during the readdir you may or may not see them. Furthermore you may seek to a directory offset you have previously seen. These are the guarantee that ext[23] provides and that posix requires, and more importantly that user space expects. Plus it is a simple semantic to implement reliable service. It is just a matter of calling readdir a second time if you are wondering if something new has show up. These better semantics are implemented by scanning through the pids in numerical order and by making the file offset a pid plus a fixed offset. The pid scan happens on the pid bitmap, which when you look at it is remarkably efficient for a brute force algorithm. Given that a typical cache line is 64 bytes and thus covers space for 64*8 == 200 pids. There are only 40 cache lines for the entire 32K pid space. A typical system will have 100 pids or more so this is actually fewer cache lines we have to look at to scan a linked list, and the worst case of having to scan the entire pid bitmap is pretty reasonable. If we need something more efficient we can go to a more efficient data structure for indexing the pids, but for now what we have should be sufficient. In addition this takes no additional locks and is actually less code than what we are doing now. Also another very subtle bug in this area has been fixed. It is possible to catch a task in the middle of de_thread where a thread is assuming the thread of it's thread group leader. This patch carefully handles that case so if we hit it we don't fail to return the pid, that is undergoing the de_thread dance. Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for providing the first fix, pointing this out and working on it. [oleg@tv-sign.ru: fix it] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] list module taint flags in Oops/panicRandy Dunlap1-3/+34
When listing loaded modules during an oops or panic, also list each module's Tainted flags if non-zero (P: Proprietary or F: Forced load only). If a module is did not taint the kernel, it is just listed like usbcore but if it did taint the kernel, it is listed like wizmodem(PF) Example: [ 3260.121718] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [ 3260.121729] [<ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8 [ 3260.121742] PGD fe8d067 PUD 264a6067 PMD 0 [ 3260.121748] Oops: 0002 [1] SMP [ 3260.121753] CPU 1 [ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp [ 3260.121785] Pid: 5556, comm: bash Tainted: P 2.6.18-git10 #1 [Alternatively, I can look into listing tainted flags with 'lsmod', but that won't help in oopsen/panics so much.] [akpm@osdl.org: cleanup] Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] Support piping into commands in /proc/sys/kernel/core_patternAndi Kleen2-1/+5
Using the infrastructure created in previous patches implement support to pipe core dumps into programs. This is done by overloading the existing core_pattern sysctl with a new syntax: |program When the first character of the pattern is a '|' the kernel will instead threat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file. This is useful for having automatic core dump analysis without filling up disks. The program can do some simple analysis and save only a summary of the core dump. The core dump proces will run with the privileges and in the name space of the process that caused the core dump. I also increased the core pattern size to 128 bytes so that longer command lines fit. Most of the changes comes from allowing core dumps without seeks. They are fairly straight forward though. One small incompatibility is that if someone had a core pattern previously that started with '|' they will get suddenly new behaviour. I think that's unlikely to be a real problem though. Additional background: > Very nice, do you happen to have a program that can accept this kind of > input for crash dumps? I'm guessing that the embedded people will > really want this functionality. I had a cheesy demo/prototype. Basically it wrote the dump to a file again, ran gdb on it to get a backtrace and wrote the summary to a shared directory. Then there was a simple CGI script to generate a "top 10" crashes HTML listing. Unfortunately this still had the disadvantage to needing full disk space for a dump except for deleting it afterwards (in fact it was worse because over the pipe holes didn't work so if you have a holey address map it would require more space). Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as cores (at least it worked with zsh's =(cat core) syntax), so it would be likely possible to do it without temporary space with a simple wrapper that calls it in the right way. I ran out of time before doing that though. The demo prototype scripts weren't very good. If there is really interest I can dig them out (they are currently on a laptop disk on the desk with the laptop itself being in service), but I would recommend to rewrite them for any serious application of this and fix the disk space problem. Also to be really useful it should probably find a way to automatically fetch the debuginfos (I cheated and just installed them in advance). If nobody else does it I can probably do the rewrite myself again at some point. My hope at some point was that desktops would support it in their builtin crash reporters, but at least the KDE people I talked too seemed to be happy with their user space only solution. Alan sayeth: I don't believe that piping as such as neccessarily the right model, but the ability to intercept and processes core dumps from user space is asked for by many enterprise users as well. They want to know about, capture, analyse and process core dumps, often centrally and in automated form. [akpm@osdl.org: loff_t != unsigned long] Signed-off-by: Andi Kleen <ak@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] Create call_usermodehelper_pipe()Andi Kleen1-1/+54
A new member in the ever growing family of call_usermode* functions is born. The new call_usermodehelper_pipe() function allows to pipe data to the stdin of the called user mode progam and behaves otherwise like the normal call_usermodehelp() (except that it always waits for the child to finish) Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>