aboutsummaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2006-03-27[PATCH] Notifier chain update: API changesAlan Stern11-61/+134
The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] lightweight robust futexes updates 2Ingo Molnar1-10/+4
futex.h updates: - get rid of FUTEX_OWNER_PENDING - it's not used - reduce ROBUST_LIST_LIMIT to a saner value Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] lightweight robust futexes updatesIngo Molnar7-7/+7
- fix: initialize the robust list(s) to NULL in copy_process. - doc update - cleanup: rename _inuser to _inatomic - __user cleanups and other small cleanups Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] lightweight robust futexes: x86_64Ingo Molnar2-2/+27
x86_64: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire up the new syscalls. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] lightweight robust futexes: i386Ingo Molnar2-2/+25
i386: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire up the new syscalls. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] lightweight robust futexes: compatIngo Molnar2-0/+21
32-bit syscall compatibility support. (This patch also moves all futex related compat functionality into kernel/futex_compat.c.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] lightweight robust futexes: coreIngo Molnar3-1/+100
Add the core infrastructure for robust futexes: structure definitions, the new syscalls and the do_exit() based cleanup mechanism. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] lightweight robust futexes: arch defaultsIngo Molnar7-0/+42
This patchset provides a new (written from scratch) implementation of robust futexes, called "lightweight robust futexes". We believe this new implementation is faster and simpler than the vma-based robust futex solutions presented before, and we'd like this patchset to be adopted in the upstream kernel. This is version 1 of the patchset. Background ---------- What are robust futexes? To answer that, we first need to understand what futexes are: normal futexes are special types of locks that in the noncontended case can be acquired/released from userspace without having to enter the kernel. A futex is in essence a user-space address, e.g. a 32-bit lock variable field. If userspace notices contention (the lock is already owned and someone else wants to grab it too) then the lock is marked with a value that says "there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to wait for the other guy to release it. The kernel creates a 'futex queue' internally, so that it can later on match up the waiter with the waker - without them having to know about each other. When the owner thread releases the futex, it notices (via the variable value) that there were waiter(s) pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up. Once all waiters have taken and released the lock, the futex is again back to 'uncontended' state, and there's no in-kernel state associated with it. The kernel completely forgets that there ever was a futex at that address. This method makes futexes very lightweight and scalable. "Robustness" is about dealing with crashes while holding a lock: if a process exits prematurely while holding a pthread_mutex_t lock that is also shared with some other process (e.g. yum segfaults while holding a pthread_mutex_t, or yum is kill -9-ed), then waiters for that lock need to be notified that the last owner of the lock exited in some irregular way. To solve such types of problems, "robust mutex" userspace APIs were created: pthread_mutex_lock() returns an error value if the owner exits prematurely - and the new owner can decide whether the data protected by the lock can be recovered safely. There is a big conceptual problem with futex based mutexes though: it is the kernel that destroys the owner task (e.g. due to a SEGFAULT), but the kernel cannot help with the cleanup: if there is no 'futex queue' (and in most cases there is none, futexes being fast lightweight locks) then the kernel has no information to clean up after the held lock! Userspace has no chance to clean up after the lock either - userspace is the one that crashes, so it has no opportunity to clean up. Catch-22. In practice, when e.g. yum is kill -9-ed (or segfaults), a system reboot is needed to release that futex based lock. This is one of the leading bugreports against yum. To solve this problem, 'Robust Futex' patches were created and presented on lkml: the one written by Todd Kneisel and David Singleton is the most advanced at the moment. These patches all tried to extend the futex abstraction by registering futex-based locks in the kernel - and thus give the kernel a chance to clean up. E.g. in David Singleton's robust-futex-6.patch, there are 3 new syscall variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER. The kernel attaches such robust futexes to vmas (via vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are searched to see whether they have a robust_head set. Lots of work went into the vma-based robust-futex patch, and recently it has improved significantly, but unfortunately it still has two fundamental problems left: - they have quite complex locking and race scenarios. The vma-based patches had been pending for years, but they are still not completely reliable. - they have to scan _every_ vma at sys_exit() time, per thread! The second disadvantage is a real killer: pthread_exit() takes around 1 microsecond on Linux, but with thousands (or tens of thousands) of vmas every pthread_exit() takes a millisecond or more, also totally destroying the CPU's L1 and L2 caches! This is very much noticeable even for normal process sys_exit_group() calls: the kernel has to do the vma scanning unconditionally! (this is because the kernel has no knowledge about how many robust futexes there are to be cleaned up, because a robust futex might have been registered in another task, and the futex variable might have been simply mmap()-ed into this process's address space). This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than that: the overhead makes robust futexes impractical for any type of generic Linux distribution. So it became clear to us, something had to be done. Last week, when Thomas Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he found a handful of new races and we were talking about it and were analyzing the situation. At that point a fundamentally different solution occured to me. This patchset (written in the past couple of days) implements that new solution. Be warned though - the patchset does things we normally dont do in Linux, so some might find the approach disturbing. Parental advice recommended ;-) New approach to robust futexes ------------------------------ At the heart of this new approach there is a per-thread private list of robust locks that userspace is holding (maintained by glibc) - which userspace list is registered with the kernel via a new syscall [this registration happens at most once per thread lifetime]. At do_exit() time, the kernel checks this user-space list: are there any robust futex locks to be cleaned up? In the common case, at do_exit() time, there is no list registered, so the cost of robust futexes is just a simple current->robust_list != NULL comparison. If the thread has registered a list, then normally the list is empty. If the thread/process crashed or terminated in some incorrect way then the list might be non-empty: in this case the kernel carefully walks the list [not trusting it], and marks all locks that are owned by this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any). The list is guaranteed to be private and per-thread, so it's lockless. There is one race possible though: since adding to and removing from the list is done after the futex is acquired by glibc, there is a few instructions window for the thread (or process) to die there, leaving the futex hung. To protect against this possibility, userspace (glibc) also maintains a simple per-thread 'list_op_pending' field, to allow the kernel to clean up if the thread dies after acquiring the lock, but just before it could have added itself to the list. Glibc sets this list_op_pending field before it tries to acquire the futex, and clears it after the list-add (or list-remove) has finished. That's all that is needed - all the rest of robust-futex cleanup is done in userspace [just like with the previous patches]. Ulrich Drepper has implemented the necessary glibc support for this new mechanism, which fully enables robust mutexes. (Ulrich plans to commit these changes to glibc-HEAD later today.) Key differences of this userspace-list based approach, compared to the vma based method: - it's much, much faster: at thread exit time, there's no need to loop over every vma (!), which the VM-based method has to do. Only a very simple 'is the list empty' op is done. - no VM changes are needed - 'struct address_space' is left alone. - no registration of individual locks is needed: robust mutexes dont need any extra per-lock syscalls. Robust mutexes thus become a very lightweight primitive - so they dont force the application designer to do a hard choice between performance and robustness - robust mutexes are just as fast. - no per-lock kernel allocation happens. - no resource limits are needed. - no kernel-space recovery call (FUTEX_RECOVER) is needed. - the implementation and the locking is "obvious", and there are no interactions with the VM. Performance ----------- I have benchmarked the time needed for the kernel to process a list of 1 million (!) held locks, using the new method [on a 2GHz CPU]: - with FUTEX_WAIT set [contended mutex]: 130 msecs - without FUTEX_WAIT set [uncontended mutex]: 30 msecs I have also measured an approach where glibc does the lock notification [which it currently does for !pshared robust mutexes], and that took 256 msecs - clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do. (1 million held locks are unheard of - we expect at most a handful of locks to be held at a time. Nevertheless it's nice to know that this approach scales nicely.) Implementation details ---------------------- The patch adds two new syscalls: one to register the userspace list, and one to query the registered list pointer: asmlinkage long sys_set_robust_list(struct robust_list_head __user *head, size_t len); asmlinkage long sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr, size_t __user *len_ptr); List registration is very fast: the pointer is simply stored in current->robust_list. [Note that in the future, if robust futexes become widespread, we could extend sys_clone() to register a robust-list head for new threads, without the need of another syscall.] So there is virtually zero overhead for tasks not using robust futexes, and even for robust futex users, there is only one extra syscall per thread lifetime, and the cleanup operation, if it happens, is fast and straightforward. The kernel doesnt have any internal distinction between robust and normal futexes. If a futex is found to be held at exit time, the kernel sets the highest bit of the futex word: #define FUTEX_OWNER_DIED 0x40000000 and wakes up the next futex waiter (if any). User-space does the rest of the cleanup. Otherwise, robust futexes are acquired by glibc by putting the TID into the futex field atomically. Waiters set the FUTEX_WAITERS bit: #define FUTEX_WAITERS 0x80000000 and the remaining bits are for the TID. Testing, architecture support ----------------------------- I've tested the new syscalls on x86 and x86_64, and have made sure the parsing of the userspace list is robust [ ;-) ] even if the list is deliberately corrupted. i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the new glibc code (on x86_64 and i386), and it works for his robust-mutex testcases. All other architectures should build just fine too - but they wont have the new syscalls yet. Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline function before writing up the syscalls (that function returns -ENOSYS right now). This patch: Add placeholder futex_atomic_cmpxchg_inuser() implementations to every architecture that supports futexes. It returns -ENOSYS. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] mips: add ptr_to_compat()Ingo Molnar1-0/+5
Add ptr_to_compat() - needed by the new robust futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] parisc: add ptr_to_compat()Ingo Molnar1-0/+5
Add ptr_to_compat() to parisc - needed by the new robust futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <iod00d@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] s390: add ptr_to_compat()Ingo Molnar1-0/+5
Add ptr_to_compat() to s390 - needed by the new robust-futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> untested. CHECKME: am i right about the 0x7fffffffUL masking? Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] ia64: add ptr_to_compat()Ingo Molnar1-0/+6
Add ptr_to_compat() to ia64 - needed by the robust-futex code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify PFN_* macrosDave Hansen4-12/+10
Just about every architecture defines some macros to do operations on pfns. They're all virtually identical. This patch consolidates all of them. One minor glitch is that at least i386 uses them in a very skeletal header file. To keep away from #include dependency hell, I stuck the new definitions in a new, isolated header. Of all of the implementations, sh64 is the only one that varied by a bit. It used some masks to ensure that any sign-extension got ripped away before the arithmetic is done. This has been posted to that sh64 maintainers and the development list. Compiles on x86, x86_64, ia64 and ppc64. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] uninline zone helpersKAMEZAWA Hiroyuki1-35/+3
Helper functions for for_each_online_pgdat/for_each_zone look too big to be inlined. Speed of these helper macro itself is not very important. (inner loops are tend to do more work than this) This patch make helper function to be out-of-lined. inline out-of-line .text 005c0680 005bf6a0 005c0680 - 005bf6a0 = FE0 = 4Kbytes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] for_each_online_pgdat: remove pgdat_listKAMEZAWA Hiroyuki1-3/+0
By using for_each_online_pgdat(), pgdat_list is not necessary now. This patch removes it. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] for_each_online_pgdat: for_each_bootmemKAMEZAWA Hiroyuki1-0/+1
Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is sorted by node_boot_start. Only nodes against which init_bootmem() is called are linked to the list. (i386 allocates bootmem only from one node(0) not from all online nodes.) A summary: 1. for_each_online_pgdat() traverses all *online* nodes. 2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] define for_each_online_pgdatKAMEZAWA Hiroyuki2-51/+61
This patch defines for_each_online_pgdat() as a replacement of for_each_pgdat() Now, online nodes are managed by node_online_map. But for_each_pgdat() uses pgdat_link to iterate over all nodes(pgdat). This means management structure for online pgdat is duplicated. I think using node_online_map for for_each_pgdat() is simple and sane rather ather than pgdat_link. New macro is named as for_each_online_pgdat(). Following patch will fix callers of for_each_pgdat(). The bootmem allocater uses for_each_pgdat() before pgdat initialization. I don't think it's sane. Following patch will fix it. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] remove zone_mem_mapKAMEZAWA Hiroyuki3-8/+6
This patch removes zone_mem_map. pfn_to_page uses pgdat, page_to_pfn uses zone. page_to_pfn can use pgdat instead of zone, which is only one user of zone_mem_map. By modifing it, we can remove zone_mem_map. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: ia64 pfn_to_pageKAMEZAWA Hiroyuki1-5/+13
ia64 has special config CONFIG_VIRTUAL_MEM_MAP. CONFIG_DISCONTIGMEM=y && CONFIG_VIRTUAL_MEM_MAP!=y is bug ? Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: xtensa pfn_to_pageKAMEZAWA Hiroyuki1-4/+2
xtensa can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: v850 pfn_to_pageKAMEZAWA Hiroyuki1-2/+2
v850 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: uml pfn_to_pageKAMEZAWA Hiroyuki1-3/+1
UML can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: sparc pfn_to_pageKAMEZAWA Hiroyuki1-2/+2
sparc can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: sh64 pfn_to_pageKAMEZAWA Hiroyuki1-3/+2
sh64 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Richard Curnow <rc@rc0.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: sh pfn_to_pageKAMEZAWA Hiroyuki1-3/+2
sh can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: s390 pfn_to_pageKAMEZAWA Hiroyuki1-2/+1
s390 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: ppc pfn_to_pageKAMEZAWA Hiroyuki1-2/+2
PPC can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: parisc pfn_to_pageKAMEZAWA Hiroyuki2-19/+1
PARISC can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: mips pfn_to_pageKAMEZAWA Hiroyuki2-16/+1
MIPS can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: m32r pfn_to_pageKAMEZAWA Hiroyuki2-17/+2
m32r can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: h8300 pfn_to_pageKAMEZAWA Hiroyuki1-2/+2
H8300 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: FRV pfn_to_pageKAMEZAWA Hiroyuki1-5/+2
FRV can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: cris pfn_to_pageKAMEZAWA Hiroyuki1-2/+2
cris can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: arm26 pfn_to_pageKAMEZAWA Hiroyuki1-2/+2
arm26 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: arm pfn_to_pageKAMEZAWA Hiroyuki1-10/+5
ARM can use generic funcs. PFN_TO_NID, LOCAL_MAP_NR are defined by sub-archs. Signed-off-by: KAMEZAWA Hirotuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: alpha pfn_to_pageKAMEZAWA Hiroyuki2-18/+2
Alpha can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: powerpc pfn_to_pageKAMEZAWA Hiroyuki1-2/+1
PowerPC can use generic ones. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: x86_64 pfn_to_pageKAMEZAWA Hiroyuki2-6/+1
x86_64 can use generic funcs. For DISCONTIGMEM, CONFIG_OUT_OF_LINE_PFN_TO_PAGE is selected. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: i386 pfn_to_pageKAMEZAWA Hiroyuki2-19/+1
i386 can use generic funcs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] unify pfn_to_page: generic functionsKAMEZAWA Hiroyuki3-11/+79
There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM. Each arch has its own page_to_pfn(), pfn_to_page() for each models. But most of them can use the same arithmetic. This patch adds asm-generic/memory_model.h, which includes generic page_to_pfn(), pfn_to_page() definitions for each memory model. When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are used instead of macro. This is enabled by some archs and reduces text size. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] sched: new sched domain for representing multi-coreSiddha, Suresh B6-0/+23
Add a new sched domain for representing multi-core with shared caches between cores. Consider a dual package system, each package containing two cores and with last level cache shared between cores with in a package. If there are two runnable processes, with this appended patch those two processes will be scheduled on different packages. On such systems, with this patch we have observed 8% perf improvement with specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2 users). This new domain will come into play only on multi-core systems with shared caches. On other systems, this sched domain will be removed by domain degeneration code. This new domain can be also used for implementing power savings policy (see OLS 2005 CMP kernel scheduler paper for more details.. I will post another patch for power savings policy soon) Most of the arch/* file changes are for cpu_coregroup_map() implementation. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] fs/nfsd/export.c,net/sunrpc/cache.c: make needlessly global code staticAdrian Bunk2-5/+1
We can now make some code static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] knfsd: Convert sunrpc_cache to use krefsNeilBrown2-10/+7
.. it makes some of the code nicer. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] knfsd: Unexport cache_fresh and fix a small raceNeilBrown1-2/+0
Cache_fresh is now only used in cache.c, so unexport it. Part of cache_fresh (setting CACHE_VALID) should really be done under the lock, while part (calling cache_revisit_request etc) must be done outside the lock. So we split it up appropriately. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] knfsd: Remove DefineCacheLookupNeilBrown1-113/+0
This has been replaced by more traditional code. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] knfsd: Create cache_lookup function instead of using a macro to ↵NeilBrown1-0/+12
declare one The C++-like 'template' approach proves to be too ugly and hard to work with. The old 'template' won't go away until all users are updated. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] knfsd: Get rid of 'inplace' sunrpc cachesNeilBrown1-17/+11
These were an unnecessary wart. Also only have one 'DefineSimpleCache..' instead of two. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] knfsd: Break the hard linkage from svc_expkey to svc_exportNeilBrown1-16/+4
Current svc_expkey holds a pointer to the svc_export structure, so updates to that structure have to be in-place, which is a wart on the whole cache infrastruct. So we break that linkage and just do a second lookup. If this became a performance issue, it would be possible to put a direct link back in which was only used conditionally. i.e. when an object is replaced in the cache, we set a flag in the old object. When dereferencing the link from svc_expkey, if the flag is set, we drop the reference and do a fresh lookup. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] knfsd: Change the store of auth_domains to not be a 'cache'NeilBrown1-5/+7
The 'auth_domain's are simply handles on internal data structures. They do not cache information from user-space, and forcing them into the mold of a 'cache' misrepresents their true nature and causes confusion. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] autofs4: add new packet type for v5 communicationsIan Kent1-5/+46
This patch define a new autofs packet for autofs v5 and updates the waitq.c functions to handle the additional packet type. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] autofs4: increase module versionIan Kent1-1/+1
Update autofs4 version. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] uml: fix build warnings in __get_userJeff Dike1-21/+10
Fix a gcc warning about losing qualifiers to the first argument of copy_from_user. The typeof change for correctness, and fixes a lot of the warnings, but there are some cases where x has some extra qualifiers, like volatile, which copy_from_user can't know about. For these, the void * cast seems to be necessary. Also cleaned up some of the whitespace and got rid of the emacs comment at the bottom. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: remove unused generic bitops in include/linux/bitops.hAkinobu Mita1-123/+1
generic_{ffs,fls,fls64,hweight{64,32,16,8}}() were moved into include/asm-generic/bitops.h. So all architectures don't use them. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: sh: make thread_info.flags an unsigned longAkinobu Mita1-1/+1
The test_bit() routines are defined to work on a pointer to unsigned long. But thread_info.flags is __u32 (unsigned int) on sh and it is passed to flag set/clear/test wrappers in include/linux/thread_info.h. So the compiler will print warnings. This patch changes to unsigned long instead. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: update include/asm-generic/bitops.hAkinobu Mita1-63/+13
Currently include/asm-generic/bitops.h is not referenced from anywhere. But it will be the benefit of those who are trying to port Linux to another architecture. So update it by same manner Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: xtensa: use generic bitopsAkinobu Mita1-332/+8
- remove {,test_and_}{set,clear,change}_bit() - remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove generic_fls64() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove generic_hweight{32,16,8}() - remove sched_find_first_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: x86_64: use generic bitopsAkinobu Mita1-36/+6
- remove sched_find_first_bit() - remove generic_hweight{64,32,16,8}() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: v850: use generic bitopsAkinobu Mita1-209/+11
- remove ffz() - remove find_{next,first}{,_zero}_bit() - remove generic_ffs() - remove generic_fls() - remove generic_fls64() - remove __ffs() - remove sched_find_first_bit() - remove generic_hweight{32,16,8}() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: sparc64: use generic bitopsAkinobu Mita1-206/+13
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove ffz() - remove __ffs() - remove generic_fls() - remove generic_fls64() - remove sched_find_first_bit() - remove ffs() - unless defined(ULTRA_HAS_POPULATION_COUNT) - remove generic_hweight{64,32,16,8}() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: sparc: use generic bitopsAkinobu Mita1-376/+12
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove ffz() - remove __ffs() - remove sched_find_first_bit() - remove ffs() - remove generic_fls() - remove generic_fls64() - remove generic_hweight{32,16,8}() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove ext2_{set,clear}_bit_atomic() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: sh64: use generic bitopsAkinobu Mita1-373/+11
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove __ffs() - remove find_{next,first}{,_zero}_bit() - remove generic_hweight{32,16,8}() - remove sched_find_first_bit() - remove generic_ffs() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove ext2_{set,clear}_bit_atomic() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() - remove generic_fls() - remove generic_fls64() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Richard Curnow <rc@rc0.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: sh: use generic bitopsAkinobu Mita1-332/+10
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove find_{next,first}{,_zero}_bit() - remove generic_ffs() - remove generic_hweight{32,16,8}() - remove sched_find_first_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove ext2_{set,clear}_bit_atomic() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() - remove generic_fls() - remove generic_fls64() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: s390: use generic bitopsAkinobu Mita1-39/+5
- remove generic_ffs() - remove generic_fls() - remove generic_fls64() - remove generic_hweight{64,32,16,8}() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: powerpc: use generic bitopsAkinobu Mita1-101/+4
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove generic_fls64() - remove generic_hweight{64,32,16,8}() - remove sched_find_first_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: parisc: use generic bitopsAkinobu Mita1-277/+9
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove ffz() - remove generic_fls64() - remove generic_hweight{32,16,8}() - remove generic_hweight64() - remove sched_find_first_bit() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: mips: use generic bitopsAkinobu Mita1-449/+16
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - unless defined(CONFIG_CPU_MIPS32) or defined(CONFIG_CPU_MIPS64) - remove __ffs() - remove ffs() - remove ffz() - remove fls() - remove fls64() - remove find_{next,first}{,_zero}_bit() - remove sched_find_first_bit() - remove generic_hweight64() - remove generic_hweight{32,16,8}() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove ext2_{set,clear}_bit_atomic() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: m68knommu: use generic bitopsAkinobu Mita1-212/+9
- remove ffs() - remove __ffs() - remove sched_find_first_bit() - remove ffz() - remove find_{next,first}{,_zero}_bit() - remove generic_hweight() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() - remove generic_fls() - remove generic_fls64() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] m68k: fix undefined reference to generic_find_next_zero_le_bitAkinobu Mita1-2/+52
This patch reverts ext2 bitmap functions. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: m68k: use generic bitopsAkinobu Mita1-81/+5
- remove generic_fls64() - remove sched_find_first_bit() - remove generic_hweight() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: m32r: use generic bitopsAkinobu Mita1-445/+12
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove ffz() - remove find_{next,first}{,_zero}_bit() - remove __ffs() - remove generic_fls() - remove generic_fls64() - remove sched_find_first_bit() - remove generic_ffs() - remove generic_hweight{32,16,8}() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove ext2_{set,clear}_bit_atomic() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: ia64: use generic bitopsAkinobu Mita1-50/+17
- remove generic_fls64() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() - remove sched_find_first_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: i386: use generic bitopsAkinobu Mita1-47/+8
- remove generic_fls64() - remove sched_find_first_bit() - remove generic_hweight{32,16,8}() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: h8300: use generic bitopsAkinobu Mita1-213/+9
- remove generic_ffs() - remove find_{next,first}{,_zero}_bit() - remove sched_find_first_bit() - remove generic_hweight{32,16,8}() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove ext2_{set,clear}_bit_atomic() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() - remove generic_fls() - remove generic_fls64() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: frv: use generic bitopsAkinobu Mita1-161/+9
- remove ffz() - remove find_{next,first}{,_zero}_bit() - remove generic_ffs() - remove __ffs() - remove generic_fls64() - remove sched_find_first_bit() - remove generic_hweight{32,16,8}() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: cris: use generic bitopsAkinobu Mita1-226/+8
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove generic_fls() - remove generic_fls64() - remove generic_hweight{32,16,8}() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() - remove sched_find_first_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Acked-by: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: arm26: use generic bitopsAkinobu Mita1-136/+10
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove ffz() - remove __ffs() - remove generic_fls() - remove generic_fls64() - remove generic_ffs() - remove sched_find_first_bit() - remove generic_hweight{32,16,8}() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: arm: use generic bitopsAkinobu Mita1-136/+10
- remove __{,test_and_}{set,clear,change}_bit() and test_bit() - if __LINUX_ARM_ARCH__ < 5 - remove ffz() - remove __ffs() - remove generic_fls() - remove generic_ffs() - remove generic_fls64() - remove sched_find_first_bit() - remove generic_hweight{32,16,8}() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: alpha: use generic bitopsAkinobu Mita1-116/+7
- unless defined(__alpha_cix__) and defined(__alpha_fix__) - remove generic_fls() - remove generic_hweight{64,32,16,8}() - remove generic_fls64() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic ↵Akinobu Mita2-0/+32
minix_{test,set,test_and_clear,test,find_first_zero}_bit() This patch introduces the C-language equivalents of the functions below: int minix_test_and_set_bit(int nr, volatile unsigned long *addr); int minix_set_bit(int nr, volatile unsigned long *addr); int minix_test_and_clear_bit(int nr, volatile unsigned long *addr); int minix_test_bit(int nr, const volatile unsigned long *addr); unsigned long minix_find_first_zero_bit(const unsigned long *addr, unsigned long size); In include/asm-generic/bitops/minix.h and include/asm-generic/bitops/minix-le.h This code largely copied from: include/asm-sparc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic ext2_{set,clear}_bit_atomic()Akinobu Mita1-0/+22
This patch introduces the C-language equivalents of the functions below: int ext2_set_bit_atomic(int nr, volatile unsigned long *addr); int ext2_clear_bit_atomic(int nr, volatile unsigned long *addr); In include/asm-generic/bitops/ext2-atomic.h This code largely copied from: include/asm-sparc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic ↵Akinobu Mita2-0/+71
ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() This patch introduces the C-language equivalents of the functions below: int ext2_set_bit(int nr, volatile unsigned long *addr); int ext2_clear_bit(int nr, volatile unsigned long *addr); int ext2_test_bit(int nr, const volatile unsigned long *addr); unsigned long ext2_find_first_zero_bit(const unsigned long *addr, unsigned long size); unsinged long ext2_find_next_zero_bit(const unsigned long *addr, unsigned long size); In include/asm-generic/bitops/ext2-non-atomic.h This code largely copied from: include/asm-powerpc/bitops.h include/asm-parisc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] fix error: __u32 undeclaredAkinobu Mita2-0/+4
Build fix for s390 declare __u32 and __u64. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic hweight{64,32,16,8}()Akinobu Mita1-0/+9
This patch introduces the C-language equivalents of the functions below: unsigned int hweight32(unsigned int w); unsigned int hweight16(unsigned int w); unsigned int hweight8(unsigned int w); unsigned long hweight64(__u64 w); In include/asm-generic/bitops/hweight.h This code largely copied from: include/linux/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic ffs()Akinobu Mita1-0/+41
This patch introduces the C-language equivalent of the function: int ffs(int x); In include/asm-generic/bitops/ffs.h This code largely copied from: include/linux/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic sched_find_first_bit()Akinobu Mita1-0/+36
This patch introduces the C-language equivalent of the function: int sched_find_first_bit(const unsigned long *b); In include/asm-generic/bitops/sched.h This code largely copied from: include/asm-powerpc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic find_{next,first}{,_zero}_bit()Akinobu Mita1-0/+13
This patch introduces the C-language equivalents of the functions below: unsigned logn find_next_bit(const unsigned long *addr, unsigned long size, unsigned long offset); unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, unsigned long offset); unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size); unsigned long find_first_bit(const unsigned long *addr, unsigned long size); In include/asm-generic/bitops/find.h This code largely copied from: arch/powerpc/lib/bitops.c Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic fls64()Akinobu Mita1-0/+12
This patch introduces the C-language equivalent of the function: int fls64(__u64 x); In include/asm-generic/bitops/fls64.h This code largely copied from: include/linux/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic fls()Akinobu Mita1-0/+41
This patch introduces the C-language equivalent of the function: int fls(int x); In include/asm-generic/bitops/fls.h This code largely copied from: include/linux/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic ffz()Akinobu Mita1-0/+12
This patch introduces the C-language equivalent of the function: unsigned long ffz(unsigned long word); In include/asm-generic/bitops/ffz.h This code largely copied from: include/asm-parisc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic __ffs()Akinobu Mita1-0/+43
This patch introduces the C-language equivalent of the function: unsigned long __ffs(unsigned long word); In include/asm-generic/bitops/__ffs.h This code largely copied from: include/asm-sparc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic __{,test_and_}{set,clear,change}_bit() and test_bit()Akinobu Mita1-0/+111
This patch introduces the C-language equivalents of the functions below: void __set_bit(int nr, volatile unsigned long *addr); void __clear_bit(int nr, volatile unsigned long *addr); void __change_bit(int nr, volatile unsigned long *addr); int __test_and_set_bit(int nr, volatile unsigned long *addr); int __test_and_clear_bit(int nr, volatile unsigned long *addr); int __test_and_change_bit(int nr, volatile unsigned long *addr); int test_bit(int nr, const volatile unsigned long *addr); In include/asm-generic/bitops/non-atomic.h This code largely copied from: asm-powerpc/bitops.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: generic {,test_and_}{set,clear,change}_bit()Akinobu Mita1-0/+191
This patch introduces the C-language equivalents of the functions below: void set_bit(int nr, volatile unsigned long *addr); void clear_bit(int nr, volatile unsigned long *addr); void change_bit(int nr, volatile unsigned long *addr); int test_and_set_bit(int nr, volatile unsigned long *addr); int test_and_clear_bit(int nr, volatile unsigned long *addr); int test_and_change_bit(int nr, volatile unsigned long *addr); In include/asm-generic/bitops/atomic.h This code largely copied from: include/asm-powerpc/bitops.h include/asm-parisc/bitops.h include/asm-parisc/atomic.h Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: use non atomic operations for minix_*_bit() and ext2_*_bit()Akinobu Mita15-72/+60
Bitmap functions for the minix filesystem and the ext2 filesystem except ext2_set_bit_atomic() and ext2_clear_bit_atomic() do not require the atomic guarantees. But these are defined by using atomic bit operations on several architectures. (cris, frv, h8300, ia64, m32r, m68k, m68knommu, mips, s390, sh, sh64, sparc, sparc64, v850, and xtensa) This patch switches to non atomic bit operation. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: cris: remove unnecessary local_irq_restore()Akinobu Mita1-1/+0
remove unnecessary local_irq_restore() after cris_atomic_restore() in test_and_set_bit(). Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: parisc: add ()-pair in __ffz() macroAkinobu Mita1-1/+1
Noticed by Michael Tokarev add missing ()-pair in __ffz() macro for parisc Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] bitops: alpha: use config options instead of __alpha_fix__ and ↵Akinobu Mita2-7/+7
__alpha_cix__ Use config options instead of gcc builtin definition to tell the use of instruction set extensions (CIX and FIX). This is introduced to tell the kbuild system the use of opmized hweight*() routines on alpha architecture. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] arm: fix undefined reference to generic_flsAkinobu Mita1-1/+30
This patch defines constant_fls() instead of removed generic_fls(). Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] more s/fucn/func/ typo fixesAkinobu Mita2-5/+5
s/fucntion/function/ typo fixes Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] isdn4linux: Siemens Gigaset drivers - tty interfaceHansjoerg Lipp1-0/+32
And: Tilman Schmidt <tilman@imap.cc> This patch adds the tty interface to the gigaset module. The tty interface provides direct access to the AT command set of the Gigaset devices. Signed-off-by: Hansjoerg Lipp <hjlipp@web.de> Signed-off-by: Tilman Schmidt <tilman@imap.cc> Cc: Karsten Keil <kkeil@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] x86: kprobes-boosterMasami Hiramatsu1-0/+6
Current kprobe copies the original instruction at the probe point and replaces it with a breakpoint instruction (int3). When the kernel hits the probe point, kprobe handler is invoked. And the copied instruction is single-step executed on the copied buffer (not on the original address) by kprobe. After that, the kprobe checks registers and modify it (if need) as if the instructions was executed on the original address. My proposal is based on the fact there are many instructions which do NOT require the register modification after the single-step execution. When the copied instruction is a kind of them, kprobe just jumps back to the next instruction after single-step execution. If so, why don't we execute those instructions directly? With kprobe-booster patch, kprobes will execute a copied instruction directly and (if need) jump back to original code. This direct execution is executed when the kprobe don't have both post_handler and break_handler, and the copied instruction can be executed directly. I sorted instructions which can be executed directly or not; - Call instructions are NG(can not be executed directly). We should correct the return address pushed into top of stack. - Indirect instructions except for absolute indirect-jumps are NG. Those instructions changes EIP randomly. We should check EIP and correct it. - Instructions that change EIP beyond the range of the instruction buffer are NG. - Instructions that change EIP to tail 5 bytes of the instruction buffer (it is the size of a jump instruction). We must write a jump instruction which backs to original kernel code in the instruction buffer. - Break point instruction is NG. We should not touch EIP and pass to other handlers. - Absolute direct/indirect jumps are OK.- Conditional Jumps are NG. - Halt and software-interruptions are NG. Because it will stay on the instruction buffer of kprobes. - Prefixes are NG. - Unknown/reserved opcode is NG. - Other 1 byte instructions are OK. But those instructions need a jump back code. - 2 bytes instructions are mapped sparsely. So, in this release, this patch don't boost those instructions. >From Intel's IA-32 opcode map described in IA-32 Intel Architecture Software Developer's Manual Vol.2 B, I determined that following opcodes are not boostable. - 0FH (2byte escape) - 70H - 7FH (Jump on condition) - 9AH (Call) and 9CH (Pushf) - C0H-C1H (Grp 2: includes reserved opcode) - C6H-C7H (Grp11: includes reserved opcode) - CCH-CEH (Software-interrupt) - D0H-D3H (Grp2: includes reserved opcode) - D6H (Reserved) - D8H-DFH (Coprocessor) - E0H-E3H (loop/conditional jump) - E8H (Call) - F0H-F3H (Prefixes and reserved) - F4H (Halt) - F6H-F7H (Grp3: includes reserved opcode) - FEH-FFH(Grp4,5: includes reserved opcode) Kprobe-booster checks whether target instruction can be boosted (can be executed directly) at arch_copy_kprobe() function. If the target instruction can be boosted, it clears "boostable" flag. If not, it sets "boostable" flag -1. This is disabled status. In resume_execution() function, If "boostable" flag is cleared, kprobe-booster measures the size of the target instruction and sets "boostable" flag 1. In kprobe_handler(), kprobe checks the "boostable" flag. If the flag is 1, it resets current kprobe and executes instruction buffer directly instead of single stepping. When unregistering a boosted kprobe, it calls synchronize_sched() after "int3" is removed. So we can ensure followings after the synchronize_sched() called. - interrupt handlers are finished on all CPUs. - instruction buffer is not executed on all CPUs. And we can release the boosted kprobe safely. And also, on preemptible kernel, the booster is not enabled where the kernel preemption is enabled. So, there are no preempted threads on the instruction buffer. The description of kretprobe-booster: ==================================== In the normal operation, kretprobe make a target function return to trampoline code. And a kprobe (called trampoline_probe) have been inserted at the trampoline code. When the kernel hits this kprobe, it calls kretprobe's handler and it returns to original return address. Kretprobe-booster patch removes the trampoline_probe. It allows the trampoline code to call kretprobe's handler directly instead of invoking kprobe. And tranpoline code returns to original return address. This new trampoline code stores and restores registers, so the kretprobe handler is still able to access those registers. Current kprobe has about 1.3 usec/probe(*) overhead, and kprobe-booster patch reduces it to 0.6 usec/probe(*). Also current kretprobe has about 2.0 usec/probe(*) overhead. Kprobe-booster patch reduces it to 1.3 usec/probe(*), and the combination of both kprobe-booster patch and kretprobe-booster patch reduces it to 0.9 usec/probe(*). I expect the combination of both patches can reduce half of a probing overhead. Performance numbers strongly depend on the processor model. Andrew Morton wrote: > These preempt tricks look rather nasty. Can you please describe what the > problem is, precisely? And how this code avoids it? Perhaps we can find > something cleaner. The problem is how to remove the copied instructions of the kprobe *safely* on the preemptable kernel (CONFIG_PREEMPT=y). Kprobes basically executes the following actions; (1)int3 (2)preempt_disable() (3)kprobe_prehandler() (4)copied instructioin(single step) (5)kprobe_posthandler() (6)preempt_enable() (7)return to the original code During the execution of copied instruction, preemption is disabled (from step (2) to (6)). When unregistering the probes, Kprobe waits for RCU quiescent state by using synchronize_sched() after removing int3 instruction. Thus we can ensure the copied instruction is not executed. On the other hand, kprobe-booster executes the following actions; (1)int3 (2)preempt_disable() (3)kprobe_prehandler() (4)preempt_enable() <-- this one is added by my patch (5)copied instruction(direct execution) (6)jmp back to the original code The problem is that we have no way to prevent preemption on step (5) or (6). We cannot call preempt_disable() after step (6), because there are no rooms to do that. Thus, some other processes may be preempted at step(5) or (6) on preemptable kernel. And I couldn't find the easy way to ensure that other processes' stack do *not* have the address of them. (I thought some way to do that, but those are very costly.) So currently, I simply boost the kprobe only when the probe point is already preemption disabled. > Also, the patch adds a preempt_enable() but I don't see a corresponding > preempt_disable(). Am I missing something? It is corresponding to the preempt_disable() in the top of kprobe_handler(). I copied the code of kprobe_handler() here: static int __kprobes kprobe_handler(struct pt_regs *regs) { struct kprobe *p; int ret = 0; kprobe_opcode_t *addr = NULL; unsigned long *lp; struct kprobe_ctlblk *kcb; /* * We don't want to be preempted for the entire * duration of kprobe processing */ preempt_disable(); <-- HERE kcb = get_kprobe_ctlblk(); Signed-off-by: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hrtimers: remove data fieldRoman Zippel3-5/+4
The nanosleep cleanup allows to remove the data field of hrtimer. The callback function can use container_of() to get it's own data. Since the hrtimer structure is anyway embedded in other structures, this adds no overhead. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hrtimers: remove nsec_t typedefRoman Zippel1-12/+6
nsec_t predates ktime_t and has mostly been superseded by it. In the few places that are left it's better to make it explicit that we're dealing with 64 bit values here. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hrtimers: remove DEFINE_KTIME and ktime_to_clock_t()Roman Zippel1-20/+0
Now that it_real_value is gone, the last user of DEFINE_KTIME and ktime_to_clock_t are also gone, so remove it before someone starts using it again. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hrtimers: remove state fieldRoman Zippel1-9/+2
Remove the state field and encode this information in the rb_node similiar to normal timer. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hrtimers: simplify nanosleepRoman Zippel1-3/+1
nanosleep is the only user of the expired state, so let it manage this itself, which makes the hrtimer code a bit simpler. The remaining time is also only calculated if requested. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hrtimers: pass current time to hrtimer_forward()Roman Zippel1-1/+2
Pass current time to hrtimer_forward(). This allows to use the softirq time in the timer base when the forward function is called from the timer callback. Other places pass current time with a call to timer->base->get_time(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hrtimers: optimize softirq runqueuesThomas Gleixner1-8/+12
The hrtimer softirq is called from the timer softirq every tick. Retrieve the current time from xtime and wall_to_monotonic instead of calling base->get_time() for each timer base. Store the time in the base structure and provide a hook once clock source abstractions are in place and to keep the code open for new base clocks. Based on a patch from: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] remove ->get_blocks() supportBadari Pulavarty1-10/+7
Now that get_block() can handle mapping multiple disk blocks, no need to have ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO and makes it users use get_block() instead. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] pass b_size to ->get_block()Badari Pulavarty1-0/+1
Pass amount of disk needs to be mapped to get_block(). This way one can modify the fs ->get_block() functions to map multiple blocks at the same time. [akpm@osdl.org: performance tweak] [akpm@osdl.org: remove unneeded assignments] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] change buffer_head.b_size to size_tBadari Pulavarty1-8/+11
Increase the size of the buffer_head b_size field (only) for 64 bit platforms. Update some old and moldy comments in and around the structure as well. The b_size increase allows us to perform larger mappings and allocations for large I/O requests from userspace, which tie in with other changes allowing the get_block_t() interface to map multiple blocks at once. Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] ext3_get_blocks: support multiple blocks allocation in ext3_new_block()Mingming Cao1-1/+4
Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to allocate the requested number of blocks on a best effort basis: After allocated the first block, it will always attempt to allocate the next few(up to the requested size and not beyond the reservation window) adjacent blocks at the same time. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] ext3_get_blocks: Mapping multiple blocks at a onceMingming Cao1-3/+3
Currently ext3_get_block() only maps or allocates one block at a time. This is quite inefficient for sequential IO workload. I have posted a early implements a simply multiple block map and allocation with current ext3. The basic idea is allocating the 1st block in the existing way, and attempting to allocate the next adjacent blocks on a best effort basis. More description about the implementation could be found here: http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2 The following the latest version of the patch: break the original patch into 5 patches, re-worked some logicals, and fixed some bugs. The break ups are: [patch 1] Adding map multiple blocks at a time in ext3_get_blocks() [patch 2] Extend ext3_get_blocks() to support multiple block allocation [patch 3] Implement multiple block allocation in ext3-try-to-allocate (called via ext3_new_block()). [patch 4] Proper accounting updates in ext3_new_blocks() [patch 5] Adjust reservation window size properly (by the given number of blocks to allocate) before block allocation to increase the possibility of allocating multiple blocks in a single call. Tests done so far includes fsx,tiobench and dbench. The following numbers collected from Direct IO tests (1G file creation/read) shows the system time have been greatly reduced (more than 50% on my 8 cpu system) with the patches. 1G file DIO write: 2.6.15 2.6.15+patches real 0m31.275s 0m31.161s user 0m0.000s 0m0.000s sys 0m3.384s 0m0.564s 1G file DIO read: 2.6.15 2.6.15+patches real 0m30.733s 0m30.624s user 0m0.000s 0m0.004s sys 0m0.748s 0m0.380s Some previous test we did on buffered IO with using multiple blocks allocation and delayed allocation shows noticeable improvement on throughput and system time. This patch: Add support of mapping multiple blocks in one call. This is useful for DIO reads and re-writes (where blocks are already allocated), also is in line with Christoph's proposal of using getblocks() in mpage_readpage() or mpage_readpages(). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] 2TB files: change type of kstatfs entriesTakashi Sato1-5/+5
This fix was proposed by Trond Myklebust. He says: The type "sector_t" is heavily tied in to the block layer interface as an offset/handle to a block, and is subject to a supposedly block-specific configuration option: CONFIG_LBD. Despite this, it is used in struct kstatfs to save a couple of bytes on the stack whenever we call the filesystems' ->statfs(). So kstatfs's entries related to blocks are invalid on statfs64 for a network filesystem which has more than 2^32-1 blocks when CONFIG_LBD is disabled. - struct kstatfs Change the type of following entries from sector_t to u64. f_blocks f_bfree f_bavail f_files f_ffree Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] 2TB files: add blkcnt_tTakashi Sato8-1/+33
Add blkcnt_t as the type of inode.i_blocks. This enables you to make the size of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF. - CONFIG_LSF Add new configuration parameter. - blkcnt_t On h8300, i386, mips, powerpc, s390 and sh that define sector_t, blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is defined as unsigned long. On other architectures, it is defined as unsigned long. - inode.i_blocks Change the type from sector_t to blkcnt_t. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] 2TB files: st_blocks is invalid when calling stat64Takashi Sato5-13/+5
This patch series fixes the following problems on 32 bits architecture. o stat64 returns the lower 32 bits of blocks, although userland st_blocks has 64 bits, because i_blocks has only 32 bits. The ioctl with FIOQSIZE has the same problem. o As Dave Kleikamp said, making >2TB file on JFS results in writing an invalid block number to disk inode. The cause is the same as above too. o In generic quota code dquot_transfer(), the file usage is calculated from i_blocks via inode_get_bytes(). If the file is over 2TB, the change of usage is less than expected. The cause is the same as above too. o As Trond Myklebust said, statfs64's entries related to blocks are invalid on statfs64 for a network filesystem which has more than 2^32-1 blocks with CONFIG_LBD disabled. [PATCH 3/3] We made patches to fix problems that occur when handling a large filesystem and a large file. It was discussed on the mails titled "stat64 for over 2TB file returned invalid st_blocks". Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] mempool: use mempool_create_slab_pool()Matthew Dobson1-3/+1
Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] mempool: add mempool_create_slab_pool()Matthew Dobson1-0/+8
Create a simple wrapper function for the common case of creating a slab-based mempool. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] mempool: add kzalloc allocatorMatthew Dobson1-2/+8
Add another allocator to the common mempool code: a kzalloc/kfree allocator This will be used by the next patch in the series to replace a mempool-backed kzalloc allocator. It is also very likely that there will be more users in the future. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] mempool: add kmalloc allocatorMatthew Dobson1-0/+12
Add another allocator to the common mempool code: a kmalloc/kfree allocator This will be used by the next patch in the series to replace duplicate mempool-backed kmalloc allocators in several places in the kernel. It is also very likely that there will be more users in the future. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] mempool: add page allocatorMatthew Dobson1-0/+12
This will be used by the next patch in the series to replace duplicate mempool-backed page allocators in 2 places in the kernel. It is also likely that there will be more users in the future. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] Use loff_t for size in struct proc_dir_entryManeesh Soni1-1/+1
Change proc_dir_entry->size to be loff_t to represent files like /proc/vmcore for 32bit systems with more than 4G memory. Needed for seeing correct size for /proc/vmcore for 32-bit systems with > 4G RAM. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] consolidate sys32/compat_adjtimexStephen Rothwell2-0/+4
Create compat_sys_adjtimex and use it an all appropriate places. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] create struct compat_timex and use it everywhereStephen Rothwell1-0/+26
We had a copy of the compatibility version of struct timex in each 64 bit architecture. This patch just creates a global one and replaces all the usages of the old ones. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Kyle McMartin <kyle@parisc-linux.org> Acked-by: Tony Luck <tony.luck@intel.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] VFS,fs/locks.c,NFSD4: add race_free posix_lock_file_conf() interfaceAndy Adamson1-0/+1
Lockd and the NFSv4 server both exercise a race condition where posix_test_lock() is called either before or after posix_lock_file() to deal with a denied lock request due to a conflicting lock. Remove the race condition for the NFSv4 server by adding a new conflicting lock parameter to __posix_lock_file() , changing the name to __posix_lock_file_conf(). Keep posix_lock_file() interface, add posix_lock_conf() interface, both call __posix_lock_file_conf(). [akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs] Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] hpet header sanitizationRandy Dunlap1-16/+20
Add __KERNEL__ block. Use __KERNEL__ to allow ioctl interface to be usable. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] ipmi: add full sysfs supportCorey Minyard3-3/+48
Add full driver model support for the IPMI driver. It links in the proper bus and device support. It adds an "ipmi" driver interface that has each BMC discovered by the driver (as a device). These BMCs appear in the devices/platform directory. If there are multiple interfaces to the same BMC, the driver should discover this and will only have one BMC entry. The BMC entry will have pointers to each interface device that connects to it. The device information (statistics and config information) has not yet been ported over to the driver model from proc, that will come later. This work was based on work by Yani Ioannou. I basically rewrote it using that code as a guide, but he still deserves credit :). [bunk@stusta.de: make ipmi_find_bmc_guid() static] Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] cleanup smp_call_function UP buildCon Kolivas1-1/+5
net/core/flow.c: In function 'flow_cache_flush': net/core/flow.c:299: warning: statement with no effect Signed-off-by: Con Kolivas <kernel@kolivas.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] Make address_space_operations->invalidatepage return voidNeilBrown3-4/+4
The return value of this function is never used, so let's be honest and declare it as void. Some places where invalidatepage returned 0, I have inserted comments suggesting a BUG_ON. [akpm@osdl.org: JBD BUG fix] [akpm@osdl.org: rework for git-nfs] [akpm@osdl.org: don't go BUG in block_invalidate_page()] Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] Make address_space_operations->sync_page return voidNeilBrown2-2/+2
The only user ignores the return value, and the only instanace (block_sync_page) always returns 0... Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] EFI: keep physical table addresses in efi structureBjorn Helgaas2-9/+11
Almost all users of the table addresses from the EFI system table want physical addresses. So rather than doing the pa->va->pa conversion, just keep physical addresses in struct efi. This fixes a DMI bug: the efi structure contained the physical SMBIOS address on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap() on a virtual address on ia64. This is essentially the same as an earlier patch by Matt Tolentino: http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2 except that this changes all table addresses, not just ACPI addresses. Matt's original patch was backed out because it caused MCAs on HP sx1000 systems. That problem is resolved by the ioremap() attribute checking added for ia64. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Matt Domsch <Matt_Domsch@dell.com> Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com> Cc: "Brown, Len" <len.brown@intel.com> Cc: Andi Kleen <ak@muc.de> Acked-by: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] ia64: ioremap: check EFI for valid memory attributesBjorn Helgaas1-13/+2
Check the EFI memory map so we can use the correct memory attributes for ioremap(). Previously, we always used uncacheable access, which blows up on some machines for regular system memory. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Matt Domsch <Matt_Domsch@dell.com> Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com> Cc: "Brown, Len" <len.brown@intel.com> Cc: Andi Kleen <ak@muc.de> Acked-by: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] EFI, /dev/mem: simplify efi_mem_attribute_range()Bjorn Helgaas2-2/+4
Pass the size, not a pointer to the size, to efi_mem_attribute_range(). This function validates memory regions for the /dev/mem read/write/mmap paths. The pointer allows arches to reduce the size of the range, but I think that's unnecessary complexity. Simplifying it will let me use efi_mem_attribute_range() to improve the ia64 ioremap() implementation. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Matt Domsch <Matt_Domsch@dell.com> Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com> Cc: "Brown, Len" <len.brown@intel.com> Cc: Andi Kleen <ak@muc.de> Acked-by: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] ia64: use i386 dmi_scan.cMatt Domsch2-0/+11
Enable DMI table parsing on ia64. Andi Kleen has a patch in his x86_64 tree which enables the use of i386 dmi_scan.c on x86_64. dmi_scan.c functions are being used by the drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or memory spaces where the IPMI controllers may be found. This patch adds equivalent changes for ia64 as to what is in the x86_64 tree. In addition, I reworked the DMI detection, such that on EFI-capable systems, it uses the efi.smbios pointer to find the table, rather than brute-force searching from 0xF0000. On non-EFI systems, it continues the brute-force search. My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with latest BIOS, does not list the IPMI controller in the ACPI namespace, nor does it have an ACPI SPMI table. Also note, currently shipping Dell x8xx EM64T servers don't have these either, so DMI is the only method for obtaining the address of the IPMI controller. Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] Add flush_kernel_dcache_page() APIJames Bottomley1-0/+6
We have a problem in a lot of emulated storage in that it takes a page from get_user_pages() and does something like kmap_atomic(page) modify page kunmap_atomic(page) However, nothing has flushed the kernel cache view of the page before the kunmap. We need a lightweight API to do this, so this new API would specifically be for flushing the kernel cache view of a user page which the kernel has modified. The driver would need to add flush_kernel_dcache_page(page) before the final kunmap. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] Add API for flushing Anon pagesJames Bottomley1-0/+6
Currently, get_user_pages() returns fully coherent pages to the kernel for anything other than anonymous pages. This is a problem for things like fuse and the SCSI generic ioctl SG_IO which can potentially wish to do DMA to anonymous pages passed in by users. The fix is to add a new memory management API: flush_anon_page() which is used in get_user_pages() to make anonymous pages coherent. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26[PATCH] protect remove_proc_entrySteven Rostedt1-0/+3
It has been discovered that the remove_proc_entry has a race in the removing of entries in the proc file system that are siblings. There's no protection around the traversing and removing of elements that belong in the same subdirectory. This subdirectory list is protected in other areas by the BKL. So the BKL was at first used to protect this area too, but unfortunately, remove_proc_entry may be called with spinlocks held. The BKL may schedule, so this was not a solution. The final solution was to add a new global spin lock to protect this list, called proc_subdir_lock. This lock now protects the list in remove_proc_entry, and I also went around looking for other areas that this list is modified and added this protection there too. Care must be taken since these locations call several functions that may also schedule. Since I don't see any location that these functions that modify the subdirectory list are called by interrupts, the irqsave/restore versions of the spin lock was _not_ used. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25Merge master.kernel.org:/home/rmk/linux-2.6-serialLinus Torvalds1-0/+1
* master.kernel.org:/home/rmk/linux-2.6-serial: [ARM] 3383/3: ixp2000: ixdp2x01 platform serial conversion [SERIAL] amba-pl010: Remove accessor macros [SERIAL] remove 8250_acpi (replaced by 8250_pnp and PNPACPI) [SERIAL] icom: select FW_LOADER
2006-03-25Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds8-88/+234
* master.kernel.org:/home/rmk/linux-2.6-arm: [ARM] 3030/2: fix permission check in the obscur cmpxchg syscall [ARM] nommu: rename compressed/head.S symbols to a new style [ARM] select TLS_REG_EMUL and NEEDS_SYSCALL_FOR_CMPXCHG [ARM] nommu: Move hardware page table definitions to pgtable-hwdef.h [ARM] Move read of processor ID out of lookup_processor_type() [ARM] Fix typo in tlbflush.h [ARM] noMMU: removes TLB codes in nommu mode [ARM] noMMU: block sys_fork in nommu mode [ARM] 3399/1: Fix link problem when CONFIG_PRINTK is disabled [ARM] 3398/1: Fix the VFP registers loading/storing base address [ARM] 3397/1: AT91RM9200 Header update [ARM] 3385/1: Battery support for sharp zaurus sl-5500 (collie) [ARM] SMP: don't set cpu_*_map in smp_prepare_boot_cpu include/linux/clk.h is betraying its ARM origins [ARM] Move enable_irq and disable_irq to assembler.h [ARM] 3391/1: use PLAT8250_DEV_PLATFORM{,1} for platform device id instead of 0/1
2006-03-25[ARM] 3383/3: ixp2000: ixdp2x01 platform serial conversionLennert Buytenhek1-0/+1
Patch from Lennert Buytenhek Add a PLAT8250_DEV_PLATFORM2, and convert the two ixdp2x01 CPLD serial ports to use platform serial devices with ids PLAT8250_DEV_PLATFORM[12]. (The on-chip xscale UART is PLAT8250_DEV_PLATFORM, id #0.) Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25Merge nommu treeRussell King5-80/+111
Fix merge conflict in arch/arm/mm/proc-xscale.S Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25[ARM] 3397/1: AT91RM9200 Header updateAndrew Victor1-1/+99
Patch from Andrew Victor This patch updates the hardware header to include definitions for the Memory Controller registers. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25include/linux/clk.h is betraying its ARM originsTodd Poynor1-2/+2
include/linux/clk.h is betraying its ARM origins. Signed-off-by: Todd Poynor <tpoynor@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25powerpc: fix strncasecmp prototypeLinus Torvalds1-1/+1
It takes a size_t, not an int, as its third argument. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25Merge branch 'audit.b3' of ↵Linus Torvalds3-30/+147
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits) [PATCH] fix audit_init failure path [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format [PATCH] sem2mutex: audit_netlink_sem [PATCH] simplify audit_free() locking [PATCH] Fix audit operators [PATCH] promiscuous mode [PATCH] Add tty to syscall audit records [PATCH] add/remove rule update [PATCH] audit string fields interface + consumer [PATCH] SE Linux audit events [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL [PATCH] Fix IA64 success/failure indication in syscall auditing. [PATCH] Miscellaneous bug and warning fixes [PATCH] Capture selinux subject/object context information. [PATCH] Exclude messages by message type [PATCH] Collect more inode information during syscall processing. [PATCH] Pass dentry, not just name, in fsnotify creation hooks. [PATCH] Define new range of userspace messages. [PATCH] Filter rule comparators ... Fixed trivial conflict in security/selinux/hooks.c
2006-03-25Merge git://git.linux-nfs.org/pub/linux/nfs-2.6Linus Torvalds14-128/+153
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits) SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc LOCKD: Make nlmsvc_traverse_shares return void LOCKD: nlmsvc_traverse_blocks return is unused SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers. NFSv4: Dont list system.nfs4_acl for filesystems that don't support it. SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release() SUNRPC: Fix memory barriers for req->rq_received NFS: Fix a race in nfs_sync_inode() NFS: Clean up nfs_flush_list() NFS: Fix a race with PG_private and nfs_release_page() NFSv4: Ensure the callback daemon flushes signals SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs NFS, NLM: Allow blocking locks to respect signals NFS: Make nfs_fhget() return appropriate error values NFSv4: Fix an oops in nfs4_fill_super lockd: blocks should hold a reference to the nlm_file NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE NFSv4: Send the delegation stateid for SETATTR calls ...
2006-03-25[PATCH] x86_64: Removed duplicated declaration of force_iommuAndi Kleen1-1/+0
Noticed by Andrew Morton. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: group memnodemap and memnodeshift in a memnode structureEric Dumazet1-2/+7
pfn_to_page() and others need to access both memnode_shift and the very first bytes of memnodemap[]. If we force memnode_shift to be just before the memnodemap array, we can reduce the memory footprint to one cache line instead of two for most setups. This patch introduce a 'memnode' structure where shift and map[] are carefully placed. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Make local_t 64bit instead of 32bitAndi Kleen1-5/+5
For consistency with other architectures Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Remove CONFIG_UNORDERED_IOAndi Kleen1-18/+0
It was a failed experiment - all benchmarks done with it on both AMD and Intel showed it was a loss. That was probably because the store buffers of the CPUs for write combining traffic weren't large enough. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: timer interrupt lockup due to pending interruptVivek Goyal1-0/+1
o check_timer() routine fails while second kernel is booting after a crash on an opetron box. Problem happens because timer vector (0x31) seems to be locked. o After a system crash, it is not safe to service interrupts any more, hence interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC sends these interrupts to the CPU during early boot of second kernel. Other pending interrupts are discarded saying unexpected trap but timer interrupt is serviced and CPU does not issue an LAPIC EOI because it think this interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31 locking as LAPIC does not clear respective ISR and keeps on waiting for EOI. o This patch issues extra EOI for the pending interrupts who have ISR set. o Though today only timer seems to be the special case because in early boot it thinks interrupts are coming from i8259 and uses mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But probably doing it in generic manner for all vectors makes sense. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Try to allocate node memmap near the end of nodeAndi Kleen1-0/+3
This fixes problems with very large nodes (over 128GB) filling up all of the first 4GB with their mem_map and not leaving enough space for the swiotlb. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Reorder one field of the PDA to reduce paddingArjan van de Ven1-1/+1
This reorders the mmu_state int in the pda, such that there is no more padding (there currently is 4 bytes of padding). Boot tested. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Don't invoke OOM killer while allocating floppy DMA buffersAndi Kleen1-1/+1
Floppy can fall back to smaller buffers, so don't do OOM killing. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Implement early DMI scanningAndi Kleen3-5/+41
There are more and more cases where we need to know DMI information early to work around bugs. i386 already had early DMI scanning, but x86-64 didn't. Implement this now. This required some cleanup in the i386 code. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Clean up and tweak ACPI blacklist year codeAndi Kleen1-0/+2
- Move the core parser into dmi_scan.c. It can be useful for other subsystems too. - Differentiate between field doesn't exist and field is 0 or unparseable. The first case is likely an old BIOS with broken ACPI, the later is likely a slightly buggy BIOS where someone forget to edit the date. Don't blacklist in the later case. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Don't define string functions to builtinAndi Kleen1-14/+3
gcc should handle this anyways, and it causes problems when sprintf is turned into strcpy by gcc behind our backs and the C fallback version of strcpy is actually defining __builtin_strcpy Then drop -ffreestanding from the main Makefile because it isn't needed anymore and implies -fno-builtin, which is wrong now. (it was only added for x86-64, so dropping it should be safe) Noticed by Roman Zippel Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: remove dead do_softirq_thunkJan Beulich1-2/+0
Appearantly a left-over... Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: actively synchronize vmalloc area when registering certain ↵Jan Beulich2-0/+32
callbacks While the modular aspect of the respective i386 patch doesn't apply to x86-64 (as the top level page directory entry is shared between modules and the base kernel), handlers registered with register_die_notifier() are still under similar constraints for touching ioremap()ed or vmalloc()ed memory. The likelihood of this problem becoming visible is of course significantly lower, as the assigned virtual addresses would have to cross a 2**39 byte boundary. This is because the callback gets invoked (a) in the page fault path before the top level page table propagation gets carried out (hence a fault to propagate the top level page table entry/entries mapping to module's code/data would nest infinitly) and (b) in the NMI path, where nested faults must absolutely not happen, since otherwise the IRET from the nested fault re-enables NMIs, potentially resulting in nested NMI occurences. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: eliminate set_debug()Jan Beulich2-9/+1
For consistency and to have only a single place of definition, replace set_debug() uses with set_debugreg(), and eliminate the definition of thj former. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Rename struct node in x86-64 NUMA code to struct bootnodeAndi Kleen1-2/+2
It conflicts with the struct node in node.h Actually the x86-64 version was there first, but .. Suggested by Jan Beulich Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] x86_64: Increase the variability of the process stack on 64bit ↵Andi Kleen1-0/+4
architectures 8MB is not really very random, use 1GB (or more with larger page sizes) instead. Also use the low bits of the random generator output now instead of throwing them away. Only enabled on x86-64 right now. Other architectures need to add a suitable STACK_RND_MASK Cc: mingo@elte.hu Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25Merge branch 'release' of ↵Linus Torvalds13-25/+47
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] New IA64 core/thread detection patch [IA64] Increase max node count on SN platforms [IA64] Increase max node count on SN platforms [IA64] Increase max node count on SN platforms [IA64] Increase max node count on SN platforms [IA64] Tollhouse HP: IA64 arch changes [IA64] cleanup dig_irq_init [IA64] MCA recovery: kernel context recovery table IA64: Use early_parm to handle mvec_name and nomca [IA64] move patchlist and machvec into init section [IA64] add init declaration - nolwsys [IA64] add init declaration - gate page functions [IA64] add init declaration to memory initialization functions [IA64] add init declaration to cpu initialization functions [IA64] add __init declaration to mca functions [IA64] Ignore disabled Local SAPIC Affinity Structure in SRAT [IA64] sn_check_intr: use ia64_get_irr() [IA64] fix ia64 is_hugepage_only_range
2006-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds1-3/+3
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits) BUG_ON() Conversion in drivers/video/ BUG_ON() Conversion in drivers/parisc/ BUG_ON() Conversion in drivers/block/ BUG_ON() Conversion in sound/sparc/cs4231.c BUG_ON() Conversion in drivers/s390/block/dasd.c BUG_ON() Conversion in lib/swiotlb.c BUG_ON() Conversion in kernel/cpu.c BUG_ON() Conversion in ipc/msg.c BUG_ON() Conversion in block/elevator.c BUG_ON() Conversion in fs/coda/ BUG_ON() Conversion in fs/binfmt_elf_fdpic.c BUG_ON() Conversion in input/serio/hil_mlc.c BUG_ON() Conversion in md/dm-hw-handler.c BUG_ON() Conversion in md/bitmap.c The comment describing how MS_ASYNC works in msync.c is confusing rcu: undeclared variable used in documentation fix typos "wich" -> "which" typo patch for fs/ufs/super.c Fix simple typos tabify drivers/char/Makefile ...
2006-03-25Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds3-11/+24
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC]: Try to start getting SMP back into shape.
2006-03-25Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds3-2/+3
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETFILTER] x_table.c: sem2mutex [IPV4]: Aggregate route entries with different TOS values [TCP]: Mark tcp_*mem[] __read_mostly. [TCP]: Set default max buffers from memory pool size [SCTP]: Fix up sctp_rcv return value [NET]: Take RTNL when unregistering notifier [WIRELESS]: Fix config dependencies. [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms. [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated. [MODULES]: Don't allow statically declared exports [BRIDGE]: Unaligned accesses in the ethernet bridge
2006-03-25Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvbLinus Torvalds12-45/+462
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (33 commits) V4L/DVB (3604): V4l printk fix V4L/DVB (3599c): Whitespace cleanups under Documentation/video4linux V4L/DVB (3599b): Whitespace cleanups under drivers/media V4L/DVB (3599a): Move drivers/usb/media to drivers/media/video V4L/DVB (3599): Implement new routing commands for wm8775 and cs53l32a. V4L/DVB (3598): Add bit algorithm adapter for the Conexant CX2341X boards. V4L/DVB (3597): Vivi: fix warning: implicit declaration of function 'in_interrupt' V4L/DVB (3588): Remove VIDIOC_G/S_AUDOUT from msp3400 V4L/DVB (3587): Always wake thread after routing change. V4L/DVB (3584): Implement V4L2_TUNER_MODE_LANG1_LANG2 audio mode V4L/DVB (3582): Implement correct msp3400 input/output routing V4L/DVB (3581): Add new media/msp3400.h header containing the routing macros V4L/DVB (3580): Last round of msp3400 cleanups before adding routing commands V4L/DVB (3579): Move msp_modus to msp3400-kthreads, add JP and KR std detection V4L/DVB (3578): Make scart definitions easier to handle V4L/DVB (3577): Cleanup audio input handling V4L/DVB (3575): Cxusb: fix i2c debug messages for bluebird devices V4L/DVB (3574): Cxusb: fix debug messages V4L/DVB (3573): Cxusb: remove FIXME: comment in bluebird_patch_dvico_firmware_download V4L/DVB (3572): Cxusb: conditionalize gpio write for the medion box ...
2006-03-25[PATCH] remove pps supportRoman Zippel1-41/+0
This removes the support for pps. It's completely unused within the kernel and is basically in the way for further cleanups. It should be easier to readd proper support for it after the rest has been converted to NTP4 (where the pps mechanisms are quite different from NTP3 anyway). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Adrian Bunk <bunk@stusta.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] udf: remove duplicate definitionsPekka Enberg1-21/+0
This patch removes duplicate definitions from include/linux/udf_fs_i.h which are already defined in fs/udf/ecma_167.h. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] Check if cpu can be onlined before calling smp_prepare_cpu()Ashok Raj1-1/+0
- Moved check for online cpu out of smp_prepare_cpu() - Moved default declaration of smp_prepare_cpu() to kernel/cpu.c - Removed lock_cpu_hotplug() from smp_prepare_cpu() to around it, since its called from cpu_up() as well now. - Removed clearing from cpu_present_map during cpu_offline as it breaks using cpu_up() directly during a subsequent online operation. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] cpumask: uninline any_online_cpu()Andrew Morton1-10/+3
text data bss dec hex filename before: 3605597 1363528 363328 5332453 515de5 vmlinux after: 3605295 1363612 363200 5332107 515c8b vmlinux 218 bytes saved. Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n. This function seems inefficient. Can't we simply AND the two masks, then use find_first_bit()? Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] cpumask: uninline highest_possible_processor_id()Andrew Morton1-9/+6
Shrinks the only caller (net/bridge/netfilter/ebtables.c) by 174 bytes. Also, optimise highest_possible_processor_id() out of existence on CONFIG_SMP=n. Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] cpumask: uninline next_cpu()Andrew Morton1-7/+4
text data bss dec hex filename before: 3488027 1322496 360128 5170651 4ee5db vmlinux after: 3485112 1322480 359968 5167560 4ed9c8 vmlinux 2931 bytes saved Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] cpumask: uninline first_cpu()Andrew Morton1-5/+6
text data bss dec hex filename before: 3490577 1322408 360000 5172985 4eeef9 vmlinux after: 3488027 1322496 360128 5170651 4ee5db vmlinux Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] radix-tree documentation cleanupsJonathan Corbet1-5/+8
Documentation changes to help radix tree users avoid overrunning the tags array. RADIX_TREE_TAGS moves to linux/radix-tree.h and is now known as RADIX_TREE_MAX_TAGS (Nick Piggin's idea). Tag parameters are changed to unsigned, and some comments are updated. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] roundup_pow_of_two() 64-bit fixAndrew Morton2-2/+10
fls() takes an integer, so roundup_pow_of_two() is busted for ulongs larger than 2^32-1. Fix this by implementing and using fls_long(). (Why does roundup_pow_of_two() return a long?) (Why is roundup_pow_of_two() __attribute_const__ whereas long_log2() is __attribute_pure__?) (Why does long_log2() suck so much? Because we were missing fls_long()?) Cc: Roland Dreier <rdreier@cisco.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] refactor capable() to one implementation, add __capable() helperChris Wright2-7/+18
Move capable() to kernel/capability.c and eliminate duplicate implementations. Add __capable() function which can be used to check for capabiilty of any process. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] add a proper prototype for setup_arch()Adrian Bunk1-0/+4
This patch adds a proper prototype for setup_arch() in init.h. This patch is based on a patch by Ben Dooks <ben-linux@fluff.org>. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notificationsDavide Libenzi21-0/+22
Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] irq: uninline migration functionsAndrew Morton1-47/+2
Uninline some massive IRQ migration functions. Put them in the new kernel/irq/migration.c. Cc: Andi Kleen <ak@muc.de> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] amiga: fix driver_register() return handling, remove zorro_module_init()Bjorn Helgaas1-33/+0
Remove the assumption that driver_register() returns the number of devices bound to the driver. In fact, it returns zero for success or a negative error value. zorro_module_init() used the device count to automatically unregister and unload drivers that found no devices. That might have worked at one time, but has been broken for some time because zorro_register_driver() returned either a negative error or a positive count (never zero). So it could only unregister on failure, when it's not needed anyway. This functionality could be resurrected in individual drivers by counting devices in their .probe() methods. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] hp300: fix driver_register() return handling, remove dio_module_init()Bjorn Helgaas1-32/+0
Remove the assumption that driver_register() returns the number of devices bound to the driver. In fact, it returns zero for success or a negative error value. dio_module_init() used the device count to automatically unregister and unload drivers that found no devices. That might have worked at one time, but has been broken for some time because dio_register_driver() returned either a negative error or a positive count (never zero). So it could only unregister on failure, when it's not needed anyway. This functionality could be resurrected in individual drivers by counting devices in their .probe() methods. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Philip Blundell <philb@gnu.org> Cc: Jochen Friedrich <jochen@scram.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] reiserfs: cleanupsVladimir V. Saveliev1-3/+3
Clean up several places where gcc issues warnings when -W is specified. Thanks to Neil for finding that. Signed-off-by: Vladimir V. Saveliev <vs@namesys.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Hans Reiser <reiser@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] parport: move PP_MAJOR from ppdev.h to major.hRene Herman2-2/+1
Today I wondered about /dev/parport<n> after not seeing anything in drivers/parport register char-major-99. Having PP_MAJOR in include/linux/major.h would've allowed me to more quickly determine that it was the ppdev driver driving these. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] inotify: lock avoidance with parent watch status in dentryNick Piggin3-0/+32
Previous inotify work avoidance is good when inotify is completely unused, but it breaks down if even a single watch is in place anywhere in the system. Robin Holt notices that udev is one such culprit - it slows down a 512-thread application on a 512 CPU system from 6 seconds to 22 minutes. Solve this by adding a flag in the dentry that tells inotify whether or not its parent inode has a watch on it. Event queueing to parent will skip taking locks if this flag is cleared. Setting and clearing of this flag on all child dentries versus event delivery: this is no in terms of race cases, and that was shown to be equivalent to always performing the check. The essential behaviour is that activity occuring _after_ a watch has been added and _before_ it has been removed, will generate events. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] kernel/params.c: make param_array() staticAdrian Bunk1-7/+0
param_array() in kernel/params.c can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] Remove MODULE_PARMRusty Russell2-22/+0
MODULE_PARM was actually breaking: recent gcc version optimize them out as unused. It's time to replace the last users, which are generally in the most unloved drivers anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] constify tty flip buffer handlingThomas Koeller1-2/+2
Add a couple of 'const' qualifiers to the TTY flip buffer APIs, where appropriate. Signed-off-by: Thomas Koeller <thomas@koeller.dyndns.org> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] Introduce FMODE_EXEC file flagOleg Drokin1-0/+5
Introduce FMODE_EXEC file flag, to indicate that file is being opened for execution. This is useful for distributed filesystems to maintain consistent behavior for returning ETXTBUSY when opening for write and execution happens on different nodes. akpm: Needed by Lustre at present. I assume their objective to to work towards being able to install Lustre on an unmodified distro kernel, which seems sane. It should have zero runtime cost. Trond and Chuck indicate that NFS4 can probably use this too, for the same thing. Steven says it's also on the GFS todo list. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Chuck Lever <cel@citi.umich.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] reiserfs: fix transaction overflowingAlexander Zarochentzev1-0/+5
This patch fixes a bug in reiserfs truncate. A transaction might overflow when truncating long highly fragmented file. The fix is to split truncation into several transactions to avoid overflowing. Signed-off-by: Vladimir V. Saveliev <vs@namesys.com> Cc; Charles McColgan <cm@chuck.net> Cc: Alexander Zarochentsev <zam@namesys.com> Cc: Hans Reiser <reiser@namesys.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] fs/inode.c: make iprune_mutex staticAdrian Bunk1-1/+0
There's no reason for iprune_mutex being global. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] Small cleanup in quota.hJan Kara1-1/+0
Remove unused quota flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] jbd: embed j_commit_timer in journal structAndrew Morton1-1/+3
The kjournald timer is currently on the kernel thread's stack and the journal structure points at it. Save a pointer hop by moving the timer into the journal structure. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] slab: optimize constant-size kzalloc callsPekka Enberg1-3/+27
As suggested by Eric Dumazet, optimize kzalloc() calls that pass a compile-time constant size. Please note that the patch increases kernel text slightly (~200 bytes for defconfig on x86). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] slab: introduce kmem_cache_zalloc allocatorPekka Enberg1-0/+2
Introduce a memory-zeroing variant of kmem_cache_alloc. The allocator already exits in XFS and there are potential users for it so this patch makes the allocator available for the general public. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] slab: implement /proc/slab_allocatorsAl Viro1-2/+4
Implement /proc/slab_allocators. It produces output like: idr_layer_cache: 80 idr_pre_get+0x33/0x4e buffer_head: 2555 alloc_buffer_head+0x20/0x75 mm_struct: 9 mm_alloc+0x1e/0x42 mm_struct: 20 dup_mm+0x36/0x370 vm_area_struct: 384 dup_mm+0x18f/0x370 vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3 vm_area_struct: 1 split_vma+0x5a/0x10e vm_area_struct: 11 do_brk+0x206/0x2e2 vm_area_struct: 2 copy_vma+0xda/0x142 vm_area_struct: 9 setup_arg_pages+0x99/0x214 fs_cache: 8 copy_fs_struct+0x21/0x133 fs_cache: 29 copy_process+0xf38/0x10e3 files_cache: 30 alloc_files+0x1b/0xcf signal_cache: 81 copy_process+0xbaa/0x10e3 sighand_cache: 77 copy_process+0xe65/0x10e3 sighand_cache: 1 de_thread+0x4d/0x5f8 anon_vma: 241 anon_vma_prepare+0xd9/0xf3 size-2048: 1 add_sect_attrs+0x5f/0x145 size-2048: 2 journal_init_revoke+0x99/0x302 size-2048: 2 journal_init_revoke+0x137/0x302 size-2048: 2 journal_init_inode+0xf9/0x1c4 Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> DESC slab-leaks3-locking-fix EDESC From: Andrew Morton <akpm@osdl.org> Update for slab-remove-cachep-spinlock.patch Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] sys_alarm() unsigned signed conversion fixupThomas Gleixner1-0/+1
alarm() calls the kernel with an unsigend int timeout in seconds. The value is stored in the tv_sec field of a struct timeval to setup the itimer. The tv_sec field of struct timeval is of type long, which causes the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX. Before the hrtimer merge (pre 2.6.16) such a negative value was converted to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's not clear whether this was intended or just happened to be done by the timeval_to_jiffies code. hrtimers expect a timeval in canonical form and treat a negative timeout as already expired. This breaks the legitimate usage of alarm() with a timeout value > INT_MAX seconds. For 32 bit machines it is therefor necessary to limit the internal seconds value to avoid API breakage. Instead of doing this in all implementations of sys_alarm the duplicated sys_alarm code is moved into a common function in itimer.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[IPV4]: Aggregate route entries with different TOS valuesIlia Sotnikov1-1/+1
When we get an ICMP need-to-frag message, the original TOS value in the ICMP payload cannot be used as a key to look up the routes to update. This is because the TOS field may have been modified by routers on the way. Similarly, ip_rt_redirect should also ignore the TOS as the router that gave us the message may have modified the TOS value. The patch achieves this objective by aggregating entries with different TOS values (but are otherwise identical) into the same bucket. This makes it easy to update them at the same time when an ICMP message is received. In future we should use a twin-hashing scheme where teh aggregation occurs at the entry level. That is, the TOS goes back into the hash for normal lookups while ICMP lookups will end up with a node that gives us a list that contains all other route entries that differ only by TOS. Signed-off-by: Ilia Sotnikov <hostcc@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-24[NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms.David S. Miller1-1/+1
This makes struct sock 8 bytes smaller on 64-bit. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-24[MODULES]: Don't allow statically declared exportsPatrick McHardy1-0/+1
Add an extern declaration for exported symbols to make the compiler warn on symbols declared statically. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-24[IA64] New IA64 core/thread detection patchFenghua Yu1-2/+1
IPF SDM 2.2 changes definition of PAL_LOGICAL_TO_PHYSICAL to add proc_number=-1 to get core/thread mapping info on the running processer. Based on this change, we had better to update existing core/thread detection in IA64 kernel correspondingly. The attached patch implements this change. It simplifies detection code and eliminates potential race condition. It also runs a bit faster and has better scalability especially when cores and threads number grows up in one package. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>