aboutsummaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2006-01-09[PKT_SCHED]: Convert tc action functions to single skb pointersPatrick McHardy1-1/+1
tcf_action_exec only gets a single skb pointer and doesn't own the skb, but passes double skb pointers (to a local variable) to the action functions. Change to use single skb pointers everywhere. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[PKT_SCHED]: Use USEC_PER_SECPatrick McHardy1-11/+12
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09[CRYPTO] Allow multiple implementations of the same algorithmHerbert Xu1-0/+5
This is the first step on the road towards asynchronous support in the Crypto API. It adds support for having multiple crypto_alg objects for the same algorithm registered in the system. For example, each device driver would register a crypto_alg object for each algorithm that it supports. While at the same time the user may load software implementations of those same algorithms. Users of the Crypto API may then select a specific implementation by name, or choose any implementation for a given algorithm with the highest priority. The priority field is a 32-bit signed integer. In future it will be possible to modify it from user-space. This also provides a solution to the problem of selecting amongst various AES implementations, that is, aes vs. aes-i586 vs. aes-padlock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-01-09Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvbLinus Torvalds9-106/+150
2006-01-09V4L/DVB (3325): WSS output interface for av7110Oliver Endriss1-0/+2
- Implemented v4l2 api for sliced vbi data output to pass WSS data from userspace to the av7110 Signed-off-by: Oliver Endriss <o.endriss@gmx.de> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-mergeLinus Torvalds100-229/+2012
2006-01-09[PATCH] Update cyblafb driverKnut Petersen1-0/+4
This is a major update to the cyblafb framebuffer driver. Most of the stuff has been tested in the mm tree. Main advantages: ============ - vxres > xres support - ywrap and xpan support - much faster for almost all modes (e.g. 1280x1024-16bpp draws more than 41 full screens of text instead of about 25 full screens of text per second on authors Epia 5000) - module init/exit code fixed - bugs triggered by console rotation fixed - lots of minor improvements - startup modes suitable for high performance scrolling in all directions This diff also contains a lot of white space fixes. No side effects are possible, only one single graphics core is affected. Signed-off-by: Knut Petersen <Knut_Petersen@t-online.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09V4L/DVB (3307): Some cleanups at I2C modulesMauro Carvalho Chehab2-2/+1
- i2c names shorten - removed obsoleted flags on newer modules - small cleanups Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09[PATCH] rcu: uninline __rcu_pending()Oleg Nesterov1-30/+1
__rcu_pending() is rather fat and called twice from rcu_pending(). rcu_pending() has multiple callers, and not that small too. This patch uninlines both of them. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09V4L/DVB (3278): convert diagnostics over to the new v4l2-common.h macros.Hans Verkuil2-2/+3
- Convert diagnostics over to the new v4l2-common.h macros. - deprecated tuner_debug option, the new option is debug. - renamed cx25840_debug to debug. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3276): Added new diagnositics macros, convert msp3400 to the new ↵Hans Verkuil1-12/+51
macros. - Added new v4l_err, v4l_warn, v4l_info and v4l_dbg macros to v4l2-common.h for use in v4l-dvb i2c drivers. This ensures a unique prefix for each device instance. - At a later stage these macros may be reimplemented using the device-generic macros from device.h. - Converted the msp3400 driver to the new macros. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3269): ioctls cleanups.Michael Krufky5-10/+27
- Now, all internal ioctls are at v4l2-common.h - removed unused ioctl at saa6752hs.h - all debug ioctl code moved to v4l2-common.c - removed duplicated stuff from other cards Signed-off-by: Michael Krufky <mkrufky@m1k.net> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09Merge branch 'blk-softirq' of git://brick.kernel.dk/data/git/linux-2.6-blockLinus Torvalds3-4/+20
Manual merge for trivial #include changes
2006-01-09V4L/DVB (3214): Calculate the saa7115 AMCLK regs instead of using fixed valuesHans Verkuil1-10/+5
- Calculate the audio master clock registers from the actual frequencies. This simplifies the code and it also prepares for adding CGC2 support. - VIDIOC_INT_AUDIO_CLOCK_FREQ now receives an u32 instead of an enum. It is more generic and actually easier to implement. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3245): Added some comments about multiple tuner support.Hans Verkuil1-3/+19
- Added some comments to make clearer how to use ioctl api to handle multiple tuners at the same board. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3234): Included advanced debug option to tvp5150.cMauro Carvalho Chehab1-0/+1
- Included advanced debug option to tvp5150.c - Now, advanced debug info is the first item at V4L menu. Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3233): Fixed API to set I2S speed controlMauro Carvalho Chehab2-1/+7
- Created a new ioctl to control I2S speed. Old calls to an inadequate V4L2 API replaced. Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3196): correct Thomson DTT 761x frequency rangesMichael Krufky1-1/+1
- Corrected Thomson DTT 7611 tuner programming, based on spec sheet - renamed to Thomson DTT 761x - applies to DTT 7611 7611A 7612 7613 7613A 7614 7615 7615A (DTT 7610 is similar, but slightly different programming) - corrected frequency ranges for analog and digital modes Signed-off-by: Michael Krufky <mkrufky@m1k.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3167): added ntsc parameter to tuner and more standardized debugsMauro Carvalho Chehab1-5/+5
- Debug message changed to be coherent with other modules - added ntsc parameter - parameters moved to the beginning at the file - tuner_status moved to a better position. Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3116): tda9887 improvements: better defaults, better configurability.Hans Verkuil1-13/+18
- Set the tuner takeover point to 0x10 for NTSC/radio and 0x14 for PAL/SECAM. - Allow override through TDA9887_SET_CONFIG - PAL-N belongs with PAL-BG as does PAL-H. PAL-Nc belongs to PAL-M - Add SECAM-BGH - Set video freq to cVideoIF_38_90 for DK standards. - Add cTunerGainLow to radio, change deemphasis to 75 for mono. - Add ntsc module param for 'M' and 'J' (Japanese) standards. - Fix module handling for 2.4. - Now able to select all standards through pal/secam/ntsc module options Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3105): Remove AUDC_CONFIG_PINNACLE horror, fix mt20xx radio support.Hans Verkuil1-11/+0
- Remove AUDC_CONFIG_PINNACLE horror. This also fixes radio support for mt20xx tuners. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L/DVB (3062): Fix wrong tuner.h define for tuner 46Hans Verkuil1-2/+2
- Fix wrong tuner.h define for tuner 46 (it's a Panasonic, not a Microtune). Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09DVB (2444): Implement frontend-specific tuning and the ability to disable zigzagAndrew de Quincey1-0/+10
- Implement frontend-specific tuning and the ability to disable zigzag Signed-off-by: Andrew de Quincey <adq_dvb@lidskialf.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L (1021): Tuner description now follows the same CodingStyle as the othersMauro Carvalho Chehab1-22/+0
- Tuner description now follows the same CodingStyle as the others Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L (0987): Added Secam L' std on tda9887 and common macros moved to videodev2.hMauro Carvalho Chehab1-1/+10
- Added SECAM L' video standard - Common std macros moved to videodev2.h Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09V4L (926_2): Moves compat32 functions from fs to v4l subsystemArnd Bergmann2-26/+3
This moves the 32 bit ioctl compatibility handlers for Video4Linux into a new file and adds explicit calls to them to each v4l device driver. Unfortunately, there does not seem to be any code handling the v4l2 ioctls, so quite often the code goes through two separate conversions, first from 32 bit v4l to 64 bit v4l, and from there to 64 bit v4l2. My patch does not change that, so there is still much room for improvement. Also, some drivers have additional ioctl numbers, for which the conversion should be handled internally to that driver. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
2006-01-09[IDE] Use the block layer deferred softirq request completionJens Axboe1-0/+1
This patch makes IDE use the new blk_complete_request() interface. There's still room for improvement, as __ide_end_request() really could drop the lock after getting HWGROUP->rq (why does it need to hold it in the first place? If ->rq access isn't serialized, we are screwed anyways). Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09[SCSI] Kill the SCSI softirq handlingJens Axboe1-1/+0
This patch moves the SCSI softirq handling to the block layer version. There should be no functional changes. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09[BLOCK] ll_rw_blk: Enable out-of-order request completions through softirqJens Axboe2-3/+19
Request completion can be a quite heavy process, since it needs to iterate through the entire request and complete the bio's it holds. This patch adds blk_complete_request() which moves this processing into a dedicated block softirq. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09[BLOCK] Kill blk_attempt_remerge()Jens Axboe1-1/+0
It's a broken interface, it's done way too late. And apparently it triggers slab problems in recent kernels as well (most likely after the generic dispatch code was merged). So kill it, ide-cd is the only user of it. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09make elv_try_merge() static, kill the dead declaration ofCoywolf Qi Hunt1-2/+0
elv_try_last_merge(). Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09[PATCH] ppc64: Fix oprofile when compiled as a moduleAnton Blanchard1-2/+9
My recent changes to oprofile broke it when built as a module. Fix it by using an enum instead of a function pointer. This way we still retain the oprofile configuration in the cputable. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] 3/5 powerpc: Add platform functions interpreterBenjamin Herrenschmidt3-0/+275
This is the platform function interpreter itself along with the backends for UniN/U3/U4, mac-io, GPIOs and i2c. It adds the ability to execute those do-platform-* scripts in the device-tree (at least for most devices for which a backend is provided). This should replace the clock spreading hacks properly. It might also have an impact on all sort of machines since some of the scripts marked "at init" will now be executed on boot (or some other on sleep/wakeup), those will possibly do things that the kernel didn't do at all, like setting some values into some i2c devices (changing thermal sensor calibration or conversion rate) etc... Thus regression testing is MUCH welcome. Also loook for errors in dmesg. That's also why I've left rather verbose debugging enabled in this version of the patch. (I do expect some Windtunnel G4s to show some errors as they have an i2c clock chip on the PMU bus that uses some primitives that the i2c backend doesn't implement yet. I really need users that have one of those machine to come back to me so we can get that done right, though the errors themselves should be harmless, I suspect the machine might not run at full speed). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] 2/5 powerpc: Rework PowerMac i2c part 2Benjamin Herrenschmidt1-0/+2
This is the continuation of the previous patch. This one removes the old PowerMac i2c drivers (i2c-keywest and i2c-pmac-smu) and replaces them both with a single stub driver that uses the new PowerMac low i2c layer. Now that i2c-keywest is gone, the low-i2c code is extended to support interrupt driver transfers. All i2c busses now appear as platform devices. Compatibility with existing drivers should be maintained as the i2c bus names have been kept identical, except for the SMU bus but in that later case, all users has been fixed. With that patch added, matching a device node to an i2c_adapter becomes trivial. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] 1/5 powerpc: Rework PowerMac i2c part 1Benjamin Herrenschmidt4-31/+78
This is the first part of a rework of the PowerMac i2c code. It completely reworks the "low_i2c" layer. It is now more flexible, supports KeyWest, SMU and PMU i2c busses, and provides functions to match device nodes to i2c busses and adapters. This patch also extends & fix some bugs in the SMU driver related to i2c support and removes the clock spreading hacks from the pmac feature code rather than adapting them to the new API since they'll be replaced by the platform function code completely in patch 3/5 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: set irq affinity for running threadsArnd Bergmann1-0/+1
For far, all SPU triggered interrupts always end up on the first SMT thread, which is a bad solution. This patch implements setting the affinity to the CPU that was running last when entering execution on an SPU. This should result in a significant reduction in IPI calls and better cache locality for SPE thread specific data. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: fix allocation on 64k pagesArnd Bergmann1-3/+1
The size of the local store is architecture defined and independent from the page size, so it should not be defined in terms of pages in the first place. This mistake broke a few places when building for 64kb pages. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: abstract priv1 register access.Arnd Bergmann1-6/+25
In a hypervisor based setup, direct access to the first priviledged register space can typically not be allowed to the kernel and has to be implemented through hypervisor calls. As suggested by Masato Noguchi, let's abstract the register access trough a number of function calls. Since there is currently no public specification of actual hypervisor calls to implement this, I only provide a place that makes it easier to hook into. Cc: Masato Noguchi <Masato.Noguchi@jp.sony.com> Cc: Geoff Levand <geoff.levand@am.sony.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: clean up use of bitopsArnd Bergmann1-4/+2
checking bits manually might not be synchonized with the use of set_bit/clear_bit. Make sure we always use the correct bitops by removing the unnecessary identifiers. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] cell: enable pause(0) in cpu_idleArnd Bergmann3-7/+21
This patch enables support for pause(0) power management state for the Cell Broadband Processor, which is import for power efficient operation. The pervasive infrastructure will in the future enable us to introduce more functionality specific to the Cell's pervasive unit. From: Maximino Aguilar <maguilar@us.ibm.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] Small fix in eeh definitions when CONFIG_EEH not enabledHaren Myneni1-0/+3
Undefined symbols (eeh_add_device_tree_early and eeh_remove_bus_device) when EEH is not enabled. This small patch will fix this. Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: added a udbg_progressKumar Gala1-0/+1
Added a common udbg_progress for use by ppc_md.progress() Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-08[PATCH] Make vm86 support optionalMatt Mackall2-2/+20
This adds an option to remove vm86 support under CONFIG_EMBEDDED. Saves about 5k. This version eliminates most of the #ifdefs of the previous version and instead uses function stubs in vm86.h. Also, release_vm86_irqs is moved from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs can live together. $ size vmlinux-baseline vmlinux-novm86 text data bss dec hex filename 2920821 523232 190652 3634705 377611 vmlinux-baseline 2916268 523100 190492 3629860 376324 vmlinux-novm86 Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] remove semicolons from save_flags()Andrew Morton1-1/+1
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Eliminate __attribute__ ((packed)) warnings for gcc-4.1Jan Blunck10-293/+293
Since version 4.1 the gcc is warning about ignored attributes. This patch is using the equivalent attribute on the struct instead of on each of the structure or union members. GCC Manual: "Specifying Attributes of Types packed This attribute, attached to struct or union type definition, specifies that each member of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used. Specifying this attribute for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members." Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] parport: bring back an unused phase for ppdev ioctlMarko Kohtala1-1/+3
Earlier fix removed unused phase, but that changed the values for other phases. Since these are exposed to userspace through ppdev, it is safer not to change them. Restore the unused phase value. Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09powerpc: Fix some #ifndef __KERNEL__ that should be #ifdefPaul Mackerras3-3/+3
Grrr.... Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-08[PATCH] Split out screen_info from tty.hBrian Gerst2-71/+78
This makes it possible for boot code to use screen_info without dragging in all of tty.h. Signed-off-by: Brian Gerst <bgerst@didntduck.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] PTRACE_SYSEMU is only for i386 and clashes with other ptrace codes ↵Paolo 'Blaisorblade' Giarrusso2-2/+3
of other archs PTRACE_SYSEMU{,_SINGLESTEP} is actually arch specific, for now, and the current allocated number clashes with a ptrace code of frv, i.e. PTRACE_GETFDPIC. I should have submitted this much earlier, anyway we get no breakage for this. CC: Daniel Jacobowitz <dan@debian.org> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] shrink struct pageAndrew Morton1-19/+22
Reduce the size of the pageframe for NR_CPUS>4, CONFIG_PREEMPT back to the minimal size by unionising both ->private and ->mapping with the pagetable lock. It uses an anonymous struct and hence requires gcc-3.x. Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] aio: reorder kiocb structure elements to make sync iocb setup fasterBenjamin LaHaise1-5/+7
Reorder members of the kiocb structure to make sync kiocb setup faster. By setting the elements sequentially, the write combining buffers on the CPU are able to combine the writes into a single burst, which results in fewer cache cycles being consumed, freeing them up for other code. This results in a 10-20KB/s[*] increase on the bw_unix part of LMbench on my test system. * The improvement varies based on what other patches are in the system, as there are a number of bottlenecks, so this number is not absolutely accurate. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] /dev/mem: validate mmap requestsBjorn Helgaas1-0/+1
Add a hook so architectures can validate /dev/mem mmap requests. This is analogous to validation we already perform in the read/write paths. The identity mapping scheme used on ia64 requires that each 16MB or 64MB granule be accessed with exactly one attribute (write-back or uncacheable). This avoids "attribute aliasing", which can cause a machine check. Sample problem scenario: - Machine supports VGA, so it has uncacheable (UC) MMIO at 640K-768K - efi_memmap_init() discards any write-back (WB) memory in the first granule - Application (e.g., "hwinfo") mmaps /dev/mem, offset 0 - hwinfo receives UC mapping (the default, since memmap says "no WB here") - Machine check abort (on chipsets that don't support UC access to WB memory, e.g., sx1000) In the scenario above, the only choices are - Use WB for hwinfo mmap. Can't do this because it causes attribute aliasing with the UC mapping for the VGA MMIO space. - Use UC for hwinfo mmap. Can't do this because the chipset may not support UC for that region. - Disallow the hwinfo mmap with -EINVAL. That's what this patch does. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] remove gcc-2 checksAndrew Morton16-103/+15
Remove various things which were checking for gcc-1.x and gcc-2.x compilers. From: Adrian Bunk <bunk@stusta.de> Some documentation updates and removes some code paths for gcc < 3.2. Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Abandon gcc-2.95.xAndrew Morton2-31/+0
There's one scsi driver which doesn't compile due to weird __VA_ARGS__ tricks and the rather useful scsi/sd.c is currently getting an ICE. None of the new SAS code compiles, due to extensive use of anonymous unions. The V4L guys are very good at exploiting the gcc-2.95.x macro expansion bug (_why_ does each driver need to implement its own debug macros?) and various people keep on sneaking in anonymous unions, which are rather nice. Plus anonymous unions are rather useful. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] remove unused blkp field in percpu_dataEric Dumazet1-1/+0
I found that blkp field was not used in kernel tree. As most of the times NR_CPUS is a power of two and kmalloc() memory blocks too, this extra field basically doubles the memory space allocated in __alloc_percpu() to store the 'struct percpu_data' (for example, if NR_CPUS=8 on i386, kmalloc(4*8+4) returns a 64 bytes block instead of a 32 bytes block after this patch) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fs: remove s_old_blocksize from struct super_blockPekka Enberg1-1/+0
This patch inlines the single user of struct super_block field s_old_blocksize and removes the field. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] shrink dentry structEric Dumazet1-2/+7
Some long time ago, dentry struct was carefully tuned so that on 32 bits UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple of memory cache lines. Then RCU was added and dentry struct enlarged by two pointers, with nice results for SMP, but not so good on UP, because breaking the above tuning (128 + 8 = 136 bytes) This patch reverts this unwanted side effect, by using an union (d_u), where d_rcu and d_child are placed so that these two fields can share their memory needs. At the time d_free() is called (and d_rcu is really used), d_child is known to be empty and not touched by the dentry freeing. Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so the previous content of d_child is not needed if said dentry was unhashed but still accessed by a CPU because of RCU constraints) As dentry cache easily contains millions of entries, a size reduction is worth the extra complexity of the ugly C union. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Maneesh Soni <maneesh@in.ibm.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Ian Kent <raven@themaw.net> Cc: Paul Jackson <pj@sgi.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@epoch.ncsc.mil> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] shared mounts: cleanupMiklos Szeredi2-2/+3
Small cleanups in shared mounts code. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Ram Pai <linuxram@us.ibm.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Add block_device_operations.getgeo block device methodChristoph Hellwig1-0/+2
HDIO_GETGEO is implemented in most block drivers, and all of them have to duplicate the code to copy the structure to userspace, as well as getting the start sector. This patch moves that to common code [1] and adds a ->getgeo method to fill out the raw kernel hd_geometry structure. For many drivers this means ->ioctl can go away now. [1] the s390 block drivers are odd in this respect. xpram sets ->start to 4 always which seems more than odd, and the dasd driver shifts the start offset around, probably because of it's non-standard sector size. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@suse.de> Cc: <mike.miller@hp.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo Giarrusso <blaisorblade@yahoo.it> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] sigaction should clear all signals on SIG_IGN, not just < 32George Anzinger1-0/+17
While rooting aroung in the signal code trying to understand how to fix the SIG_IGN ploy (set sig handler to SIG_IGN and flood system with high speed repeating timers) I came across what, I think, is a problem in sigaction() in that when processing a SIG_IGN request it flushes signals from 1 to SIGRTMIN and leaves the rest. Attempt to fix this. Signed-off-by: George Anzinger <george@mvista.com> Cc: Roland McGrath <roland@redhat.com> Cc: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] keys: Permit running process to instantiate keysDavid Howells3-0/+15
Make it possible for a running process (such as gssapid) to be able to instantiate a key, as was requested by Trond Myklebust for NFS4. The patch makes the following changes: (1) A new, optional key type method has been added. This permits a key type to intercept requests at the point /sbin/request-key is about to be spawned and do something else with them - passing them over the rpc_pipefs files or netlink sockets for instance. The uninstantiated key, the authorisation key and the intended operation name are passed to the method. (2) The callout_info is no longer passed as an argument to /sbin/request-key to prevent unauthorised viewing of this data using ps or by looking in /proc/pid/cmdline. This means that the old /sbin/request-key program will not work with the patched kernel as it will expect to see an extra argument that is no longer there. A revised keyutils package will be made available tomorrow. (3) The callout_info is now attached to the authorisation key. Reading this key will retrieve the information. (4) A new field has been added to the task_struct. This holds the authorisation key currently active for a thread. Searches now look here for the caller's set of keys rather than looking for an auth key in the lowest level of the session keyring. This permits a thread to be servicing multiple requests at once and to switch between them. Note that this is per-thread, not per-process, and so is usable in multithreaded programs. The setting of this field is inherited across fork and exec. (5) A new keyctl function (KEYCTL_ASSUME_AUTHORITY) has been added that permits a thread to assume the authority to deal with an uninstantiated key. Assumption is only permitted if the authorisation key associated with the uninstantiated key is somewhere in the thread's keyrings. This function can also clear the assumption. (6) A new magic key specifier has been added to refer to the currently assumed authorisation key (KEY_SPEC_REQKEY_AUTH_KEY). (7) Instantiation will only proceed if the appropriate authorisation key is assumed first. The assumed authorisation key is discarded if instantiation is successful. (8) key_validate() is moved from the file of request_key functions to the file of permissions functions. (9) The documentation is updated. From: <Valdis.Kletnieks@vt.edu> Build fix. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Alexander Zangerl <az@bond.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] keys: Permit key expiry time to be setDavid Howells1-0/+1
Add a new keyctl function that allows the expiry time to be set on a key or removed from a key, provided the caller has attribute modification access. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Alexander Zangerl <az@bond.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Fix some problems with truncate and mtime semantics.NeilBrown1-1/+2
SUS requires that when truncating a file to the size that it currently is: truncate and ftruncate should NOT modify ctime or mtime O_TRUNC SHOULD modify ctime and mtime. Currently mtime and ctime are always modified on most local filesystems (side effect of ->truncate) or never modified (on NFS). With this patch: ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when an update of these times is required whether size changes or not (via a new argument to do_truncate). This allows NFS to do the right thing for O_TRUNC. inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE sets the size to it's current value. This allows local filesystems to do the right thing for f?truncate. Also, the logic in inode_setattr is changed a bit so there are two return points. One returns the error from vmtruncate if it failed, the other returns 0 (there can be no other failure). Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change requested, we now fall-through and mark_inode_dirty. If a filesystem did not have a ->truncate function, then vmtruncate will have changed i_size, without marking the inode as 'dirty', and I think this is wrong. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Permit multiple inclusion of linux/pagevec.hDavid Howells1-0/+5
Make it possible to include linux/pagevec.h multiple times without incurring errors due to duplicate definitions. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] use ptrace_get_task_struct in various placesChristoph Hellwig1-0/+2
The ptrace_get_task_struct() helper that I added as part of the ptrace consolidation is useful in variety of places that currently opencode it. Switch them to the common helpers. Add a ptrace_traceme() helper that needs to be explicitly called, and simplify the ptrace_get_task_struct() interface. We don't need the request argument now, and we return the task_struct directly, using ERR_PTR() for error returns. It's a bit more code in the callers, but we have two sane routines that do one thing well now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: cleanup, change relayfs_file_* to relay_file_*Tom Zanussi1-3/+2
This patch renames relayfs_file_operations to relay_file_operations, and the file operations themselves from relayfs_XXX to relay_file_XXX, to make it more clear that they refer to relay files. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: add support for global relay buffersTom Zanussi1-1/+7
This patch adds the optional is_global outparam to the create_buf_file() callback. This can be used by clients to create a single global relayfs buffer instead of the default per-cpu buffers. This was suggested as being useful for certain debugging applications where it's more convenient to be able to get all the data from a single channel without having to go to the bother of dealing with per-cpu files. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: add support for relay files in other filesystemsTom Zanussi1-0/+34
This patch adds a couple of callback functions that allow a client to hook into relay_open()/close() and supply the files that will be used to represent the channel buffers; the default implementation if no callbacks are defined is to create the files in relayfs. This is to support the creation and use of relay files in other filesystems such as debugfs, as implied by the fact that relayfs_file_operations are exported. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: remove unused alloc/destroy_inode()Tom Zanussi1-14/+0
Since we're no longer using relayfs_inode_info, remove relayfs_alloc_inode() and relayfs_destroy_inode() along with the relayfs inode cache. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: add relayfs_remove_file()Tom Zanussi1-0/+1
This patch adds and exports relayfs_remove_file(), for API symmetry (with relayfs_create_file()). Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: export relayfs_create_file() with fileops paramTom Zanussi1-1/+6
This patch adds a mandatory fileops param to relayfs_create_file() and exports that function so that clients can use it to create files defined by their own set of file operations, in relayfs. The purpose is to allow relayfs applications to create their own set of 'control' files alongside their relay files in relayfs rather than having to create them in /proc or debugfs for instance. relayfs_create_file() is also used by relay_open_buf() to create the relay files for a channel. In this case, a pointer to relayfs_file_operations is passed in, along with a pointer to the buffer associated with the file. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] ELF: symbol table type additionsJan Beulich1-0/+2
Needed for the Novell kernel debugger and perhaps some per-cpu data on x86_64 in the future. Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] rcu file: use atomic primitivesNick Piggin3-222/+2
Use atomic_inc_not_zero for rcu files instead of special case rcuref. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] move rtc_interrupt() prototype to rtc.hAdrian Bunk1-0/+3
This patch moves the rtc_interrupt() prototype to rtc.h and removes the prototypes from C files. It also renames static rtc_interrupt() functions in arch/arm/mach-integrator/time.c and arch/sh64/kernel/time.c to avoid compile problems. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Paul Gortmaker <p_gortmaker@yahoo.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: support a truncate() for expanding size (generic_cont_expand)OGAWA Hirofumi1-1/+2
This patch changes generic_cont_expand(), in order to share the code with fatfs. - Use vmtruncate() if ->prepare_write() returns a error. Even if ->prepare_write() returns an error, it may already have added some blocks. So, this truncates blocks outside of ->i_size by vmtruncate(). - Add generic_cont_expand_simple(). The generic_cont_expand_simple() assumes that ->prepare_write() can handle the block boundary. With this, we don't need to care the extra byte. And for expanding a file size by truncate(), fatfs uses the added generic_cont_expand_simple(). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] export/change sync_page_range/_nolock()OGAWA Hirofumi1-1/+3
This exports/changes the sync_page_range/_nolock(). The fatfs needs sync_page_range/_nolock() for expanding truncate, and changes "size_t count" to "loff_t count". Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: support ->direct_IO()OGAWA Hirofumi1-1/+2
This patch add to support of ->direct_IO() for mostly read. The user of this seems to want to use for streaming read. So, current direct I/O has limitation, it can only overwrite. (For write operation, mainly we need to handle the hole etc..) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] IRQ type flagsRussell King2-4/+21
Some ARM platforms have the ability to program the interrupt controller to detect various interrupt edges and/or levels. For some platforms, this is critical to setup correctly, particularly those which the setting is dependent on the device. Currently, ARM drivers do (eg) the following: err = request_irq(irq, ...); set_irq_type(irq, IRQT_RISING); However, if the interrupt has previously been programmed to be level sensitive (for whatever reason) then this will cause an interrupt storm. Hence, if we combine set_irq_type() with request_irq(), we can then safely set the type prior to unmasking the interrupt. The unfortunate problem is that in order to support this, these flags need to be visible outside of the ARM architecture - drivers such as smc91x need these flags and they're cross-architecture. Finally, the SA_TRIGGER_* flag passed to request_irq() should reflect the property that the device would like. The IRQ controller code should do its best to select the most appropriate supported mode. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] new driver synclink_gtPaul Fulghum1-1/+8
New character device driver for the SyncLink GT and SyncLink AC families of synchronous and asynchronous serial adapters Signed-off-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fix more missing includesTim Schmielau1-0/+1
Include fixes for 2.6.14-git11. Should allow to remove sched.h from module.h on i386, x86_64, arm, ia64, ppc, ppc64, and s390. Probably more to come since I haven't yet checked the other archs. Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: remove test for null cpuset from alloc code pathPaul Jackson1-0/+2
Remove a couple of more lines of code from the cpuset hooks in the page allocation code path. There was a check for a NULL cpuset pointer in the routine cpuset_update_task_memory_state() that was only needed during system boot, after the memory subsystem was initialized, before the cpuset subsystem was initialized, to catch a NULL task->cpuset pointer. Add a cpuset_init_early() routine, just before the mem_init() call in init/main.c, that sets up just enough of the init tasks cpuset structure to render cpuset_update_task_memory_state() calls harmless. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: rebind vma mempolicies fixPaul Jackson1-0/+18
Fix more of longstanding bug in cpuset/mempolicy interaction. NUMA mempolicies (mm/mempolicy.c) are constrained by the current tasks cpuset to just the Memory Nodes allowed by that cpuset. The kernel maintains internal state for each mempolicy, tracking what nodes are used for the MPOL_INTERLEAVE, MPOL_BIND or MPOL_PREFERRED policies. When a tasks cpuset memory placement changes, whether because the cpuset changed, or because the task was attached to a different cpuset, then the tasks mempolicies have to be rebound to the new cpuset placement, so as to preserve the cpuset-relative numbering of the nodes in that policy. An earlier fix handled such mempolicy rebinding for mempolicies attached to a task. This fix rebinds mempolicies attached to vma's (address ranges in a tasks address space.) Due to the need to hold the task->mm->mmap_sem semaphore while updating vma's, the rebinding of vma mempolicies has to be done when the cpuset memory placement is changed, at which time mmap_sem can be safely acquired. The tasks mempolicy is rebound later, when the task next attempts to allocate memory and notices that its task->cpuset_mems_generation is out-of-date with its cpusets mems_generation. Because walking the tasklist to find all tasks attached to a changing cpuset requires holding tasklist_lock, a spinlock, one cannot update the vma's of the affected tasks while doing the tasklist scan. In general, one cannot acquire a semaphore (which can sleep) while already holding a spinlock (such as tasklist_lock). So a list of mm references has to be built up during the tasklist scan, then the tasklist lock dropped, then for each mm, its mmap_sem acquired, and the vma's in that mm rebound. Once the tasklist lock is dropped, affected tasks may fork new tasks, before their mm's are rebound. A kernel global 'cpuset_being_rebound' is set to point to the cpuset being rebound (there can only be one; cpuset modifications are done under a global 'manage_sem' semaphore), and the mpol_copy code that is used to copy a tasks mempolicies during fork catches such forking tasks, and ensures their children are also rebound. When a task is moved to a different cpuset, it is easier, as there is only one task involved. It's mm->vma's are scanned, using the same mpol_rebind_policy() as used above. It may happen that both the mpol_copy hook and the update done via the tasklist scan update the same mm twice. This is ok, as the mempolicies of each vma in an mm keep track of what mems_allowed they are relative to, and safely no-op a second request to rebind to the same nodes. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: number_of_cpusets optimizationPaul Jackson1-1/+9
Easy little optimization hack to avoid actually having to call cpuset_zone_allowed() and check mems_allowed, in the main page allocation routine, __alloc_pages(). This saves several CPU cycles per page allocation on systems not using cpusets. A counter is updated each time a cpuset is created or removed, and whenever there is only one cpuset in the system, it must be the root cpuset, which contains all CPUs and all Memory Nodes. In that case, when the counter is one, all allocations are allowed. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: numa_policy_rebind cleanupPaul Jackson1-2/+10
Cleanup, reorganize and make more robust the mempolicy.c code to rebind mempolicies relative to the containing cpuset after a tasks memory placement changes. The real motivator for this cleanup patch is to lay more groundwork for the upcoming patch to correctly rebind NUMA mempolicies that are attached to vma's after the containing cpuset memory placement changes. NUMA mempolicies are constrained by the cpuset their task is a member of. When either (1) a task is moved to a different cpuset, or (2) the 'mems' mems_allowed of a cpuset is changed, then the NUMA mempolicies have embedded node numbers (for MPOL_BIND, MPOL_INTERLEAVE and MPOL_PREFERRED) that need to be recalculated, relative to their new cpuset placement. The old code used an unreliable method of determining what was the old mems_allowed constraining the mempolicy. It just looked at the tasks mems_allowed value. This sort of worked with the present code, that just rebinds the -task- mempolicy, and leaves any -vma- mempolicies broken, referring to the old nodes. But in an upcoming patch, the vma mempolicies will be rebound as well. Then the order in which the various task and vma mempolicies are updated will no longer be deterministic, and one can no longer count on the task->mems_allowed holding the old value for as long as needed. It's not even clear if the current code was guaranteed to work reliably for task mempolicies. So I added a mems_allowed field to each mempolicy, stating exactly what mems_allowed the policy is relative to, and updated synchronously and reliably anytime that the mempolicy is rebound. Also removed a useless wrapper routine, numa_policy_rebind(), and had its caller, cpuset_update_task_memory_state(), call directly to the rewritten policy_rebind() routine, and made that rebind routine extern instead of static, and added a "mpol_" prefix to its name, making it mpol_rebind_policy(). Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: implement cpuset_mems_allowedPaul Jackson1-1/+7
Provide a cpuset_mems_allowed() method, which the sys_migrate_pages() code needed, to obtain the mems_allowed vector of a cpuset, and replaced the workaround in sys_migrate_pages() to call this new method. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: combine refresh_mems and update_memsPaul Jackson1-2/+2
The important code paths through alloc_pages_current() and alloc_page_vma(), by which most kernel page allocations go, both called cpuset_update_current_mems_allowed(), which in turn called refresh_mems(). -Both- of these latter two routines did a tasklock, got the tasks cpuset pointer, and checked for out of date cpuset->mems_generation. That was a silly duplication of code and waste of CPU cycles on an important code path. Consolidated those two routines into a single routine, called cpuset_update_task_memory_state(), since it updates more than just mems_allowed. Changed all callers of either routine to call the new consolidated routine. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: memory pressure meterPaul Jackson1-0/+11
Provide a simple per-cpuset metric of memory pressure, tracking the -rate- that the tasks in a cpuset call try_to_free_pages(), the synchronous (direct) memory reclaim code. This enables batch managers monitoring jobs running in dedicated cpusets to efficiently detect what level of memory pressure that job is causing. This is useful both on tightly managed systems running a wide mix of submitted jobs, which may choose to terminate or reprioritize jobs that are trying to use more memory than allowed on the nodes assigned them, and with tightly coupled, long running, massively parallel scientific computing jobs that will dramatically fail to meet required performance goals if they start to use more memory than allowed to them. This patch just provides a very economical way for the batch manager to monitor a cpuset for signs of memory pressure. It's up to the batch manager or other user code to decide what to do about it and take action. ==> Unless this feature is enabled by writing "1" to the special file /dev/cpuset/memory_pressure_enabled, the hook in the rebalance code of __alloc_pages() for this metric reduces to simply noticing that the cpuset_memory_pressure_enabled flag is zero. So only systems that enable this feature will compute the metric. Why a per-cpuset, running average: Because this meter is per-cpuset, rather than per-task or mm, the system load imposed by a batch scheduler monitoring this metric is sharply reduced on large systems, because a scan of the tasklist can be avoided on each set of queries. Because this meter is a running average, instead of an accumulating counter, a batch scheduler can detect memory pressure with a single read, instead of having to read and accumulate results for a period of time. Because this meter is per-cpuset rather than per-task or mm, the batch scheduler can obtain the key information, memory pressure in a cpuset, with a single read, rather than having to query and accumulate results over all the (dynamically changing) set of tasks in the cpuset. A per-cpuset simple digital filter (requires a spinlock and 3 words of data per-cpuset) is kept, and updated by any task attached to that cpuset, if it enters the synchronous (direct) page reclaim code. A per-cpuset file provides an integer number representing the recent (half-life of 10 seconds) rate of direct page reclaims caused by the tasks in the cpuset, in units of reclaims attempted per second, times 1000. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: mempolicy one more nodemask conversionPaul Jackson1-2/+3
Finish converting mm/mempolicy.c from bitmaps to nodemasks. The previous conversion had left one routine using bitmaps, since it involved a corresponding change to kernel/cpuset.c Fix that interface by replacing with a simple macro that calls nodes_subset(), or if !CONFIG_CPUSET, returns (1). Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <christoph@lameter.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] slob: introduce the SLOB allocatorMatt Mackall1-0/+35
configurable replacement for slab allocator This adds a CONFIG_SLAB option under CONFIG_EMBEDDED. When CONFIG_SLAB is disabled, the kernel falls back to using the 'SLOB' allocator. SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer, similar to the original Linux kmalloc allocator that SLAB replaced. It's signicantly smaller code and is more memory efficient. But like all similar allocators, it scales poorly and suffers from fragmentation more than SLAB, so it's only appropriate for small systems. It's been tested extensively in the Linux-tiny tree. I've also stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not recommended). Here's a comparison for otherwise identical builds, showing SLOB saving nearly half a megabyte of RAM: $ size vmlinux* text data bss dec hex filename 3336372 529360 190812 4056544 3de5e0 vmlinux-slab 3323208 527948 190684 4041840 3dac70 vmlinux-slob $ size mm/{slab,slob}.o text data bss dec hex filename 13221 752 48 14021 36c5 mm/slab.o 1896 52 8 1956 7a4 mm/slob.o /proc/meminfo: SLAB SLOB delta MemTotal: 27964 kB 27980 kB +16 kB MemFree: 24596 kB 25092 kB +496 kB Buffers: 36 kB 36 kB 0 kB Cached: 1188 kB 1188 kB 0 kB SwapCached: 0 kB 0 kB 0 kB Active: 608 kB 600 kB -8 kB Inactive: 808 kB 812 kB +4 kB HighTotal: 0 kB 0 kB 0 kB HighFree: 0 kB 0 kB 0 kB LowTotal: 27964 kB 27980 kB +16 kB LowFree: 24596 kB 25092 kB +496 kB SwapTotal: 0 kB 0 kB 0 kB SwapFree: 0 kB 0 kB 0 kB Dirty: 4 kB 12 kB +8 kB Writeback: 0 kB 0 kB 0 kB Mapped: 560 kB 556 kB -4 kB Slab: 1756 kB 0 kB -1756 kB CommitLimit: 13980 kB 13988 kB +8 kB Committed_AS: 4208 kB 4208 kB 0 kB PageTables: 28 kB 28 kB 0 kB VmallocTotal: 1007312 kB 1007312 kB 0 kB VmallocUsed: 48 kB 48 kB 0 kB VmallocChunk: 1007264 kB 1007264 kB 0 kB (this work has been sponsored in part by CELF) From: Ingo Molnar <mingo@elte.hu> Fix 32-bitness bugs in mm/slob.c. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] remove get_task_struct_rcu()Paul E. McKenney1-12/+0
The latest set of signal-RCU patches does not use get_task_struct_rcu(). Attached is a patch that removes it. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] RCU signal handlingIngo Molnar1-2/+30
RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked instead of tasklist_lock read-locked. This is a scalability improvement on SMP and a preemption-latency improvement under PREEMPT_RCU. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: William Irwin <wli@holomorphy.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] consolidate asm/futex.hJeff Dike18-777/+72
Most of the architectures have the same asm/futex.h. This consolidates them into asm-generic, with the arches including it from their own asm/futex.h. In the case of UML, this reverts the old broken futex.h and goes back to using the same one as almost everyone else. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Kill L1_CACHE_SHIFT_MAXRavikiran G Thirumalai23-36/+4
Kill L1_CACHE_SHIFT from all arches. Since L1_CACHE_SHIFT_MAX is not used anymore with the introduction of INTERNODE_CACHE, kill L1_CACHE_SHIFT_MAX. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp ↵Ravikiran G Thirumalai4-8/+17
macros ____cacheline_maxaligned_in_smp is currently used to align critical structures and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find L1_CACHE_SHIFT_MAX useless. However, we have been using ____cacheline_maxaligned_in_smp to align structures on the internode cacheline size. As per Andi's suggestion, following patch kills ____cacheline_maxaligned_in_smp and introduces INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches. Arches needing L3/Internode cacheline alignment can define INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp With this patch, L1_CACHE_SHIFT_MAX can be killed Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] frv: miscellaneous changesDavid Howells9-3/+50
Fix a number of miscellanous items: (1) Declare lock sections in the linker script. (2) Recurse in the correct manner in the arch makefile. (3) asm/bug.h requires asm/linkage.h to be included first. One C file puts asm/bug.h first. (4) Add an empty RTC header file to avoid missing header file errors. (5) sg_dma_address() should use the dma_address member of a scatter list. (6) Add trivial pci_unmap support. (7) Add pgprot_noncached() (8) Discard u_quad_t. (9) Use ~0UL rather than ULONG_MAX in unistd.h in case the latter isn't declared. (10) Add an empty VGA header file to avoid missing header file errors. (11) Add an XOR header file to use the generic XOR stuff. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] frv: make get_user macro cast pointersDavid Howells1-4/+4
Make the get_user macro cast the source pointer to an appropriate type for the specified size. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] frv: add module support stubsDavid Howells1-4/+12
Add stubs for FRV module support. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] frv: supply various missing I/O access primitivesDavid Howells2-0/+127
Supply various I/O access primitives that are missing for the FRV arch: (*) mmiowb() (*) read*_relaxed() (*) ioport_*map() (*) ioread*(), iowrite*(), ioread*_rep() and iowrite*_rep() (*) pci_io*map() (*) check_signature() The patch also makes __is_PCI_addr() more efficient. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] frv: drop 8/16-bit xchg and cmpxchgDavid Howells1-91/+5
Drop support for 8-bit and 16-bit xchg and cmpxchg emulation and implements 32-bit xchg with the SWAP/SWAPI instruction. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09[PATCH] powerpc: sanitize header files for user space includesArnd Bergmann75-29/+183
include/asm-ppc/ had #ifdef __KERNEL__ in all header files that are not meant for use by user space, include/asm-powerpc does not have this yet. This patch gets us a lot closer there. There are a few cases where I was not sure, so I left them out. I have verified that no CONFIG_* symbols are used outside of __KERNEL__ any more and that there are no obvious compile errors when including any of the headers in user space libraries. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-08[PATCH] mempolicies: unexport get_vma_policy()Christoph Lameter1-3/+0
Since the numa_maps functionality is now in mempolicy.c we no longer need to export get_vma_policy(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] set_page_count() macro safetyAvishay Traeger1-1/+1
Fix set_page_count() macro to handle complex arguments. Signed-off-by: Avishay Traeger <atraeger@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpusets: swap migration interfacePaul Jackson1-0/+7
Add a boolean "memory_migrate" to each cpuset, represented by a file containing "0" or "1" in each directory below /dev/cpuset. It defaults to false (file contains "0"). It can be set true by writing "1" to the file. If true, then anytime that a task is attached to the cpuset so marked, the pages of that task will be moved to that cpuset, preserving, to the extent practical, the cpuset-relative placement of the pages. Also anytime that a cpuset so marked has its memory placement changed (by writing to its "mems" file), the tasks in that cpuset will have their pages moved to the cpusets new nodes, preserving, to the extent practical, the cpuset-relative placement of the moved pages. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] SwapMig: Extend parameters for migrate_pages()Christoph Lameter1-1/+2
Extend the parameters of migrate_pages() to allow the caller control over the fate of successfully migrated or impossible to migrate pages. Swap migration and direct migration will have the same interface after this patch so that patches can be independently applied to the policy layer and the core migration code. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] SwapMig: add_to_swap() avoid atomic allocationsChristoph Lameter1-1/+1
Add gfp_mask to add_to_swap add_to_swap does allocations with GFP_ATOMIC in order not to interfere with swapping. During migration we may have use add_to_swap extensively which may lead to out of memory errors. This patch makes add_to_swap take a parameter that specifies the gfp mask. The page migration code can then make add_to_swap use GFP_KERNEL. Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] SwapMig: CONFIG_MIGRATION fixesChristoph Lameter1-2/+1
Move move_to_lru, putback_lru_pages and isolate_lru in section surrounded by CONFIG_MIGRATION saving some codesize for single processor kernels. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Swap Migration V5: sys_migrate_pages interfaceChristoph Lameter6-4/+14
sys_migrate_pages implementation using swap based page migration This is the original API proposed by Ray Bryant in his posts during the first half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org. The intent of sys_migrate is to migrate memory of a process. A process may have migrated to another node. Memory was allocated optimally for the prior context. sys_migrate_pages allows to shift the memory to the new node. sys_migrate_pages is also useful if the processes available memory nodes have changed through cpuset operations to manually move the processes memory. Paul Jackson is working on an automated mechanism that will allow an automatic migration if the cpuset of a process is changed. However, a user may decide to manually control the migration. This implementation is put into the policy layer since it uses concepts and functions that are also needed for mbind and friends. The patch also provides a do_migrate_pages function that may be useful for cpusets to automatically move memory. sys_migrate_pages does not modify policies in contrast to Ray's implementation. The current code here is based on the swap based page migration capability and thus is not able to preserve the physical layout relative to it containing nodeset (which may be a cpuset). When direct page migration becomes available then the implementation needs to be changed to do a isomorphic move of pages between different nodesets. The current implementation simply evicts all pages in source nodeset that are not in the target nodeset. Patch supports ia64, i386 and x86_64. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Swap Migration V5: MPOL_MF_MOVE interfaceChristoph Lameter1-0/+3
Add page migration support via swap to the NUMA policy layer This patch adds page migration support to the NUMA policy layer. An additional flag MPOL_MF_MOVE is introduced for mbind. If MPOL_MF_MOVE is specified then pages that do not conform to the memory policy will be evicted from memory. When they get pages back in new pages will be allocated following the numa policy. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration supportChristoph Lameter1-0/+2
Include page migration if the system is NUMA or having a memory model that allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM). And: - Only include lru_add_drain_per_cpu if building for an SMP system. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Swap Migration V5: migrate_pages() functionChristoph Lameter1-0/+2
This adds the basic page migration function with a minimal implementation that only allows the eviction of pages to swap space. Page eviction and migration may be useful to migrate pages, to suspend programs or for remapping single pages (useful for faulty pages or pages with soft ECC failures) The process is as follows: The function wanting to migrate pages must first build a list of pages to be migrated or evicted and take them off the lru lists via isolate_lru_page(). isolate_lru_page determines that a page is freeable based on the LRU bit set. Then the actual migration or swapout can happen by calling migrate_pages(). migrate_pages does its best to migrate or swapout the pages and does multiple passes over the list. Some pages may only be swappable if they are not dirty. migrate_pages may start writing out dirty pages in the initial passes over the pages. However, migrate_pages may not be able to migrate or evict all pages for a variety of reasons. The remaining pages may be returned to the LRU lists using putback_lru_pages(). Changelog V4->V5: - Use the lru caches to return pages to the LRU Changelog V3->V4: - Restructure code so that applying patches to support full migration does require minimal changes. Rename swapout_pages() to migrate_pages(). Changelog V2->V3: - Extract common code from shrink_list() and swapout_pages() Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "Michael Kerrisk" <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swapChristoph Lameter1-0/+1
Add PF_SWAPWRITE to control a processes permission to write to swap. - Use PF_SWAPWRITE in may_write_to_queue() instead of checking for kswapd and pdflush - Set PF_SWAPWRITE flag for kswapd and pdflush Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Swap Migration V5: LRU operationsChristoph Lameter2-0/+25
This is the start of the `swap migration' patch series. Swap migration allows the moving of the physical location of pages between nodes in a numa system while the process is running. This means that the virtual addresses that the process sees do not change. However, the system rearranges the physical location of those pages. The main intent of page migration patches here is to reduce the latency of memory access by moving pages near to the processor where the process accessing that memory is running. The patchset allows a process to manually relocate the node on which its pages are located through the MF_MOVE and MF_MOVE_ALL options while setting a new memory policy. The pages of process can also be relocated from another process using the sys_migrate_pages() function call. Requires CAP_SYS_ADMIN. The migrate_pages function call takes two sets of nodes and moves pages of a process that are located on the from nodes to the destination nodes. Manual migration is very useful if for example the scheduler has relocated a process to a processor on a distant node. A batch scheduler or an administrator can detect the situation and move the pages of the process nearer to the new processor. sys_migrate_pages() could be used on non-numa machines as well, to force all of a particualr process's pages out to swap, if someone thinks that's useful. Larger installations usually partition the system using cpusets into sections of nodes. Paul has equipped cpusets with the ability to move pages when a task is moved to another cpuset. This allows automatic control over locality of a process. If a task is moved to a new cpuset then also all its pages are moved with it so that the performance of the process does not sink dramatically (as is the case today). Swap migration works by simply evicting the page. The pages must be faulted back in. The pages are then typically reallocated by the system near the node where the process is executing. For swap migration the destination of the move is controlled by the allocation policy. Cpusets set the allocation policy before calling sys_migrate_pages() in order to move the pages as intended. No allocation policy changes are performed for sys_migrate_pages(). This means that the pages may not faulted in to the specified nodes if no allocation policy was set by other means. The pages will just end up near the node where the fault occurred. There's another patch series in the pipeline which implements "direct migration". The direct migration patchset extends the migration functionality to avoid going through swap. The destination node of the relation is controllable during the actual moving of pages. The crutch of using the allocation policy to relocate is not necessary and the pages are moved directly to the target. Its also faster since swap is not used. And sys_migrate_pages() can then move pages directly to the specified node. Implement functions to isolate pages from the LRU and put them back later. This patch: An earlier implementation was provided by Hirokazu Takahashi <taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the memory hotplug project. From: Magnus This breaks out isolate_lru_page() and putpack_lru_page(). Needed for swap migration. Signed-off-by: Magnus Damm <magnus.damm@gmail.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] add schedule_on_each_cpu()Christoph Lameter1-0/+1
swap migration's isolate_lru_page() currently uses an IPI to notify other processors that the lru caches need to be drained if the page cannot be found on the LRU. The IPI interrupt may interrupt a processor that is just processing lru requests and cause a race condition. This patch introduces a new function run_on_each_cpu() that uses the keventd() to run the LRU draining on each processor. Processors disable preemption when dealing the LRU caches (these are per processor) and thus executing LRU draining from another process is safe. Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race condition. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Make high and batch sizes of per_cpu_pagelists configurableRohit Seth2-0/+3
As recently there has been lot of traffic on the right values for batch and high water marks for per_cpu_pagelists. This patch makes these two variables configurable through /proc interface. A new tunable /proc/sys/vm/percpu_pagelist_fraction is added. This entry controls the fraction of pages at most in each zone that are allocated for each per cpu page list. The min value for this is 8. It means that we don't allow more than 1/8th of pages in each zone to be allocated in any single per_cpu_pagelist. The batch value of each per cpu pagelist is also updated as a result. It is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) Signed-off-by: Rohit Seth <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] drop-pagecacheAndrew Morton2-0/+8
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to discard as much pagecache and/or reclaimable slab objects as it can. THis operation requires root permissions. It won't drop dirty data, so the user should run `sync' first. Caveats: a) Holds inode_lock for exorbitant amounts of time. b) Needs to be taught about NUMA nodes: propagate these all the way through so the discarding can be controlled on a per-node basis. This is a debugging feature: useful for getting consistent results between filesystem benchmarks. We could possibly put it under a config option, but it's less than 300 bytes. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] slab: remove unused align parameter from alloc_percpuPekka Enberg1-4/+3
__alloc_percpu and alloc_percpu both take an 'align' argument which is completely ignored. snmp6_mib_init() in net/ipv6/af_inet6.c attempts to use it, but it will be ignored. Therefore, remove the 'align' argument and fixup the lone caller. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41.Olaf Hering1-7/+1
Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41. Also remove unneeded declations, add a public function. drivers/base/memory.c:53: error: static declaration of 'register_memory_notifier' follows non-static declaration include/linux/memory.h:85: error: previous declaration of 'register_memory_notifier' was here drivers/base/memory.c:58: error: static declaration of 'unregister_memory_notifier' follows non-static declaration include/linux/memory.h:86: error: previous declaration of 'unregister_memory_notifier' was here drivers/base/memory.c:68: error: static declaration of 'register_memory' follows non-static declaration include/linux/memory.h:73: error: previous declaration of 'register_memory' was here Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] asm-generic/atomic.h needs types.hAndrew Morton1-0/+1
For BITS_PER_LONG Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09[PATCH] powerpc: G4+ oprofile supportAndy Fleming2-16/+51
This patch adds oprofile support for the 7450 and all its multitudinous derivatives. * Added 7450 (and derivatives) support for oprofile * Changed e500 cputable to have oprofile model and cpu_type fields * Added support for classic 32-bit performance monitor interrupt * Cleaned up common powerpc oprofile code to be as common as possible * Cleaned up oprofile_impl.h to reflect 32 bit classic code * Added 32-bit MMCRx bitfield definitions and SPR numbers Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: pci_address_to_pio fixBenjamin Herrenschmidt2-6/+6
This fixes pci_address_to_pio() to return an unsigned long (to be safe) and fixes a bug in the implementation that caused it to return a bogus IO port number Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Replace VMALLOCBASE with VMALLOC_STARTDavid Gibson2-10/+11
On ppc64, we independently define VMALLOCBASE and VMALLOC_START to be the same thing: the start of the vmalloc() area at 0xd000000000000000. VMALLOC_START is used much more widely, including in generic code, so this patch gets rid of the extraneous VMALLOCBASE. This does require moving the definitions of region IDs from page_64.h to pgtable.h, but they don't clearly belong in the former rather than the latter, anyway. While we're moving them, clean up the definitions of the REGION_IDs: - Abolish REGION_SIZE, it was only used once, to define REGION_MASK anyway - Define the specific region ids in terms of the REGION_ID() macro. - Define KERNEL_REGION_ID in terms of PAGE_OFFSET rather than KERNELBASE. It amounts to the same thing, but conceptually this is about the region of the linear mapping (which starts at PAGE_OFFSET) rather than of the kernel text itself (which is at KERNELBASE). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Experimental support for new G5 Macs (#2)Benjamin Herrenschmidt4-4/+8
This adds some very basic support for the new machines, including the Quad G5 (tested), and other new dual core based machines and iMac G5 iSight (untested). This is still experimental ! There is no thermal control yet, there is no proper handing of MSIs, etc.. but it boots, I have all 4 cores up on my machine. Compared to the previous version of this patch, this one adds DART IOMMU support for the U4 chipset and thus should work fine on setups with more than 2Gb of RAM. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: export PCI fixup routinelinas1-0/+1
There is code in the RPAPHP directory that is identical to this routine; I'll be removing that code in an upcoming patch, but this patch is needed to expose the function to make it callable. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Update MPIC workaroundsSegher Boessenkool1-1/+2
Cleanup the MPIC IO-APIC workarounds, make them a bit more generic, smaller and faster. Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Remove device_node addrs/n_addrBenjamin Herrenschmidt2-41/+9
The pre-parsed addrs/n_addrs fields in struct device_node are finally gone. Remove the dodgy heuristics that did that parsing at boot and remove the fields themselves since we now have a good replacement with the new OF parsing code. This patch also fixes a bunch of drivers to use the new code instead, so that at least pmac32, pseries, iseries and g5 defconfigs build. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Dont set 32bit cputable bits on 64bitAnton Blanchard1-9/+11
Milton and I were looking at the cputable code and it looks like we can set spurious bits on 64bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] ppc64: Add NUMA cpu summary at bootAnton Blanchard1-0/+4
We used to print a NUMA cpu summary at boot before the hotplug cpu code was added. This has been useful for catching machine configuration as well as firmware bugs in the past. This patch restores that functionality. An example of the output is: Node 0 CPUs: 0-7 Node 1 CPUs: 8-15 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] ppc32: Add TQM85xx (8540/8541/8555/8560) board supportKumar Gala1-0/+4
This patch adds support for the TQ Components TQM85xx modules. Currently the modules TQM8540/8541/8555/8560 are supported. Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: Improved SPU preemptability [part 2].Arnd Bergmann1-0/+1
This patch reduces lock complexity of SPU scheduler, particularly for involuntary preemptive switches. As a result the new code does a better job of mapping the highest priority tasks to SPUs. Lock complexity is reduced by using the system default workqueue to perform involuntary saves. In this way we avoid nasty lock ordering problems that the previous code had. A "minimum timeslice" for SPU contexts is also introduced. The intent here is to avoid thrashing. While the new scheduler does a better job at prioritization it still does nothing for fairness. From: Mark Nutter <mnutter@us.ibm.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: Improved SPU preemptability.Arnd Bergmann1-2/+3
This patch makes it easier to preempt an SPU context by having the scheduler hold ctx->state_sema for much shorter periods of time. As part of this restructuring, the control logic for the "run" operation is moved from arch/ppc64/kernel/spu_base.c to fs/spufs/file.c. Of course the base retains "bottom half" handlers for class{0,1} irqs. The new run loop will re-acquire an SPU if preempted. From: Mark Nutter <mnutter@us.ibm.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Add arch-dependent copy_oldmem_pageMichael Ellerman1-0/+2
Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Add arch dependent basic infrastructure for Kdump.Michael Ellerman1-1/+9
Implementing the machine_crash_shutdown which will be called by crash_kexec (called in case of a panic, sysrq etc.). Disable the interrupts, shootdown cpus using debugger IPI and collect regs for all CPUs. elfcorehdr= specifies the location of elf core header stored by the crashed kernel. This command line option will be passed by the kexec-tools to capture kernel. savemaxmem= specifies the actual memory size that the first kernel has and this value will be used for dumping in the capture kernel. This command line option will be passed by the kexec-tools to capture kernel. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Fixups for kernel linked at 32 MBMichael Ellerman1-1/+2
There's a few places where we need to fix things up for the kernel to work if it's linked at 32MB: - platforms/powermac/smp.c To start secondary cpus on pmac we patch the reset vector, which is fine. Except if we're above 32MB we don't have enough bits for an absolute branch, it needs to relative. - kernel/head_64.s - A few branches in the cpu hold code need to load the full target address and do a bctr. - after_prom_start needs to load PHYSICAL_START as the dest address, not 0. - The exception prolog needs to load the low word of the target adddress, not just the low halfword. - Fixup handling of the initial stab address. - kernel/setup_64.c smp_release_cpus() needs to write 1 to the spinloop flag near 0, not 32 MB. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Reroute interrupts from 0 + offset to PHYSICAL_START + offsetMichael Ellerman1-0/+13
Regardless of where the kernel's linked we always get interrupts at low addresses. This patch creates a trampoline in the first 3 pages of memory, where interrupts land, and patches those addresses to jump into the real kernel code at PHYSICAL_START. We also need to reserve the trampoline code and a bit more in prom.c Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Create a trampoline for the fwnmi vectorsMichael Ellerman1-0/+6
The fwnmi vectors can be anywhere < 32 MB, so we need to use a trampoline for them. The kdump kernel will register the trampoline addresses, which will then jump up to the real code above 32 MB. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Add CONFIG_CRASH_DUMPMichael Ellerman1-1/+9
This patch adds a Kconfig variable, CONFIG_CRASH_DUMP, which configures the built kernel for use as a Kdump kernel. Currently "all" this involves is changing the value of KERNELBASE to 32 MB. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: numa placement for dynamically added memoryMike Kravetz1-0/+8
This places dynamically added memory within the appropriate numa node. A new routine hot_add_scn_to_nid() replicates most of the memory scanning code in parse_numa_properties(). Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Separate usage of KERNELBASE and PAGE_OFFSETMichael Ellerman1-1/+15
This patch separates usage of KERNELBASE and PAGE_OFFSET. I haven't looked at any of the PPC32 code, if we ever want to support Kdump on PPC we'll have to do another audit, ditto for iSeries. This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1 gazillion for 64-bit. To get a physical address from a virtual one you subtract PAGE_OFFSET, _not_ KERNELBASE. KERNELBASE is the virtual address of the start of the kernel, it's often the same as PAGE_OFFSET, but _might not be_. If you want to know something's offset from the start of the kernel you should subtract KERNELBASE. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Add a is_kernel_addr() macroMichael Ellerman1-0/+6
There's a bunch of code that compares an address with KERNELBASE to see if it's a "kernel address", ie. >= KERNELBASE. The proper test is actually to compare with PAGE_OFFSET, since we're going to change KERNELBASE soon. So replace all of them with an is_kernel_addr() macro that does that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Propagate regs through to machine_crash_shutdownMichael Ellerman1-1/+1
Currently machine_crash_shutdown() gets a struct pt_regs, but doesn't pass it through to the ppc_md function, it should. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Update OF address parsersBenjamin Herrenschmidt4-6/+68
This updates the OF address parsers to return the IO flags indicating the type of address obtained. It also adds a PCI call for converting physical addresses that hit IO space into into IO tokens, and add routines that return the translated addresses into struct resource Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: udbg updatesBenjamin Herrenschmidt1-1/+1
The udbg low level io layer has an issue with udbg_getc() returning a char (unsigned on ppc) instead of an int, thus the -1 if you had no available input device could end up turned into 0xff, filling your display with bogus characters. This fixes it, along with adding a little blob to xmon to do a delay before exiting when getting an EOF and fixing the detection of ADB keyboards in udbg_adb.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: migrate common PCI hotplug codeLinas Vepstas1-0/+9
23-rpaphp-migrate.patch (parts) This patch moves some pci device add & remove code from the PCI hotplug directory to the arch/powerpc/kernel directory, and cleans it up a tad. The primary reason for this is that the code performs some fairly generic operations that are shared with the PCI error recovery code (living in the arch/powerpc/kernel directory). Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: make pcibios_claim_one_bus available to other codeLinas Vepstas1-0/+2
22-rpaphp-eliminate-dupe-code.patch (parts) The RPAPHP code contains two routines that appear to be gratuitous copies of very similar pci code. In particular, rpaphp_claim_resource ~~ pci_claim_resource rpadlpar_claim_one_bus == pcibios_claim_one_bus This makes pcibios_claim_one_bus from arch/powerpc/kernel/pci_64.c available to the RPAPHP code. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: PCI hotplug common code eliminationLinas Vepstas1-0/+10
20-rpaphp-eeh-cleanup.patch This patch move some code from the rpaphp directory, to the powerpc directory, where it should have been all along (Among other things, I need it in the powerpc directory for the PCI error recovery.) Please note that patch affects TWO maintainers: Paul, after applying the powerpc part, please ask that GregKH appli the PCI part. It is safe to have the powerpc part go in first. It would be bad to have the PCI part go in first. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Remove some unneeded fields from the pacaDavid Gibson1-5/+0
This patch removes several unnecessary fields from the paca: - next_jiffy_update_tb was simply unused. Remove trivially. - The exdsi exception save area was not used. There were plans to use it, but they never seem to have gone anywhere. If they ever do, we can put it back. Remove from the paca, and from asm-offsets.c - The default_decr field was used from asm, but was only ever assigned the value of tb_ticks_per_jiffy. Just access tb_ticks_per_jiffy from asm directly instead. Built and booted on POWER5 LPAR and iSeries RS64. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Remove ItLpRegSave area from the pacaDavid Gibson2-8/+8
On iSeries, the paca contains, amongst other things an ItLpRegSave structure used by the hypervisor to save registers. The hypervisor locates this area through a pointer at the beginning of the paca, so the structure itself can be located elsewhere. This patch moves the reg_save area out into its own array. This reduces the amount of iSeries specific gunk which is visible to general powerpc code via paca.h Built and booted on POWER5 LPAR and iSeries RS64. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Add back support for booting from BootX (#2)Benjamin Herrenschmidt1-0/+166
ARCH=powerpc couldn't boot from BootX as it uses a "different" way of getting in the kernel. This patch adds the necessary trampolines, creating a flattened device-tree from the tree passed from MacOS, and initializing the btext engine early for really-early debugging. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Unify udbg (#2)Benjamin Herrenschmidt4-14/+20
This patch unifies udbg for both ppc32 and ppc64 when building the merged achitecture. xmon now has a single "back end". The powermac udbg stuff gets enriched with some ADB capabilities and btext output. In addition, the early_init callback is now called on ppc32 as well, approx. in the same order as ppc64 regarding device-tree manipulations. The init sequences of ppc32 and ppc64 are getting closer, I'll unify them in a later patch. For now, you can force udbg to the scc using "sccdbg" or to btext using "btextdbg" on powermacs. I'll implement a cleaner way of forcing udbg output to something else than the autodetected OF output device in a later patch. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: serial port discovery (#2)Benjamin Herrenschmidt3-1/+9
This moves the discovery of legacy serial ports to a separate file, makes it common to ppc32 and ppc64, and reworks it to use the new OF address translators to get to the ports early. This new version can also detect some PCI serial cards using legacy chips and will probably match those discovered port with the default console choice. Only ppc64 gets udbg still yet, unifying udbg isn't finished yet. It also adds some speed-probing code to udbg so that the default console can come up at the same speed it was set to by the firmware. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Add OF address parsing code (#2)Benjamin Herrenschmidt3-0/+23
Parsing addresses extracted from Open Firmware isn't a simple matter. We have various bits of code that try to do it in various place, including some heuristics in prom.c that pre-parse addresses at boot and fill device-nodes "addrs", but those are dodgy at best and I want to deprecate them. So this patch introduces a new set of routines that should be capable of parsing most types of addresses and translating them into CPU physical addresses. It currently works for things on PCI busses and ISA busses and should work on "standard" busses like the root bus or the MacIO bus that don't put funky flags in addresses. If you have other bus types that do use funky flags, you'll have to add new bus type translators, which is fairly easy. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09ppc: remove duplicate bseip.hPaul Mackerras1-38/+0
include/asm-ppc/bseip.h is a duplicate of arch/ppc/platforms/bseip.h and is not referenced anywhere, so get rid of it. Pointed out by Marcelo Tosatti. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09powerpc: Update __NR_syscalls to account for SPU syscallsPaul Mackerras1-1/+1
A previous patch ended up not increasing __NR_syscalls to account for the new SPU syscalls (probably my fault). Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: cooperative scheduler supportArnd Bergmann2-7/+13
This adds a scheduler for SPUs to make it possible to use more logical SPUs than physical ones are present in the system. Currently, there is no support for preempting a running SPU thread, they have to leave the SPU by either triggering an event on the SPU that causes it to return to the owning thread or by sending a signal to it. This patch also adds operations that enable accessing an SPU in either runnable or saved state. We use an RW semaphore to protect the state of the SPU from changing underneath us, while we are holding it readable. In order to change the state, it is acquired writeable and a context save or restore is executed before downgrading the semaphore to read-only. From: Mark Nutter <mnutter@us.ibm.com>, Uli Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] kernel-side context switch code for spufsMark Nutter1-1/+0
This adds the code needed to perform a context switch from spufs, following the recommended 76-step sequence. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: switchable spu contextsMark Nutter2-0/+332
Add some infrastructure for saving and restoring the context of an SPE. This patch creates a new structure that can hold the whole state of a physical SPE in memory. It also contains code that avoids races during the context switch and the binary code that is loaded to the SPU in order to access its registers. The actual PPE- and SPE-side context switch code are two separate patches. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: The SPU file system, baseArnd Bergmann3-0/+505
This is the current version of the spu file system, used for driving SPEs on the Cell Broadband Engine. This release is almost identical to the version for the 2.6.14 kernel posted earlier, which is available as part of the Cell BE Linux distribution from http://www.bsc.es/projects/deepcomputing/linuxoncell/. The first patch provides all the interfaces for running spu application, but does not have any support for debugging SPU tasks or for scheduling. Both these functionalities are added in the subsequent patches. See Documentation/filesystems/spufs.txt on how to use spufs. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: IBMEBUS bus supportHeiko J Schick1-0/+83
This patch adds the necessary core bus support used by device drivers that sit on the IBM GX bus on modern pSeries machines like the Galaxy infiniband for example. It provide transparent DMA ops (the low level driver works with virtual addresses directly) along with a simple bus layer using the Open Firmware matching routines. Signed-off-by: Heiko J Schick <schickhj@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] syscall entry/exit revampDavid Woodhouse2-4/+10
This cleanup patch speeds up the null syscall path on ppc64 by about 3%, and brings the ppc32 and ppc64 code slightly closer together. The ppc64 code was checking current_thread_info()->flags twice in the syscall exit path; once for TIF_SYSCALL_T_OR_A before disabling interrupts, and then again for TIF_SIGPENDING|TIF_NEED_RESCHED etc after disabling interrupts. Now we do the same as ppc32 -- check the flags only once in the fast path, and re-enable interrupts if necessary in the ptrace case. The patch abolishes the 'syscall_noerror' member of struct thread_info and replaces it with a TIF_NOERROR bit in the flags, which is handled in the slow path. This shortens the syscall entry code, which no longer needs to clear syscall_noerror. The patch adds a TIF_SAVE_NVGPRS flag which causes the syscall exit slow path to save the non-volatile GPRs into a signal frame. This removes the need for the assembly wrappers around sys_sigsuspend(), sys_rt_sigsuspend(), et al which existed solely to save those registers in advance. It also means I don't have to add new wrappers for ppoll() and pselect(), which is what I was supposed to be doing when I got distracted into this... Finally, it unifies the ppc64 and ppc32 methods of handling syscall exit directly into a signal handler (as required by sigsuspend et al) by introducing a TIF_RESTOREALL flag which causes _all_ the registers to be reloaded from the pt_regs by taking the ret_from_exception path, instead of the normal syscall exit path which stomps on the callee-saved GPRs. It appears to pass an LTP test run on ppc64, and passes basic testing on ppc32 too. Brief tests of ptrace functionality with strace and gdb also appear OK. I wouldn't send it to Linus for 2.6.15 just yet though :) Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: moved ipic code to arch/powerpcKumar Gala1-0/+0
Moved 83xx and QUICC Engine interrupt handling code into arch/powerpc as a precursor of getting 83xx sub-arch building in arch/powerpc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Merge kexecMichael Ellerman2-6/+9
This patch merges, to some extent, the PPC32 and PPC64 kexec implementations. We adopt the PPC32 approach of having ppc_md callbacks for the kexec functions. The current PPC64 implementation becomes the "default" implementation for PPC64 which platforms can select if they need no special treatment. I've added these default callbacks to pseries/maple/cell/powermac, this means iSeries no longer supports kexec - but it never worked anyway. I've renamed PPC32's machine_kexec_simple to default_machine_kexec, inline with PPC64. Judging by the comments it might be better named machine_kexec_non_of, or something, but at the moment it's the only implementation for PPC32 so it's the "default". Kexec requires machine_shutdown(), which is in machine_kexec.c on PPC32, but we already have in setup-common.c on powerpc. All this does is call ppc_md.nvram_sync, which only powermac implements, so instead make machine_shutdown a ppc_md member and have it call core99_nvram_sync directly on powermac. I've also stuck relocate_kernel.S into misc_32.S for powerpc. Built for ARCH=ppc, and 32 & 64 bit ARCH=powerpc, with KEXEC=y/n. Booted on P5 LPAR and successfully kexec'ed. Should apply on top of 493f25ef4087395891c99fcfe2c72e62e293e89f. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] PPC_PREP: remove unneeded exportsAdrian Bunk1-1/+0
This patch removes the EXPORT_SYMBOL'ed but completely unused variable ucSystemType and removes the unneeded EXPORT_SYMBOL(_prep_type). Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Tom Rini <trini@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-07[IPV4]: make ip_fragment() staticAdrian Bunk1-1/+0
Since there's no longer any external user of ip_fragment() we can make it static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[NETFILTER]: Add dummy nf_hook{_thresh}() when NETFILTER is disabled.David S. Miller1-0/+15
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[NETFILTER]: Add ipt_policy/ip6t_policy matchesPatrick McHardy2-0/+104
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[NETFILTER]: Handle NAT in IPsec policy checksPatrick McHardy1-0/+16
Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi of the original packet from the conntrack information for IPsec policy checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[NETFILTER]: Redo policy lookups after NAT when neccessaryPatrick McHardy1-0/+1
When NAT changes the key used for the xfrm lookup it needs to be done again. If a new policy is returned in POST_ROUTING the packet needs to be passed to xfrm4_output_one manually after all hooks were called because POST_ROUTING is called with fixed okfn (ip_finish_output). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harderPatrick McHardy4-3/+7
ip_route_me_harder doesn't use the port numbers of the xfrm lookup and uses ip_route_input for non-local addresses which doesn't do a xfrm lookup, ip6_route_me_harder doesn't do a xfrm lookup at all. Use xfrm_decode_session and do the lookup manually, make sure both only do the lookup if the packet hasn't been transformed already. Makeing sure the lookup only happens once needs a new field in the IP6CB, which exceeds the size of skb->cb. The size of skb->cb is increased to 48b. Apparently the IPv6 mobile extensions need some more room anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[IPV4]: reset IPCB flags when neccessaryPatrick McHardy1-5/+3
Reset IPSKB_XFRM_TUNNEL_SIZE flags in ipip and ip_gre hard_start_xmit function before the packet reenters IP. This is neccessary so the encapsulated packets are checked not to be oversized in xfrm4_output.c again. Reset all flags in sit when a packet changes its address family. Also remove some obsolete IPSKB flags. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[IPV4/6]: Netfilter IPsec input hooksPatrick McHardy1-0/+2
When the innermost transform uses transport mode the decapsulated packet is not visible to netfilter. Pass the packet through the PRE_ROUTING and LOCAL_IN hooks again before handing it to upper layer protocols to make netfilter-visibility symetrical to the output path. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[IPV6]: Move nextheader offset to the IP6CBPatrick McHardy3-4/+5
Move nextheader offset to the IP6CB to make it possible to pass a packet to ip6_input_finish multiple times and have it skip already parsed headers. As a nice side effect this gets rid of the manual hopopts skipping in ip6_input_finish. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[XFRM]: Netfilter IPsec output hooksPatrick McHardy2-34/+38
Call netfilter hooks before IPsec transforms. Packets visit the FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode transform. Patch from Herbert Xu <herbert@gondor.apana.org.au>: Move the loop from dst_output into xfrm4_output/xfrm6_output since they're the only ones who need to it. xfrm{4,6}_output_one() processes the first SA all subsequent transport mode SAs and is called in a loop that calls the netfilter hooks between each two calls. In order to avoid the tail call issue, I've added the inline function nf_hook which is nf_hook_slow plus the empty list check. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds64-302/+291
2006-01-07Merge master.kernel.org:/home/rmk/linux-2.6-serialLinus Torvalds1-0/+17
2006-01-07Merge master.kernel.org:/home/rmk/linux-2.6-mmcLinus Torvalds1-0/+5
2006-01-07[ARM] byteorder.h needs linux/compiler.hRussell King1-0/+1
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-07[ARM] Move asm/hardware/clock.h to linux/clk.hRussell King2-2/+3
This is needs to be visible to other architectures using the AMBA bus and peripherals. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-07Merge with Linus' kernel.Russell King315-5850/+6299
2006-01-07[ARM] Move AMBA include files to include/linux/amba/Russell King7-3/+3
Since the ARM AMBA bus is used on MIPS as well as ARM, we need to make the bus available for other architectures to use. Move the AMBA include files from include/asm-arm/hardware/ to include/linux/amba/ Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-07[ARM] 3239/1: Add ARM optimised swab32Andre McCurdy1-1/+14
Patch from Andre McCurdy Replaces generic swab32 routine with a more ARM friendly version. Reduces kernel text size by approx 1200 bytes when compiled with 3.4.4 and approx 2400 bytes with 4.0.2 Probably some performance benefit as well. Signed-off-by: Andre McCurdy <armccurdy@yahoo.co.uk> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-06Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6Linus Torvalds5-55/+20
2006-01-06Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds4-9/+9
2006-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuildLinus Torvalds2-4/+4
2006-01-06[NET]: Endian-annotate in_aton()Alexey Dobriyan1-1/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06[NET]: Endian-annotate struct iphdrAlexey Dobriyan1-5/+5
And fix trivial warnings that emerged. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06[NET]: Change sk_run_filter()'s return type in net/core/filter.cKris Katterjohn2-3/+3
It should return an unsigned value, and fix sk_filter() as well. Signed-off-by: Kris Katterjohn <kjak@ispwest.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06Merge ../torvalds-2.6/Greg Kroah-Hartman124-1969/+1843
2006-01-06kbuild: un-stringnify KBUILD_MODNAMESam Ravnborg2-4/+4
Now when kbuild passes KBUILD_MODNAME with "" do not __stringify it when used. Remove __stringnify for all users. This also fixes the output of: $ ls -l /sys/module/ drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia_core drwxr-xr-x 3 root root 0 2006-01-05 14:24 "processor" drwxr-xr-x 3 root root 0 2006-01-05 14:24 "psmouse" The quoting of the module names will be gone again. Thanks to GregKH + Kay Sievers for reproting this. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-01-06SUNRPC: Update the spkm3 code to use the make_checksum interfaceJ. Bruce Fields1-1/+1
Also update the tokenlen calculations to accomodate g_token_size(). Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Allow entries in the idmap cache to expireTrond Myklebust1-0/+2
If someone changes the uid/gid mapping in userland, then we do eventually want those changes to be propagated to the kernel. Currently the kernel assumes that it may cache entries forever. Add an expiration time + garbage collector for idmap entries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: Ensure client closes the socket when server initiates a closeTrond Myklebust1-0/+1
If the server decides to close the RPC socket, we currently don't actually respond until either another RPC call is scheduled, or until xprt_autoclose() gets called by the socket expiry timer (which may be up to 5 minutes later). This patch ensures that xprt_autoclose() is called much sooner if the server closes the socket. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: get rid of cl_chattyChuck Lever1-1/+0
Clean up: Every ULP that uses the in-kernel RPC client, except the NLM client, sets cl_chatty. There's no reason why NLM shouldn't set it, so just get rid of cl_chatty and always be verbose. Test-plan: Compile with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: transport switch API for setting port numberChuck Lever1-0/+1
At some point, transport endpoint addresses will no longer be IPv4. To hide the structure of the rpc_xprt's address field from ULPs and port mappers, add an API for setting the port number during an RPC bind operation. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: new interface to force an RPC rebindChuck Lever1-0/+1
We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols. Start by creating an API to force RPC rebind, replacing logic that simply sets cl_port to zero. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: switchable buffer allocationChuck Lever2-7/+6
Add RPC client transport switch support for replacing buffer management on a per-transport basis. In the current IPv4 socket transport implementation, RPC buffers are allocated as needed for each RPC message that is sent. Some transport implementations may choose to use pre-allocated buffers for encoding, sending, receiving, and unmarshalling RPC messages, however. For transports capable of direct data placement, the buffers can be carved out of a pre-registered area of memory rather than from a slab cache. Test-plan: Millions of fsx operations. Performance characterization with "sio" and "iozone". Use oprofile and other tools to look for significant regression in CPU utilization. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NLM: Further cancel fixesJ. Bruce Fields1-1/+1
If the server receives an NLM cancel call and finds no waiting lock to cancel, then chances are the lock has already been applied, and the client just hadn't yet processed the NLM granted callback before it sent the cancel. The Open Group text, for example, perimts a server to return either success (LCK_GRANTED) or failure (LCK_DENIED) in this case. But returning an error seems more helpful; the client may be able to use it to recognize that a race has occurred and to recover from the race. So, modify the relevant functions to return an error in this case. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: net/sunrpc/xdr.c: remove xdr_decode_string()Adrian Bunk1-1/+0
This patch removes ths unused function xdr_decode_string(). Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Charles Lever <Charles.Lever@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Allow user to set the port used by the NFSv4 callback channelTrond Myklebust1-0/+11
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Ensure DELEGRETURN returns attributesTrond Myklebust1-0/+6
Upon return of a write delegation, the server will almost always bump the change attribute. Ensure that we pick up that change so that we don't invalidate our data cache unnecessarily. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Make stat() return updated mtimes after a write()Trond Myklebust1-0/+1
The SuS states that a call to write() will cause mtime to be updated on the file. In order to satisfy that requirement, we need to flush out any cached writes in nfs_getattr(). Speed things up slightly by not committing the writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>