commit d223a60106891bfe46febfacf46b20cd8509aaad Author: Linus Torvalds Date: Wed Oct 4 19:57:05 2006 -0700 Linux 2.6.19-rc1 Merge window closed.. commit 77dc2db6d1d2703ee4e83d4b3dbecf4e06a910e6 Author: Mark Assad Date: Thu Oct 5 12:25:05 2006 +1000 [PATCH] itmtouch: fix inverted flag to indicate touch location correctly, correct white space There is a bug in the current version of the itmtouch USB touchscreen driver. The if statment that checks if pressure is being applied to the touch screen is now missing a ! (not), so events are no longer being reported correctly. The original source code for this line was as follows: #define UCP(x) ((unsigned char*)(x)) #define UCOM(x,y,z) ((UCP((x)->transfer_buffer)[y]) & (z)) ... if (!UCOM(urb, 7, 0x20)) { And was cleaned to: unsigned char *data = urb->transfer_buffer; .... if (data[7] & 0x20) { (note the lack of '!') This has been tested on an LG L1510BF and an LG1510SF touch screen. Signed-off-by: Mark Assad Signed-off-by: Linus Torvalds commit 1604f31895dcdb42edf6511ce7ef0546ff92c8e5 Author: Matthew Wilcox Date: Wed Oct 4 15:12:52 2006 -0600 [PA-RISC] Fix time.c for new do_timer() calling convention do_timer now wants to know how many ticks have elapsed. Now that we have to calculate that, we can eliminate some of the clever code that avoided having to calculate that. Also add some more documentation. I'd like to thank Grant Grundler for helping me with this. Signed-off-by: Matthew Wilcox commit 1070c9655b90016ec4c9b59c402292e57ee15885 Author: Matthew Wilcox Date: Wed Oct 4 13:37:41 2006 -0600 [PA-RISC] Fix must_check warnings in drivers.c Panic if we can't register the parisc bus or the root parisc device. There's no way we can boot without them, so let the user know ASAP. If we can't register a parisc device, handle the failure gracefully. Signed-off-by: Matthew Wilcox commit f64ef295032d07345ca26bf4876a1577c4dccb37 Author: Matthew Wilcox Date: Wed Oct 4 13:33:53 2006 -0600 [PA-RISC] Fix parisc_newuname() The utsname virtualisation broke parisc_newuname compilation. Rewrite the implementation to call sys_newuname() like sparc64 does. Signed-off-by: Matthew Wilcox commit ccd6c355e89a21d9047ae19471629758d3a01959 Author: Matthew Wilcox Date: Wed Oct 4 13:27:45 2006 -0600 [PA-RISC] Remove warning from pci.c max() doesn't like comparing an unsigned long and a resource_size_t, so make the local variables resource_size_t too. Signed-off-by: Matthew Wilcox commit 15c130c1cde38da528f82efce882e8d7632f4d91 Author: Matthew Wilcox Date: Wed Oct 4 13:18:25 2006 -0600 [PA-RISC] Fix filldir warnings filldir_t now takes a u64, not an ino_t. Signed-off-by: Matthew Wilcox commit 17cca07237617a2d712eb44cffd8720055e61291 Author: Matthew Wilcox Date: Wed Oct 4 13:16:10 2006 -0600 [PA-RISC] Fix sys32_sysctl When CONFIG_SYSCTL_SYSCALL isn't defined, do_sysctl doesn't exist and we fail to link. Fix with an ifdef, the same way sparc64 did. Also add some minor changes to be more like sparc64. Signed-off-by: Matthew Wilcox commit ee9f4b5d95d03d1546f0d06cbe384bd4ab97bcba Author: Matthew Wilcox Date: Wed Oct 4 13:08:33 2006 -0600 [PA-RISC] Fix sba_iommu compilation klist_iter_exit() only takes one parameter. Also fix warning by adding additional brackets. Signed-off-by: Matthew Wilcox commit 43b4f4061cf54aa225a1e94a969450ccf5305cd9 Author: Arnd Bergmann Date: Wed Oct 4 17:26:24 2006 +0200 [POWERPC] cell: fix bugs found by sparse - Some long constants should be marked 'ul'. - When using desc->handler_data to pass an __iomem register area, we need to add casts to and from __iomem. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit f7e2ce788677ca0996d360202b91524db894c7b2 Author: Arnd Bergmann Date: Wed Oct 4 17:26:23 2006 +0200 [POWERPC] spiderpic: enable new style devtree support This enables support for new firmware test releases. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 68272047c51145a8aa4f3b6ae27edae6986c28cc Author: Arnd Bergmann Date: Wed Oct 4 17:26:22 2006 +0200 [POWERPC] Update cell_defconfig This adds defaults for new configuration options added since 2.6.18 and it enables the option for 64kb pages by default. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 867672777964b9309e4e914fe097648c938b67b2 Author: Arnd Bergmann Date: Wed Oct 4 17:26:21 2006 +0200 [POWERPC] spufs: add infrastructure for finding elf objects This adds an 'object-id' file that the spe library can use to store a pointer to its ELF object. This was originally meant for use by oprofile, but is now also used by the GNU debugger, if available. In order for oprofile to find the location in an spu-elf binary where an event counter triggered, we need a way to identify the binary in the first place. Unfortunately, that binary itself can be embedded in a powerpc ELF binary. Since we can assume it is mapped into the effective address space of the running process, have that one write the pointer value into a new spufs file. When a context switch occurs, pass the user value to the profiler so that can look at the mapped file (with some care). Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 7650f2f2c367242a2092908794b4486876baf6c7 Author: Arnd Bergmann Date: Wed Oct 4 17:26:20 2006 +0200 [POWERPC] spufs: support new OF device tree format The properties we used traditionally in the device tree are somewhat nonstandard. This adds support for a more conventional format using 'interrupts' and 'reg' properties. The interrupts are specified in three cells (class 0, 1 and 2) and registered at the interrupt-parent. The reg property contains either three or four register areas in the order 'local-store', 'problem', 'priv2', and 'priv1', so the priv1 one can be left out in case of hypervisor driven systems that access these through hcalls. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit e1dbff2bafa83f839ef15f51904b0cce9fc89387 Author: Arnd Bergmann Date: Wed Oct 4 17:26:19 2006 +0200 [POWERPC] spufs: add support for read/write on cntl Writing to cntl can be used to stop execution on the spu and to restart it, reading from cntl gives the contents of the current status register. The access is always in ascii, as for most other files. This was always meant to be there, but we had a little problem with writing to runctl so it was left out so far. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 772920e594df25f2011ca49abd9c8b85c4820cdc Author: Arnd Bergmann Date: Wed Oct 4 17:26:18 2006 +0200 [POWERPC] spufs: remove support for ancient firmware Any firmware that still uses the 'spc' nodes already stopped running for other reasons, so let's get rid of this. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit cdcc89bb1c6e886a55fe00e2de3b9c65d41674c2 Author: Arnd Bergmann Date: Wed Oct 4 17:26:17 2006 +0200 [POWERPC] spufs: make mailbox functions handle multiple elements Since libspe2 will provide a function that can read/write multiple mailbox elements at once, the kernel should handle that efficiently. read/write on the three mailbox files can now access the spe context multiple times to operate on any number of mailbox data elements. If the spu application keeps writing to its outbound mailbox, the read call will pick up all the data in a single system call. Unfortunately, if the user passes an invalid pointer, we may lose a mailbox element on read, since we can't put it back. This probably impossible to solve, if the user also accesses the mailbox through direct register access. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit ac91cb8dae061ced64e475d0d70fac4a95298819 Author: Arnd Bergmann Date: Wed Oct 4 17:26:16 2006 +0200 [POWERPC] spufs: use correct pg_prot for mapping SPU local store This hopefully fixes a long-standing bug in the spu file system. An spu context comes with local memory that can be either saved in kernel pages or point directly to a physical SPE. When mapping the physical SPE, that mapping needs to be cache-inhibited. For simplicity, we used to map the kernel backing memory that way too, but unfortunately that was not only inefficient, but also incorrect because the same page could then be accessed simultaneously through a cacheable and a cache-inhibited mapping, which is not allowed by the powerpc specification and in our case caused data inconsistency for which we did a really ugly workaround in user space. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 6263203ed6e9ff107129a1ebe613290b342a4465 Author: Arnd Bergmann Date: Wed Oct 4 17:26:15 2006 +0200 [POWERPC] spufs: Add infrastructure needed for gang scheduling Add the concept of a gang to spufs as a new type of object. So far, this has no impact whatsover on scheduling, but makes it possible to add that later. A new type of object in spufs is now a spu_gang. It is created with the spu_create system call with the flags argument set to SPU_CREATE_GANG (0x2). Inside of a spu_gang, it is then possible to create spu_context objects, which until now was only possible at the root of spufs. There is a new member in struct spu_context pointing to the spu_gang it belongs to, if any. The spu_gang maintains a list of spu_context structures that are its children. This information can then be used in the scheduler in the future. There is still a bug that needs to be resolved in this basic infrastructure regarding the order in which objects are removed. When the spu_gang file descriptor is closed before the spu_context descriptors, we leak the dentry and inode for the gang. Any ideas how to cleanly solve this are appreciated. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 9add11daeee2f6d69f6b86237f197824332a4a3b Author: Arnd Bergmann Date: Wed Oct 4 17:26:14 2006 +0200 [POWERPC] spufs: implement error event delivery to user space This tries to fix spufs so we have an interface closer to what is specified in the man page for events returned in the third argument of spu_run. Fortunately, libspe has never been using the returned contents of that register, as they were the same as the return code of spu_run (duh!). Unlike the specification that we never implemented correctly, we now require a SPU_CREATE_EVENTS_ENABLED flag passed to spu_create, in order to get the new behavior. When this flag is not passed, spu_run will simply ignore the third argument now. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 28347bce8a837258e737873a55d31f2f424a6ea6 Author: HyeonSeung Jang Date: Wed Oct 4 17:26:13 2006 +0200 [POWERPC] spufs: fix context switch during page fault For better explanation, I break down the page fault handling into steps: 1) There is a page fault caused by DMA operation initiated by SPU and DMA is suspended. 2) The interrupt handler 'spu_irq_class_1()/__spu_trap_data_map()' is called and it just wakes up the sleeping spe-manager thread. 3) by PPE scheduler, the corresponding bottom half, spu_irq_class_1_bottom() is called in process context and DMA is restarted. There can be a quite large time gap between 2) and 3) and I found the following problem: Between 2) and 3) If the context becomes unbound, 3) is not executed because when the spe-manager thread is awaken, the context is already saved. (This situation can happen, for example, when a high priority spe thread newly started in that time gap) But the actual problem is that the corresponding SPU context does not work even if it is bound again to a SPU. Besides I can see the following warning in mambo simulator when the context becomes unbound(in save_mfc_cmd()), i.e. when unbind() is called for the context after step 2) before 3) : 'WARNING: 61392752237: SPE2: MFC_CMD_QUEUE channel count of 15 is inconsistent with number of available DMA queue entries of 16' After I go through available documents, I found that the problem is because the suspended DMA is not restarted when it is bound again. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit a68cf983f635930ea35f9e96b27d96598550dea0 Author: Mark Nutter Date: Wed Oct 4 17:26:12 2006 +0200 [POWERPC] spufs: scheduler support for NUMA. This patch adds NUMA support to the the spufs scheduler. The new arch/powerpc/platforms/cell/spufs/sched.c is greatly simplified, in an attempt to reduce complexity while adding support for NUMA scheduler domains. SPUs are allocated starting from the calling thread's node, moving to others as supported by current->cpus_allowed. Preemption is gone as it was buggy, but should be re-enabled in another patch when stable. The new arch/powerpc/platforms/cell/spu_base.c maintains idle lists on a per-node basis, and allows caller to specify which node(s) an SPU should be allocated from, while passing -1 tells spu_alloc() that any node is allowed. Since the patch removes the currently implemented preemptive scheduling, it is technically a regression, but practically all users have since migrated to this version, as it is part of the IBM SDK and the yellowdog distribution, so there is not much point holding it back while the new preemptive scheduling patch gets delayed further. Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit 27d5bf2a35c0762f1358e9ef39776733cd942121 Author: Benjamin Herrenschmidt Date: Wed Oct 4 17:26:11 2006 +0200 [POWERPC] spufs: cell spu problem state mapping updates This patch adds a new "psmap" file to spufs that allows mmap of all of the problem state mapping of SPEs. It is compatible with 64k pages. In addition, it removes mmap ability of individual files when using 64k pages, with the exception of signal1 and signal2 which will both map the entire 64k page holding both registers. It also removes CONFIG_SPUFS_MMAP as there is no point in not building mmap support in spufs. It goes along a separate patch to libspe implementing usage of that new file to access problem state registers. Another patch will follow up to fix races opened up by accessing the 'runcntl' register directly, which is made possible with this patch. Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Arnd Bergmann Signed-off-by: Paul Mackerras commit afaf5a2d341d33b66b47c2716a263ce593460a08 Author: David Somayajulu Date: Tue Sep 19 10:28:00 2006 -0700 [SCSI] Initial Commit of qla4xxx open-iSCSI driver for Qlogic Corporation's iSCSI HBAs Signed-off-by: Ravi Anand Signed-off-by: David Somayajulu Signed-off-by: Doug Maxey Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit ed542bed126caeefc6546b276e4af852d4d34f33 Author: Jeff Garzik Date: Wed Oct 4 07:05:11 2006 -0400 [SCSI] raid class: handle component-add errors Signed-off-by: Jeff Garzik Signed-off-by: James Bottomley commit 83aabc1be551dd1f07266c125ff48ec62a2ce515 Author: Jeff Garzik Date: Wed Oct 4 06:34:03 2006 -0400 [SCSI] SCSI megaraid_sas: handle thrown errors - handle clear_user() error - handle and properly unwind from sysfs errors thrown during mod init - adjust order of calls in megasas_exit() to precisely match registration order in megasas_init() Signed-off-by: Jeff Garzik Updated for extra attribute and Signed-off-by: James Bottomley commit bb0766204c81d6bd01532476aec4e512c960fb4d Author: Jeff Garzik Date: Wed Oct 4 06:19:18 2006 -0400 [SCSI] SCSI aic94xx: handle sysfs errors Handle and unwind from errors returned by driver model functions. Signed-off-by: Jeff Garzik Signed-off-by: James Bottomley commit 13026a6b985b9d1e19330d5656e211f15b5aca3b Author: Jeff Garzik Date: Wed Oct 4 06:00:38 2006 -0400 [SCSI] SCSI st: fix error handling in module init, sysfs - Notice and handle sysfs errors in module init, tape init - Properly unwind errors in module init - Remove bogus st_sysfs_class==NULL test, it is guaranteed !NULL at that point Signed-off-by: Jeff Garzik Signed-off-by: James Bottomley commit 5e4009ba3d5af40f5615fdb4304cc4a9947cca0a Author: Jeff Garzik Date: Wed Oct 4 05:32:54 2006 -0400 [SCSI] SCSI sd: fix module init/exit error handling - Properly handle and unwind errors in init_sd(). Fixes leaks on error, if class_register() or scsi_register_driver() failed. - Ensure that exit_sd() execution order is the perfect inverse of initialization order. FIXME: If some-but-not-all register_blkdev() calls fail, we wind up calling unregister_blkdev() for block devices we did not register. This was a pre-existing bug. Signed-off-by: Jeff Garzik Signed-off-by: James Bottomley commit 37e0333c68ca9cbddfc0108e1889556287563df0 Author: Jeff Garzik Date: Wed Oct 4 05:23:04 2006 -0400 [SCSI] SCSI osst: add error handling to module init, sysfs - check all sysfs-related return codes, and propagate them back to callers - properly unwind errors in osst_probe(), init_osst(). This fixes a leak that occured if scsi driver registration failed, and fixes an oops if sysfs creation returned an error. (unrelated) - kzalloc() cleanup in new_tape_buf() Signed-off-by: Jeff Garzik Signed-off-by: James Bottomley commit de77aaff5f0178f44867f131deb5e2cb1610fe6b Author: Henne Date: Wed Oct 4 10:22:09 2006 +0200 [SCSI] scsi: remove hosts.h Remove the obsolete hosts.h file under drivers/scsi. Signed-off-by: Henrik Kretzschmar Signed-off-by: James Bottomley commit c1278289363d9976c81b3b2512621fe152280e82 Author: Henne Date: Wed Oct 4 09:33:47 2006 +0200 [SCSI] scsi: Scsi_Cmnd convertion in aic7xxx_old.c Changes the obsolete Scsi_Cmnd to struct scsi_cmnd in aic7xxx_old.c. Also replacing lots of whitespaces with tabs in structures and functions which have been changed. Signed-off-by: Henrik Kretzschmar Signed-off-by: James Bottomley commit 3bdc9d0b408e01c4e556daba0035ba37f603e920 Author: Peter Oberparleiter Date: Wed Oct 4 20:02:30 2006 +0200 [S390] cio: improve unit check handling for internal operations Retry internal operation after unit check instead of aborting them. Signed-off-by: Peter Oberparleiter Signed-off-by: Martin Schwidefsky commit 3230015e15d4cf48e1df80fcf70d150f490cffe6 Author: Peter Oberparleiter Date: Wed Oct 4 20:02:26 2006 +0200 [S390] cio: add timeout handler for internal operations. Add timeout handler for common-I/O-layer-internal I/O operations. Signed-off-by: Peter Oberparleiter Signed-off-by: Martin Schwidefsky commit 0b2b6e1ddce4696cb7afcbb15a654fe95428a498 Author: Heiko Carstens Date: Wed Oct 4 20:02:23 2006 +0200 [S390] Remove open-coded mem_map usage. Use page_to_phys and pfn_to_page to avoid open-coded mem_map usage. Signed-off-by: Heiko Carstens commit 7676bef9c183fd573822cac9992927ef596d584c Author: Heiko Carstens Date: Wed Oct 4 20:02:19 2006 +0200 [S390] Have s390 use add_active_range() and free_area_init_nodes. Size zones and holes in an architecture independent manner for s390. Signed-off-by: Heiko Carstens commit cb601d41c175b7419efc91075a714d6a157bb0ac Author: Heiko Carstens Date: Wed Oct 4 20:02:15 2006 +0200 [S390] Remove crept in whitespace from head*.S again. Signed-off-by: Heiko Carstens commit 42e47eeb8fb3f9d2abe653cc7f185816068a057d Author: Martin Schwidefsky Date: Wed Oct 4 20:02:12 2006 +0200 [S390] incorrect placement of include. The include of linux/smp.h needs to be done before the #if that checks for the compiler version. Seems like fallout from the inline assembly cleanup patch vs. the directed yield patch. Signed-off-by: Martin Schwidefsky commit 8abfe01dae8c0ed7ca6bfb153a7fcab47df72a52 Author: Heiko Carstens Date: Wed Oct 4 20:02:09 2006 +0200 [S390] Wire up sys_getcpu system call. Signed-off-by: Heiko Carstens commit 4e56296d471a827fdd244cfdb6a1e62fc3af7af0 Author: Ralph Wuerthner Date: Wed Oct 4 20:02:05 2006 +0200 [S390] zcrypt device registration/unregistration race. Fix a race condition during AP device registration and unregistration. Signed-off-by: Ralph Wuerthner Signed-off-by: Martin Schwidefsky commit f1ee3281bedbbca70a1f53bc715ea6f27c616052 Author: Cornelia Huck Date: Wed Oct 4 20:02:02 2006 +0200 [S390] Add timeouts during sense PGID, path verification and disband PGID. While the machine owns us an interrupt in these cases (and we should get one), reality isn't always like that... Signed-off-by: Cornelia Huck Signed-off-by: Martin Schwidefsky commit b05e37035298148b6c311eccf06ac50fd389f0b2 Author: Martin Schwidefsky Date: Wed Oct 4 20:01:58 2006 +0200 [S390] user-copy optimization fallout. Fix new restore_sigregs function. It copies the user space copy of the old psw without correcting the psw.mask and the psw.addr high order bit. While we are at it, simplify save_sigregs a bit. Signed-off-by: Martin Schwidefsky commit aa97b102527ff94fe04930a660f897ef2bafb2a8 Author: Martin Schwidefsky Date: Wed Oct 4 20:01:52 2006 +0200 [S390] update default configuration Signed-off-by: Martin Schwidefsky commit 2a3681e56e825bce469d2ccf3c85741b5005e1f1 Author: Sumant Patro Date: Tue Oct 3 13:19:21 2006 -0700 [SCSI] megaraid_sas: sets ioctl timeout and updates version,changelog This patch sets timeout of max 180 seconds for ioctl completion. It also updates the Changelog and hikes the version to 3.05. Signed-off-by: Sumant Patro Signed-off-by: James Bottomley commit 5d018ad057347995e5c4564b3e43339e6497f839 Author: Sumant Patro Date: Tue Oct 3 13:13:18 2006 -0700 [SCSI] megaraid_sas: adds tasklet for cmd completion This patch adds a tasklet for command completion. Signed-off-by: Sumant Patro Signed-off-by: James Bottomley commit 658dcedb4e35d77f7f6552b5a640d7d82c372053 Author: Sumant Patro Date: Tue Oct 3 13:09:14 2006 -0700 [SCSI] megaraid_sas: prints pending cmds before setting hw_crit_error This patch adds function to print the pending frame details before returning failure from the reset routine. It also exposes a new variable megasas_dbg_lvl that allows the user to set the debug level for logging. Signed-off-by: Sumant Patro Signed-off-by: James Bottomley commit b274cab779219325fd480cc696a456d1c3893bd8 Author: Sumant Patro Date: Tue Oct 3 12:52:12 2006 -0700 [SCSI] megaraid_sas: function pointer for disable interrupt This patch adds function pointer to invoke disable interrupt for xscale and ppc IOP based controllers. Removes old implementation that checks for controller type in megasas_disable_intr. Signed-off-by: Sumant Patro Signed-off-by: James Bottomley commit b1df99d9434edf3fc26f9e36ee6a443e3611e829 Author: Sumant Patro Date: Tue Oct 3 12:40:47 2006 -0700 [SCSI] megaraid_sas: frame count optimization This patch removes duplicated code in frame calculation & adds megasas_get_frame_count() that also takes into account the number of frames that can be contained in the Main frame. FW uses the frame count to pull sufficient number of frames from host memory. Signed-off-by: Sumant Patro Signed-off-by: James Bottomley commit e3bbff9f3cf91c84c76cfdd5e80041ad1b487192 Author: Sumant Patro Date: Tue Oct 3 12:28:49 2006 -0700 [SCSI] megaraid_sas: FW transition and q size changes This patch has the following enhancements : a. handles new transition states of FW to support controller hotplug. b. It reduces by 1 the maximum cmds that the driver may send to FW. c. Sends "Stop Processing" cmd to FW before returning failure from reset routine d. Adds print in megasas_transition routine e. Sends "RESET" flag to FW to do a soft reset of controller to move from Operational to Ready state. f. Sending correct pointer (cmd->sense) to pci_pool_free Signed-off-by: Sumant Patro Signed-off-by: James Bottomley commit 2c2345c2b4fec30d12e1e1a6ee153a80af101e32 Author: Roger Gammans Date: Wed Oct 4 13:37:45 2006 +0200 [PATCH] Document bi_sector and sector_t Signed-Off-By: Roger Gammans Signed-off-by: Jens Axboe commit f583f4924d669d36de677e0cc2422ee95203d444 Author: David C Somayajulu Date: Wed Oct 4 08:27:25 2006 +0200 [PATCH] helper function for retrieving scsi_cmd given host based block layer tag This was necessitated by the need for a function to get back to a scsi_cmnd, when an hba the posts its (corresponding) completion interrupt with a block layer tag as its reference. Signed-off-by: Mike Christie Signed-off-by: David Somayajulu Signed-off-by: Jens Axboe commit 4d5e392c33820dc8861423bb1b8dae205ea0ad3d Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:11 2006 +0200 [PATCH] atmel_serial: Fix roundoff error in atmel_console_get_options The atmel_console_get_options() function initializes the baud, parity and bits settings from the actual hardware setup, in case it has been initialized by a e.g. boot loader. The baud rate, however, is not necessarily exactly equal to one of the standard baud rates (115200, etc.) This means that the baud rate calculated by this function may be slightly higher or slightly lower than one of the standard baud rates. If the baud rate is slightly lower than the target, this causes problems when uart_set_option() tries to match the detected baud rate against the standard baud rate, as it will always select a baud rate that is lower or equal to the target rate. For example if the detected baud rate is slightly lower than 115200, usart_set_options() will select 57600. This patch fixes the problem by subtracting 1 from the value in BRGR when calculating the baud rate. The detected baud rate will thus always be higher than the nearest standard baud rate, and uart_set_options() will end up doing the right thing. Tested on ATSTK1000 and AT91RM9200-EK boards. Both are broken without this patch. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit c194588dba968510b5aa7a1818bd2c8b36a416f7 Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:10 2006 +0200 [PATCH] AVR32: Allow renumbering of serial devices Allow the board to remap actual USART peripheral devices to serial devices by calling at32_map_usart(hw_id, serial_line). This ensures that even though ATSTK1002 uses USART1 as the first serial port, it will still have a ttyS0 device. This also adds a board-specific early setup hook and moves the at32_setup_serial_console() call there from the platform code. Signed-off-by: Haavard Skinnemoen Signed-off-by: Linus Torvalds commit acca9b83acfe89fbb7421d5412176dee2ad2959a Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:09 2006 +0200 [PATCH] atmel_serial: Support AVR32 Make CONFIG_SERIAL_ATMEL selectable on AVR32 and #ifdef out some ARM- specific code in the driver. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 75d35213777e2b278db57a420efbce2bdb61da93 Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:08 2006 +0200 [PATCH] atmel_serial: Pass fixed register mappings through platform_data In order to initialize the serial console early, the atmel_serial driver had to do a hack where it compared the physical address of the port with an address known to be permanently mapped, and used it as a virtual address. This got around the limitation that ioremap() isn't always available when the console is being initalized. This patch removes that hack and replaces it with a new "regs" field in struct atmel_uart_data that the board-specific code can initialize to a fixed virtual mapping for platform devices where this is possible. It also initializes the DBGU's regs field with the address the driver used to check against. On AVR32, the "regs" field is initialized from the physical base address when this it can be accessed through a permanently 1:1 mapped segment, i.e. the P4 segment. If regs is NULL, the console initialization is delayed until the "real" driver is up and running and ioremap() can be used. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 71f2e2b8783f7b270b673e31e2322572057b286a Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:07 2006 +0200 [PATCH] atmel_serial: Rename at91_register_uart_fns Rename at91_register_uart_fns and associated structs and variables to make it consistent with the atmel_ prefix used by the rest of the driver. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 9ab4f88b7ffdf338773e7755f923bc6b9e079834 Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:06 2006 +0200 [PATCH] serial: Rename PORT_AT91 -> PORT_ATMEL The at91_serial driver can be used with both AT32 and AT91 devices from Atmel and has therefore been renamed atmel_serial. The only thing left is to rename PORT_AT91 PORT_ATMEL. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 7192f92c799e4bf4943e3e233d6e4d786ac4d8a4 Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:05 2006 +0200 [PATCH] at91_serial -> atmel_serial: Internal names Prefix all internal functions and variables with atmel_ instead of at91_. The at91_register_uart_fns() stuff is left as is since I can't find any actual users of it. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 73e2798b0f3f4fa8ff7d3e8138027a8352359bb5 Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:04 2006 +0200 [PATCH] at91_serial -> atmel_serial: Public definitions Rename the following public definitions: * AT91_NR_UART -> ATMEL_MAX_UART * struct at91_uart_data -> struct atmel_uart_data * at91_default_console_device -> atmel_default_console_device Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 1e8ea80219564c3433dbca7afe075ced9eb8117c Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:03 2006 +0200 [PATCH] at91_serial -> atmel_serial: Platform device name Rename the "at91_usart" platform driver "atmel_usart" and update platform devices accordingly. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 749c4e60334580ee0f1427eb90ad006fecbffd21 Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:02 2006 +0200 [PATCH] at91_serial -> atmel_serial: Kconfig symbols Rename the following Kconfig symbols: * CONFIG_SERIAL_AT91 -> CONFIG_SERIAL_ATMEL * CONFIG_SERIAL_AT91_CONSOLE -> CONFIG_SERIAL_ATMEL_CONSOLE * CONFIG_SERIAL_AT91_TTYAT -> CONFIG_SERIAL_ATMEL_TTYAT Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit b6156b4e2e3b725ae3549882f59c82ab5b5048a5 Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:01 2006 +0200 [PATCH] at91_serial -> atmel_serial: at91_serial.c Rename at91_serial.c atmel_serial.c Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit 5b34821a601ea079184efba2f9c7c7af61241bde Author: Haavard Skinnemoen Date: Wed Oct 4 16:02:00 2006 +0200 [PATCH] at91_serial -> atmel_serial: at91rm9200_usart.h Move include/asm/arch/at91rm9200_usart.h into drivers/serial and rename it atmel_usart.h. Also delete AVR32's version of this file. Signed-off-by: Haavard Skinnemoen Acked-by: Andrew Victor Signed-off-by: Linus Torvalds commit c4710e65c005339b5979fa258bf89940dc2a700b Author: David Woodhouse Date: Wed Oct 4 17:32:21 2006 +0100 [MIPS] Remove remaining reference to ite_gpio.h from Kbuild Signed-off-by: David Woodhouse Signed-off-by: Ralf Baechle commit 7009af8cd37f7904939aec6bd2325c581abd7cac Author: Vitaly Wool Date: Wed Oct 4 19:19:58 2006 +0400 [MIPS] PNX8550 fixups This patch fixes the compilation errors on PNX8550 and hard-to-track bug in interrupt handling. It also corresponds to the latest changes in PNX8550 serial driver. Signed-off-by: Vitaly Wool Signed-off-by: Ralf Baechle commit a0a00cbf8ae5cea3d72e28982c06f3270420c657 Author: Alan Cox Date: Wed Oct 4 12:47:14 2006 +0100 [PATCH] pata: teach ali about rev C8, keep pcmcia driver in sync This fixes support for rev c8 of the ALi/ULi PATA, and keeps pcmcia in sync so ide_cs and pata_pcmcia are interchangable, both are only changes to constants. Right now rev 0xC8 and higher don't work with libata but 0xc8 is in the field now. Signed-off-by: Alan Cox Signed-off-by: Linus Torvalds commit 11966adc33fa1504c2d9a78e6fc129e5c87bdee1 Author: Jeff Garzik Date: Wed Oct 4 04:41:53 2006 -0400 [PATCH] RTC: build fixes Fix obvious build breakage revealed by 'make allyesconfig' in current -git. Signed-off-by: Jeff Garzik Signed-off-by: Linus Torvalds commit 5b9b5572c87b460cd91f7722ac233d1873cfb084 Author: Andrew Morton Date: Wed Oct 4 02:17:32 2006 -0700 [PATCH] git-powerpc: wrapper: don't require execute permissions If you lose the x bit (eg: by using patch(1)), powerpc won't build. Be defensive about it... Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: Sam Ravnborg Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit ece7f77b86e53bfe14699fdbcb0f03fdad0a01d6 Author: Adrian Bunk Date: Wed Oct 4 02:17:31 2006 -0700 [PATCH] kill sound/oss/*_syms.c Move all EXPORT_SYMBOL's from sound/oss/*_syms.c to the files with the actual functions. Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit d56b9b9c464a10ab1ee51a4c6190a2b57b8ef7a6 Author: Adrian Bunk Date: Wed Oct 4 02:17:22 2006 -0700 [PATCH] The scheduled removal of some OSS drivers This patch contains the scheduled removal of OSS drivers that: - have ALSA drivers for the same hardware without known regressions and - whose Kconfig options have been removed in 2.6.17. [michal.k.k.piotrowski@gmail.com: build fix] Signed-off-by: Adrian Bunk Signed-off-by: Michal Piotrowski Cc: David Woodhouse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 595182bcdf64fbfd7ae22c67ea6081b7d387d246 Author: Josh Triplett Date: Wed Oct 4 02:17:21 2006 -0700 [PATCH] RCU: CREDITS and MAINTAINERS Add MAINTAINERS entry for Read-Copy Update (RCU), listing Dipankar Sarma as maintainer, and giving the URL for Paul McKenney's RCU site. Add MAINTAINERS entry for rcutorture, listing myself as maintainer. Add CREDITS entries for developers of RCU, RCU variants, and rcutorture. Use Paul McKenney's preferred email address in include/linux/rcupdate.h . Signed-off-by: Josh Triplett Cc: Paul McKenney Cc: Dipankar Sarma Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 20e9751bd9dd6b832fd84ada27840360f7e877f1 Author: Oleg Nesterov Date: Wed Oct 4 02:17:17 2006 -0700 [PATCH] rcu: simplify/improve batch tuning Kill a hard-to-calculate 'rsinterval' boot parameter and per-cpu rcu_data.last_rs_qlen. Instead, it adds adds a flag rcu_ctrlblk.signaled, which records the fact that one of CPUs has sent a resched IPI since the last rcu_start_batch(). Roughly speaking, we need two rcu_start_batch()s in order to move callbacks from ->nxtlist to ->donelist. This means that when ->qlen exceeds qhimark and continues to grow, we should send a resched IPI, and then do it again after we gone through a quiescent state. On the other hand, if it was already sent, we don't need to do it again when another CPU detects overflow of the queue. Signed-off-by: Oleg Nesterov Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 4b6c2cca6eef9cc4a15350bf1c61839e12e08b84 Author: Josh Triplett Date: Wed Oct 4 02:17:16 2006 -0700 [PATCH] rcu: add sched torture type to rcutorture Implement torture testing for the "sched" variant of RCU, which uses preempt_disable, preempt_enable, and synchronize_sched. Signed-off-by: Josh Triplett Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 11a147013e39ff4cb031395cb78a9d307c4799cd Author: Josh Triplett Date: Wed Oct 4 02:17:16 2006 -0700 [PATCH] rcu: add rcu_bh_sync torture type to rcutorture Use the newly-generic synchronous deferred free function to implement torture testing for rcu_bh using synchronize_rcu_bh rather than the asynchronous call_rcu_bh. Signed-off-by: Josh Triplett Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 20d2e4283a97665a3db78c60dfa342a0c7c1b180 Author: Josh Triplett Date: Wed Oct 4 02:17:15 2006 -0700 [PATCH] rcu: add rcu_sync torture type to rcutorture Use the newly-generic synchronous deferred free function to implement torture testing for RCU using synchronize_rcu rather than the asynchronous call_rcu. Signed-off-by: Josh Triplett Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit e3033736581f125ba5fd6c0760e0d430d54fb5c0 Author: Josh Triplett Date: Wed Oct 4 02:17:14 2006 -0700 [PATCH] rcu: refactor srcu_torture_deferred_free to work for any implementation Make srcu_torture_deferred_free use cur_ops->sync() so it will work for any implementation. Move and rename it in preparation for use in the ops of other implementations. Signed-off-by: Josh Triplett Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit b772e1dd4b1e60a7a160f7bd4ea08e28394ceb54 Author: Josh Triplett Date: Wed Oct 4 02:17:13 2006 -0700 [PATCH] RCU: add fake writers to rcutorture rcutorture currently has one writer and an arbitrary number of readers. To better exercise some of the code paths in RCU implementations, add fake writer threads which call the synchronize function for the RCU variant in a loop, with a delay between calls to arrange for different numbers of writers running in parallel. [bunk@stusta.de: cleanup] Acked-by: Paul McKenney Cc: Dipkanar Sarma Signed-off-by: Josh Triplett Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 75cfef32f26d03f5d0a0833572d52f94ad858a36 Author: Josh Triplett Date: Wed Oct 4 02:17:12 2006 -0700 [PATCH] rcu: Fix sign bug making rcu_random always return the same sequence rcu_random uses a counter rrs_count to occasionally mix data from get_random_bytes into the state of its pseudorandom generator. However, the rrs_counter gets declared as an unsigned long, and rcu_random checks for --rrs_count < 0, so this code will never mix any real random data into the state, and will thus always return the same sequence of random numbers. Also, change the return value of rcu_random from long to unsigned long, to avoid potential issues caused by the use of the % operator, which can return negative values for negative left operands. Signed-off-by: Josh Triplett Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 2860aaba4dc87fa43c08724434b87a8650f3bff5 Author: Josh Triplett Date: Wed Oct 4 02:17:11 2006 -0700 [PATCH] rcu: Avoid kthread_stop on invalid pointer if rcutorture reader startup fails rcu_torture_init kmallocs the array of reader threads, then creates each one with kthread_run, cleaning up with rcu_torture_cleanup if this fails. rcu_torture_cleanup calls kthread_stop on any non-NULL pointer in the array; however, any readers after the one that failed to start up will have invalid pointers, not null pointers. Avoid this by using kzalloc instead. Signed-off-by: Josh Triplett Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 3c29e03d9121e07714fb9e5303d9b026800ffd5a Author: Josh Triplett Date: Wed Oct 4 02:17:10 2006 -0700 [PATCH] rcu: Mention rcu_bh in description of rcutorture's torture_type parameter The comment for rcutorture's torture_type parameter only lists the RCU variants rcu and srcu, but not rcu_bh; add rcu_bh to the list. Signed-off-by: Josh Triplett Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit ff2c93a5373f12f86f3281705d11278a9f2334e2 Author: Josh Triplett Date: Wed Oct 4 02:17:09 2006 -0700 [PATCH] rcu: Add MODULE_AUTHOR to rcutorture module Signed-off-by: Josh Triplett Acked-by: "Paul E. McKenney" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit b4dfdbb3c707474a2254c5b4d7e62be31a4b7da9 Author: Alan Stern Date: Wed Oct 4 02:17:06 2006 -0700 [PATCH] cpufreq: make the transition_notifier chain use SRCU This patch (as762) changes the cpufreq_transition_notifier_list from a blocking_notifier_head to an srcu_notifier_head. This will prevent errors caused attempting to call down_read() to access the notifier chain at a time when interrupts must remain disabled, during system suspend. It's not clear to me whether this is really necessary; perhaps the chain could be made into an atomic_notifier. However a couple of the callout routines do use blocking operations, so this approach seems safer. The head of the notifier chain needs to be initialized before use; this is done by an __init routine at core_initcall time. If this turns out not to be a good choice, it can easily be changed. Signed-off-by: Alan Stern Cc: "Paul E. McKenney" Cc: Jesse Brandeburg Cc: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit e6a92013ba458804161c0c5b6d134d82204dc233 Author: Alan Stern Date: Wed Oct 4 02:17:05 2006 -0700 [PATCH] SRCU: report out-of-memory errors Currently the init_srcu_struct() routine has no way to report out-of-memory errors. This patch (as761) makes it return -ENOMEM when the per-cpu data allocation fails. The patch also makes srcu_init_notifier_head() report a BUG if a notifier head can't be initialized. Perhaps it should return -ENOMEM instead, but in the most likely cases where this might occur I don't think any recovery is possible. Notifier chains generally are not created dynamically. [akpm@osdl.org: avoid statement-with-side-effect in macro] Signed-off-by: Alan Stern Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit eabc069401bcf45bcc3f19e643017bf761780aa8 Author: Alan Stern Date: Wed Oct 4 02:17:04 2006 -0700 [PATCH] Add SRCU-based notifier chains This patch (as751) adds a new type of notifier chain, based on the SRCU (Sleepable Read-Copy Update) primitives recently added to the kernel. An SRCU notifier chain is much like a blocking notifier chain, in that it must be called in process context and its callout routines are allowed to sleep. The difference is that the chain's links are protected by the SRCU mechanism rather than by an rw-semaphore, so calling the chain has extremely low overhead: no memory barriers and no cache-line bouncing. On the other hand, unregistering from the chain is expensive and the chain head requires special runtime initialization (plus cleanup if it is to be deallocated). SRCU notifiers are appropriate for notifiers that will be called very frequently and for which unregistration occurs very seldom. The proposed "task notifier" scheme qualifies, as may some of the network notifiers. Signed-off-by: Alan Stern Acked-by: Paul E. McKenney Acked-by: Chandra Seetharaman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit b2896d2e75c87ea6a842c088db730b03c91db737 Author: Paul E. McKenney Date: Wed Oct 4 02:17:03 2006 -0700 [PATCH] srcu-3: add SRCU operations to rcutorture Adds SRCU operations to rcutorture and updates rcutorture documentation. Also increases the stress imposed by the rcutorture test. [bunk@stusta.de: make needlessly global code static] Signed-off-by: Paul E. McKenney Cc: Paul E. McKenney Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 621934ee7ed5b073c7fd638b347e632c53572761 Author: Paul E. McKenney Date: Wed Oct 4 02:17:02 2006 -0700 [PATCH] srcu-3: RCU variant permitting read-side blocking Updated patch adding a variant of RCU that permits sleeping in read-side critical sections. SRCU is as follows: o Each use of SRCU creates its own srcu_struct, and each srcu_struct has its own set of grace periods. This is critical, as it prevents one subsystem with a blocking reader from holding up SRCU grace periods for other subsystems. o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) all take a pointer to a srcu_struct. o The SRCU primitives must be called from process context. o srcu_read_lock() returns an int that must be passed to the matching srcu_read_unlock(). Realtime RCU avoids the need for this by storing the state in the task struct, but SRCU needs to allow a given code path to pass through multiple SRCU domains -- storing state in the task struct would therefore require either arbitrary space in the task struct or arbitrary limits on SRCU nesting. So I kicked the state-storage problem up to the caller. Of course, it is not permitted to call synchronize_srcu() while in an SRCU read-side critical section. o There is no call_srcu(). It would not be hard to implement one, but it seems like too easy a way to OOM the system. (Hey, we have enough trouble with call_rcu(), which does -not- permit readers to sleep!!!) So, if you want it, please tell me why... [josht@us.ibm.com: sparse notation] Signed-off-by: Paul E. McKenney Signed-off-by: Josh Triplett Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 95d77884c77beed676036d2f74d10b470a483c63 Author: Eric W. Biederman Date: Wed Oct 4 02:17:01 2006 -0700 [PATCH] htirq: tidy up the htirq code This moves the declarations for the architecture helpers into include/linux/htirq.h from the generic include/linux/pci.h. Hopefully this will make this distinction clearer. htirq.h is included where it is needed. The dependency on the msi code is fixed and removed. The Makefile is tidied up. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Tony Luck Cc: Andi Kleen Cc: Thomas Gleixner Cc: Greg KH Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 03571e11c4a6d08230657f80970f0a5cc7820471 Author: Eric W. Biederman Date: Wed Oct 4 02:17:00 2006 -0700 [PATCH] msi: move the ia64 code into arch/ia64 This is just a few makefile tweaks and some file renames. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Tony Luck Cc: Andi Kleen Cc: Thomas Gleixner Cc: Greg KH Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 3b7d1921f4cdd6d6ddb7899ae7a8d413991c5cf4 Author: Eric W. Biederman Date: Wed Oct 4 02:16:59 2006 -0700 [PATCH] msi: refactor and move the msi irq_chip into the arch code It turns out msi_ops was simply not enough to abstract the architecture specific details of msi. So I have moved the resposibility of constructing the struct irq_chip to the architectures, and have two architecture specific functions arch_setup_msi_irq, and arch_teardown_msi_irq. For simple architectures those functions can do all of the work. For architectures with platform dependencies they can call into the appropriate platform code. With this msi.c is finally free of assuming you have an apic, and this actually takes less code. The helpers for the architecture specific code are declared in the linux/msi.h to keep them separate from the msi functions used by drivers in linux/pci.h Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Tony Luck Cc: Andi Kleen Cc: Thomas Gleixner Cc: Greg KH Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 277bc33bc2479707e88b0b2ae6fe56e8e4aabe81 Author: Eric W. Biederman Date: Wed Oct 4 02:16:57 2006 -0700 [PATCH] msi: only use a single irq_chip for msi interrupts The logic works like this. Since we no longer track the state logic by hand in msi.c startup and shutdown are no longer needed. By updating msi_set_mask_bit to work on msi devices that do not implement a mask bit we can always call the mask/unmask functions. What we really have are mask and unmask so we use them to implement the .mask and .unmask functions instead of .enable and .disable. By switching to the handle_edge_irq handler we only need an ack function that moves the irq if necessary. Which removes the old end and ack functions and their peculiar logic of sometimes disabling an irq. This removes the reliance on pre genirq irq handling methods. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Tony Luck Cc: Andi Kleen Cc: Thomas Gleixner Cc: Greg KH Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 1f80025e624bb14fefadfef7e80fbfb9740d4714 Author: Eric W. Biederman Date: Wed Oct 4 02:16:56 2006 -0700 [PATCH] msi: simplify msi sanity checks by adding with generic irq code Currently msi.c is doing sanity checks that make certain before an irq is destroyed it has no more users. By adding irq_has_action I can perform the test is a generic way, instead of relying on a msi specific data structure. By performing the core check in dynamic_irq_cleanup I ensure every user of dynamic irqs has a test present and we don't free resources that are in use. In msi.c this allows me to kill the attrib.state member of msi_desc and all of the assciated code to maintain it. To keep from freeing data structures when irq cleanup code is called to soon changing dyanamic_irq_cleanup is insufficient because there are msi specific data structures that are also not safe to free. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Tony Luck Cc: Andi Kleen Cc: Thomas Gleixner Cc: Greg KH Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 8b955b0dddb35e398b07e217a81f8bd49400796f Author: Eric W. Biederman Date: Wed Oct 4 02:16:55 2006 -0700 [PATCH] Initial generic hypertransport interrupt support This patch implements two functions ht_create_irq and ht_destroy_irq for use by drivers. Several other functions are implemented as helpers for arch specific irq_chip handlers. The driver for the card I tested this on isn't yet ready to be merged. However this code is and hypertransport irqs are in use in a few other places in the kernel. Not that any of this will get merged before 2.6.19 Because the ipath-ht400 is slightly out of spec this code will need to be generalized to work there. I think all of the powerpc uses are for a plain interrupt controller in a chipset so support for native hypertransport devices is a little less interesting. However I think this is a half way decent model on how to separate arch specific and generic helper code, and I think this is a functional model of how to get the architecture dependencies out of the msi code. [akpm@osdl.org: Kconfig fix] Signed-off-by: Eric W. Biederman Cc: Greg KH Cc: Andi Kleen Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit e78d01693be38bf93dd6bb49b86e143da450de86 Author: Eric W. Biederman Date: Wed Oct 4 02:16:54 2006 -0700 [PATCH] Add Hypertransport capability defines This adds defines for the hypertransport capability subtypes and starts using them a little. [akpm@osdl.org: fix typo] Signed-off-by: Eric W. Biederman Acked-by: Benjamin Herrenschmidt Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit cd1182f56a064d42d10e289ef4018f9c2230247d Author: Eric W. Biederman Date: Wed Oct 4 02:16:53 2006 -0700 [PATCH] genirq: x86_64 irq: Kill irq compression With more irqs in the system we don't need this. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f023d764cc6165eb4f1cad6b2b0882ce0660764a Author: Eric W. Biederman Date: Wed Oct 4 02:16:52 2006 -0700 [PATCH] genirq: x86_64 irq: Kill gsi_irq_sharing After raising the number of irqs the system supports this function is no longer necessary. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 550f2299ac8ffaba943cf211380d3a8d3fa75301 Author: Eric W. Biederman Date: Wed Oct 4 02:16:51 2006 -0700 [PATCH] genirq: x86_64 irq: make vector_irq per cpu This refactors the irq handling code to make the vectors a per cpu resource so the same vector number can be simultaneously used on multiple cpus for different irqs. This should make systems that were hitting limits on the total number of irqs much more livable. [akpm@osdl.org: build fix] [akpm@osdl.org: __target_IO_APIC_irq is unneeded on UP] Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit e500f57436b9056a245216c53113613928155eba Author: Eric W. Biederman Date: Wed Oct 4 02:16:50 2006 -0700 [PATCH] genirq: x86_64 irq: Make the external irq handlers report their vector, not the irq number This is a small pessimization but it paves the way for making this information per cpu. Which allows the the maximum number of IRQS to become NR_CPUS*224. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 23d0b8b053391afe15c9667d80de77ca88e18b8b Author: Eric W. Biederman Date: Wed Oct 4 02:16:49 2006 -0700 [PATCH] genirq: irq: generalize the check for HARDIRQ_BITS This patch adds support for systems that cannot receive every interrupt on a single cpu simultaneously, in the check to see if we have enough HARDIRQ_BITS. MAX_HARDIRQS_PER_CPU becomes the count of the maximum number of hardare generated interrupts per cpu. On architectures that support per cpu interrupt delivery this can be a significant space savings and scalability bonus. This patch adds support for systems that cannot receive every interrupt on Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 323a01c50832749d23664954f91df6fc43e73975 Author: Eric W. Biederman Date: Wed Oct 4 02:16:48 2006 -0700 [PATCH] genirq: irq: remove msi hacks Because of the nasty way that CONFIG_PCI_MSI was implemented we wound up with set_irq_info and set_native_irq_info, with move_irq and move_native_irq. Both functions did the same thing but they were built and called under different circumstances. Now that the msi hacks are gone we can kill move_irq and set_irq_info. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit ace80ab796ae30d2c9ee8a84ab6f608a61f8b87b Author: Eric W. Biederman Date: Wed Oct 4 02:16:47 2006 -0700 [PATCH] genirq: i386 irq: Remove the msi assumption that irq == vector This patch removes the change in behavior of the irq allocation code when CONFIG_PCI_MSI is defined. Removing all instances of the assumption that irq == vector. create_irq is rewritten to first allocate a free irq and then to assign that irq a vector. assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an vector not bound to an irq is removed. The ioapic vector methods are removed, and everything now works with irqs. The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI [akpm@osdl.org: cleanup] Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 04b9267b15206fc902a18de1f78de6c82ca47716 Author: Eric W. Biederman Date: Wed Oct 4 02:16:46 2006 -0700 [PATCH] genirq: x86_64 irq: Remove the msi assumption that irq == vector This patch removes the change in behavior of the irq allocation code when CONFIG_PCI_MSI is defined. Removing all instances of the assumption that irq == vector. create_irq is rewritten to first allocate a free irq and then to assign that irq a vector. assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an vector not bound to an irq is removed. The ioapic vector methods are removed, and everything now works with irqs. The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 4b2fabb9ec9b3b1cf5cf848a678058fb20c4d552 Author: Eric W. Biederman Date: Wed Oct 4 02:16:45 2006 -0700 [PATCH] genirq: msi: only build msi-apic.c on ia64 After the previous changes ia64 is the only architecture useing msi-apic.c [akpm@osdl.org: unbreak MSI on ia64] Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 2d3fcc1c54df2f49674e1f7c99d4800ed1d51902 Author: Eric W. Biederman Date: Wed Oct 4 02:16:43 2006 -0700 [PATCH] genirq: i386 irq: Move msi message composition into io_apic.c This removes the hardcoded assumption that irq == vector in the msi composition code, and it allows the msi message composition to setup logical mode, or lowest priorirty delivery mode as we do for other apic interrupts, and with the same selection criteria. Basically this moves the problem of what is in the msi message into the architecture irq management code where it belongs. Not in a generic layer that doesn't have enough information to compose msi messages properly. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 589e367f9b9117b3412da0d4e10ea6882db8da84 Author: Eric W. Biederman Date: Wed Oct 4 02:16:42 2006 -0700 [PATCH] genirq: x86_64 irq: Move msi message composition into io_apic.c This removes the hardcoded assumption that irq == vector in the msi composition code, and it allows the msi message composition to setup logical mode, or lowest priorirty delivery mode as we do for other apic interrupts, and with the same selection criteria. Basically this moves the problem of what is in the msi message into the architecture irq management code where it belongs. Not in a generic layer that doesn't have enough information to compose msi messages properly. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 1ce03373a7f4b5fa8ca5be02ff35229800a6e12b Author: Eric W. Biederman Date: Wed Oct 4 02:16:41 2006 -0700 [PATCH] genirq: msi: make the msi code irq based and not vector based The msi currently allocates irqs backwards. First it allocates a platform dependent routing value for an interrupt the ``vector'' and then it figures out from the vector which irq you are on. For ia64 this is fine. For x86 and x86_64 this is complete nonsense and makes an enourmous mess of the irq handling code and prevents some pretty significant cleanups in the code for handling large numbers of irqs. This patch refactors msi.c to work in terms of irqs and create_irq/destroy_irq for dynamically managing irqs. Hopefully this is finally a version of msi.c that is useful on more than just x86 derivatives. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit c4fa0bbf384496ae4acc0a150719d9d8fa8d11b3 Author: Eric W. Biederman Date: Wed Oct 4 02:16:40 2006 -0700 [PATCH] genirq: x86_64 irq: Dynamic irq support The current implementation of create_irq() is a hack but it is the current hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on this hack. Thus we are this hack of assuming irq == vector until the depencencies in the generic irq code are removed. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 3fc471ede99579211c44b6a64829c4318976990f Author: Eric W. Biederman Date: Wed Oct 4 02:16:39 2006 -0700 [PATCH] genirq: i386 irq: Dynamic irq support The current implementation of create_irq() is a hack but it is the current hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on this hack. Thus we are stuck this hack of assuming irq == vector until the depencencies in the generic msi code are removed. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit b6cf2583ba026ca563ff8b15805fcf30b8e192a7 Author: Eric W. Biederman Date: Wed Oct 4 02:16:38 2006 -0700 [PATCH] genirq: ia64 irq: Dynamic irq support [akpm@osdl.org: build fix] Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 3a16d713626735f3016da0521b7bf251cd78e836 Author: Eric W. Biederman Date: Wed Oct 4 02:16:37 2006 -0700 [PATCH] genirq: irq: add a dynamic irq creation API With the msi support comes a new concept in irq handling, irqs that are created dynamically at run time. Currently the msi code allocates irqs backwards. First it allocates a platform dependent routing value for an interrupt the ``vector'' and then it figures out from the vector which irq you are on. This msi backwards allocator suffers from two basic problems. The allocator suffers because it is trying to do something that is architecture specific in a generic way making it brittle, inflexible, and tied to tightly to the architecture implementation. The alloctor also suffers from it's very backwards nature as it has tied things together that should have no dependencies. To solve the basic dynamic irq allocation problem two new architecture specific functions are added: create_irq and destroy_irq. create_irq takes no input and returns an unused irq number, that won't be reused until it is returned to the free poll with destroy_irq. The irq then can be used for any purpose although the only initial consumer is the msi code. destroy_irq takes an irq number allocated with create_irq and returns it to the free pool. Making this functionality per architecture increases the simplicity of the irq allocation code and increases it's flexibility. dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the irq_desc initializtion that should happen for dynamic irqs. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 92db6d10bc1bc43330a4c540fa5b64c83d9d865f Author: Eric W. Biederman Date: Wed Oct 4 02:16:35 2006 -0700 [PATCH] genirq: msi: simplify the msi irq limit policy Currently we attempt to predict how many irqs we will be able to allocate with msi using pci_vector_resources and some complicated accounting, and then we only allow each device as many irqs as we think are available on average. Only the s2io driver even takes advantage of this feature all other drivers have a fixed number of irqs they need and bail if they can't get them. pci_vector_resources is inaccurate if anyone ever frees an irq. The whole implmentation is racy. The current irq limit policy does not appear to make sense with current drivers. So I have simplified things. We can revisit this we we need a more sophisticated policy. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 38bc0361303535c86f6b67b151a541728d7bdae6 Author: Eric W. Biederman Date: Wed Oct 4 02:16:34 2006 -0700 [PATCH] genirq: msi: refactor the msi_ops The current msi_ops are short sighted in a number of ways, this patch attempts to fix the glaring deficiences. - Report in msi_ops if a 64bit address is needed in the msi message, so we can fail 32bit only msi structures. - Send and receive a full struct msi_msg in both setup and target. This is a little cleaner and allows for architectures that need to modify the data to retarget the msi interrupt to a different cpu. - In target pass in the full cpu mask instead of just the first cpu in case we can make use of the full cpu mask. - Operate in terms of irqs and not vectors, currently there is still a 1-1 relationship but on architectures other than ia64 I expect this will change. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 0366f8f7137deb072991e4c50769c6da31f8940c Author: Eric W. Biederman Date: Wed Oct 4 02:16:33 2006 -0700 [PATCH] genirq: msi: implement helper functions read_msi_msg and write_msi_msg In support of this I also add a struct msi_msg that captures the the two address and one data field ina typical msi message, and I remember the pos and if the address is 64bit in struct msi_desc. This makes the code a little more readable and easier to maintain, and paves the way to further simplfications. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit dd159eeca971d594fa30176733b66d37acda82a3 Author: Eric W. Biederman Date: Wed Oct 4 02:16:32 2006 -0700 [PATCH] genirq: msi: make the msi boolean tests return either 0 or 1 This allows the output of the msi tests to be stored directly in a bit field. If you don't do this a value greater than one will be truncated and become 0. Changing true to false with bizare consequences. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 7bd007e480672c99d8656c7b7b12ef0549432c37 Author: Eric W. Biederman Date: Wed Oct 4 02:16:31 2006 -0700 [PATCH] genirq: msi: simplify msi enable and disable The problem. Because the disable routines leave the msi interrupts in all sorts of half enabled states the enable routines become impossible to implement correctly, and almost impossible to understand. Simplifing this allows me to simply kill the buggy reroute_msix_table, and generally makes the code more maintainable. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Cc: Rajesh Shah Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 0be6652f1e61b647f738eb25af057bf9551a9841 Author: Eric W. Biederman Date: Wed Oct 4 02:16:30 2006 -0700 [PATCH] genirq: x86_64 irq: Reenable migrating irqs to other cpus In the latest changes the code for migrating x86_64 irqs was dropped. This reads it in a fashion that will work even if we change the vector on level triggered irqs when we migrate them. [akpm@osdl.org: build fix] Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit e7b946e98a456077dd6897f726f3d6197bd7e3b9 Author: Eric W. Biederman Date: Wed Oct 4 02:16:29 2006 -0700 [PATCH] genirq: irq: add moved_masked_irq Currently move_native_irq disables and renables the irq we are migrating to ensure we don't take that irq when we are actually doing the migration operation. Disabling the irq needs to happen but sometimes doing the work is move_native_irq is too late. On x86 with ioapics the irq move sequences needs to be: edge_triggered: mask irq. move irq. unmask irq. ack irq. level_triggered: mask irq. ack irq. move irq. unmask irq. We can easily perform the edge triggered sequence, with the current defintion of move_native_irq. However the level triggered case does not map well. For that I have added move_masked_irq, to allow me to disable the irqs around both the ack and the move. Q: Why have we not seen this problem earlier? A: The only symptom I have been able to reproduce is that if we change the vector before acknowleding an irq the wrong irq is acknowledged. Since we currently are not reprogramming the irq vector during migration no problems show up. We have to mask the irq before we acknowledge the irq or else we could hit a window where an irq is asserted just before we acknowledge it. Edge triggered irqs do not have this problem because acknowledgements do not propogate in the same way. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit a24ceab4f44f21749aa0b6bd38bee37c775e036f Author: Eric W. Biederman Date: Wed Oct 4 02:16:27 2006 -0700 [PATCH] genirq: irq: convert the move_irq flag from a 32bit word to a single bit The primary aim of this patchset is to remove maintenances problems caused by the irq infrastructure. The two big issues I address are an artificially small cap on the number of irqs, and that MSI assumes vector == irq. My primary focus is on x86_64 but I have touched other architectures where necessary to keep them from breaking. - To increase the number of irqs I modify the code to look at the (cpu, vector) pair instead of just looking at the vector. With a large number of irqs available systems with a large irq count no longer need to compress their irq numbers to fit. Removing a lot of brittle special cases. For acpi guys the result is that irq == gsi. - Addressing the fact that MSI assumes irq == vector takes a few more patches. But suffice it to say when I am done none of the generic irq code even knows what a vector is. In quick testing on a large Unisys x86_64 machine we stumbled over at least one driver that assumed that NR_IRQS could always fit into an 8 bit number. This driver is clearly buggy today. But this has become a class of bugs that it is now much easier to hit. This patch: This is a minor space optimization. In practice I don't think this has any affect because of our alignment constraints and the other fields but there is not point in chewing up an uncessary word and since we already read the flag field this should improve the cache hit ratio of the irq handler. Signed-off-by: Eric W. Biederman Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Rajesh Shah Cc: Andi Kleen Cc: "Protasevich, Natalie" Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f5b9ed7acdcfea4bf73a70dececa7483787503ed Author: Ingo Molnar Date: Wed Oct 4 02:16:26 2006 -0700 [PATCH] genirq: convert the i386 architecture to irq-chips This patch converts all the i386 PIC controllers (except VisWS and Voyager, which I could not test - but which should still work as old-style IRQ layers) to the new and simpler irq-chip interrupt handling layer. [akpm@osdl.org: build fix] [mingo@elte.hu: enable fasteoi handler for i386 level-triggered IO-APIC irqs] Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Roland Dreier Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f29bd1ba68c8c6a0f50bd678bbd5a26674018f7c Author: Ingo Molnar Date: Wed Oct 4 02:16:25 2006 -0700 [PATCH] genirq: convert the x86_64 architecture to irq-chips This patch converts all the x86_64 PIC controllers layers to the new and simpler irq-chip interrupt handling layer. [mingo@elte.hu: The patch also enables the fasteoi handler for x86_64] Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner Cc: Benjamin Herrenschmidt Cc: Roland Dreier Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 0271eb947db2704a0ff8be68d72915ab021d1ead Author: Andrew Morton Date: Wed Oct 4 02:16:24 2006 -0700 [PATCH] fbdev: riva warning fix drivers/video/riva/fbdev.c: In function `riva_get_EDID_OF': drivers/video/riva/fbdev.c:1846: warning: assignment discards qualifiers from pointer target type This code is being bad: copying a pointer to read-only OF data into a non-const pointer. Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 237fead619984cc48818fe12ee0ceada3f55b012 Author: Michael Halcrow Date: Wed Oct 4 02:16:22 2006 -0700 [PATCH] ecryptfs: fs/Makefile and fs/Kconfig eCryptfs is a stacked cryptographic filesystem for Linux. It is derived from Erez Zadok's Cryptfs, implemented through the FiST framework for generating stacked filesystems. eCryptfs extends Cryptfs to provide advanced key management and policy features. eCryptfs stores cryptographic metadata in the header of each file written, so that encrypted files can be copied between hosts; the file will be decryptable with the proper key, and there is no need to keep track of any additional information aside from what is already in the encrypted file itself. [akpm@osdl.org: updates for ongoing API changes] [bunk@stusta.de: cleanups] [akpm@osdl.org: alpha build fix] [akpm@osdl.org: cleanups] [tytso@mit.edu: inode-diet updates] [pbadari@us.ibm.com: generic_file_*_read/write() interface updates] [rdunlap@xenotime.net: printk format fixes] [akpm@osdl.org: make slab creation and teardown table-driven] Signed-off-by: Phillip Hellewell Signed-off-by: Michael Halcrow Signed-off-by: Erez Zadok Signed-off-by: Adrian Bunk Signed-off-by: Stephan Mueller Signed-off-by: "Theodore Ts'o" Signed-off-by: Badari Pulavarty Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f7aa2638f288f4c67acdb55947472740bd27d27a Author: Cedric Le Goater Date: Wed Oct 4 02:16:21 2006 -0700 [PATCH] Fix linux/nfsd/const.h for make headers_check make headers_check fails on linux/nfsd/const.h. Since linux/sunrpc/msg_prot.h does not seem to export anything interesting for userspace, this patch moves it in the __KERNEL__ protected section. Signed-off-by: Cedric Le Goater Cc: David Woodhouse Cc: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 42ca09938157105c1f573c831a35e9c3e02eb354 Author: J.Bruce Fields Date: Wed Oct 4 02:16:20 2006 -0700 [PATCH] knfsd: nfsd4: actually use all the pieces to implement referrals Use all the pieces set up so far to implement referral support, allowing return of NFS4ERR_MOVED and fs_locations attribute. Signed-off-by: Manoj Naik Signed-off-by: Fred Isaman Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 81c3f4130202a1dcb2b28ab56684eb5e9d43d8c1 Author: J.Bruce Fields Date: Wed Oct 4 02:16:19 2006 -0700 [PATCH] knfsd: nfsd4: xdr encoding for fs_locations Encode fs_locations attribute. Signed-off-by: Manoj Naik Signed-off-by: Fred Isaman Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 933469190ed5915b0568bc564346bb8db718f460 Author: Manoj Naik Date: Wed Oct 4 02:16:18 2006 -0700 [PATCH] knfsd: nfsd4: fslocations data structures Define FS locations structures, some functions to manipulate them, and add code to parse FS locations in downcall and add to the exports structure. [bfields@fieldses.org: bunch of fixes and cleanups] Signed-off-by: Manoj Naik Signed-off-by: Fred Isaman Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit b009a873de05c6e0d7613df3584b6dcb2e4280ee Author: J.Bruce Fields Date: Wed Oct 4 02:16:17 2006 -0700 [PATCH] knfsd: nfsd: store export path in export Store the export path in the svc_export structure instead of storing only the dentry. This will prevent the need for additional d_path calls to provide NFSv4 fs_locations support. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 21c0d8fdd95024ffa708a938099148b8f1078d46 Author: NeilBrown Date: Wed Oct 4 02:16:16 2006 -0700 [PATCH] knfsd: close a race-opportunity in d_splice_alias There is a possible race in d_splice_alias. Though __d_find_alias(inode, 1) will only return a dentry with DCACHE_DISCONNECTED set, it is possible for it to get cleared before the BUG_ON, and it is is not possible to lock against that. There are a couple of problems here. Firstly, the code doesn't match the comment. The comment describes a 'disconnected' dentry as being IS_ROOT as well as DCACHE_DISCONNECTED, however there is not testing of IS_ROOT anythere. A dentry is marked DCACHE_DISCONNECTED when allocated with d_alloc_anon, and remains DCACHE_DISCONNECTED while a path is built up towards the root. So a dentry can have a valid name and a valid parent and even grandparent, but will still be DCACHE_DISCONNECTED until a path to the root is created. Once the path to the root is complete, everything in the path gets DCACHE_DISCONNECTED cleared. So the fact that DCACHE_DISCONNECTED isn't enough to say that a dentry is free to be spliced in with a given name. This can only be allowed if the dentry does not yet have a name, so the IS_ROOT test is needed too. However even adding that test to __d_find_alias isn't enough. As d_splice_alias drops dcache_lock before calling d_move to perform the splice, it could race with another thread calling d_splice_alias to splice the inode in with a different name in a different part of the tree (in the case where a file has hard links). So that splicing code is only really safe for directories (as we know that directories only have one link). For directories, the caller of d_splice_alias will be holding i_mutex on the (unique) parent so there is no room for a race. A consequence of this is that a non-directory will never benefit from being spliced into a pre-exisiting dentry, but that isn't a problem. It is perfectly OK for a non-directory to have multiple dentries, some anonymous, some not. And the comment for d_splice_alias says that it only happens for directories anyway. Signed-off-by: Neil Brown Cc: Christoph Hellwig Cc: Al Viro Cc: Dipankar Sarma Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 44c556000a31e8079cfbb9a42a7edb93ca6b589a Author: NeilBrown Date: Wed Oct 4 02:16:15 2006 -0700 [PATCH] knfsd: fix auto-sizing of nfsd request/reply buffers totalram is measured in pages, not bytes, so PAGE_SHIFT must be used when trying to find 1/4096 of RAM. Cc: "J. Bruce Fields" Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 6b54dae2b0defb30babb0fe87b13463b9f4b2907 Author: NeilBrown Date: Wed Oct 4 02:16:15 2006 -0700 [PATCH] knfsd: lockd: fix refount on nsm If nlm_lookup_host finds what it is looking for it exits with an extra reference on the matching 'nsm' structure. So don't actually count the reference until we are (fairly) sure it is going to be used. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit b66285cee3f9abad26cca6c9b848e1ad6b792d94 Author: J.Bruce Fields Date: Wed Oct 4 02:16:14 2006 -0700 [PATCH] knfsd: nfsd4: acls: fix handling of zero-length acls It is legal to have zero-length NFSv4 acls; they just deny everything. Also, nfs4_acl_nfsv4_to_posix will always return with pacl and dpacl set on success, so the caller doesn't need to check this. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f3b64eb6efb1ef46f6629b66a429e7f2b5955003 Author: J.Bruce Fields Date: Wed Oct 4 02:16:13 2006 -0700 [PATCH] knfsd: nfsd4: acls: simplify nfs4_acl_nfsv4_to_posix interface There's no need to handle the case where the caller passes in null for pacl or dpacl; no caller does that, because it would be a dumb thing to do. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit b548edc2dd9440c561f3302cb9f212ef2d06a8ef Author: J.Bruce Fields Date: Wed Oct 4 02:16:12 2006 -0700 [PATCH] knfsd: nfsd4: acls: fix inheritance We can be a little more flexible about the flags allowed for inheritance (in particular, we can deal with either the presence or the absence of INHERIT_ONLY), but we should probably reject other combinations that we don't understand. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 09229edb68a3961db54174a2725055bd1589b4b8 Author: J.Bruce Fields Date: Wed Oct 4 02:16:11 2006 -0700 [PATCH] knfsd: nfsd4: acls: relax the nfsv4->posix mapping Use a different nfsv4->(draft posix) acl mapping which is 1. completely backwards compatible, 2. accepts any nfsv4 acl, and 3. errs on the side of restricting permissions. In detail: 1. completely backwards compatible: The new mapping produces the same result on any acl produced by the existing (draft posix)->nfsv4 mapping; the one exception is that we no longer attempt to guess the value of the mask by assuming certain denies represent the mask. Since the server still keeps track of the mask locally, sequences of chmod's will still be handled fine; the only thing this will change is sequences of chmod's with intervening read-modify-writes of the acl. That last case just isn't worth the trouble and the possible misrepresentations of the user's intent (if we guess that a certain deny indicates masking is in effect when it really isn't). 2. accepts any nfsv4 acl: That's not quite true: we still reject acls that use combinations of inheritance flags that we don't support. We also reject acls that attempt to explicitly deny read_acl or read_attributes permissions, or that attempt to deny write_acl or write_attributes permissions to the owner of the file. 3. errs on the side of restricting permissions: one exception to this last rule: we totally ignore some bits (write_owner, synchronize, read_named_attributes, etc.) that are completely alien to our filesystem semantics, in some cases even if that would mean ignoring an explicit deny that we have no intention of enforcing. Excepting that, the posix acl produced should be the most permissive acl that is not more permissive than the given nfsv4 acl. And the new code's shorter, too. Neato. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit d0ebd9c0e71d20ea8c2b4a071d2a2b4878ef07d6 Author: J.Bruce Fields Date: Wed Oct 4 02:16:10 2006 -0700 [PATCH] knfsd: nfsd4: clean up exp_pseudoroot The previous patch enables some minor simplification here. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f38b20c64519bb812a49b9ef4e10d90367a5af5c Author: J.Bruce Fields Date: Wed Oct 4 02:16:09 2006 -0700 [PATCH] knfsd: nfsd4: refactor exp_pseudoroot We could be using more common code in exp_pseudoroot(). This will also simplify some changes we need to make later. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 8f8e05c5708d7e9017c47f395f8b1498f7f52922 Author: J.Bruce Fields Date: Wed Oct 4 02:16:08 2006 -0700 [PATCH] knfsd: svcrpc: use consistent variable name for the reply state The rpc reply has multiple levels of error returns. The code here contributes to the confusion by using "accept_statp" for a pointer to what the rfc (and wireshark, etc.) refer to as the "reply_stat". (The confusion is compounded by the fact that the rfc also has an "accept_stat" which follows the reply_stat in the succesful case.) Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 5b304bc5bfccc82b856e876e829c260df8e67ff2 Author: J.Bruce Fields Date: Wed Oct 4 02:16:07 2006 -0700 [PATCH] knfsd: svcrpc: gss: fix failure on SVC_DENIED in integrity case If the request is denied after gss_accept was called, we shouldn't try to wrap the reply. We were checking the accept_stat but not the reply_stat. To check the reply_stat in _release, we need a pointer to before (rather than after) the verifier, so modify body_start appropriately. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 3c15a486643a103eaf068e5fb3b7f9d720d579a7 Author: J.Bruce Fields Date: Wed Oct 4 02:16:06 2006 -0700 [PATCH] knfsd: svcrpc: gss: factor out some common wrapping code Factor out some common code from the integrity and privacy cases. Signed-off-by: J. Bruce Fields Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 89e63ef609fb0064a47281e31e38010159c32d57 Author: Neil Brown Date: Wed Oct 4 02:16:06 2006 -0700 [PATCH] Convert lockd to use the newer mutex instead of the older semaphore Both the (recently introduces) nsm_sema and the older f_sema are converted over. Cc: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit bc5fea4299b8bda5f73c6f79dc35d388caf8bced Author: Olaf Kirch Date: Wed Oct 4 02:16:05 2006 -0700 [PATCH] knfsd: register all RPC programs with portmapper by default The NFSACL patches introduced support for multiple RPC services listening on the same transport. However, only the first of these services was registered with portmapper. This was perfectly fine for nfsacl, as you traditionally do not want these to show up in a portmapper listing. The patch below changes the default behavior to always register all services listening on a given transport, but retains the old behavior for nfsacl services. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 0ade060ee51b9b6cf18d580405dc9ab90067f69f Author: Olaf Kirch Date: Wed Oct 4 02:16:04 2006 -0700 [PATCH] knfsd: lockd: fix use of h_nextrebind nlmclnt_recovery would try to force a portmap rebind by setting host->h_nextrebind to 0. The right thing to do here is to set it to the current time. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 460f5cac1e24e947509b6112c99c5bc9ff687b45 Author: Olaf Kirch Date: Wed Oct 4 02:16:03 2006 -0700 [PATCH] knfsd: export nsm_local_state to user space via sysctl Every NLM call includes the client's NSM state. Currently, the Linux client always reports 0 - which seems not to cause any problems, but is not what the protocol says. This patch exposes the kernel's internal variable to user space via a sysctl, which can be set at system boot time by statd. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 39be4502cb75dc26007fe1659735b26c8e63fcc6 Author: Olaf Kirch Date: Wed Oct 4 02:16:03 2006 -0700 [PATCH] knfsd: match GRANTED_RES replies using cookies When we send a GRANTED_MSG call, we current copy the NLM cookie provided in the original LOCK call - because in 1996, some broken clients seemed to rely on this bug. However, this means the cookies are not unique, so that when the client's GRANTED_RES message comes back, we cannot simply match it based on the cookie, but have to use the client's IP address in addition. Which breaks when you have a multi-homed NFS client. The X/Open spec explicitly mentions that clients should not expect the same cookie; so one may hope that any clients that were broken in 1996 have either been fixed or rendered obsolete. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 031d869d0e0be18cfe35526be5608225b8f0a7be Author: Olaf Kirch Date: Wed Oct 4 02:16:02 2006 -0700 [PATCH] knfsd: make nlmclnt_next_cookie SMP safe The way we incremented the NLM cookie in nlmclnt_next_cookie was not thread safe. This patch changes the counter to an atomic_t Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit abd1f50094cad9dff6d68ada98b495549f52fc30 Author: Olaf Kirch Date: Wed Oct 4 02:16:01 2006 -0700 [PATCH] knfsd: lockd: optionally use hostnames for identifying peers This patch adds the nsm_use_hostnames sysctl and module param. If set, lockd will use the client's name (as given in the NLM arguments) to find the NSM handle. This makes recovery work when the NFS peer is multi-homed, and the reboot notification arrives from a different IP than the original lock calls. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 350fce8dbf43f7d441b77366851c9ce3cd28d6dc Author: NeilBrown Date: Wed Oct 4 02:16:00 2006 -0700 [PATCH] knfsd: simplify nlmsvc_invalidate_all As a result of previous patches, the loop in nlmsvc_invalidate_all just sets h_expires for all client/hosts to 0 (though does it in a very complicated way). This was possibly meant to trigger early garbage collection but half the time '0' is in the future and so it infact delays garbage collection. Pre-aging the 'hosts' is not really needed at this point anyway so we throw out the loop and nlm_find_client which is no longer needed. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit c53c1bb94f30cecee79ca0a8e9977640338283be Author: Olaf Kirch Date: Wed Oct 4 02:16:00 2006 -0700 [PATCH] knfsd: lockd: Add nlm_destroy_host This patch moves the host destruction code out of nlm_host_gc into a function of its own. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f2af793db02d2c2f677bdb5bf8e0efdcbf9c0256 Author: Olaf Kirch Date: Wed Oct 4 02:15:59 2006 -0700 [PATCH] knfsd: lockd: make nlm_traverse_* more flexible This patch makes nlm_traverse{locks,blocks,shares} and friends use a function pointer rather than a "action" enum. This function pointer is given two nlm_hosts (one given by the caller, the other taken from the lock/block/share currently visited), and is free to do with them as it wants. If it returns a non-zero value, the lockd/block/share is released. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 07ba80635117c136714084e019375aa508365375 Author: Olaf Kirch Date: Wed Oct 4 02:15:58 2006 -0700 [PATCH] knfsd: change nlm_file to use a hlist This changes struct nlm_file and the nlm_files hash table to use a hlist instead of the home-grown lists. This allows us to remove f_hash which was only used to find the right hash chain to delete an entry from. It also increases the size of the nlm_files hash table from 32 to 128. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 68a2d76cea4234bc027df23085d9df4f2171f7fc Author: Olaf Kirch Date: Wed Oct 4 02:15:57 2006 -0700 [PATCH] knfsd: lockd: Change list of blocked list to list_node This patch changes the nlm_blocked list to use a list_node instead of homegrown linked list handling. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 0cea32761a2f954c6d42ca79d7d1e6b9663b1e4a Author: Olaf Kirch Date: Wed Oct 4 02:15:56 2006 -0700 [PATCH] knfsd: lockd: make the hash chains use a hlist_node Get rid of the home-grown singly linked lists for the nlm_host hash table. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 9502c52259f7038b6c1e31532e22884716a56b1a Author: Olaf Kirch Date: Wed Oct 4 02:15:56 2006 -0700 [PATCH] knfsd: lockd: make the nsm upcalls use the nsm_handle This converts the statd upcalls to use the nsm_handle This means that we only register each host once with statd, rather than registering each host/vers/protocol triple. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 5c8dd29ca7fc7483690cef4306549742d534f2a2 Author: Olaf Kirch Date: Wed Oct 4 02:15:55 2006 -0700 [PATCH] knfsd: lockd: Make nlm_host_rebooted use the nsm_handle This patch makes the SM_NOTIFY handling understand and use the nsm_handle. To make it a bit clear what is happening: nlmclent_prepare_reclaim and nlmclnt_finish_reclaim get open-coded into 'reclaimer' The result is tidied up. Then some of that functionality is moved out into nlm_host_rebooted (which calls nlmclnt_recovery which starts a thread which runs reclaimer). Also host_rebooted now finds an nsm_handle rather than a host, then then iterates over all hosts and deals with each host that shares that nsm_handle. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit f0737a39a64a9df32bb045c54e1cdf6cecdcbdd7 Author: Olaf Kirch Date: Wed Oct 4 02:15:54 2006 -0700 [PATCH] knfsd: misc minor fixes, indentation changes cleans up some code in lockd/host.c, fixes an error printk and makes it a fatal BUG if nlmsvc_free_host_resources fails. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 8dead0dbd478f35fd943f3719591e5af1ac0950d Author: Olaf Kirch Date: Wed Oct 4 02:15:53 2006 -0700 [PATCH] knfsd: lockd: introduce nsm_handle This patch introduces the nsm_handle, which is shared by all nlm_host objects referring to the same client. With this patch applied, all nlm_hosts from the same address will share the same nsm_handle. A future patch will add sharing by name. Note: this patch changes h_name so that it is no longer guaranteed to be an IP address of the host. When the host represents an NFS server, h_name will be the name passed in the mount call. When the host represents a client, h_name will be the name presented in the lock request received from the client. A h_name is only used for printing informational messages, this change should not be significant. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit db4e4c9a9e741ee812e1febf5e386d6a24218a71 Author: Olaf Kirch Date: Wed Oct 4 02:15:52 2006 -0700 [PATCH] knfsd: when looking up a lockd host, pass hostname & length This patch adds the peer's hostname (and name length) to all calls to nlm*_lookup_host functions. A subsequent patch will make use of these (is requested by a sysctl). Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit cf712c24d72341effcfd28330b83b49f77cb627b Author: Olaf Kirch Date: Wed Oct 4 02:15:52 2006 -0700 [PATCH] knfsd: consolidate common code for statd->lockd notification Common code from nlm4svc_proc_sm_notify and nlmsvc_proc_sm_notify is moved into a new nlm_host_rebooted. This is in preparation of a patch that will change the reboot notification handling entirely. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 977faf392fc898407554bbe7338d57b29e3660cf Author: Olaf Kirch Date: Wed Oct 4 02:15:51 2006 -0700 [PATCH] knfsd: hide use of lockd's h_monitored flag This patch moves all checks of the h_monitored flag into the nsm_monitor/unmonitor functions. A subsequent patch will replace the mechanism by which we mark a host as being monitored. There is still one occurence of h_monitored outside of mon.c and that is in clntlock.c where we respond to a reboot. The subsequent patch will modify this too. Signed-off-by: Olaf Kirch Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 7b2b1fee30df7e2165525cd03f7d1d01a3a56794 Author: Greg Banks Date: Wed Oct 4 02:15:50 2006 -0700 [PATCH] knfsd: knfsd: cache ipmap per TCP socket Speed up high call-rate workloads by caching the struct ip_map for the peer on the connected struct svc_sock instead of looking it up in the ip_map cache hashtable on every call. This helps workloads using AUTH_SYS authentication over TCP. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. Profiling showed strcmp(), called from ip_map_match(), was taking 4.8% of each CPU, and ip_map_lookup() was taking 2.9%. This patch drops both contribution into the profile noise. Note that the above result overstates this value of this patch for most workloads. The synthetic clients are all using separate IP addresses, so there are 64 entries in the ip_map cache hash. Because the kernel measured contained the bug fixed in commit commit 1f1e030bf75774b6a283518e1534d598e14147d4 and was running on 64bit little-endian machine, probably all of those 64 entries were on a single chain, thus increasing the cost of ip_map_lookup(). With a modern kernel you would need more clients to see the same amount of performance improvement. This patch has helped to scale knfsd to handle a deployment with 2000 NFS clients. Signed-off-by: Greg Banks Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit fce1456a19f5c08b688c29f00ef90fdfa074c79b Author: Greg Banks Date: Wed Oct 4 02:15:49 2006 -0700 [PATCH] knfsd: make nfsd readahead params cache SMP-friendly Make the nfsd read-ahead params cache more SMP-friendly by changing the single global list and lock into a fixed 16-bucket hashtable with per-bucket locks. This reduces spinlock contention in nfsd_read() on read-heavy workloads on multiprocessor servers. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing 1K streaming reads at full line rate. The server had 128 nfsd threads, which sizes the RA cache at 256 entries, of which only a handful were used. Flat profiling shows nfsd_read(), including the inlined nfsd_get_raparms(), taking 10.4% of each CPU. This patch drops the contribution from nfsd() to 1.71% for each CPU. Signed-off-by: Greg Banks Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 596bbe53eb3abfe7326b2f5e8afd614265c319c8 Author: NeilBrown Date: Wed Oct 4 02:15:48 2006 -0700 [PATCH] knfsd: Allow max size of NFSd payload to be configured The max possible is the maximum RPC payload. The default depends on amount of total memory. The value can be set within reason as long as no nfsd threads are currently running. The value can also be ready, allowing the default to be determined after nfsd has started. Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 7adae489fe794e3e203ff168595f635d0b845e59 Author: Greg Banks Date: Wed Oct 4 02:15:47 2006 -0700 [PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP The limit over UDP remains at 32K. Also, make some of the apparently arbitrary sizing constants clearer. The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of the rqstp. This allows it to be different for different protocols (udp/tcp) and also allows it to depend on the servers declared sv_bufsiz. Note that we don't actually increase sv_bufsz for nfs yet. That comes next. Signed-off-by: Greg Banks Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 3cc03b164cf01c6f36e64720b58610d292fb26f7 Author: NeilBrown Date: Wed Oct 4 02:15:47 2006 -0700 [PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom .. by allocating the array of 'kvec' in 'struct svc_rqst'. As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer allocate an array of this size on the stack. So we allocate it in 'struct svc_rqst'. However svc_rqst contains (indirectly) an array of the same type and size (actually several, but they are in a union). So rather than waste space, we move those arrays out of the separately allocated union and into svc_rqst to share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at different times, so there is no conflict). Signed-off-by: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 4452435948424e5322c2a2fefbdc2cf3732cc45d Author: NeilBrown Date: Wed Oct 4 02:15:46 2006 -0700 [PATCH] knfsd: Replace two page lists in struct svc_rqst with one We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES. struct svc_rqst contains two such arrays. However the there are never more that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is needed. The two arrays are for the pages holding the request, and the pages holding the reply. Instead of two arrays, we can simply keep an index into where the first reply page is. This patch also removes a number of small inline functions that probably server to obscure what is going on rather than clarify it, and opencode the needed functionality. Also remove the 'rq_restailpage' variable as it is *always* 0. i.e. if the response 'xdr' structure has a non-empty tail it is always in the same pages as the head. check counters are initilised and incr properly check for consistant usage of ++ etc maybe extra some inlines for common approach general review Signed-off-by: Neil Brown Cc: Magnus Maatta Sig