aboutsummaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2006-01-08[PATCH] ext3: external journal device as a mount optionJohann Lombardi1-10/+44
The patch below adds a new mount option to allow the external journal device to be specified. The syntax is as follows: # mount -t ext3 -o journal_dev=0x0820 ... where 0x0820 means major=8 and minor=32. Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] shared mounts: cleanupMiklos Szeredi2-2/+2
Small cleanups in shared mounts code. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Ram Pai <linuxram@us.ibm.com> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] pivot_root: add commentNeil Brown1-0/+4
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] do_coredump() should reset group_stop_count earlierOleg Nesterov1-1/+1
__group_complete_signal() sets ->group_stop_count in sig_kernel_coredump() path and marks the target thread as ->group_exit_task. So any thread except group_exit_task will go to handle_group_stop()->finish_stop(). However, when group_exit_task actually starts do_coredump(), it sets SIGNAL_GROUP_EXIT, but does not reset ->group_stop_count while killing other threads. If we have not yet stopped threads in the same thread group, they all will spin in kernel mode until group_exit_task sends them SIGKILL, because ->group_stop_count > 0 means: recalc_sigpending_tsk() never clears TIF_SIGPENDING get_signal_to_deliver() goes to handle_group_stop() handle_group_stop() returns when SIGNAL_GROUP_EXIT set syscall_exit/resume_userspace notice TIF_SIGPENDING, call get_signal_to_deliver() again. So we are wasting cpu cycles, and if one of these threads is rt_task() this may be a serious problem. NOTE: do_coredump() holds ->mmap_sem, so not stopped threads can't escape coredumping after clearing ->group_stop_count. See also this thread: http://marc.theaimsgroup.com/?t=112739139900002 Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fix possible PAGE_CACHE_SHIFT overflowsAndrew Morton8-17/+17
We've had two instances recently of overflows when doing 64_bit_value = (32_bit_value << PAGE_CACHE_SHIFT) I did a tree-wide grep of `<<.*PAGE_CACHE_SHIFT' and this is the result. - afs_rxfs_fetch_descriptor.offset is of type off_t, which seems broken. - jfs and jffs are limited to 4GB anyway. - reiserfs map_block_for_writepage() takes an unsigned long for the block - it should take sector_t. (It'll fail for huge filesystems with blocksize<PAGE_CACHE_SIZE) - cramfs_read() needs to use sector_t (I think cramsfs is busted on large filesystems anyway) - affs is limited in file size anyway. - I generally didn't fix 32-bit overflows in directory operations. - arm's __flush_dcache_page() is peculiar. What if the page lies beyond 4G? - gss_wrap_req_priv() needs checking (snd_buf->page_base) Cc: Oleg Drokin <green@linuxhacker.ru> Cc: David Howells <dhowells@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: <reiserfs-dev@namesys.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <linux-fsdevel@vger.kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Fix overflow tests for compat_sys_fcntl64 lockingNeilBrown1-4/+18
When making an fctl locking call through compat_sys_fcntl64 (i.e. a 32bit app on a 64bit kernel), the syscall can return a locking range that is in conflict with the queried lock. If some aspect of this range does not fit in the 32bit structure, something needs to be done. The current code is wrong in several respects: - It returns data to userspace even if no conflict was found i.e. it should check l_type for F_UNLCK - It returns -EOVERFLOW too agressively. A lock range covering the last possible byte of the file (start = COMPAT_OFF_T_MAX, len = 1) should be possible, but is rejected with the current test. - A extra-long 'len' should not be a problem. If only that part of the conflicting lock that would be visible to the 32bit app needs to be reported to the 32bit app anyway. This patch addresses those three issues and adds a comment to (hopefully) record it for posterity. Note: this patch mainly affects test-cases. Real applications rarely is ever see the problems. This patch has been tested (LSB test suite), and works. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <willy@debian.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Fix some problems with truncate and mtime semantics.NeilBrown4-22/+15
SUS requires that when truncating a file to the size that it currently is: truncate and ftruncate should NOT modify ctime or mtime O_TRUNC SHOULD modify ctime and mtime. Currently mtime and ctime are always modified on most local filesystems (side effect of ->truncate) or never modified (on NFS). With this patch: ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when an update of these times is required whether size changes or not (via a new argument to do_truncate). This allows NFS to do the right thing for O_TRUNC. inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE sets the size to it's current value. This allows local filesystems to do the right thing for f?truncate. Also, the logic in inode_setattr is changed a bit so there are two return points. One returns the error from vmtruncate if it failed, the other returns 0 (there can be no other failure). Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change requested, we now fall-through and mark_inode_dirty. If a filesystem did not have a ->truncate function, then vmtruncate will have changed i_size, without marking the inode as 'dirty', and I think this is wrong. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] udf: remove bogus inode == NULL check in inode_bmapChristoph Hellwig1-5/+0
inode can never be NULL when calling this function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: cleanup, change relayfs_file_* to relay_file_*Tom Zanussi2-43/+48
This patch renames relayfs_file_operations to relay_file_operations, and the file operations themselves from relayfs_XXX to relay_file_XXX, to make it more clear that they refer to relay files. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: add support for global relay buffersTom Zanussi1-10/+25
This patch adds the optional is_global outparam to the create_buf_file() callback. This can be used by clients to create a single global relayfs buffer instead of the default per-cpu buffers. This was suggested as being useful for certain debugging applications where it's more convenient to be able to get all the data from a single channel without having to go to the bother of dealing with per-cpu files. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: add support for relay files in other filesystemsTom Zanussi2-3/+29
This patch adds a couple of callback functions that allow a client to hook into relay_open()/close() and supply the files that will be used to represent the channel buffers; the default implementation if no callbacks are defined is to create the files in relayfs. This is to support the creation and use of relay files in other filesystems such as debugfs, as implied by the fact that relayfs_file_operations are exported. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: remove unused alloc/destroy_inode()Tom Zanussi1-45/+1
Since we're no longer using relayfs_inode_info, remove relayfs_alloc_inode() and relayfs_destroy_inode() along with the relayfs inode cache. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: use generic_ip for private dataTom Zanussi1-8/+9
Use inode->u.generic_ip instead of relayfs_inode_info to store pointer to user data. Clients using relayfs_file_create() to create their own files would probably more expect their data to be stored in generic_ip; we also intend in the next set of patches to get rid of relayfs-specific stuff in the file operations, so we might as well do it here. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: add relayfs_remove_file()Tom Zanussi1-0/+12
This patch adds and exports relayfs_remove_file(), for API symmetry (with relayfs_create_file()). Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: export relayfs_create_file() with fileops paramTom Zanussi3-20/+28
This patch adds a mandatory fileops param to relayfs_create_file() and exports that function so that clients can use it to create files defined by their own set of file operations, in relayfs. The purpose is to allow relayfs applications to create their own set of 'control' files alongside their relay files in relayfs rather than having to create them in /proc or debugfs for instance. relayfs_create_file() is also used by relay_open_buf() to create the relay files for a channel. In this case, a pointer to relayfs_file_operations is passed in, along with a pointer to the buffer associated with the file. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] relayfs: decouple buffer creation from inode creationTom Zanussi4-26/+19
The patch series implementa or fixes 3 things that were specifically requested or suggested by relayfs users: - support for non-relay files (patches 1-6) Currently, the relayfs API only supports the creation of directories (relayfs_create_dir()) and relay files (relay_open()). These patches adds support for non-relay files (relayfs_create_file()). This is so relayfs applications can create 'control files' in relayfs itself rather than in /proc or via a netlink channel, as is currently done in the relay-app examples. Basically what this amounts to is exporting relayfs_create_file() with an additional file_ops param that clients can use to supply file operations for their own special-purpose files in relayfs. - make exported relay file ops useful (patches 7-8) The relayfs relay_file_operations have always been exported, the intent being to make it possible to create relay files in other filesystems such as debugfs. The problem, though, is that currently the file operations are too tightly coupled to relayfs to actually be used for this purpose. This patch fixes that by adding a couple of callback functions that allow a client to hook into relay_open()/close() and supply the files that will be used to represent the channel buffers; the default implementation if no callbacks are defined is to create the files in relayfs. - add an option to create global relay buffer (patches 9-10) The file creation callback also supplies an optional param, is_global, that can be used by clients to create a single global relayfs buffer instead of the default per-cpu buffers. This was suggested as being useful for certain debugging applications where it's more convenient to be able to get all the data from a single channel without having to go to the bother of dealing with per-cpu files. - cleanup, some renaming and Documentation updates (patches 11-12) There were several comments that the use of netlink in the example code was non-intuitive and in fact the whole relay-app business was needlessly confusing. Based on that feedback, the example code has been completely converted over to relayfs control files as supported by this patch, and have also been made completely self-contained. The converted examples along with a couple of new examples that demonstrate using exported relay files can be found in relay-apps tarball: http://prdownloads.sourceforge.net/relayfs/relay-apps-0.9.tar.gz?download This patch: Separate buffer create/destroy from inode create/destroy. We want to be able to associate other data and not just relay buffers with inodes. Buffer create/destroy is moved out of inode.c and into relayfs core code. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] rcu file: use atomic primitivesNick Piggin2-6/+5
Use atomic_inc_not_zero for rcu files instead of special case rcuref. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Fix and add EXPORT_SYMBOL(filemap_write_and_wait)OGAWA Hirofumi15-48/+21
This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it. See mm/filemap.c: And changes the filemap_write_and_wait() and filemap_write_and_wait_range(). Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite() returns error. However, even if filemap_fdatawrite() returned an error, it may have submitted the partially data pages to the device. (e.g. in the case of -ENOSPC) <quotation> Andrew Morton writes, If filemap_fdatawrite() returns an error, this might be due to some I/O problem: dead disk, unplugged cable, etc. Given the generally crappy quality of the kernel's handling of such exceptions, there's a good chance that the filemap_fdatawait() will get stuck in D state forever. </quotation> So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO. Trond, could you please review the nfs part? Especially I'm not sure, nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not. Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: support a truncate() for expanding size (generic_cont_expand)OGAWA Hirofumi2-17/+74
This patch changes generic_cont_expand(), in order to share the code with fatfs. - Use vmtruncate() if ->prepare_write() returns a error. Even if ->prepare_write() returns an error, it may already have added some blocks. So, this truncates blocks outside of ->i_size by vmtruncate(). - Add generic_cont_expand_simple(). The generic_cont_expand_simple() assumes that ->prepare_write() can handle the block boundary. With this, we don't need to care the extra byte. And for expanding a file size by truncate(), fatfs uses the added generic_cont_expand_simple(). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: support ->direct_IO()OGAWA Hirofumi3-15/+87
This patch add to support of ->direct_IO() for mostly read. The user of this seems to want to use for streaming read. So, current direct I/O has limitation, it can only overwrite. (For write operation, mainly we need to handle the hole etc..) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: s/EXPORT_SYMBOL/EXPORT_SYMBOL_GPL/OGAWA Hirofumi5-17/+17
All EXPORT_SYMBOL of fatfs is only for vfat/msdos. _GPL would be proper. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: add the read/writepages()OGAWA Hirofumi1-1/+16
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: use sb_find_get_block() instead of sb_getblk()OGAWA Hirofumi1-2/+2
We don't need to allocate buffer for checking the buffer is uptodate. This use sb_find_get_block() instead, and if it returns NULL it's not uptodate. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] fat: move fat_clusters_flush() to write_super()OGAWA Hirofumi3-6/+14
It is overkill to update the FS_INFO whenever modifying prev_free/free_clusters, because those are just a hint. So, this patch uses ->write_super() for updating FS_INFO instead. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] slob: introduce the SLOB allocatorMatt Mackall1-0/+4
configurable replacement for slab allocator This adds a CONFIG_SLAB option under CONFIG_EMBEDDED. When CONFIG_SLAB is disabled, the kernel falls back to using the 'SLOB' allocator. SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer, similar to the original Linux kmalloc allocator that SLAB replaced. It's signicantly smaller code and is more memory efficient. But like all similar allocators, it scales poorly and suffers from fragmentation more than SLAB, so it's only appropriate for small systems. It's been tested extensively in the Linux-tiny tree. I've also stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not recommended). Here's a comparison for otherwise identical builds, showing SLOB saving nearly half a megabyte of RAM: $ size vmlinux* text data bss dec hex filename 3336372 529360 190812 4056544 3de5e0 vmlinux-slab 3323208 527948 190684 4041840 3dac70 vmlinux-slob $ size mm/{slab,slob}.o text data bss dec hex filename 13221 752 48 14021 36c5 mm/slab.o 1896 52 8 1956 7a4 mm/slob.o /proc/meminfo: SLAB SLOB delta MemTotal: 27964 kB 27980 kB +16 kB MemFree: 24596 kB 25092 kB +496 kB Buffers: 36 kB 36 kB 0 kB Cached: 1188 kB 1188 kB 0 kB SwapCached: 0 kB 0 kB 0 kB Active: 608 kB 600 kB -8 kB Inactive: 808 kB 812 kB +4 kB HighTotal: 0 kB 0 kB 0 kB HighFree: 0 kB 0 kB 0 kB LowTotal: 27964 kB 27980 kB +16 kB LowFree: 24596 kB 25092 kB +496 kB SwapTotal: 0 kB 0 kB 0 kB SwapFree: 0 kB 0 kB 0 kB Dirty: 4 kB 12 kB +8 kB Writeback: 0 kB 0 kB 0 kB Mapped: 560 kB 556 kB -4 kB Slab: 1756 kB 0 kB -1756 kB CommitLimit: 13980 kB 13988 kB +8 kB Committed_AS: 4208 kB 4208 kB 0 kB PageTables: 28 kB 28 kB 0 kB VmallocTotal: 1007312 kB 1007312 kB 0 kB VmallocUsed: 48 kB 48 kB 0 kB VmallocChunk: 1007264 kB 1007264 kB 0 kB (this work has been sponsored in part by CELF) From: Ingo Molnar <mingo@elte.hu> Fix 32-bitness bugs in mm/slob.c. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] RCU signal handlingIngo Molnar1-2/+2
RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked instead of tasklist_lock read-locked. This is a scalability improvement on SMP and a preemption-latency improvement under PREEMPT_RCU. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: William Irwin <wli@holomorphy.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] frv: suppress configuration of certain features for FRVDavid Howells1-1/+1
Suppress configuration of certain features for the FRV arch as they can't be built for FRV at the moment: (*) RTC (*) HISAX_* (*) PARPORT_PC (*) VGA_CONSOLE (*) BINFMT_ELF Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Fold numa_maps into mempolicies.cChristoph Lameter1-122/+5
First discussed at http://marc.theaimsgroup.com/?t=113149255100001&r=1&w=2 - Use the check_range() in mempolicy.c to gather statistics. - Improve the numa_maps code in general and fix some comments. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] drop-pagecacheAndrew Morton2-1/+69
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to discard as much pagecache and/or reclaimable slab objects as it can. THis operation requires root permissions. It won't drop dirty data, so the user should run `sync' first. Caveats: a) Holds inode_lock for exorbitant amounts of time. b) Needs to be taught about NUMA nodes: propagate these all the way through so the discarding can be controlled on a per-node basis. This is a debugging feature: useful for getting consistent results between filesystem benchmarks. We could possibly put it under a config option, but it's less than 300 bytes. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06Merge git://git.linux-nfs.org/pub/linux/nfs-2.6Linus Torvalds35-1016/+1678
2006-01-06[PATCH] fs/ufs: debug mode compilation failureEvgeniy1-1/+1
This patch should fix compilation failure of fs/ufs/dir.c with defined UFS_DIR_DEBUG Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06NFSv4: Fix an Oops in nfs_do_expire_all_delegationsTrond Myklebust1-4/+2
If the loop errors, we need to exit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Allow entries in the idmap cache to expireTrond Myklebust3-0/+33
If someone changes the uid/gid mapping in userland, then we do eventually want those changes to be propagated to the kernel. Currently the kernel assumes that it may cache entries forever. Add an expiration time + garbage collector for idmap entries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: get rid of some needless code obfuscation in xdr_encode_sattr().Trond Myklebust1-11/+10
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Send valid mode bits to the serverTrond Myklebust3-2/+5
inode->i_mode contains a lot more than just the mode bits. Make sure that we mask away this extra stuff in SETATTR calls to the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: get rid of cl_chattyChuck Lever5-7/+1
Clean up: Every ULP that uses the in-kernel RPC client, except the NLM client, sets cl_chatty. There's no reason why NLM shouldn't set it, so just get rid of cl_chatty and always be verbose. Test-plan: Compile with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: new interface to force an RPC rebindChuck Lever1-2/+2
We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols. Start by creating an API to force RPC rebind, replacing logic that simply sets cl_port to zero. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv3: try get_root user-supplied security_flavorJ. Bruce Fields1-7/+19
Thanks to Ed Keizer for bug and root cause. He says: "... we could only mount the top-level Solaris share. We could not mount deeper into the tree. Investigation showed that Solaris allows UNIX authenticated FSINFO only on the top level of the share. This is a problem because we share/export our home directories one level higher than we mount them. I.e. we share the partition and not the individual home directories. This prevented access to home directories." We still may need to try auth_sys for the case where the client doesn't have appropriate credentials. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NLM: fix parsing of sm notify procedureJ. Bruce Fields1-1/+3
The procedure that decodes statd sm_notify call seems to be skipping a few arguments. How did this ever work? >From folks at Polyserve. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NLM: Further cancel fixesJ. Bruce Fields2-6/+16
If the server receives an NLM cancel call and finds no waiting lock to cancel, then chances are the lock has already been applied, and the client just hadn't yet processed the NLM granted callback before it sent the cancel. The Open Group text, for example, perimts a server to return either success (LCK_GRANTED) or failure (LCK_DENIED) in this case. But returning an error seems more helpful; the client may be able to use it to recognize that a race has occurred and to recover from the race. So, modify the relevant functions to return an error in this case. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NLM: clean up nlmsvc_delete_blockJ. Bruce Fields1-2/+1
The fl_next check here is superfluous (and possibly a layering violation). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NLM: don't unlock on cancel requestsJ. Bruce Fields2-16/+2
Currently when lockd gets an NLM_CANCEL request, it also does an unlock for the same range. This is incorrect. The Open Group documentation says that "This procedure cancels an *outstanding* blocked lock request." (Emphasis mine.) Also, consider a client that holds a lock on the first byte of a file, and requests a lock on the entire file. If the client cancels that request (perhaps because the requesting process is signalled), the server shouldn't apply perform an unlock on the entire file, since that will also remove the previous lock that the client was already granted. Or consider a lock request that actually *downgraded* an exclusive lock to a shared lock. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NLM: Clean up nlmsvc_grant_reply lockingJ. Bruce Fields1-4/+3
Slightly simpler logic here makes it more trivial to verify that the up's and down's are balanced here. Break out an assignment from a conditional while we're at it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Allow user to set the port used by the NFSv4 callback channelTrond Myklebust5-3/+113
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Clean up weak cache consistency codeTrond Myklebust1-20/+40
...and ensure that nfs_update_inode() respects wcc Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Ensure DELEGRETURN returns attributesTrond Myklebust3-17/+35
Upon return of a write delegation, the server will almost always bump the change attribute. Ensure that we pick up that change so that we don't invalidate our data cache unnecessarily. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Ensure change attribute returned by GETATTR callback conforms to specTrond Myklebust3-1/+5
According to RFC3530 we're supposed to cache the change attribute at the time the client receives a write delegation. If the inode is clean, a CB_GETATTR callback by the server to the client is supposed to return the cached change attribute. If, OTOH, the inode is dirty, the client should bump the cached change attribute by 1. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Make directIO aware of compound pages...Trond Myklebust1-3/+4
...and avoid calling set_page_dirty on them Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Make stat() return updated mtimes after a write()Trond Myklebust2-11/+14
The SuS states that a call to write() will cause mtime to be updated on the file. In order to satisfy that requirement, we need to flush out any cached writes in nfs_getattr(). Speed things up slightly by not committing the writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Ensure that we return the delegation on the target of a rename too.Trond Myklebust1-1/+3
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: support large reads and writes on the wireChuck Lever5-29/+40
Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: make "inode number mismatch" message more usefulChuck Lever1-8/+9
To help NFS users and server developers, make the "inode number mismatch" message display more useful information. Test-plan: None. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: get rid of useless kernel log messageChuck Lever1-2/+1
nfs_statfs() generates a log message when GETATTR returns an error. This is usually a useless message. Make it a dprintk. Test plan: None Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs()Chuck Lever1-2/+2
Red Hat found a problem in the error recovery logic in __init_nfs. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: use generic_write_checks() to sanity check direct writesChuck Lever1-24/+19
Replace ad hoc write parameter sanity checking in nfs_file_direct_write() with a call to generic_write_checks(). This should make the proper checks modulo the O_LARGEFILE flag, and should catch NFSv2-specific limitations by virtue of i_sb->s_maxbytes. Test plan: Posix compliance testing with both NFSv2 and NFSv3. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Remove requirement for machine creds for the "setclientid" operationTrond Myklebust4-49/+52
Use a cred from the nfs4_client->cl_state_owners list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Remove requirement for machine creds for the "renew" operationTrond Myklebust4-25/+41
In RFC3530, the RENEW operation is allowed to use either the same principal, RPC security flavour and (if RPCSEC_GSS), the same mechanism and service that was used for SETCLIENTID_CONFIRM OR Any principal, RPC security flavour and service combination that currently has an OPEN file on the server. Choose the latter since that doesn't require us to keep credentials for the same principal for the entire duration of the mount. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Send RENEW requests to the server only when we're holding stateTrond Myklebust5-2/+75
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Convert instances of kernel_thread() to kthread()Trond Myklebust1-30/+16
Convert private implementations in NFSv4 state recovery and delegation code to use kthreads. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: State recovery cleanupTrond Myklebust3-25/+28
Use wait_on_bit() when waiting for state recovery to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: OPEN/LOCK/LOCKU/CLOSE will automatically renew the NFSv4 leaseTrond Myklebust1-3/+31
Cut down on the number of unnecessary RENEW requests on the wire. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call.Trond Myklebust1-2/+2
...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to interrupt RPC calls too (as per the Solaris implementation). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Make DELEGRETURN an interruptible operation.Trond Myklebust1-8/+60
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Convert LOCK rpc call into an asynchronous RPC callTrond Myklebust2-75/+175
In order to allow users to interrupt/cancel it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: locking XDR cleanupTrond Myklebust2-168/+155
Get rid of some unnecessary intermediate structures Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Make open recovery track O_RDWR, O_RDONLY and O_WRONLY correctlyTrond Myklebust2-131/+156
When recovering from a delegation recall or a network partition, we need to replay open(O_RDWR), open(O_RDONLY) and open(O_WRONLY) separately. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Make nfs4_state track O_RDWR, O_RDONLY and O_WRONLY separatelyTrond Myklebust3-28/+42
A closer reading of RFC3530 reveals that OPEN_DOWNGRADE must always specify a access modes that have been the argument of a previous OPEN operation. IOW: doing OPEN(O_RDWR) and then OPEN_DOWNGRADE(O_WRONLY) is forbidden unless the user called OPEN(O_WRONLY) In order to fix that, we really need to track the three possible open states separately. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Make open_confirm() asynchronous tooTrond Myklebust2-28/+80
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Convert open() into an asynchronous RPC callTrond Myklebust2-66/+130
OPEN is a stateful operation, so we must ensure that it always completes. In order to allow users to interrupt the operation, we need to make the RPC call asynchronous, and then wait on completion (or cancel). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Allocate OPEN call RPC arguments using kmalloc()Trond Myklebust1-96/+117
Cleanup in preparation for making OPEN calls interruptible by the user. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Make locku use the new RPC "wait on completion" interface.Trond Myklebust1-29/+36
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: stateful NFSv4 RPC call interfaceTrond Myklebust1-1/+0
The NFSv4 model requires us to complete all RPC calls that might establish state on the server whether or not the user wants to interrupt it. We may also need to schedule new work (including new RPC calls) in order to cancel the new state. The asynchronous RPC model will allow us to ensure that RPC calls always complete, but in order to allow for "synchronous" RPC, we want to add the ability to wait for completion. The waits are, of course, interruptible. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: Further cleanupsTrond Myklebust2-18/+16
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06RPC: Clean up RPC task structureTrond Myklebust12-140/+181
Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Work correctly with single-page ->writepage() callsTrond Myklebust1-11/+5
Ensure that we always initiate flushing of data before we exit a single-page ->writepage() call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06Merge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-blockLinus Torvalds1-2/+24
Manual fixup for merge with Jens' "Suspend support for libata", commit ID 9b847548663ef1039dd49f0eb4463d001e596bc3. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] knfsd: reduce stack consumptionNeil Brown1-8/+12
A typical nfsd call trace is nfsd -> svc_process -> nfsd_dispatch -> nfsd3_proc_write -> nfsd_write ->nfsd_vfs_write -> vfs_writev These add up to over 300 bytes on the stack. Looking at each of these, I see that nfsd_write (which includes nfsd_vfs_write) contributes 0x8c to stack usage itself!! It turns out this is because it puts a 'struct iattr' on the stack so it can kill suid if needed. The following patch saves about 50 bytes off the stack in this call path. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] knfsd: check error status from vfs_getattr and i_op->fsyncDavid Shaw4-55/+71
Both vfs_getattr and i_op->fsync return error statuses which nfsd was largely ignoring. This as noticed when exporting directories using fuse. This patch cleans up most of the offences, which involves moving the call to vfs_getattr out of the xdr encoding routines (where it is too late to report an error) into the main NFS procedure handling routines. There is still a called to vfs_gettattr (related to the ACL code) where the status is ignored, and called to nfsd_sync_dir don't check return status either. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] jbd: split checkpoint listsJan Kara1-177/+241
Split the checkpoint list of the transaction into two lists. In the first list we keep the buffers that need to be submitted for IO. In the second list are kept buffers that were already submitted and we just have to wait for the IO to complete. This should simplify a handling of checkpoint lists a bit and can eventually be also a performance gain. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: check file type in lookupMiklos Szeredi2-13/+22
Previously invalid types were quietly changed to regular files, but at revalidation the inode was changed to bad. This was rather inconsistent behavior. Now check if the type is valid on initial lookup, and return -EIO if not. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: ensure progress in read and writeMiklos Szeredi1-4/+3
In direct_io mode, send at least one page per reqest. Previously it was possible that reqests with zero data were sent, and hence the read/write didn't make any progress, resulting in an infinite (though interruptible) loop. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: make maximum write data configurableMiklos Szeredi3-23/+32
Make the maximum size of write data configurable by the filesystem. The previous fixed 4096 limit only worked on architectures where the page size is less or equal to this. This change make writing work on other architectures too, and also lets the filesystem receive bigger write requests in direct_io mode. Normal writes which go through the page cache are still limited to a page sized chunk per request. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: clean up request size limit checkingMiklos Szeredi4-27/+24
Change the way a too large request is handled. Until now in this case the device read returned -EINVAL and the operation returned -EIO. Make it more flexibible by not returning -EINVAL from the read, but restarting it instead. Also remove the fixed limit on setxattr data and let the filesystem provide as large a read buffer as it needs to handle the extended attribute data. The symbolic link length is already checked by VFS to be less than PATH_MAX, so the extra check against FUSE_SYMLINK_MAX is not needed. The check in fuse_create_open() against FUSE_NAME_MAX is not needed, since the dentry has already been looked up, and hence the name already checked. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: fail file operations on bad inodeMiklos Szeredi2-5/+37
Make file operations on a bad inode fail. This just makes things a bit more consistent. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: add code documentationMiklos Szeredi1-9/+90
Document some not-so-trivial functions. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: support caching negative dentriesMiklos Szeredi1-21/+43
Add support for caching negative dentries. Up till now, ->d_revalidate() always forced a new lookup on these. Now let the lookup method return a zero node ID (not used for anything else) meaning a negative entry, but with a positive cache timeout. The old way of signaling negative entry (replying ENOENT) still works. Userspace should check the ABI minor version to see whether sending a zero ID is allowed by the kernel or not. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: add frsize to statfs replyMiklos Szeredi1-1/+4
Add 'frsize' member to the statfs reply. I'm not sure if sending f_fsid will ever be needed, but just in case leave some space at the end of the structure, so less compatibility mess would be required. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: bump interface versionMiklos Szeredi2-0/+5
Change interface version to 7.4. Following changes will need backward compatibility support, so store the minor version returned by userspace. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: clean up page offset calculationMiklos Szeredi1-4/+3
Use page_offset() instead of doing page offset calculation by hand. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: clean up fuse_lookup()Miklos Szeredi1-52/+23
Simplify fuse_lookup() and related functions. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] s390: cleanup KconfigMartin Schwidefsky2-2/+2
Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X, ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by S390, 64BIT and COMPAT. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] s390: cms volume label definitionsPeter Oberparleiter1-15/+15
Moved definition of CMS volume label to vtoc.h and modify partitions/ibm.c to use this volume label definition instead of anonymous array. Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] NOMMU: Provide shared-writable mmap support on ramfsDavid Howells5-22/+368
The attached patch makes ramfs support shared-writable mmaps by: (1) Attempting to perform a contiguous block allocation to the requested size when truncate attempts to increase the file from zero size, such as happens when: fd = shm_open("/file/on/ramfs", ...): ftruncate(fd, size_requested); addr = mmap(NULL, subsize, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, fd, offset); (2) Permitting any shared-writable mapping over any contiguous set of extant pages. get_unmapped_area() will return the address into the actual ramfs pages. The mapping may start anywhere and be of any size, but may not go over the end of file. Multiple mappings may overlap in any way. (3) Not permitting a file to be shrunk if it would truncate any shared mappings (private mappings are copied). Thus this patch provides support for POSIX shared memory on NOMMU kernels, with certain limitations such as there being a large enough block of pages available to support the allocation and it only working on directly mappable filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] mm: rmap optimisationNick Piggin1-1/+1
Optimise rmap functions by minimising atomic operations when we know there will be no concurrent modifications. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] Hugetlb: Copy on Write supportDavid Gibson1-3/+0
Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be supported. This helps us to safely use hugetlb pages in many more applications. The patch makes the following changes. If needed, I also have it broken out according to the following paragraphs. 1. Add a pair of functions to set/clear write access on huge ptes. The writable check in make_huge_pte is moved out to the caller for use by COW later. 2. Hugetlb copy-on-write requires special case handling in the following situations: - copy_hugetlb_page_range() - Copied pages must be write protected so a COW fault will be triggered (if necessary) if those pages are written to. - find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the page cache. MAP_PRIVATE pages still need to be locked however. 3. Provide hugetlb_cow() and calls from hugetlb_fault() and hugetlb_no_page() which handles the COW fault by making the actual copy. 4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps will be allowed. Make MAP_HUGETLB exempt from the depricated VM_RESERVED mapping check. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "Seth, Rohit" <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] hfsplus oops fixJoshua Kwan1-1/+1
nls_utf8 is available, and the check in hfsplus_fill_super checks the wrong pointer for NULLness (it checks the saved nls, not the new one that it needs to use.) Signed-off-by: Joshua Kwan <joshk@triplehelix.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[BLOCK] bio: check for same page merge possibilities in __bio_add_page()Jens Axboe1-2/+24
For filesystems with a blocksize < page size, we can merge same page calls into the bio_vec at the end of the bio. This saves segments on systems with a page size > the "normal" 4kb fs block size. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-05Merge http://oss.oracle.com/git/ocfs2Linus Torvalds104-12/+45227
2006-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds3-20/+34
Trivial manual merge fixup for usb_find_interface clashes.
2006-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds19-83/+5
2006-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds1-8/+30
2006-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2-0/+4
2006-01-04Relax the rw_verify_area() error checking.Linus Torvalds2-10/+26
In particular, allow over-large read- or write-requests to be downgraded to a more reasonable range, rather than considering them outright errors. We want to protect lower layers from (the sadly all too common) overflow conditions, but prefer to do so by chopping the requests up, rather than just refusing them outright. Cc: Peter Anvin <hpa@zytor.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-04[PATCH] sysfs: handle failures in sysfs_make_direntSteven Rostedt1-1/+5
I noticed that if sysfs_make_dirent fails to allocate the sd, then a null will be passed to sysfs_put. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04[PATCH] Driver core: Make block devices create the proper symlink nameGreg Kroah-Hartman1-2/+25
Block devices need to add the block device name to the symlink they put in the device directory, otherwise multiple symlinks of the same name can be created. This matches the class system, which works the same way, we just forgot to convert block at the same time. Cc: Pete Zaitcev <zaitcev@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04[PATCH] driver core: replace "hotplug" by "uevent"Kay Sievers1-3/+3
Leave the overloaded "hotplug" word to susbsystems which are handling real devices. The driver core does not "plug" anything, it just exports the state to userspace and generates events. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04[PATCH] remove mount/umount uevents from superblock handlingKay Sievers1-14/+1
The names of these events have been confusing from the beginning on, as they have been more like claim/release events. We needed these events for noticing HAL if storage devices have been mounted. Thanks to Al, we have the proper solution now and can poll() /proc/mounts instead to get notfied about mount tree changes. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-03[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.hArnaldo Carvalho de Melo2-0/+4
To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[PATCH] o Update Kconfig documentation to reflect support for readonly mounts.Mark Fasheh1-1/+0
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03[PATCH] This patch contains the following cleanups:Adrian Bunk3-8/+3
- cluster/sys.c: make needlessly global code static - dlm/: "extern" declarations for variables belong into header files (and in this case, they are already in dlmdomain.h) Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemMark Fasheh2-11/+43
Link the code into the kernel build system. OCFS2 is marked as experimental. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemMark Fasheh52-0/+24438
The OCFS2 file system module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemMark Fasheh6-1/+1485
dlmfs: A minimal dlm userspace interface implemented via a virtual file system. Most of the OCFS2 tools make use of this to take cluster locks when doing operations on the file system. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemKurt Hackel17-0/+10830
A distributed lock manager built with the cluster file system use case in mind. The OCFS2 dlm exposes a VMS style API, though things have been simplified internally. The only lock levels implemented currently are NLMODE, PRMODE and EXMODE. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemZach Brown7-0/+2624
Node messaging via tcp. Used by the dlm and the file system for point to point communication between nodes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemMark Fasheh3-0/+1916
Disk based heartbeat. Configured and started from userspace, the kernel component handles I/O submission and event generation via callback mechanism. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemKurt Hackel7-0/+1001
A simple node information service, filled and updated from userspace. The rest of the stack queries this service for simple node information. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemZach Brown2-0/+441
Very simple printk wrapper which adds the ability to enable various sets of debug messages at run-time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03[PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATEZach Brown1-1/+1
readpage(), prepare_write(), and commit_write() callers are updated to understand the special return code AOP_TRUNCATED_PAGE in the style of writepage() and WRITEPAGE_ACTIVATE. AOP_TRUNCATED_PAGE tells the caller that the callee has unlocked the page and that the operation should be tried again with a new page. OCFS2 uses this to detect and work around a lock inversion in its aop methods. There should be no change in behaviour for methods that don't return AOP_TRUNCATED_PAGE. WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are made enums so that kerneldoc can be used to document their semantics. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2006-01-03[PATCH] configfs: User-driven configuration filesystemJoel Becker10-0/+2455
Configfs, a file system for userspace-driven kernel object configuration. The OCFS2 stack makes extensive use of this for propagation of cluster configuration information into kernel. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2006-01-03update the email address of Randy DunlapAdrian Bunk1-1/+1
This patch removes all references to the bouncing address rddunlap@osdl.org and one dead web page from the kernel. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Randy Dunlap <rdunlap@xenotime.net>
2006-01-03s/retreiv/retriev/gMatt Mackall1-2/+2
As everyone knows, the rule is: "i before e.. um.. always." Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-03fs/qnx4/bitmap.c: #if 0 qnx4_new_block()Adrian Bunk1-0/+2
qnx4_new_block() is neither implemented nor used. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Anders Larsen <al@alarsen.net>
2006-01-03remove pointers to the defunct UDF mailing listAdrian Bunk16-80/+0
This patch removes pointers to the defunct UDF mailing list. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2005-12-30Insanity avoidance in /procLinus Torvalds1-24/+23
The old /proc interfaces were never updated to use loff_t, and are just generally broken. Now, we should be using the seq_file interface for all of the proc files, but converting the legacy functions is more work than most people care for and has little upside.. But at least we can make the non-LFS rules explicit, rather than just insanely wrapping the offset or something. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-29[PATCH] uml: hostfs - fix possible PAGE_CACHE_SHIFT overflowsPaolo 'Blaisorblade' Giarrusso1-1/+6
Prevent page->index << PAGE_CACHE_SHIFT from overflowing. There is a casting there, but was added without care, so it's at the wrong place. Note the extra parens around the shift - "+" is higher precedence than "<<", leading to a GCC warning which saved all us. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-29[PATCH] Hostfs: remove unused varPaolo 'Blaisorblade' Giarrusso1-2/+0
Trivial removal of unused variable from this file - doesn't even change the generated assembly code, in fact (gcc should trigger a warning for unused value here). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-22[SPARC]: introduce a SPARC Kconfig symbolAdrian Bunk1-1/+1
Introduce a Kconfig symbol SPARC that is defined on both the sparc and sparc64 architectures. This symbol makes some dependencies more readable. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-22[PATCH] fix posix lock on NFSASANO Masahiro1-1/+2
NFS client prevents mandatory lock, but there is a flaw on it; Locks are possibly left if the mode is changed while locking. This permits unlocking even if the mandatory lock bits are set. Signed-off-by: ASANO Masahiro <masano@tnes.nec.co.jp> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-20[PATCH] relayfs: remove warning printk() in relay_switch_subbuf()Tom Zanussi1-2/+6
There's currently a diagnostic printk in relay_switch_subbuf() meant as a warning if you accidentally try to log an event larger than the sub-buffer size. The problem is if this happens while logging from somewhere it's not safe to be doing printks, such as in the scheduler, you can end up with a deadlock. This patch removes the warning from relay_switch_subbuf() and instead prints some diagnostic info when the channel is closed. Thanks to Mathieu Desnoyers for pointing out the problem and suggesting a fix. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-20[PATCH] nfsd: check for read-only exports before setting aclsAndreas Gruenbacher2-2/+2
We must check for MAY_SATTR before setting acls, which includes checking for read-only exports: the lower-level setxattr operation that eventually sets the acl cannot check export-level restrictions. Bug reported by Martin Walter <mawa@uni-freiburg.de>. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-19NLM: Fix Oops in nlmclnt_mark_reclaim()Trond Myklebust1-0/+4
When mixing -olock and -onolock mounts on the same client, we have to check that fl->fl_u.nfs_fl.owner is set before dereferencing it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19NFS: Fix another O_DIRECT raceTrond Myklebust3-42/+33
Ensure we call unmap_mapping_range() and sync dirty pages to disk before doing an NFS direct write. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-15Merge by hand (conflicts in scsi_lib.c)James Bottomley3-14/+34
This merge is pretty extensive. The conflict is over the new req->retries parameter, so I had to change the prototype to scsi_setup_blk_pc_cmnd() and the usage in sd, sr and st. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15[SCSI] seperate max_sectors from max_hw_sectorsMike Christie1-9/+11
- export __blk_put_request and blk_execute_rq_nowait needed for async REQ_BLOCK_PC requests - seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was already testing against max_sectors and SCSI-ml was setting max_sectors and max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set a valid max_hw_sectors for all LLDs. Today if a LLD does not set it SCSI-ml sets it to a safe default and some LLDs set it to a artificial low value to overcome memory and feedback issues. Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024, drivers that used to call blk_queue_max_sectors with a large value of max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15[PATCH] xfs: missing gfp_t annotationsAl Viro1-2/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14[SCSI] Convert SCSI mid-layer to scsi_execute_asyncMike Christie1-0/+20
Add scsi helpers to create really-large-requests and convert scsi-ml to scsi_execute_async(). Per Jens's previous comments, I placed this function in scsi_lib.c. I made it follow all the queue's limits - I think I did at least :), so I removed the warning on the function header. I think the scsi_execute_* functions should eventually take a request_queue and be placed some place where the dm-multipath hw_handler can use them if that failover code is going to stay in the kernel. That conversion patch will be sent in another mail though. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-14[PATCH] reiserfs: close open transactions on error pathJeff Mahoney1-8/+18
The following patch fixes a bug where if the journal is aborted, it can leave a transaction open. The result will be a BUG when another code path attempts to start a transaction and will get a "nesting into different fs" error, since current->journal_info will be left non-NULL. Original fix against SUSE kernel by Chris Mason <mason@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14[PATCH] reiserfs: skip commit on io errorJeff Mahoney1-4/+14
This should have been part of the original io error patch, but got dropped somewhere along the way. It's extremely important when handling the i/o error in the journal to not commit the transaction with corrupt data. This patch adds that code back in. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] inotify: add two inotify_add_watch flagsJohn McCutchan1-3/+10
The below patch lets userspace have more control over the inodes that inotify will watch. It introduces two new flags. IN_ONLYDIR -- only watch the inode if it is a directory. This is needed to avoid the race that can occur when we want to be sure that we are watching a directory. IN_DONT_FOLLOW -- don't follow a symlink. In combination with IN_ONLYDIR we can make sure that we don't watch the target of symlinks. The issues the flags fix came up when writing the gnome-vfs inotify backend. Default behaviour is unchanged. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Acked-by: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] Fix listxattr() for generic security attributesDaniel Drake1-1/+1
Commit f549d6c18c0e8e6cf1bf0e7a47acc1daf7e2cec1 introduced a generic fallback for security xattrs, but appears to include a subtle bug. Gentoo users with kernels with selinux compiled in, and coreutils compiled with acl support, noticed that they could not copy files on tmpfs using 'cp'. cp (compiled with acl support) copies the file, lists the extended attributes on the old file, copies them all to the new file, and then exits. However the listxattr() calls were failing with this odd behaviour: llistxattr("a.out", (nil), 0) = 17 llistxattr("a.out", 0x7fffff8c6cb0, 17) = -1 ERANGE (Numerical result out of range) I believe this is a simple problem in the logic used to check the buffer sizes; if the user sends a buffer the exact size of the data, then its ok :) This change solves the problem. More info can be found at http://bugs.gentoo.org/113138 Signed-off-by: Daniel Drake <dsd@gentoo.org> Acked-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-03NFSv4: Fix an Oops in the synchronous write pathTrond Myklebust1-1/+10
- Missing initialisation of attribute bitmask in _nfs4_proc_write() - On success, _nfs4_proc_write() must return number of bytes written. - Missing post_op_update_inode() in _nfs4_proc_write() - Missing initialisation of attribute bitmask in _nfs4_proc_commit() - Missing post_op_update_inode() in _nfs4_proc_commit() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03NFS: Fix post-op attribute revalidation...Trond Myklebust2-0/+4
- Missing nfs_mark_for_revalidate in nfs_proc_link() - Missing nfs_mark_for_revalidate in nfs_rename() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03NFS: use set_page_writeback() in the appropriate placesTrond Myklebust1-2/+4
Ensure that we use set_page_writeback() in the appropriate places to help the VM in keeping its page radix_tree in sync. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03NFS: Fix a few further cache consistency regressionsTrond Myklebust1-34/+20
Steve Dickson writes: Doing the following: 1. On server: $ mkdir ~/t $ echo Hello > ~/t/tmp 2. On client, wait for a string to appear in this file: $ until grep -q foo t/tmp ; do echo -n . ; sleep 1 ; done 3. On server, create a *new* file with the same name containing that string: $ mv ~/t/tmp ~/t/tmp.old; echo foo > ~/t/tmp will show how the client will never (and I mean never ;-) ) see the updated file. The problem is that we do not update nfsi->cache_change_attribute when the file changes on the server (we only update it when our client makes the changes). This again means that functions like nfs_check_verifier() will fail to register when the parent directory has changed and should trigger a dentry lookup revalidation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03NFS: Fix cache consistency regressionSteve Dickson1-0/+1
Make sure cache_change_attribute is initialized to jiffies so when the mtime changes on directory, the directory will be refreshed. Signed-off by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-29[CIFS] For previous fix, mode on mkdir needed S_IFDIR left out.Steve French1-0/+1
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-29[CIFS] Missing parenthesis and typo in previous fixSteve French2-2/+3
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-29Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.gitSteve French14-35/+84
2005-11-29[CIFS] Fix umount --force to wake up the pending response queue, not justSteve French5-8/+62
the request queue. Also periodically wakeup response_q so threads can check if stuck requests have timed out. Workaround Windows server illegal smb length on transact2 findfirst response. Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-29[CIFS] Fix missing permission check on setattr when noperm mount option isSteve French5-13/+60
disabled. Also set mode, uid, gid better on mkdir and create for the case when Unix Extensions is not enabled and setuids is enabled. This is necessary to fix the hole in which chown could be allowed for non-root users in some cases if root mounted, and also to display the mode and uid properly in some cases. Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-29[PATCH] hfsplus: don't modify journaled volumeRoman Zippel4-6/+33
Access to a journaled HFS+ volume is not officially supported under Linux, so mount such a volume read-only, but users can override this behaviour using the "force" mount option. The minimum requirement to relax this check is to at least check that the journal is empty and so nothing needs to be replayed to make sure the volume is consistent. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29[PATCH] reiserfs: handle cnode allocation failure gracefullyJeff Mahoney1-0/+9
If an external device is used for a journal, by default it will use the entire device. The reiserfs journal code allocates structures per journal block when it mounts the file system. If the journal device is too large, and memory cannot be allocated for the structures, it will continue and ultimately panic when it can't pull one off the free list. This patch handles the allocation failure gracefully and prints an error message at mount time. Changes: Updated error message to be more descriptive to the user. Discussed and approved on ReiserFS Mailing List, Nov 28. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29VM: add common helper function to create the page tablesLinus Torvalds1-11/+1
This logic was duplicated four times, for no good reason. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29[JFFS2] Fix the slab cache constructor of 'struct jffs2_inode_info' objects.Thomas Gleixner2-1/+3
JFFS2 initialize f->sem mutex as "locked" in the slab constructor which is a bug. Objects are freed with unlocked f->sem mutex. So, when they allocated again, f->sem is unlocked because the slab cache constructor is not called for them. The constructor is called only once when memory pages are allocated for objects (namely, when the slab layer allocates new slabs). So, sometimes 'struct jffs2_inode_info' are allocated with unlocked f->sem, sometimes with locked. This is a bug. Instead, initialize f->sem as unlocked in the constructor. I.e., in the "constructed" state f->sem must be unlocked. From: Keijiro Yano <keijiro_yano@yahoo.co.jp> Acked-by: Artem B. Bityutskiy <dedekind@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2005-11-28[PATCH] fuse: check for invalid node ID in fuse_create_open()Miklos Szeredi1-3/+8
Check for invalid node ID values in the new atomic create+open method. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28[PATCH] fuse: check directory aliasing in mkdirMiklos Szeredi1-9/+17
Check the created directory inode for aliases in the mkdir() method. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28[PATCH] Fix oops in vfs_quotaon_mount()Jan Kara1-0/+6
When quota file specified in mount options did not exist, we tried to dereference NULL pointer later. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28[PATCH] v9fs: fix memory leak in v9fs dentry codeLatchesar Ionkov1-0/+2
Assign the appropriate dentry operations to the dentry. Fixes memory leak. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28[PATCH] ext3: Wrong return value for EXT3_IOC_GROUP_ADDGlauber de Oliveira Costa1-0/+1
This patch corrects the return value for the EXT3_IOC_GROUP_ADD in case it fails due to the presence of multiple resizers at the filesystem. The problem is a little bit more serious than a wrong return value in this case, since the clause err=0 in the exit_journal path will lead to a call to update_backups which in turns causes a NULL pointer dereference. Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28[PATCH] reiserfs: fix 32-bit overflow in map_block_for_writepage()Oleg Drokin1-1/+1
I now see another overflow in reiserfs that should lead to data corruptions with files that are bigger than 4G under certain circumstances when using mmap. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28mm: re-architect the VM_UNPAGED logicLinus Torvalds1-4/+3
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very explicit support for a "remapped page range" aka VM_PFNMAP. It allows a VM area to contain an arbitrary range of page table entries that the VM never touches, and never considers to be normal pages. Any user of "remap_pfn_range()" automatically gets this new functionality, and doesn't even have to mark the pages reserved or indeed mark them any other way. It just works. As a side effect, doing mmap() on /dev/mem works for arbitrary ranges. Sparc update from David in the next commit. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28[CIFS] When file is deleted locally but later recreated on the serverSteve French3-20/+35
fix cifs negative dentries so they are freed faster (not requiring umount or readdir e.g.) so the client recognizes the new file on the server more quickly. Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-25NFS: Fix a spinlock recursion inside nfs_update_inode()Trond Myklebust1-14/+12
In cases where the server has gone insane, nfs_update_inode() may end up calling nfs_invalidate_inode(), which again calls stuff that takes the inode->i_lock that we're already holding. In addition, given the sort of things we have in NFS these days that need to be cleaned up on inode release, I'm not sure we should ever be calling make_bad_inode(). Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing the caches, and marking the inode as being stale. Thanks to Steve Dickson <SteveD@redhat.com> for spotting this. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-25NFSv4: Fix typo in lock cachingTrond Myklebust1-3/+3
When caching locks due to holding a file delegation, we must always check against local locks before sending anything to the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-25NFSv4: Fix buggy nfs_wait_on_sequence()Trond Myklebust1-10/+10
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-25[XFS] Resolve the xlog_grant_log_space hang, revert inline to macro.Nathan Scott1-24/+12
SGI-PV: 946205 SGI-Modid: xfs-linux-melb:xfs-kern:24567a Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-25[XFS] Fix a case where attr2 format was being used unconditionally.Nathan Scott1-3/+8
SGI-PV: 941645 SGI-Modid: xfs-linux-melb:xfs-kern:24566a Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-25[XFS] Tight loop in xfs_finish_reclaim_all prevented the xfslogd to runFelix Blyakher1-2/+3
its queue of IO completion callbacks, thus creating the deadlock between umount and xfslogd. Breaking the loop solves the problem. SGI-PV: 943821 SGI-Modid: xfs-linux-melb:xfs-kern:202363a Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-25[XFS] Fix a 32 bit value wraparound when providing a mapping for a largeNathan Scott1-7/+6
direct write. SGI-PV: 944820 SGI-Modid: xfs-linux-melb:xfs-kern:24351a Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-25[XFS] handle error returns from freeze_bdevChristoph Hellwig1-1/+1
SGI-PV: 945483 SGI-Modid: xfs-linux-melb:xfs-kern:201884a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-25[XFS] Fix potential overflow in xfs_iomap_t delta for very large extentsEric Sandeen1-1/+1
SGI-PV: 945311 SGI-Modid: xfs-linux-melb:xfs-kern:201708a Signed-off-by: Eric Sandeen <sandeen@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-23[PATCH] jffs2 debug gcc-2.9x fixAndrew Morton1-4/+4
Work around gcc-2.95.x macro expansion bug. Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] fix do_wait() vs exec() raceOleg Nesterov1-4/+4
When non-leader thread does exec, de_thread adds old leader to the init's ->children list in EXIT_ZOMBIE state and drops tasklist_lock. This means that release_task(leader) in de_thread() is racy vs do_wait() from init task. I think de_thread() should set old leader's state to EXIT_DEAD instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: george anzinger <george@mvista.com> Cc: Roland Dreier <rolandd@cisco.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] Fix hugetlbfs_statfs() reporting of block limitsDavid Gibson1-4/+8
Currently, if a hugetlbfs is mounted without limits (the default), statfs() will return -1 for max/free/used blocks. This does not appear to be in line with normal convention: simple_statfs() and shmem_statfs() both return 0 in similar cases. Worse, it confuses the translation logic in put_compat_statfs(), causing it to return -EOVERFLOW on such a mount. This patch alters hugetlbfs_statfs() to return 0 for max/free/used blocks on a mount without limits. Note that we need the test in the patch below, rather than just using 0 in the sbinfo structure, because the -1 marked in the free blocks field is used internally to tell the Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22[PATCH] Fix error handling with put_compat_statfs()David Gibson1-8/+8
In fs/compat.c, whenever put_compat_statfs() returns an error, the containing syscall returns -EFAULT. This is presumably by analogy with the non-compat case, where any non-zero code from copy_to_user() should be translated into an EFAULT. However, put_compat_statfs() is also return -EOVERFLOW. The same applies for put_compat_statfs64(). This bug can be observed with a statfs() on a hugetlbfs directory. hugetlbfs, when mounted without limits reports available, free and total blocks as -1 (itself a bug, another patch coming). statfs() will mysteriously return EFAULT although it's parameters are perfectly valid addresses. This patch causes the compat versions of statfs() and statfs64() to correctly propogate the return values from put_compat_statfs() and put_compat_statfs64(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-21Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-5/+3
2005-11-20[COMPAT] net: SIOCGIFCONF data corruptionAlexandra Kossovsky1-5/+3
From: Alexandra Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> From http://bugzilla.kernel.org/show_bug.cgi?id=4746 There is user data corruption when using ioctl(SIOCGIFCONF) in 32-bit application running amd64 kernel. I do not think that this problem is exploitable, but any data corruption may lead to security problems. Following code demonstrates the problem #include <stdint.h> #include <stdio.h> #include <sys/time.h> #include <sys/socket.h> #include <net/if.h> #include <sys/ioctl.h> char buf[256]; main() { int s = socket(AF_INET, SOCK_DGRAM, 0); struct ifconf req; int i; req.ifc_buf = buf; req.ifc_len = 41; printf("Result %d\n", ioctl(s, SIOCGIFCONF, &req)); printf("Len %d\n", req.ifc_len); for (i = 41; i < 256; i++) if (buf[i] != 0) printf("Byte %d is corrupted\n", i); } Steps to reproduce: Compile the code above into 32-bit elf and run it. You'll get Result 0 Len 32 Byte 48 is corrupted Byte 52 is corrupted Byte 53 is corrupted Byte 54 is corrupted Byte 55 is corrupted Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20[PATCH] Remove compat ioctl semaphoreAndi Kleen1-7/+0
Originally for 2.6.16, but the semaphore causes problems for some people so get rid of it now. It's not needed anymore because the ioctl hash table is never changed at run time now. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-19Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.gitSteve French1-0/+169
2005-11-19[CIFS] Fix setattr of mode only (e.g. in some chmod cases) to WindowsSteve French1-0/+1
so it does not return EACCESS (unless server really returns that). Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-18[CIFS] Fix mknod of block and chardev over SFU mountsSteve French3-13/+62
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-18[COMPAT]: EXT3_IOC_SETVERSION is _IOW() not _IOR().David S. Miller1-1/+1
Noticed by Helge Deller. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-18[CIFS] Missing part of previous patchSteve French2-0/+3
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-18[CIFS] Fix scheduling while atomic when pending writes at file close timeSteve French2-22/+56
Fix the case in which readdir reset file type when SFU mount option specified. Also fix sfu related functions to not request EAs (xattrs) when not configured in Kconfig Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-17[CIFS] Vectored and async i/o turned on and correct theSteve French1-72/+45
writev and aio_write to flush properly. This is Christoph's patch merged with the new nobrl file operations Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> From: Christoph Hellwig <hch@lst.de> - support vectored and async aio ops unconditionally - this is above the pagecache and transparent to the fs - remove cifs_read_wrapper. it was only doing silly checks and calling generic_file_write in all cases. - use do_sync_read/do_sync_write as read/write operations. They call ->readv/->writev which we now always implemente. - add the filemap_fdatawrite calls to writev/aio_write which were missing previously compared to plain write. no idea what the point behind them is, but let's be consistent at least.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-11-17[CIFS] Recognize properly symlinks and char/blk devices (not justSteve French1-4/+44
FIFOs) created by SFU (part 2 of 2). Thanks to Martin Koeppe for useful analysis. Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-17[COMPAT]: Add ext3 ioctl translations.David S. Miller1-0/+36
So things like on-line resizing et al. work. Based almost entirely upon a patch by Guido Günther <agx@sigxcpu.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-16[CIFS] Fix sparse warnings on smb bcc (byte count)Steve French2-3/+3
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-16[DVB]: Add compat ioctl handling.David S. Miller1-0/+133
Based upon a patch by Guido Guenther <agx@sigxcpu.org>. Some of these ioctls had embedded time_t objects or pointers, so needed translation. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-15[CIFS] Fix endian errors (setfacl/getfacl failures) in handling ACLsSteve French1-14/+17
(and a ppc64 compiler warning) Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-15[CIFS] Recognize properly symlinks and char/blk devices (not just FIFOs)Steve French2-10/+66
created by SFU (part 1 of 2). Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-11-13Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.gitSteve French6-31/+27
2005-11-13[PATCH] ext2: remove duplicate newlines in ext2_fill_superJohann Lombardi1-1/+1
ext2_warning() already adds a newline. Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] aio: replace locking comments with assert_spin_locked()Zach Brown1-5/+12
aio: replace locking comments with assert_spin_locked() Signed-off-by: Zach Brown <zach.brown@oracle.com> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] aio: remove kioctx from mm_structZach Brown1-18/+9
Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if any, is done in the context of their owner who has allocated them on the stack. The sole user of a sync iocb's ctx reference was aio_complete() checking for an elevated iocb ref count that could never happen. No path which grabs an iocb ref has access to sync iocbs. If we were to implement sync iocb cancelation it would be done by the owner of the iocb using its on-stack reference. Removing this chunk from aio_complete allows us to remove the entire kioctx instance from mm_struct, reducing its size by a third. On a i386 testing box the slab size went from 768 to 504 bytes and from 5 to 8 per page. Signed-off-by: Zach Brown <zach.brown@oracle.com> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] Fix sparse warning in proc/task_mmu.cLuiz Fernando Capitulino1-1/+1
fs/proc/task_mmu.c:198:33: warning: Using plain integer as NULL pointer Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] ext3: journal handling on error path in ext3_journalled_writepage()Denis Lunev1-1/+3
This patch fixes lost referrence on ext3 current handle in ext3_journalled_writepage(). Signed-Off-By: Denis Lunev <den@sw.ru> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13Merge master.kernel.org:/pub/scm/linux/kernel/git/tglx/mtd-2.6Linus Torvalds1-3/+0
2005-11-13[JFFS2] Remove broken and useless debug codeThomas Gleixner1-3/+0
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>