From: David Mosberger I believe this is the last outstanding piece that prevents ia64 from being fully in sync with Linus' tree (yes, there are some minor ACPI changes outstanding and a toolchain bug that's left to fix, but other than that, I think we're clean). Many architectures (alpha, ia64, ppc, ppc64, sparc, and sparc64 at least) use a syscall convention which provides for a return value and a separate error flag. On those architectures, it can be beneficial if the kernel provides a mechanism to signal that a syscall call has completed successfully, even when the returned value is potentially a (small) negative number. The patch below provides a hook for such a mechanism via a macro called force_successful_syscall_return(). On x86, this would be simply a no-op (because on x86, user-level has to be hacked to handle such cases). On Alpha, it would be something along the lines of: #define force_successful_syscall_return() ptregs->r0 = 0 where "ptregs" is a pointer to the user's ptregs structure of the current task. On ia64, we have been using this for a long time: static inline void force_successful_syscall_return (void) { ia64_task_regs(current)->r8 = 0; } The other architectures (ppc, ppc64, sparc, and sparc64) currently have no mechanism to force a syscall return to be successful. But since the syscall convention already provide for a separate error flag, the arch maintainers could change this if they wanted to. There are only 3 places in the platform-independent portion of the kernel that need this macro: - memory_lseek() in drivers/char/mem.c - fs/fcntl.c for F_GETOWN - lseek for /proc/mem in fs/proc/array.c Ideally, there are a couple of other places that could benefit from this macro: - sys_getpriority() - sys_shmat() - sys_brk() - do_mmap2() - do_mremap() but these are not so critical, because the can be worked around in platform-specific code (e.g., see arch/ia64/kernel/sys_ia64.c). Note that for the above 3 cases, handling them in user level is rather suboptimal: - it would affect all lseek() syscalls, even though only /proc/mem and /dev/mem need the special treatment (at least until there are filesystems that can handle files >= 2^63 bytes) - all fcntl() calls would be affected, even though only F_GETOWN needs the special treatment so I think handling these in the kernel for the platforms that can makes tons of sense. The only limitation of force_successful_syscall_return() is that it doesn't help with system calls performed by the kernel. But the kernel does that so rarely and for such a limited set of syscalls that this is not a real problem. 25-akpm/drivers/char/mem.c | 6 ++++-- 25-akpm/fs/fcntl.c | 1 + 25-akpm/fs/proc/base.c | 17 +++++++++++++++++ 25-akpm/include/asm-i386/ptrace.h | 1 + 4 files changed, 23 insertions(+), 2 deletions(-) diff -puN drivers/char/mem.c~force_successful_syscall_return drivers/char/mem.c --- 25/drivers/char/mem.c~force_successful_syscall_return Tue Jun 3 16:04:00 2003 +++ 25-akpm/drivers/char/mem.c Tue Jun 3 16:04:48 2003 @@ -524,20 +524,22 @@ static loff_t memory_lseek(struct file * { loff_t ret; - lock_kernel(); + down(&file->f_dentry->d_inode->i_sem); switch (orig) { case 0: file->f_pos = offset; ret = file->f_pos; + force_successful_syscall_return(); break; case 1: file->f_pos += offset; ret = file->f_pos; + force_successful_syscall_return(); break; default: ret = -EINVAL; } - unlock_kernel(); + up(&file->f_dentry->d_inode->i_sem); return ret; } diff -puN fs/fcntl.c~force_successful_syscall_return fs/fcntl.c --- 25/fs/fcntl.c~force_successful_syscall_return Tue Jun 3 16:04:00 2003 +++ 25-akpm/fs/fcntl.c Tue Jun 3 16:04:00 2003 @@ -318,6 +318,7 @@ static long do_fcntl(unsigned int fd, un * to fix this will be in libc. */ err = filp->f_owner.pid; + force_successful_syscall_return(); break; case F_SETOWN: err = f_setown(filp, arg, 1); diff -puN fs/proc/base.c~force_successful_syscall_return fs/proc/base.c --- 25/fs/proc/base.c~force_successful_syscall_return Tue Jun 3 16:04:00 2003 +++ 25-akpm/fs/proc/base.c Tue Jun 3 16:04:00 2003 @@ -557,7 +557,24 @@ static ssize_t mem_write(struct file * f } #endif +static loff_t mem_lseek(struct file * file, loff_t offset, int orig) +{ + switch (orig) { + case 0: + file->f_pos = offset; + break; + case 1: + file->f_pos += offset; + break; + default: + return -EINVAL; + } + force_successful_syscall_return(); + return file->f_pos; +} + static struct file_operations proc_mem_operations = { + .llseek = mem_lseek, .read = mem_read, .write = mem_write, .open = mem_open, diff -puN include/asm-i386/ptrace.h~force_successful_syscall_return include/asm-i386/ptrace.h --- 25/include/asm-i386/ptrace.h~force_successful_syscall_return Tue Jun 3 16:04:00 2003 +++ 25-akpm/include/asm-i386/ptrace.h Tue Jun 3 16:04:00 2003 @@ -57,6 +57,7 @@ struct pt_regs { #ifdef __KERNEL__ #define user_mode(regs) ((VM_MASK & (regs)->eflags) || (3 & (regs)->xcs)) #define instruction_pointer(regs) ((regs)->eip) +#define force_successful_syscall_return() do { } while (0) #endif #endif _