From: Andreas Gruenbacher , "Michael Kerrisk" After Andreas Gruenbacher suggested a change to vfs_permission() ( http://marc.theaimsgroup.com/?l=linux-kernel&m=106631710402696&w=2 ) and thus also the access() system call, we've done some further research on what happens with access() on other Unix implementations. The conclusions we've reached are that on Linux the operation of access(): a) doesn't conform to SUSv3, and b) isn't consistent with other Unix implementations (results from various implementations are shown below). First a little delving into SUSv3: [[ >From the specification for : The constants F_OK, R_OK, W_OK, and X_OK and the expressions R_OK|W_OK, R_OK|X_OK, and R_OK|W_OK|X_OK shall all have distinct values. >From the specification for access(): If any access permissions are checked, each shall be checked individually, as described in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 3, Definitions. If the process has appropriate privileges, an implementation may indicate success for X_OK even if none of the execute file permission bits are set. ERRORS [EACCES] Permission bits of the file mode do not permit the requested access, or search permission is denied on a component of the path prefix. >From the rationale for access(): In early proposals, some inadequacies in the access() function led to the creation of an eaccess() function because: 1. Historical implementations of access() do not test file access correctly when the process' real user ID is superuser. In particular, they always return zero when testing execute permissions without regard to whether the file is executable. [...] However, the historical model of eaccess() does not resolve problem (1), so this volume of IEEE Std 1003.1-2001 now allows access() to behave in the desired way because several implementations have corrected the problem. [...] The sentence concerning appropriate privileges and execute permission bits reflects the two possibilities implemented by historical implementations when checking superuser access for X_OK. New implementations are discouraged from returning X_OK unless at least one execution permission bit is set. ]] The implication of all this is that for the call: access(files, mode) it is bits/masks that are relevant for the check, rather than a direct test for equality of the form `mode == X_OK', which is currently what vfs_permission() in effect implements. Furthermore, when testing for X_OK **for a privileged process**, an implementation of access() is permitted two possibilities: [a] always say that X_OK is allowed [b] allow X_OK only if execute permission is granted to at least one permission class (user/group/other) for the file. SUSv3 encourages this latter implementation. Using a test program (see below) run under the superuser account, we tested what access() returns on various implementations when applied to a file on which all permissions are disabled: ---------- 1 500 100 0 Oct 26 10:39 /tmp/t_access On FreeBSD 4.8, AIX 4.3 and 5.1, and Solaris 9, we see the following: access(/tmp/t_access, 0) returns 0 access(/tmp/t_access, R_OK) returns 0 access(/tmp/t_access, W_OK) returns 0 access(/tmp/t_access, X_OK) returns 0 access(/tmp/t_access, R_OK | W_OK) returns 0 access(/tmp/t_access, R_OK | X_OK) returns 0 access(/tmp/t_access, W_OK | X_OK) returns 0 access(/tmp/t_access, R_OK | W_OK | X_OK) returns 0 These results are consistent with interpretation [a] above. On HP-UX 11 and Tru64 4.0f and 5.1B, we see: access(/tmp/t_access, 0) returns 0 access(/tmp/t_access, R_OK) returns 0 access(/tmp/t_access, W_OK) returns 0 access(/tmp/t_access, X_OK) returns -1 access(/tmp/t_access, R_OK | W_OK) returns 0 access(/tmp/t_access, R_OK | X_OK) returns -1 access(/tmp/t_access, W_OK | X_OK) returns -1 access(/tmp/t_access, R_OK | W_OK | X_OK) returns -1 This is consistent with the (SUSv3 preferred) interpretation [b] above. On Linux 2.4.21 and 2.6.0-test9, we see the following: access(/tmp/t_access, 0) returns 0 access(/tmp/t_access, R_OK) returns 0 access(/tmp/t_access, W_OK) returns 0 access(/tmp/t_access, X_OK) returns -1 access(/tmp/t_access, R_OK | W_OK) returns 0 access(/tmp/t_access, R_OK | X_OK) returns 0 access(/tmp/t_access, W_OK | X_OK) returns 0 access(/tmp/t_access, R_OK | W_OK | X_OK) returns 0 In other words, LinuxÂs access() / vfs_permission() is treating the case Âmode == X_OK as a special case, distinct from all of the other bit-mask combinations that include X_OK. Really it should operate using either interpretation [a] or (preferably) [b] above. I have appended Andreas patch against 2.6.0-test9, which implements interpretation [b]. Please apply. Cheers, Michael Kerrisk ##### Test program ##### /* t_access_root.c */ #include #include #include #include #include #include #define UID 500 #define GID 100 #define PERM 0 #define TESTPATH "/tmp/t_access" static void errExit(char *msg) { perror(msg); exit(EXIT_FAILURE); } /* errExit */ static void accessTest(char *file, int mask, char *mstr) { printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask)); } /* accessTest */ int main(int argc, char *argv[]) { int fd, perm, uid, gid; char *testpath; char cmd[PATH_MAX + 20]; testpath = (argc > 1) ? argv[1] : TESTPATH; perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM; uid = (argc > 3) ? atoi(argv[3]) : UID; gid = (argc > 4) ? atoi(argv[4]) : GID; unlink(testpath); fd = open(testpath, O_RDWR | O_CREAT, 0); if (fd == -1) errExit("open"); if (fchown(fd, uid, gid) == -1) errExit("fchown"); if (fchmod(fd, perm) == -1) errExit("fchown"); close(fd); snprintf(cmd, sizeof(cmd), "ls -l %s", testpath); system(cmd); if (seteuid(uid) == -1) errExit("seteuid"); accessTest(testpath, 0, "0"); accessTest(testpath, R_OK, "R_OK"); accessTest(testpath, W_OK, "W_OK"); accessTest(testpath, X_OK, "X_OK"); accessTest(testpath, R_OK | W_OK, "R_OK | W_OK"); accessTest(testpath, R_OK | X_OK, "R_OK | X_OK"); accessTest(testpath, W_OK | X_OK, "W_OK | X_OK"); accessTest(testpath, R_OK | W_OK | X_OK, "R_OK | W_OK | X_OK"); exit(EXIT_SUCCESS); } /* main */ 25-akpm/fs/ext2/acl.c | 2 +- 25-akpm/fs/ext3/acl.c | 2 +- 25-akpm/fs/jfs/acl.c | 2 +- 25-akpm/fs/namei.c | 2 +- 25-akpm/fs/xfs/xfs_inode.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff -puN fs/ext2/acl.c~access-vfs_permission-fix fs/ext2/acl.c --- 25/fs/ext2/acl.c~access-vfs_permission-fix Wed Nov 12 11:46:11 2003 +++ 25-akpm/fs/ext2/acl.c Wed Nov 12 11:46:11 2003 @@ -322,7 +322,7 @@ check_groups: check_capabilities: /* Allowed to override Discretionary Access Control? */ - if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO)) + if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO)) if (capable(CAP_DAC_OVERRIDE)) return 0; /* Read and search granted if capable(CAP_DAC_READ_SEARCH) */ diff -puN fs/ext3/acl.c~access-vfs_permission-fix fs/ext3/acl.c --- 25/fs/ext3/acl.c~access-vfs_permission-fix Wed Nov 12 11:46:11 2003 +++ 25-akpm/fs/ext3/acl.c Wed Nov 12 11:46:11 2003 @@ -327,7 +327,7 @@ check_groups: check_capabilities: /* Allowed to override Discretionary Access Control? */ - if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO)) + if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO)) if (capable(CAP_DAC_OVERRIDE)) return 0; /* Read and search granted if capable(CAP_DAC_READ_SEARCH) */ diff -puN fs/jfs/acl.c~access-vfs_permission-fix fs/jfs/acl.c --- 25/fs/jfs/acl.c~access-vfs_permission-fix Wed Nov 12 11:46:11 2003 +++ 25-akpm/fs/jfs/acl.c Wed Nov 12 11:46:11 2003 @@ -191,7 +191,7 @@ check_capabilities: * Read/write DACs are always overridable. * Executable DACs are overridable if at least one exec bit is set. */ - if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO)) + if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO)) if (capable(CAP_DAC_OVERRIDE)) return 0; diff -puN fs/namei.c~access-vfs_permission-fix fs/namei.c --- 25/fs/namei.c~access-vfs_permission-fix Wed Nov 12 11:46:11 2003 +++ 25-akpm/fs/namei.c Wed Nov 12 11:46:11 2003 @@ -190,7 +190,7 @@ int vfs_permission(struct inode * inode, * Read/write DACs are always overridable. * Executable DACs are overridable if at least one exec bit is set. */ - if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO)) + if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO)) if (capable(CAP_DAC_OVERRIDE)) return 0; diff -puN fs/xfs/xfs_inode.c~access-vfs_permission-fix fs/xfs/xfs_inode.c --- 25/fs/xfs/xfs_inode.c~access-vfs_permission-fix Wed Nov 12 11:46:11 2003 +++ 25-akpm/fs/xfs/xfs_inode.c Wed Nov 12 11:46:11 2003 @@ -3722,7 +3722,7 @@ xfs_iaccess( * Read/write DACs are always overridable. * Executable DACs are overridable if at least one exec bit is set. */ - if ((orgmode & (S_IRUSR|S_IWUSR)) || (inode->i_mode & S_IXUGO)) + if (!(orgmode & S_IXUSR) || (inode->i_mode & S_IXUGO)) if (capable_cred(cr, CAP_DAC_OVERRIDE)) return 0; _