Linux-Privs <author>Andrew G. Morgan, <tt>morgan@parc.power.net</tt> <date>DRAFT v0.10 1997/4/21 <abstract> This is the specification document for Linux-Privs. It is a "standard" describing the kernel component to a secure system based around a Linux Kernel and is designed to be as compatible with POSIX.1e as is possible without open access to the actual specifications. </abstract> <toc> <chapt>Introduction <p> The Linux-Privs project is an effort to implement a POSIX.1e (formerly POSIX 6) security model under Linux. <p> At the heart of the changes to the Linux kernel from the historical model is the separation of identity and privilege. Historically, root (UID=0) was all powerful and other users (UID!=0) had power that was limited to that associated with their identity and their group memberships. <p> The Linux-Privs scheme is to implement a set of independent capabilities that can be given to any user. In point of fact, the capabilities are associated with applications and can only be used within the confines of the functionality of such applications. In this way individual capabilities can be restricted to a trusted set of applications. Typically, the user will have to authenticate himself to such an applciation before it will perform its privileged task. <p> This new scheme for system privilege lends itself well to restricting privileged access to the system and reduces the risk of intruders or poorly written applications running amok on the system. <chapt>Capabilities <p> In this chapter we list the capabilities known to the Linux Kernel. Firstly, we list the POSIX defined capabilities, and then those specific to Linux. <sect>POSIX capabilities <p> Here we list the POSIX capabilities honored by Linux. <sect1>CAP_CHOWN <p> The <tt/#define/ for the symbol <tt/_POSIX_CHOWN_RESTRICTED/, indicates that this capability (<tt/CAP_CHOWN/) is known. <p> This capability enables the current process to change the owner of a file. Generally, file ownership is not changeable by a user: it is implied by the user that creates it. <sect1>CAP_DAC_OVERRIDE <p> The <tt/#define/ for the symbol <bf/_POSIX_ACL/, indicates that Access Control Lists (an implementation of Discretionary Access Control) are supported by the kernel and that the following capabilities is known: <tt/CAP_DAC_OVERRIDE/. <p> This capability overrides all DAC restrictions regarding read and search on files and directories, including ACL's. <sect1>CAP_DAC_READ_SEARCH <p> Overrides all restrictions about allowed operations on files, where file owner ID must be equal to the user ID, except where CAP_FSETID is applicable. It doesn't override MAC and DAC restrictions. <sect1>CAP_FOWNER <p> Overrides the following restrictions that the effective user ID shall match the file owner ID when setting the S_ISUID and S_ISGID bits on that file; that the effective group ID (or one of the supplementary group IDs) shall match the file's group owner ID when setting the S_ISGID bit on that file; that the S_ISUID and S_ISGID bits are cleared on successful return from chown(2). <sect1>CAP_FSETID <p> Overrides the restriction that the real or effective user ID of a process sending a signal must match the real or effective user ID of the process receiving the signal. <sect1>CAP_KILL <p> Overrides the restriction that the real or effective user ID of a process sending a signal must match the real or effective user ID of the process receiving the signal. <sect1>CAP_LINK_DIR <p> Overrides the restriction that a process cannot create or delete a hard link to a directory. This shall not override MAC and DAC policies. <sect1>CAP_SETFCAP <p> Allows the (re)setting of a files capabilities. <sect1>CAP_SETGID <p> Allows setgid(2) manipulation. <sect1>CAP_SETUID <p> Allows setuid(2) manipulation <sect1>CAP_SIGMASK <p> Overrides the restriction that no process may block <tt/SIGKILL/ and <tt/SIGSTOP/. <sect1>CAP_MAC_DOWNGRADE <p> This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. This capability allows a process to downgrade an object's information label. <sect1>CAP_MAC_READ <p> This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. Allows a process to override MAC read restrictions. <sect1>CAP_MAC_RELABEL_SUBJ <p> This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. Allows a process to modify its own label. <sect1>CAP_MAC_UPGRADE <p> This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. This capability allows a process to upgrade an object's information label. <sect1>CAP_MAC_WRITE <p> This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. This capability overrides the MAC restrictions on writes. <sect1>CAP_INF_NOFLOAT_OBJ <p> This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability prevents a process' information label from floating during writes. <sect1>CAP_INF_NOFLOAT_SUBJ <p> This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability prevents the process' information label from floating during reads or executes. <sect1>CAP_INF_RELABEL_OBJ <p> This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability allows a process to change an object's information label. <sect1>CAP_INF_RELABEL_SUBJ <p> This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability allows a process to modify its own information label in violation of the overriding policy. <sect1>CAP_AUDIT_CONTROL <p> This capability is available if <tt/_POSIX_AUD/ is <tt/#define/'d. This capability allows a process to modify the audit control parameters. <sect1>CAP_AUDIT_WRITE <p> This capability is available if <tt/_POSIX_AUD/ is <tt/#define/'d. This capability allows a process to write data to the audit trail. <sect>Linux specific capabilities <p> This section lists additional capabilities that are specific to Linux or not covered by the POSIX capability definitions. <sect1>CAP_LINUX_IMMUTABLE <p> Allow modification of <tt/S_IMMUTABLE/ and <tt/S_APPEND/ file attributes. <sect1>CAP_LINUX_KERNELD <p> Permission to act as kerneld. <sect1>CAP_LINUX_INSMOD <p> Allow installation of kernel modules. <sect1>CAP_LINUX_RMMOD <p> Allow removal of kernel modules. <sect1>CAP_LINUX_RAWIO <p> Allow ioperm/iopl access. <sect1>CAP_LINUX_ATTENTION <p> Allow configuration of the secure attention key. <sect1>CAP_LINUX_RANDOM <p> Allow administration of the random device. <sect>Other capabilities <p> This sction lists those capabilities commonly found on other systems besides Linux, but which are not specified by POSIX. <sect1>CAP_NET_BIND_SERVICE <p> Allows binding to TCP/UDP sockets below 1024. <sect1>CAP_NET_BROADCAST <p> Allow broadcasting. <sect1>CAP_NET_DEBUG <p> Allow setting debug option on sockets. <sect1>CAP_NET_FIREWALL <p> Allow configuring of firewall stuff. <sect1>CAP_NET_IFCONFIG <p> Allow interface configuration. <sect1>CAP_NET_PACKET <p> Allow use of PACKET sockets. <sect1>CAP_NET_RAW <p> Allow use of RAW sockets. <sect1>CAP_NET_ROUTE <p> Allow modification of routing tables. <sect1>CAP_NET_SETID <p> CAP.FIXME: what is this about?. <sect1>CAP_IPC_LOCK <p> Allow locking of segments in memory. <sect1>CAP_IPC_OWNER <p> Override IPC ownership checks. <sect1>CAP_SYS_CHROOT <p> Allow use of chroot(). <sect1>CAP_SYS_PTRACE <p> Allow ptrace() of any process. <sect1>CAP_SYS_ACCOUNT <p> Allow configuration of process accounting. <sect1>CAP_SYS_ADMIN <p> System Admin functions: mount et al. <sect1>CAP_SYS_BOOT <p> Allow use of reboot(). <sect1>CAP_SYS_DEVICES <p> Allow device administration. <sect1>CAP_SYS_NICE <p> Allow use of renice() on others, and raising of priority. <sect1>CAP_SYS_RESOURCE <p> Override resource limits. <sect1>CAP_SYS_TIME <p> Allow manipulation of system clock. <sect1>CAP_SYS_TTY_CONFIG <p> Allow configuration of tty devices. <sect1>CAP_SYS_QUOTA <p> Allow examination and configuration of disk quotas. <chapt>Capability sets <p> In this chapter we introduce the concept of sets of capabilities. We also discuss the way in which such sets are maintained by the kernel and their association with system objects. <sect>Sets of capabilities <p> Capabilities are independent and an application may have any number of them raised. The total selection of raised capabilities constitutes a capability set. <sect1>The capability set type <p> The kernel represents a capability set with the structure <tt/struct __cap_s/. and may be thought of as a bitmap. The number of capability bits is currently 128. <p> There is a special, empty, capabiltiy set. This set has no capabilities raised. <sect1>Capability set macros <p> To manipulate capabilities, a kernel task uses one of the following three macros: <sect2>_cap_raise(cap_value_t capability) <p> This macro raises a capability in a capability set. It is used (with respect to a capability "struct __cap_s c") in the following manner: <verb> c._cap_raise(CAP_AUDIT_CONTROL); </verb> <p> Note, the macro is used in place of a member of the <tt/__cap_s/. <sect2>_cap_lower(cap_value_t capability); <p> This macro lowers a capability in a capability set. It is used (with respect to a capability "struct __cap_s c") in the following manner: <verb> c._cap_lower(CAP_AUDIT_CONTROL); </verb> <p> Note, the macro is used in place of a member of the <tt/__cap_s/. <sect2>_cap_raised(cap_value_t capability); <p> This macro returns <em/true/(!=0) if the capability is raised, and <em/false/(=0) if the capability was lowered. It is used in the following manner: <p> <verb> if (c._cap_raised(CAP_AUDIT_CONTROL)) { /* .... action in case that capability is raised .... */ } </verb> <p> Note, this macro is used as if it were a member of the capability structure. <sect>Filesystem capabilitiy sets <p> Files on the system have the following capability sets. <sect1>Inheritable (fI) <p> This is the set of capabilities that the program is permitted to inherit. In this way, the program can be configured to be safe for operation with a fixed number of capabilities. (This has the feature of eliminating the total colapse of system security in the face of an insecurely written game.) <sect1>Permitted (fP) <p> This is the set of capabilities that are explicitly raised when the new process is <tt/exec()/'d. <sect1>Effective (fE) <p> This is a single valued capability set (0 or ~0) that is required to be set if the process (after an exec) is to have all of its capabilities automatically raised. It is intended to be used with programs that were written before the concept of capabilities was introduced. Newly written programs should raise their own capabilities manually. <sect>Process/task capabilities <p> Processes (also know as kernel-tasks) have the following capability sets. They are copied unchanged when the process performs a <tt/fork()/ (or more generally, a <tt/clone()/) function call. <p> Capability sets for a process are always affected by a call to <tt/exec()/. Additionally, user level programs can manipulate their own capabilities with system calls to <tt/sys_[gs]etproccap()/. <p> Capabilities are grouped into sets and come in three forms: Effective; Inheritable; and Permitted. <sect1>Effective (pE) <p> This is the set of capabilities that are currently <em/raised/, or in effect. <p> This is set upon <tt/exec()/ to <verb> pE' = pP' & fE. </verb> <p> In other words, the new effective set is the new permitted set masked with the <tt/exec()/uted file's effective set.. <p> Capabilities may be raised in the Effective set only if they are already raised in the Permitted set. Capabilities not in the Effective set are "lowered", or are inactive. <sect1>Inheritable (pI) <p> This is the set of capabilities that can be passed through an <tt/exec()/. These capabilities do not have to be raised in the predecessor to be passed, just present in the old process sets. <p> Following an exec(), the Inheritable set becomes <verb> pI' = pI. </verb> <p> Or in other words, the Inheritable set is passed unchanged thrhough and <tt/exec()/.. <sect1>Permitted (pP) <p> These are the capabilities that are potentially available to the current task. <p> Following an <tt/exec()/, the process' Permitted set is constructed from the following combination of the former, and invoked, file's capability sets: <verb> pP' = fP | ( fI & pI ). </verb> <p> Or in other words, the process' permitted set becomes the combination of the permitted set of the <tt/exec()/'d file and those inheritable capabilities of the <tt/exec()/ing file that are also inheritable by the file. <p> Turning off a capability set in the Permitted set does not affect its status in the Inheritable set. The following is apparently a quote: (draft 13 of POSIX .1e) POSIX B.15.1.9 <p> "The corresponding inheritable flag is not affected, so that privileges can be conditionally transmitted along a process chain whose intermediate processes may themselves have no privileges" (N.B. At this stage in the development of the POSIX standard, capabilities were termed privileges.) So if you want to turn off the capabilities in the Inheritable set, you should explicitly turn it off, not just turn it off in the Permitted set. <chapt>Access Control Lists (ACL) <chapt>Mandatory Access Control (MAC) <chapt>Information Labels (IL) <chapt>Sensitivity Labels (SL) <chapt>Integrity checking; system recovery <chapt>Auditing <p> An auditing facility is provided. Its design is intended to be simple but flexible, in such a way that it can be robustly implemented and will not impinge on the general responsiveness of the Linux kernel. <p> The following is a summary of the recent discussion on the linux-privs list. I am in the process of making a proposal that will be general enough to handle all of the following features: <sect> First some kernel infrastructure: <p> <itemize> <item> basic (atomic) auditing facility <itemize> <item> two modes of access: blocking and non-blocking </itemize> </itemize> <itemize> <item> some sort of circular buffer arrangement with hooks for a module of some sort to store the audit data on some medium. <itemize> <item>  such a module will accept rules that tell it what to audit and what to discard. <item> perhaps we adopt the model of the firewalling rules. </itemize> <item> audit - session id for following login etc. process trees. <item> minimal audit records  include audit-id, uid and time plus raw numerical data (for off-line translation to readable text <item> all audit records have the same prefix-header <itemize> <item> Fixed Header prefix contains a version number (audit readers will know how to deal with historical formats). </itemize> </itemize> <sect>What audit records should contain? <p> <itemize> <item> Events to be audited are selected by event masks: union of system/user/file event masks. <item> Proposed audit prefix header: <itemize> <item> audit header version (audit generation) __u8 <item> length of audit record (bytes) __u16 <item> machine id: __u16 <item> kind of event (e.g. system call code): __u8 <item> reason for logging (bitmask: audited syscall, audited file, audited user, audited group) __u32 <item> timestamp (when audited) __u32 <item> audit id (set by init or login etc., for following process evolution)  __u32 <item> user id, group id, program id, ...  __u32 </itemize> </itemize> <sect>Next some Policy (when/what should we audit): <p> <itemize> <item> Classes of event <itemize> <item> Group individual events into classes: file-access/system tweeking etc.. <item> Audit primarily on class of event. </itemize> <item> Events that will always be audited (fixed events): <itemize> <item> all actions relating to the audit sybsystem itself <item> all attempts to change the system date <item> all actions relating to group and user attributes <item> all definitions and deletions of MAC level names and level identifiers <item> chages of init states <item> all actions relating to lodable modules </itemize> <item> User level event classes: <itemize> <item> some method for defining user-level event groups: administrator can select etc. </itemize> <item>Events that may be audited <itemize> <item> file-access (open, connect, accept, close) <item> changing of file attributes (chmod, chattr, umask, create) <item> program execution (exec, fork, vfork, exit) <item> "exits" - administrator hook to deny requests/grant requests/modify system etc.,  in the light of the event. </itemize> </itemize> <sect>User level auditing (POSIX): <p> <itemize> <item> POSIX only defines a method for writing audit records. <item> Audit records are variable length <item> each record contains header <itemize> <item> event type - indicates minimal information contained in record <item> time <item> result (success/failure) <item> audit - id <item> other stuff...(?) </itemize> <item> subject attributes <itemize> <item> information about the calling process. <itemize> <item> pid </itemize> </itemize> <item> object structures <itemize> <item> contain information on objects associated with the audited event: <itemize> <item> files <item> processes <item> etc. </itemize> </itemize> <item> event specific data: as directed by the event type. <item> Functions: <itemize> <item> int aud_write(int fildes, aur_rec_t ar) <itemize> <item> special fildes = AUD_SYSTEM_LOG, to indicate that data is written to the system audit record. </itemize> </itemize> <item> Policy: here we list audit event types (and the functions that they refer to) <itemize> <item> AUD_AET_AUD_SWITCH <itemize> <item> aud_switch() </itemize> <item> AUD_AET_AUD_WRITE <itemize> <item> aud_write() </itemize> <item> AUD_AET_CHDIR <itemize> <item> chdir() </itemize> <item> AUD_AET_CHMOD <itemize> <item> chmod() </itemize> <item> AUD_AET_CHOWN <itemize> <item> chown() </itemize> <item> AUD_AET_CREAT <itemize> <item> creat() </itemize> <item> AUD_AET_DUP <itemize> <item> dup(), dup2() </itemize> <item> AUD_AET_EXEC <itemize> <item> exec(), execl(), execlp(), execv(), execvp(), execle(), execve() </itemize> <item> AUD_AET_EXIT <itemize> <item> _exit() </itemize> <item> AUD_AET_FORK <itemize> <item> fork() </itemize> <item> AUD_AET_KILL <itemize> <item> kill() </itemize> <item> AUD_AET_LINK <itemize> <item> link() </itemize> <item> AUD_AET_MKDIR <itemize> <item> mkdir() </itemize> <item> AUD_AET_MKFIFO <itemize> <item> mkfifo() </itemize> <item> AUD_AET_OPEN <itemize> <item> open(), opendir() </itemize> <item> AUD_AET_PIPE <itemize> <item> pipe() </itemize> <item> AUD_AET_RENAME <itemize> <item> rename() </itemize> <item> AUD_AET_RMDIR <itemize> <item> rmdir() </itemize> <item> AUD_AET_SETGID <itemize> <item> setgid() </itemize> <item> AUD_AET_SETUID <itemize> <item> setuid() </itemize> <item> AUD_AET_UNLINK <itemize> <item> unlink() </itemize> <item> AUD_AET_UTIME <itemize> <item> utime() </itemize> </itemize> <item> Define a minimus set of events that cannot be turned off: these preserve the integrety of the audit lof <item> user and file bitmasks are provided to trace the behavior of the system with respect to specific files and users.  These are OR'd together to determine if an event is to be logged. </itemize> <sect>Internal kernel auditing facility <p> Here we discuss the manner in which the kernel records audited events. <p> Any part of the kernel may audit an event. <p> Only those user-level processes with their auditing capability set may audit an event with the kernel. <p> Contingency is made for the possibility of the system log filling up. <sect>Preserving the audit log <p> This section discusses the kernel hooks provided for making a hard-copy of the system log. <sect>Auditing policy <p> This section describes the policies used to define what sorts of events are logged. It also introduces the preferred format for recording these audited events. <chapt>Process/task credentials <p> In this section we cover the freedoms of a given process. These include items like resource limits. <chapt>Acknowledgements <p> This document is edited by Andrew G. Morgan. Amendments and corrections should be emailed to <tt><linux-privs@mit.edu></tt>. <p> Contributors to the text of this document include: Hildo Biersma, Roland Buresund, Dave Dillow, Julie Haugh, David Holland, Alexander Kjeldaas, Andi Kleen, Darren Moffat, Ingo Molnar, Aleph One, Christos Ricudis, Ken Seefried, Theodore Ts'o, and Zefram (Andrew Main). <chapt>Copyright/license for this document <p> Copyright (c) 1997, Andrew G. Morgan <tt><morgan@parc.power.net></tt>. All rights reserved. <p> Redistribution and use in source (sgml) and binary (derived) forms, with or without modification, are permitted provided that the following conditions are met: <p> 1. Redistributions of source code must retain the above copyright notice, and the entire permission notice in its entirety, including the disclaimer of warranties. <p> 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. <p> 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. <p> Alternatively, this product may be distributed under the terms of the GNU General Public License (GPL), in which case the provisions of the GNU GPL are required instead of the above restrictions. (This clause is necessary due to a potential bad interaction between the GNU GPL and the restrictions contained in a BSD-style copyright.) <p> THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. </report>