Linux-Privs <author>Andrew G. Morgan, <tt>morgan@parc.power.net</tt> <date>DRAFT v0.10 1997/4/21 <abstract> This is the specification document for Linux-Privs. It is a "standard" describing the kernel component to a secure system based around a Linux Kernel and is designed to be as compatible with POSIX.1e as is possible without open access to the actual specifications. </abstract> <toc> <chapt>Introduction The Linux-Privs project is an effort to implement a POSIX.1e (formerly POSIX 6) security model under Linux. At the heart of the changes to the Linux kernel from the historical model is the separation of identity and privilege. Historically, root (UID=0) was all powerful and other users (UID!=0) had power that was limited to that associated with their identity and their group memberships. The Linux-Privs scheme is to implement a set of independent capabilities that can be given to any user. In point of fact, the capabilities are associated with applications and can only be used within the confines of the functionality of such applications. In this way individual capabilities can be restricted to a trusted set of applications. Typically, the user will have to authenticate himself to such an applciation before it will perform its privileged task. This new scheme for system privilege lends itself well to restricting privileged access to the system and reduces the risk of intruders or poorly written applications running amok on the system. <chapt>Capabilities In this chapter we list the capabilities known to the Linux Kernel. Firstly, we list the POSIX defined capabilities, and then those specific to Linux. <sect>POSIX capabilities Here we list the POSIX capabilities honored by Linux. <sect1>CAP_CHOWN The <tt/#define/ for the symbol <tt/_POSIX_CHOWN_RESTRICTED/, indicates that this capability (<tt/CAP_CHOWN/) is known. This capability enables the current process to change the owner of a file. Generally, file ownership is not changeable by a user: it is implied by the user that creates it. <sect1>CAP_DAC_OVERRIDE The <tt/#define/ for the symbol <bf/_POSIX_ACL/, indicates that Access Control Lists (an implementation of Discretionary Access Control) are supported by the kernel and that the following capabilities is known: <tt/CAP_DAC_OVERRIDE/. This capability overrides all DAC restrictions regarding read and search on files and directories, including ACL's. <sect1>CAP_DAC_READ_SEARCH Overrides all restrictions about allowed operations on files, where file owner ID must be equal to the user ID, except where CAP_FSETID is applicable. It doesn't override MAC and DAC restrictions. <sect1>CAP_FOWNER Overrides the following restrictions that the effective user ID shall match the file owner ID when setting the S_ISUID and S_ISGID bits on that file; that the effective group ID (or one of the supplementary group IDs) shall match the file's group owner ID when setting the S_ISGID bit on that file; that the S_ISUID and S_ISGID bits are cleared on successful return from chown(2). <sect1>CAP_FSETID Overrides the restriction that the real or effective user ID of a process sending a signal must match the real or effective user ID of the process receiving the signal. <sect1>CAP_KILL Overrides the restriction that the real or effective user ID of a process sending a signal must match the real or effective user ID of the process receiving the signal. <sect1>CAP_LINK_DIR Overrides the restriction that a process cannot create or delete a hard link to a directory. This shall not override MAC and DAC policies. <sect1>CAP_SETFCAP Allows the (re)setting of a files capabilities. <sect1>CAP_SETGID Allows setgid(2) manipulation. <sect1>CAP_SETUID Allows setuid(2) manipulation <sect1>CAP_SIGMASK Overrides the restriction that no process may block <tt/SIGKILL/ and <tt/SIGSTOP/. <sect1>CAP_MAC_DOWNGRADE This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. This capability allows a process to downgrade an object's information label. <sect1>CAP_MAC_READ This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. Allows a process to override MAC read restrictions. <sect1>CAP_MAC_RELABEL_SUBJ This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. Allows a process to modify its own label. <sect1>CAP_MAC_UPGRADE This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. This capability allows a process to upgrade an object's information label. <sect1>CAP_MAC_WRITE This capability is available if <tt/_POSIX_MAC/ is <tt/#define/'d. This capability overrides the MAC restrictions on writes. <sect1>CAP_INF_NOFLOAT_OBJ This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability prevents a process' information label from floating during writes. <sect1>CAP_INF_NOFLOAT_SUBJ This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability prevents the process' information label from floating during reads or executes. <sect1>CAP_INF_RELABEL_OBJ This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability allows a process to change an object's information label. <sect1>CAP_INF_RELABEL_SUBJ This capability is available if <tt/_POSIX_INF/ is <tt/#define/'d. This capability allows a process to modify its own information label in violation of the overriding policy. <sect1>CAP_AUDIT_CONTROL This capability is available if <tt/_POSIX_AUD/ is <tt/#define/'d. This capability allows a process to modify the audit control parameters. <sect1>CAP_AUDIT_WRITE This capability is available if <tt/_POSIX_AUD/ is <tt/#define/'d. This capability allows a process to write data to the audit trail. <sect>Linux specific capabilities This section lists additional capabilities that are specific to Linux or not covered by the POSIX capability definitions. <sect1>CAP_LINUX_IMMUTABLE Allow modification of <tt/S_IMMUTABLE/ and <tt/S_APPEND/ file attributes. <sect1>CAP_LINUX_KERNELD Permission to act as kerneld. <sect1>CAP_LINUX_INSMOD Allow installation of kernel modules. <sect1>CAP_LINUX_RMMOD Allow removal of kernel modules. <sect1>CAP_LINUX_RAWIO Allow ioperm/iopl access. <sect1>CAP_LINUX_ATTENTION Allow configuration of the secure attention key. <sect1>CAP_LINUX_RANDOM Allow administration of the random device. <sect>Other capabilities This sction lists those capabilities commonly found on other systems besides Linux, but which are not specified by POSIX. <sect1>CAP_NET_BIND_SERVICE Allows binding to TCP/UDP sockets below 1024. <sect1>CAP_NET_BROADCAST Allow broadcasting. <sect1>CAP_NET_DEBUG Allow setting debug option on sockets. <sect1>CAP_NET_FIREWALL Allow configuring of firewall stuff. <sect1>CAP_NET_IFCONFIG Allow interface configuration. <sect1>CAP_NET_PACKET Allow use of PACKET sockets. <sect1>CAP_NET_RAW Allow use of RAW sockets. <sect1>CAP_NET_ROUTE Allow modification of routing tables. <sect1>CAP_NET_SETID CAP.FIXME: what is this about?. <sect1>CAP_IPC_LOCK Allow locking of segments in memory. <sect1>CAP_IPC_OWNER Override IPC ownership checks. <sect1>CAP_SYS_CHROOT Allow use of chroot(). <sect1>CAP_SYS_PTRACE Allow ptrace() of any process. <sect1>CAP_SYS_ACCOUNT Allow configuration of process accounting. <sect1>CAP_SYS_ADMIN System Admin functions: mount et al. <sect1>CAP_SYS_BOOT Allow use of reboot(). <sect1>CAP_SYS_DEVICES Allow device administration. <sect1>CAP_SYS_NICE Allow use of renice() on others, and raising of priority. <sect1>CAP_SYS_RESOURCE Override resource limits. <sect1>CAP_SYS_TIME Allow manipulation of system clock. <sect1>CAP_SYS_TTY_CONFIG Allow configuration of tty devices. <sect1>CAP_SYS_QUOTA Allow examination and configuration of disk quotas. <chapt>Capability sets In this chapter we introduce the concept of sets of capabilities. We also discuss the way in which such sets are maintained by the kernel and their association with system objects. <sect>Sets of capabilities Capabilities are independent and an application may have any number of them raised. The total selection of raised capabilities constitutes a capability set. <sect1>The capability set type The kernel represents a capability set with the structure <tt/struct __cap_s/. and may be thought of as a bitmap. The number of capability bits is currently 128. There is a special, empty, capabiltiy set. This set has no capabilities raised. <sect1>Capability set macros To manipulate capabilities, a kernel task uses one of the following three macros: <sect2>_cap_raise(cap_value_t capability) This macro raises a capability in a capability set. It is used (with respect to a capability "struct __cap_s c") in the following manner: <verb> c._cap_raise(CAP_AUDIT_CONTROL); </verb> Note, the macro is used in place of a member of the <tt/__cap_s/. <sect2>_cap_lower(cap_value_t capability); This macro lowers a capability in a capability set. It is used (with respect to a capability "struct __cap_s c") in the following manner: <verb> c._cap_lower(CAP_AUDIT_CONTROL); </verb> Note, the macro is used in place of a member of the <tt/__cap_s/. <sect2>_cap_raised(cap_value_t capability); This macro returns <em/true/(!=0) if the capability is raised, and <em/false/(=0) if the capability was lowered. It is used in the following manner: <verb> if (c._cap_raised(CAP_AUDIT_CONTROL)) { /* .... action in case that capability is raised .... */ } </verb> Note, this macro is used as if it were a member of the capability structure. <sect>Filesystem capabilitiy sets Files on the system have the following capability sets. <sect1>Inheritable (fI) This is the set of capabilities that the program is permitted to inherit. In this way, the program can be configured to be safe for operation with a fixed number of capabilities. (This has the feature of eliminating the total colapse of system security in the face of an insecurely written game.) <sect1>Permitted (fP) This is the set of capabilities that are explicitly raised when the new process is <tt/exec()/'d. <sect1>Effective (fE) This is a single valued capability set (0 or ~0) that is required to be set if the process (after an exec) is to have all of its capabilities automatically raised. It is intended to be used with programs that were written before the concept of capabilities was introduced. Newly written programs should raise their own capabilities manually. <sect>Process/task capabilities Processes (also know as kernel-tasks) have the following capability sets. They are copied unchanged when the process performs a <tt/fork()/ (or more generally, a <tt/clone()/) function call. Capability sets for a process are always affected by a call to <tt/exec()/. Additionally, user level programs can manipulate their own capabilities with system calls to <tt/sys_[gs]etproccap()/. Capabilities are grouped into sets and come in three forms: Effective; Inheritable; and Permitted. <sect1>Effective (pE) This is the set of capabilities that are currently <em/raised/, or in effect. This is set upon <tt/exec()/ to <verb> pE' = pP' & fE. </verb> In other words, the new effective set is the new permitted set masked with the <tt/exec()/uted file's effective set.. Capabilities may be raised in the Effective set only if they are already raised in the Permitted set. Capabilities not in the Effective set are "lowered", or are inactive. <sect1>Inheritable (pI) This is the set of capabilities that can be passed through an <tt/exec()/. These capabilities do not have to be raised in the predecessor to be passed, just present in the old process sets. Following an exec(), the Inheritable set becomes <verb> pI' = pI. </verb> Or in other words, the Inheritable set is passed unchanged thrhough and <tt/exec()/.. <sect1>Permitted (pP) These are the capabilities that are potentially available to the current task. Following an <tt/exec()/, the process' Permitted set is constructed from the following combination of the former, and invoked, file's capability sets: <verb> pP' = fP | ( fI & pI ). </verb> Or in other words, the process' permitted set becomes the combination of the permitted set of the <tt/exec()/'d file and those inheritable capabilities of the <tt/exec()/ing file that are also inheritable by the file. Turning off a capability set in the Permitted set does not affect its status in the Inheritable set. The following is apparently a quote: (draft 13 of POSIX .1e) POSIX B.15.1.9 "The corresponding inheritable flag is not affected, so that privileges can be conditionally transmitted along a process chain whose intermediate processes may themselves have no privileges" (N.B. At this stage in the development of the POSIX standard, capabilities were termed privileges.) So if you want to turn off the capabilities in the Inheritable set, you should explicitly turn it off, not just turn it off in the Permitted set. <chapt>Access Control Lists (ACL) <chapt>Mandatory Access Control (MAC) <chapt>Information Labels (IL) <chapt>Sensitivity Labels (SL) <chapt>Integrity checking; system recovery <chapt>Auditing An auditing facility is provided. Its design is intended to be simple but flexible, in such a way that it can be robustly implemented and will not impinge on the general responsiveness of the Linux kernel. The following is a summary of the recent discussion on the linux-privs list. I am in the process of making a proposal that will be general enough to handle all of the following features: <sect> First some kernel infrastructure: <itemize> <item> basic (atomic) auditing facility <itemize> <item> two modes of access: blocking and non-blocking </itemize> </itemize> <itemize> <item> some sort of circular buffer arrangement with hooks for a module of some sort to store the audit data on some medium. <itemize> <item> such a module will accept rules that tell it what to audit and what to discard. <item> perhaps we adopt the model of the firewalling rules. </itemize> <item> audit - session id for following login etc. process trees. <item> minimal audit records include audit-id, uid and time plus raw numerical data (for off-line translation to readable text <item> all audit records have the same prefix-header <itemize> <item> Fixed Header prefix contains a version number (audit readers will know how to deal with historical formats). </itemize> </itemize> <sect>What audit records should contain? <itemize> <item> Events to be audited are selected by event masks: union of system/user/file event masks. <item> Proposed audit prefix header: <itemize> <item> audit header version (audit generation) __u8 <item> length of audit record (bytes) __u16 <item> machine id: __u16 <item> kind of event (e.g. system call code): __u8 <item> reason for logging (bitmask: audited syscall, audited file, audited user, audited group) __u32 <item> timestamp (when audited) __u32 <item> audit id (set by init or login etc., for following process evolution) __u32 <item> user id, group id, program id, ... __u32 </itemize> </itemize> <sect>Next some Policy (when/what should we audit): <itemize> <item> Classes of event <itemize> <item> Group individual events into classes: file-access/system tweeking etc.. <item> Audit primarily on class of event. </itemize> <item> Events that will always be audited (fixed events): <itemize> <item> all actions relating to the audit sybsystem itself <item> all attempts to change the system date <item> all actions relating to group and user attributes <item> all definitions and deletions of MAC level names and level identifiers <item> chages of init states <item> all actions relating to lodable modules </itemize> <item> User level event classes: <itemize> <item> some method for defining user-level event groups: administrator can select etc. </itemize> <item>Events that may be audited <itemize> <item> file-access (open, connect, accept, close) <item> changing of file attributes (chmod, chattr, umask, create) <item> program execution (exec, fork, vfork, exit) <item> "exits" - administrator hook to deny requests/grant requests/modify system etc., in the light of the event. </itemize> </itemize> <sect>User level auditing (POSIX): <itemize> <item> POSIX only defines a method for writing audit records. <item> Audit records are variable length <item> each record contains header <itemize> <item> event type - indicates minimal information contained in record <item> time <item> result (success/failure) <item> audit - id <item> other stuff...(?) </itemize> <item> subject attributes <itemize> <item> information about the calling process. <itemize> <item> pid </itemize> </itemize> <item> object structures <itemize> <item> contain information on objects associated with the audited event: <itemize> <item> files <item> processes <item> etc. </itemize> </itemize> <item> event specific data: as directed by the event type. <item> Functions: <itemize> <item> int aud_write(int fildes, aur_rec_t ar) <itemize> <item> special fildes = AUD_SYSTEM_LOG, to indicate that data is written to the system audit record. </itemize> </itemize> <item> Policy: here we list audit event types (and the functions that they refer to) <itemize> <item> AUD_AET_AUD_SWITCH <itemize> <item> aud_switch() </itemize> <item> AUD_AET_AUD_WRITE <itemize> <item> aud_write() </itemize> <item> AUD_AET_CHDIR <itemize> <item> chdir() </itemize> <item> AUD_AET_CHMOD <itemize> <item> chmod() </itemize> <item> AUD_AET_CHOWN <itemize> <item> chown() </itemize> <item> AUD_AET_CREAT <itemize> <item> creat() </itemize> <item> AUD_AET_DUP <itemize> <item> dup(), dup2() </itemize> <item> AUD_AET_EXEC <itemize> <item> exec(), execl(), execlp(), execv(), execvp(), execle(), execve() </itemize> <item> AUD_AET_EXIT <itemize> <item> _exit() </itemize> <item> AUD_AET_FORK <itemize> <item> fork() </itemize> <item> AUD_AET_KILL <itemize> <item> kill() </itemize> <item> AUD_AET_LINK <itemize> <item> link() </itemize> <item> AUD_AET_MKDIR <itemize> <item> mkdir() </itemize> <item> AUD_AET_MKFIFO <itemize> <item> mkfifo() </itemize> <item> AUD_AET_OPEN <itemize> <item> open(), opendir() </itemize> <item> AUD_AET_PIPE <itemize> <item> pipe() </itemize> <item> AUD_AET_RENAME <itemize> <item> rename() </itemize> <item> AUD_AET_RMDIR <itemize> <item> rmdir() </itemize> <item> AUD_AET_SETGID <itemize> <item> setgid() </itemize> <item> AUD_AET_SETUID <itemize> <item> setuid() </itemize> <item> AUD_AET_UNLINK <itemize> <item> unlink() </itemize> <item> AUD_AET_UTIME <itemize> <item> utime() </itemize> </itemize> <item> Define a minimus set of events that cannot be turned off: these preserve the integrety of the audit lof <item> user and file bitmasks are provided to trace the behavior of the system with respect to specific files and users. These are OR'd together to determine if an event is to be logged. </itemize> <sect>Internal kernel auditing facility Here we discuss the manner in which the kernel records audited events. Any part of the kernel may audit an event. Only those user-level processes with their auditing capability set may audit an event with the kernel. Contingency is made for the possibility of the system log filling up. <sect>Preserving the audit log This section discusses the kernel hooks provided for making a hard-copy of the system log. <sect>Auditing policy This section describes the policies used to define what sorts of events are logged. It also introduces the preferred format for recording these audited events. <chapt>Process/task credentials In this section we cover the freedoms of a given process. These include items like resource limits. <chapt>Acknowledgements This document is edited by Andrew G. Morgan. Amendments and corrections should be emailed to <tt><linux-privs@mit.edu></tt>. Contributors to the text of this document include: Hildo Biersma, Roland Buresund, Dave Dillow, Julie Haugh, David Holland, Alexander Kjeldaas, Andi Kleen, Darren Moffat, Ingo Molnar, Aleph One, Christos Ricudis, Ken Seefried, Theodore Ts'o, and Zefram (Andrew Main). <chapt>Copyright/license for this document Copyright (c) 1997, Andrew G. Morgan <tt><morgan@parc.power.net></tt>. All rights reserved. Redistribution and use in source (sgml) and binary (derived) forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, and the entire permission notice in its entirety, including the disclaimer of warranties. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. Alternatively, this product may be distributed under the terms of the GNU General Public License (GPL), in which case the provisions of the GNU GPL are required instead of the above restrictions. (This clause is necessary due to a potential bad interaction between the GNU GPL and the restrictions contained in a BSD-style copyright.) THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. </report>