This is the Linux kernel capabilities FAQ

Its history, to the extent that I am able to reconstruct it is that
v2.0 was posted to the Linux kernel list on 1999/04/02 by Boris
Tobotras. Thanks to Denis Ducamp for forwarding me a copy.

Cheers

Andrew

Linux Capabilities FAQ 0.2
==========================

1) What is a capability?

The name "capabilities" as used in the Linux kernel can be confusing.
First there are Capabilities as defined in computer science. A
capability is a token used by a process to prove that it is allowed to
do an operation on an object.  The capability identifies the object
and the operations allowed on that object.  A file descriptor is a
capability.  You create the file descriptor with the "open" call and
request read or write permissions.  Later, when doing a read or write
operation, the kernel uses the file descriptor as an index into a
data structure that indicates what operations are allowed.  This is an
efficient way to check permissions.  The necessary data structures are
created once during the "open" call.  Later read and write calls only
have to do a table lookup.  Operations on capabilities include copying
capabilities, transferring capabilities between processes, modifying a
capability, and revoking a capability.  Modifying a capability can be
something like taking a read-write filedescriptor and making it
read-only.  A capability often has a notion of an "owner" which is
able to invalidate all copies and derived versions of a capability.
Entire OSes are based on this "capability" model, with varying degrees
of purity.  There are other ways of implementing capabilities than the
file descriptor model - traditionally special hardware has been used,
but modern systems also use the memory management unit of the CPU.

Then there is something quite different called "POSIX capabilities"
which is what Linux uses.  These capabilities are a partitioning of
the all powerful root privilege into a set of distinct privileges (but
look at securelevel emulation to find out that this isn't necessary
the whole truth).  Users familiar with VMS or "Trusted" versions of
other UNIX variants will know this under the name "privileges".  The
name "capabilities" comes from the now defunct POSIX draft 1003.1e
which used this name.

2) So what is a "POSIX capability"?

A process has three sets of bitmaps called the inheritable(I),
permitted(P), and effective(E) capabilities.  Each capability is
implemented as a bit in each of these bitmaps which is either set or
unset.  When a process tries to do a privileged operation, the
operating system will check the appropriate bit in the effective set
of the process (instead of checking whether the effective uid of the
process i 0 as is normally done).  For example, when a process tries
to set the clock, the Linux kernel will check that the process has the
CAP_SYS_TIME bit (which is currently bit 25) set in its effective set.

The permitted set of the process indicates the capabilities the
process can use.  The process can have capabilities set in the
permitted set that are not in the effective set.  This indicates that
the process has temporarily disabled this capability.  A process is
allowed to set a bit in its effective set only if it is available in
the permitted set.  The distinction between effective and permitted
exists so that processes can "bracket" operations that need privilege.

The inheritable capabilities are the capabilities of the current
process that should be inherited by a program executed by the current
process.  The permitted set of a process is masked against the
inheritable set during exec().  Nothing special happens during fork()
or clone().  Child processes and threads are given an exact copy of
the capabilities of the parent process.

3) What about other entities in the system? Users, Groups, Files?

Files have capabilities.  Conceptually they have the same three
bitmaps that processes have, but to avoid confusion we call them by
other names.  Only executable files have capabilities, libraries don't
have capabilities (yet).  The three sets are called the allowed set,
the forced set, and the effective set.

The allowed set indicates what capabilities the executable is allowed
to receive from an execing process.  This means that during exec(),
the capabilities of the old process are first masked against a set
which indicates what the process gives away (the inheritable set of
the process), and then they are masked against a set which indicates
what capabilities the new process image is allowed to receive (the
allowed set of the executable).

The forced set is a set of capabilities created out of thin air and
given to the process after execing the executable.  The forced set is
similar in nature to the setuid feature.  In fact, the setuid bit from
the filesystem is "read" as a full forced set by the kernel.

The effective set indicates which bits in the permitted set of the new
process should be transferred to the effective set of the new process.
The effective set is best thought of as a "capability aware" set.  It
should consist of only 1s if the executable is capability-dumb, or
only 0s if the executable is capability-smart.  Since the effective
set consists of only 0s or only 1s, the filesystem can implement this
set using a single bit.

NOTE: Filesystem support for capabilities is not part of Linux 2.2.

Users and Groups don't have associated capabilities from the kernel's
point of view, but it is entirely reasonable to associate users or
groups with capabilities.  By letting the "login" program set some
capabilities it is possible to make role users such as a backup user
that will have the CAP_DAC_READ_SEARCH capability and be able to do
backups.  This could also be implemented as a PAM module, but nobody
has implemented one yet.

4) What capabilities exist?

The capabilities available in Linux are listed and documented in the
file /usr/src/linux/include/linux/capability.h.

5) Are Linux capabilities hierarchical?

No, you cannot make a "subcapability" out of a Linux capability as in
capability-based OSes.

6) How can I use capabilities to make sure Mr. Evil Luser (eluser)
can't exploit my "suid" programs?

This is the general outline of how this works given filesystem
capability support exists.  First, you have a PAM module that sets the
inheritable capabilities of the login-shell of eluser.  Then for all
"suid" programs on the system, you decide what capabilities they need
and set the _allowed_ set of the executable to that set of
capabilities.  The capability rules

   new permitted = forced | (allowed & inheritable)

means that you should be careful about setting forced capabilities on
executables.  In a few cases, this can be useful though.  For example
the login program needs to set the inheritable set of the new user and
therefore needs an almost full permitted set.  So if you want eluser
to be able to run login and log in as a different user, you will have
to set some forced bits on that executable.

7) What about passing capabilities between processes?

Currently this is done by the system call "setcap" which can set the
capabilities of another process.  This requires the CAP_SETPCAP
capability which you really only want to grant a _few_ processes.
CAP_SETPCAP was originally intended as a workaround to be able to
implement filesystem support for capabilities using a daemon outside
the kernel.

There has been discussions about implementing socket-level capability
passing.  This means that you can pass a capability over a socket.  No
support for this exists in the official kernel yet.

8) I see securelevel has been removed from 2.2 and are superceeded by
capabilities.  How do I emulate securelevel using capabilities?

The setcap system call can remove a capability from _all_ processes on
the system in one atomic operation.  The setcap utility from the
libcap distribution will do this for you.  The utility requires the
CAP_SETPCAP privilege to do this.  The CAP_SETPCAP capability is not
enabled by default.

libcap is available from
ftp://ftp.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.2/

9) I noticed that the capability.h file lacks some capabilities that
are needed to fully emulate 2.0 securelevel.  Is there a patch for
this?

Actually yes - funny you should ask :-).  The problem with 2.0
securelevel is that they for example stop root from accessing block
devices.  At the same time they restrict the use of iopl.  These two
changes are fundamentally different.  Blocking access to block devices
means restricting something that usually isn't restricted.
Restricting access to the use of iopl on the other hand means
restricting (blocking) access to something that is already blocked.
Emulating the parts of 2.0 securelevel that restricts things that are
normally not restricted means that the capabilites in the kernel has
to have a set of capabilities that are usually _on_ for a normal
process (note that this breaks the explanation that capabilities are a
partitioning of the root privileges).  There is an experimental patch at

ftp://ftp.guardian.no/pub/free/linux/capabilities/patch-cap-exp-1

which implements a set of capabilities with the "CAP_USER" prefix:

cap_user_sock  - allowed to use socket()
cap_user_dev   - allowed to open char/block devices
cap_user_fifo  - allowed to use pipes

These should be enough to emulate 2.0 securelevel (tell me if we need
something more).

10) Seems I need a CAP_SETPCAP capability that I don't have to make use
of capabilities.  How do I enable this capability?

Change the definition of CAP_INIT_EFF_SET and CAP_INIT_INH_SET to the
following in include/linux/capability.h:

#define CAP_INIT_EFF_SET    { ~0 }
#define CAP_INIT_INH_SET    { ~0 }

This will start init with a full capability set and not with
CAP_SETPCAP removed.

11) How do I start a process with a limited set of capabilities?

Get the libcap library and use the execcap utility.  The following
example starts the update daemon with only the CAP_SYS_ADMIN
capability.

execcap 'cap_sys_admin=eip' update

12) How do I start a process with a limited set of capabilities under
another uid?

Use the sucap utility which changes uid from root without loosing any
capabilities.  Normally all capabilities are cleared when changing uid
from root.  The sucap utility requires the CAP_SETPCAP capability.
The following example starts updated under uid updated and gid updated
with CAP_SYS_ADMIN raised in the Effective set.

sucap updated updated execcap 'cap_sys_admin=eip' update

[ Sucap is currently available from
ftp://ftp.guardian.no/pub/free/linux/capabilities/sucap.c. Put it in
the progs directory of libcap to compile.]

13) What are the "capability rules"

The capability rules are the rules used to set the capabilities of the
new process image after an exec.  They work like this:

        pI' = pI
  (***) pP' = fP | (fI & pI)
        pE' = pP' & fE          [NB. fE is 0 or ~0]

  I=Inheritable, P=Permitted, E=Effective // p=process, f=file
  ' indicates post-exec().

Now to make sense of the equations think of fP as the Forced set of
the executable, and fI as the Allowed set of the executable.  Notice
how the Inheritable set isn't touched at all during exec().

14) What are the laws for setting capability bits in the Inheritable,
Permitted, and Effective sets?

Bits can be transferred from Permitted to either Effective or
Inheritable set.

Bits can be removed from all sets.

15) Where is the standard on which the Linux capabilities are based?

There used to be a POSIX draft called POSIX.6 and later POSIX 1003.1e.
However after the committee had spent over 10 years, POSIX decided
that enough is enough and dropped the draft.  There will therefore not
be a POSIX standard covering security anytime soon.  This may lead to
that the POSIX draft is available for free, however.

--
        Best regards, -- Boris.