Subject: kauth, securelevel, and "run levels"
To: None <tech-kern@netbsd.org, tech-security@netbsd.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 03/25/2006 12:37:07
I think considerable confusion in discussing Elad's proposed change has
come from the confusion of a particular abstract state of the system
(the "security level") with the name of the mechanism used to check
that state (the kernel "securelevel" global variable).  We should try
to disambiguate this so that we don't lose any important functionality
as Elad integrates his new code.

Here, as I understand it, is what the "system security level" is, and
always has been: essentially what Jonathan says, one of a set of states
of the system which are ordered by monotonically decreasing access to
privileged functionality in the kernel.

Initially (and I know this because of conversations with the authors of
the original code; but it also "just makes sense") the idea was to have
only two states: "traditional Unix permission model" (securelevel = 0)
and "all operations allowing persistent system state to be written
prohibited" (securelevel = 1).  The core idea here is to allow the
system -- even if known-compromised -- to be rebooted into a state that
is known-safe (it may have whatever bugs allowed the initial compromise;
but its key executables and data, the "TCB", will be as they were at
first deployment).

That's *all* the "security level" was originally intended for.  Let's
keep that in mind.  As Kirk said to me years ago, the idea was to
provide a simple, even provably-correct, means of dramatically limiting
the extent of any system compromise, by prohibiting all and only the
operations that would allow an attacker with root privileges to write
the disk at will.

The rest got bolted on later.  I'll address that in a couple of
paragraphs.

Of course, one wants some way to update or modify the system beyond
"total overwrite of boot media".  To make this possible, a mechanism
clearly based on the idea of "run levels" from System V was implemented
in the kernel and in init.  The system boots into "security level 0",
"traditional Unix permission model".  When the first shell spawned by
init exits, init yanks the security level to 1, "all operations allowing
persistent state to be overwritten prohibited".  When we drop from
so-called "multi-user mode" to "single-user mode" -- in other words,
when it is guaranteed that init is the *only* process running on the
system -- we return to "security level 0".

What this achieves is very simple: it guarantees that the persistent
state of the system can only be changed _if one has access to the
communication stream provided by init in "single-user mode"_, since
init is the only process running on the system.  That stream, of
course, is the system console.  Essentially this allows one to require
physical access to the machine in order to modify its software state.
(There are additional complications resulting from networked consoles,
but for any such configuration the basic invariant "requires access to
init's communication stream" constrains the potential sources of
compromise).

We should note that this whole model turns on the "security level"
being yanked up _and down_ as the system transitions between what,
for lack of a better term, I will call "run levels": states in which
we know certain upper bounds on the set of code that may be running
when we enter the state.  If you don't have those bounds, it is not
safe to transition to a more-permissive "security level" state,
and pointless to transition to a less-permissive one.

Over time, bugs in the initial security level implementation were
found.  These were essentially mistakes *in the specification of
level 1*: operations allowing persistent compromise had been
overlooked.  Unfortunately, when we found those, we did not fix
them in a consistent way.  Instead, we sometimes took advantage of
the implementation quirk that anyone could _raise_ the value of
the "securelevel" variable (though only init could lower it) and
created a new state, "security level 2", which was something like
"really we mean it, golly we're being honest now, don't let anything
be done that could write the disks."

Worse, now that we had 'security level 2', we piled on all kinds of
other random prohibitions that were useful things to prohibit in
various running systems which required serious hardening, but which
had nothing to do with the basic goal of the framework, which was
to make it possible to return the system to a known state by
dropping it to security level 0 (or rebooting it, and not letting
it proceed to level 1).

Of course, what we _ought_ to have done was fix level 1 -- the claim
was that if we prohibited everything we needed to, users would feel
too much pain -- and defined some better mechanism, like kauth, for
the kind of prohibitions we wanted for "level 2".  Basically, I picked
the wrong fight to fight and backed down to the wrong compromise
position (peanut gallery: this is a habit of mine.  sigh.).

Now, what does this imply about what we should do _now_, keeping in
mind the basic goal of the old framework: to provide two system
states, one in which even root can't arbitrarily overwrite the TCB,
and one in which root _can_, and known sets of code that can be
executing at the transition into each state?

I think it implies this:

1) We should factor out exactly what operations may allow persistent
   compromise, and produce a kauth mask that prohibits them.  This
   should correspond to the old "security level 1", plus what should
   have gone into level 1, but went into level 2 because I was a dumbass.

2) We should implement, rather than this confusion of run-level and
   security-state in init, an ordered set of "run levels" implemented
   by init and the kernel cooperatively, so that if we're in "run level
   0", we know that everything's been killed off and init has started
   with a fresh slate.  Note that this would allow implementing intermediate
   or higher "run levels".  That's important.  See point 3.

3) We should make the kernel, at the transition into each "run level", set
   a specific permission mask that is the maximal set of permissions any
   process may have at that run level.  The permission mask for any run
   level should be settable only at a lower run level.

4) Since only init can lower the run level, it follows that only init can
   cause the permission mask for a lower run level to be loaded.  With
   correctly constructed masks (two sets of which we should supply with
   the OS distribution: one that is _exactly_ the old 0, 1, 2, "securelevel"
   sets of operations, and one that is a _correct_ implementation of the
   old "no persistent compromise" goal using only levels 0 and 1) this, it
   seems obvious to me, gives an exact implementation of the old policy,
   including its goals of (relatively) easy analysis and known scope and
   source of compromise.

This also allows, using intermediate run levels, tricks like allowing
certain operations only in a defined state in which only the "network
filter update daemon" is running; and so forth.  That's all just the
gravy you get from cooking the basic meat of the design; but I think,
actually, it's fairly tasty gravy...

-- 
  Thor Lancelot Simon	                                     tls@rek.tjls.com

  "We cannot usually in social life pursue a single value or a single moral
   aim, untroubled by the need to compromise with others."      - H.L.A. Hart