tech-kern: Re: ptrace() vs. SIGKILL?

Subject: Re: ptrace() vs. SIGKILL?
To: None <tech-security@netbsd.org, tech-kern@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 12/07/2002 20:58:55
[ On Saturday, December 7, 2002 at 23:42:42 (+0100), der Mouse wrote: ]
> Subject: Re: ptrace() vs. SIGKILL?
>
> > However I mean very literally "if ptrace() can _prevent_ SIGKILL from
> > ultimately taking effect" then it's a security bug waiting to happen.
> 
> I repeat: Greg, meet Reality.

Wait a minute here.  I'm not talking about "reality", as in existing
implementations, I'm talking about an ideal world where ptrace() is safe
to exist even on security-sensitive machines.

> Try ftp.netbsd.org:/pub/NetBSD/misc/mouse/sigkill.c on your favorite
> system.

Hmmm!  Thanks for that!

Not my favourite system by a long shot, but for example (after making
the necessary portability adjustments):

$ uname -srm                                           
SunOS 5.6 sun4m
./tsigkill-vs-ptrace
p1 started, pid 20058
p1 tick
p2 started, pid 20059
p3 started, pid 20060
p2 attaching to 20058
p2 PTRACE_ATTACH failed: No such process
protocol error, unexpected EOF from p2 (20059)


> Here's what I get:
> 
> [Red] 135> sigkill
> p1 started, pid 21811
> p2 started, pid 21812
> p3 started, pid 21813
> p1 tick
> p2 attaching to 21811
> p2 waiting for 21811 to stop
> p2 stop wait shows signal 17
> p2 telling 21811 to continue
> p1 tick
> p3 SIGKILLing 21811
> p2 waiting for 21811 to stop
> p2 stop wait shows signal 9
> p3 sent SIGKILL
> p2 telling 21811 to continue, signal 0
> p1 tick
> p2 waiting for 21811 to stop
> p1 tick
> p1 tick

Looks like the bug I feared is real all right.  P1 should have died when
p2 sent it the PT_CONTINUE, and then stayed dead!

Traditionally ptrace() didn't work for non-parent processes, and for
processes which were not started with the intention of being processed,
which is why there is a PT_TRACE_ME request in the first place.  Now we
have the ability to ptrace() unrelated processes using PT_ATTACH, and
that seems to be where this bug must have crept in.  On any older
systems (eg. any based on 4.3net2 or older, or any based on any AT&T
UNIX) I would hope your program would fail to restart p1 since you don't
use PT_TRACE_ME and there is no PT_ATTACH and thus all signals will be
delivered to it normally without just stopping first and notifying the
parent process.

This apparent PT_ATTACH security bug is indeed exactly the same kind of
security bug that was introduced by the tracing facilities provided by
the SysVr4 /proc filesystem -- facilities that were introduced blindly
without taking into full consideration all the other potentially adverse
interactions with other kernel facilities and features.  (Ptrace() was
supposed to be reimplemented on top of /proc in 4.4BSD too according to
McKusick et al, but apparently they never got around to it and so now
we're still stuck with this somewhat broken, slow, and limited SunOS-4.x
like implementation.)

Unless I'm missing something PT_ATTACH and the buggy SIGKILL behaviour
allows a set of co-operating rogues could keep each other going just so
long as at least one of them got a decent timeslice after any of the
other ones it's waching was SIGKILLed and could then get in at least one
ptrace() call to restart one other before they all got SIGKILLed (and
around they'd go restarting each other as fast as you could try to kill
them).  Maybe if the SIGKILLs are all delivered in exactly the right
order then such a set could be killed, but I suspect that if the
system's busy doing anything else at all then the likelyhood of success
would be rather low.  I guess the important question though is whether a
circle of co-operating processes like that could manage to do anything
else at the same time and still have enough response time to restart one
of its siblings -- maybe it would be easier on a system with kernel
supported user-level threads (one thread waits and ptraces, the other
does whatever nasty business the program is designed to do).  Note
though IEEE Std 1003.1-2001 says:

   When a signal is delivered to a thread, if the action of that signal
   specifies termination, stop, or continue, the entire process shall be
   terminated, stopped, or continued, respectively.

Regardless I think this needs to be fixed.  Processes which are sent
SIGKILL must die before ever executing even one more instruction in
user-land, whether they get to sit motionless in purgatory for a while
at the graces of a debugger or not.  IEEE Std 1003.1-2001, for instance:

   The system shall not allow a process to catch the signals SIGKILL and
   SIGSTOP.

This, along with the additional restriction that prevents use of SIG_IGN
for SIGKILL suggests that even with the assistance of another process it
should not be possible to avoid the required action for SIGKILL, which
is to terminate the process.


> I would love to hear about any ptrace-supporting OS which doesn't work
> this way.

It wouldn't be the first cross-platform ptrace() bug.....

Perhaps ptrace() should have been included in POSIX despite being called
"obsolete" since long before POSIX got started -- maybe then it would
have received a bit more scrutiny and more of its side effects and
interactions with other facilities such as signals would have been more
safely specified.  (we now have the POSIX Tracing option in 1003.1-2001,
but it doesn't seem quite analogous to ptrace(), but more like KTRACE)

> > The [ptrace(2)] manual warns that such child processes [which have
> > called PTRACE_TRACE_ME but have not been traced by their parents]
> > cannot be made to continue without using ptrace(), which if true
> > would make it much harder to clean up after such an attack.
> 
> It's sort-of true.  If their parent is killed, they are nuked at the
> same time as they are reparented to init.  UTSL again, this time
> exit1(), in kern_exit.c:

Ah, well it's not quite so bad as I feared then.  Still it should be
better -- after all a rogue process could reparent itself to '1' before
it calls ptrace(PT_TRACE_ME), and IIRC that means it can never be
re-parented again (at least not without crashing the system :-).

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>