tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: In-kernel process exit hooks?



On Fri, Jan 08, 2016 at 07:08:19AM +0000, David Holland wrote:
> On Thu, Jan 07, 2016 at 07:34:33AM +0800, Paul Goyette wrote:
>  > Based on internal implementation of filemon(4), there is an ordering
>  > requirement imposed on the sequence of events that occur when a process
>  > using /dev/filemon exits.  In particular, the file descriptor on which the
>  > device is open needs to be closed prior to closing the descriptor to which
>  > avent/activity records are being logged.  (Otherwise, because of an extra
>  > reference on the log file, fd_close() will hang indefinitely.)
> 
> Looking at the filemon code... it is completely wrong. It should not
> be using file handles like that, especially not in a device driver.
> Rather than adding new hacks to the system to work around it being
> silly, it should be redone properly.
> 
> For an example of the right way to do this kind of thing, look in
> kern_acct.c.
> 

I would like to add my $0,03 seeing that there is time invested into
what I consider a lost cause.

I have to agree that filemon is misdesigned and of low quality in
general.

While I don't know all the details, it is clear that the purpose would
be much better served by ktrace and I would argue efforts should be
spent there.

bmake takes filemon output and copies it into some file.

>From usability perspective what seems to be needed is few hacks to
ktrace to only log the stuff make is interested in and parsing code for
bmake to translate to what looks like current filemon format for
aformentioned copying.

I only had a brief look at ktrace implementation. It uses global locks,
so that would have to be worked on, but it should be trivial to at least
make tracees using different output files not compete with each other.

So, what's so fitting about ktrace design: tracing is tied to the target
process, can react to things like changing credentials and reparenting
(fork + parent exit), but most importantly it does not make assumptions
about things which can change without its control (see below).

Filemon monitors stuff at syscall level, stores the fd and remembers
pids.

Let's see how this leads to trouble. Some of these problems were also
present in FreeBSD implementation (and some still are). A bonus
interesting fact is that implementations have diverged, although are
still fundamentally broken.

http://nxr.netbsd.org/xref/src/sys/dev/filemon/filemon.c#filemon_ioctl
>        case FILEMON_SET_PID:
>                /* Set the monitored process ID - if allowed. */
>                mutex_enter(proc_lock);
>                tp = proc_find(*((pid_t *) data));

There are no steps taken to keep tp around nor in a certain state (see
below). proc_find only finds the pointer.

>                mutex_exit(proc_lock);
>                if (tp == NULL ||
>                    tp->p_emul != &emul_netbsd) {

tp could have exited and be freed by now. Now that the comparison was
done it could have p_emul changed.

>                        error = ESRCH;
>                        break;
>                }
>
>                error = kauth_authorize_process(curproc->p_cred,
>                    KAUTH_PROCESS_CANSEE, tp,
>                    KAUTH_ARG(KAUTH_REQ_PROCESS_CANSEE_ENTRY), NULL, NULL);

Now that the check is pased the process could have executed something
setuid.

I'm somewhat surprised this code is here. Should not
kauth_authorize_process assert that tp has stable credentials in some
way, in effect just crashing filemon consumers?

>                if (!error) {
>                        filemon->fm_pid = tp->p_pid;
>                }

Pid is stored, but tp is not held. What if the 'traced' process exits?
grep only reveals the following bit in filemon_wrapper_sys_exit:

>                if (filemon->fm_pid == curproc->p_pid)
>                        filemon_printf(filemon, "# Bye bye\n");

So filemon will continue tracing anything which happened to reuse the
pid.

"Tracing points" are at syscall layer. With only pid remembered this
adds up to serious breakage induced by the need to look the filemon structure
up each time, and based on unreliable data.

>static struct filemon *
>filemon_pid_check(struct proc * p)
>{       
>        struct filemon *filemon;
>        struct proc * lp;
>        
>        if (!TAILQ_EMPTY(&filemons_inuse)) {
>                while (p) {
>                        /*
>                         * make sure p cannot exit
>                         * until we have moved on to p_pptr
>                         */
>                        rw_enter(&p->p_reflock, RW_READER);
>                        TAILQ_FOREACH(filemon, &filemons_inuse, fm_link) {
>                                if (p->p_pid == filemon->fm_pid) {
>                                        rw_exit(&p->p_reflock);
>                                        return (filemon);
>                                }
>                        }

But p_pid should be constant for 'struct proc' lifetime, so locking this
early is wasteful.

>                        lp = p;
>                        p = p->p_pptr;
>                        rw_exit(&lp->p_reflock);

I guess the lock is needed to read the pointer to the parent.

Nothing was done to keep the parent around, which means it could have
exited and be freed by now. Which in turn makes the second loop
iteration a use-after-free.

The first iteration is safe if the argument is curproc (which it is), as
it clearly cannot disappear.

Turns out this traversal is even more wrong since p_pptr does not have
to be the process you are looking for - ptrace is reparenting tracees.

>                }
>        }
>        return (NULL);
>}

-- 
Mateusz Guzik <mjguzik gmail.com>



Home | Main Index | Thread Index | Old Index