Subject: Re: Intercepting system calls
To: None <tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 04/22/2002 23:59:22
>> I've still got the patches [for PT_SYSCALL].
> Can you describe *precisely* what PT_SYSCALL does?

Yes.  Of course, a 100% precise description would amount to quoting the
code.  I'll try to steer a middle course between that and being too
vague.  If I leave out something important, of course, I'll be happy to
elaborate.

PT_SYSCALL is just like PT_CONTINUE except that the process will stop
next time it experiences syscall entry or exit.  There is a structure
(which is inherently machine-dependent to at least a slight degree)
describing the syscall, and PT_ requests to read and write it.  The
structure includes an indication whether the stop is at entry or exit;
on entry, the structure includes the syscall number and arguments,
whereas on exit, it includes the error code and rval[] values.  When a
process is stopped at entry or exit, its memory space may be examined
with PT_READ_* and changed with PT_WRITE_*, same as when it's stopped
for any other reason; these allow inspecting and modifying syscall
arguments or returns that are actually in memory (such as ioctl
argument structures, or I/O data buffers).  Additionally, the interface
structure can be written, thereby changing the syscall number and/or
arguments on entry, or return value and error on exit.  Also, if when
stopped at syscall entry the structure is written with an "exit"
structure, continuing does not actually perform the syscall, instead
simply passing back the values set in the structure.  (At exit, you
cannot write an "entry" structure and get another syscall performed,
though that would be a conceptually sensible thing to support, and may
even be reasonable and/or useful.)

That's the basics.  I can of course describe exactly what the struct
tags and member names and such in my implementation are, but it seems
to me that would be unnecessary fluff here, especially since (as I
mention below) they need some work.

It does require some MD hooks; the exact interface structure is at
least slightly machine-dependent, and it requires code in the syscall
trap handler (replacing (*callp->sy_call)(...) with about twenty lines
of code if the process is marked as trace-on-syscall).

The interface probably could use some work.  In fact, it certainly
could, since I have barely touched it since it was rejected.  One thing
I know it needs is a way to tell for sure whether the process stopped
because of a syscall or for some other reason; another is that it needs
some indication of not just syscall number but the emulation in use,
since syscall numbers are meaningless except in the context of an
emulation package.

It is completely orthogonal to ktrace.  In my implementation, the
KTRPOINTs are before and after the code that handles syscall tracing,
meaning that KTR_SYSCALL and KTR_SYSRET records reflect the call the
process tried to make and the results it actually saw, rather than the
call (if any) actually performed.  This could of course be changed.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B