Subject: Re: ptrace(2) & PT_SYSCALL does not stop before executing syscall ?
To: Christos Zoulas <christos@astron.com>
From: Andrew Doran <ad@netbsd.org>
List: current-users
Date: 04/14/2007 13:02:30
On Fri, Apr 13, 2007 at 08:05:54PM +0000, Christos Zoulas wrote:

> In article <20070413161219.GA262083@medusa.sis.pasteur.fr>,
> Nicolas Joly  <njoly@pasteur.fr> wrote:
> >-=-=-=-=-=-
> >
> >
> >Hi,
> >
> >I just noticed that tracing syscalls with ptrace(2) & PT_SYSCALL does
> >not seems to work as expected ... The debugged process seems only
> >stopped after executing a syscall, but not before.
> >
> >I made the attached code to illustrate that problem (seen on -current
> >i386 and amd64). The same program, on FreeBSD/i386 6.1, show 2 ptrace
> >calls for each syscall as i expected.
> >
> >njoly@hal [~]> uname -a
> >NetBSD hal.sis.pasteur.fr 4.99.17 NetBSD 4.99.17 (HAL) #2: Wed Apr 11
> >15:01:57 CEST 2007 
> >njoly@hal.sis.pasteur.fr:/local/src/NetBSD/obj/i386/sys/arch/i386/compile/HAL
> >i386
> >
> >njoly@hal [~]> ./ptrace >/dev/null
> >syscall 0x0 (0).
> >syscall 0xbbbea000 (-1145135104).
> >syscall 0x3 (3).
> >syscall 0x0 (0).
> >syscall 0xbbbe9000 (-1145139200).
> >syscall 0x0 (0).
> >syscall 0x0 (0).
> >syscall 0x2 (2).
> >syscall 0x2 (2).
> >syscall 0x2 (2).
> >[...]
> >njoly@hal [~]> ktrace -di /bin/echo foo
> >foo
> >njoly@hal [~]> kdump | grep RET
> >  7677      1 echo     RET   execve JUSTRETURN
> >  7677      1 echo     RET   mmap -1145135104/0xbbbea000
> >  7677      1 echo     RET   open 3
> >  7677      1 echo     RET   __fstat30 0
> >  7677      1 echo     RET   mmap -1145139200/0xbbbe9000
> >  7677      1 echo     RET   close 0
> >  7677      1 echo     RET   munmap 0
> >  7677      1 echo     RET   open -1 errno 2 No such file or directory
> >  7677      1 echo     RET   open -1 errno 2 No such file or directory
> >  7677      1 echo     RET   open -1 errno 2 No such file or directory
> >[...]
> >
> >Do i miss something ?
> >Thanks.
> 
> No, seems to have broken after Andy's changes. The calls to stop the
> process are still there... Either the flags are not set properly,
> or the signal does not trasmitted.

The issue here is that stopping is now always deferred until the LWP sleeps
interruptably or returns to userspace. That's so that any locks held over a
sleep can be released by the LWP before it comes to a halt.

I think the solution is to add a proc_stop_now() that checks for a request
to stop from the debugger, and makes it happen immediatley. That would be
called from process_stoptrace() in place of the call to mi_switch() that's
there now. I'll see about changing it to do that.

Cheers,
Andrew