5.1 vs gdb

To: tech-kern%netbsd.org@localhost
Subject: 5.1 vs gdb
From: Mouse <mouse%Rodents-Montreal.ORG@localhost>
Date: Sun, 14 Oct 2012 19:18:27 -0400 (EDT)
I've run into an issue with gdb on 5.1, and ktrace leads me to think
it's likely a kernel issue (hence this list).  It wouldn't surprise me
too much if I were wrong, though; feel free to point me elsewhere if
appropriate.

The surface manifestation is straightforward:

% cat gdbtest.c
int main(void);
int main(void)
{
 return(0);
}
% cc -o gdbtest gdbtest.c -g
% gdb gdbtest
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386--netbsdelf"...
(gdb) run
Starting program: /home/mouse/gdbtest 

at which point nothing I've tried will wake it up, except for
SIGKILLing gdb from another shell, which produces a "sorry, pid %d was
killed: orphaned traced process" message from the kernel and a "Killed"
from my shell, neither of which is surprising.

ps shows the gdb process, a copy of my shell, and a dead zombie, as in

10467 ttyp7 ZW+  0:00.00 (linktarget)
10702 ttyp7 I    0:00.01 gdb gdbtest 
24466 ttyp7 IX+  0:00.00 -local/bin/mcsh -c exec /home/mouse/gdbtest  

My shell does run linktarget as part of its startup script, so its
presence is not that surprising; its presence as a zombie for more than
the barest moment is what's surprising.  Runing gdb under ktrace -i
makes me think the SIGCHLD the shell is wiating for is getting lost:

 25022      1 linktarget CALL  write(1,0xbb902000,7)
 25022      1 linktarget GIO   fd 1 wrote 7 bytes
       "/local\n"
 25022      1 linktarget RET   write 7
 25022      1 linktarget CALL  exit(0)
 24312      1 mcsh     GIO   fd 4 read 7 bytes
       "/local\n"
 24312      1 mcsh     RET   read 7
 24312      1 mcsh     CALL  read(4,0xbfbf1c90,0x4000)
 24312      1 mcsh     GIO   fd 4 read 0 bytes
       ""
 24312      1 mcsh     RET   read 0
 24312      1 mcsh     CALL  close(4)
 24312      1 mcsh     RET   close 0
 24312      1 mcsh     CALL  __sigprocmask14(1,0xbfbf1c50,0xbfbf1c40)
 24312      1 mcsh     RET   __sigprocmask14 0
 24312      1 mcsh     CALL  __sigprocmask14(3,0xbfbf1c40,0)
 24312      1 mcsh     RET   __sigprocmask14 0
 24312      1 mcsh     CALL  __sigprocmask14(1,0xbfbf1bf4,0xbfbf1be4)
 24312      1 mcsh     RET   __sigprocmask14 0
 24312      1 mcsh     CALL  __sigprocmask14(1,0xbfbf1bf4,0)
 24312      1 mcsh     RET   __sigprocmask14 0
 24312      1 mcsh     CALL  __sigsuspend14(0xbfbf1bf4)
 10674      1 gdb      RET   wait4 24312/0x5ef8
 10674      1 gdb      CALL  ptrace(PT_GETREGS,0x5ef8,0xbfbfe19c,0)
 10674      1 gdb      RET   ptrace 0
 10674      1 gdb      CALL  ptrace(PT_CONTINUE,0x5ef8,1,0x14)
 10674      1 gdb      RET   ptrace 0
 24312      1 mcsh     RET   __sigsuspend14 -1 errno 4 Interrupted  system call
 24312      1 mcsh     CALL  __sigprocmask14(1,0xbfbf1bf4,0)
 24312      1 mcsh     RET   __sigprocmask14 0
 24312      1 mcsh     CALL  __sigsuspend14(0xbfbf1bf4)
 10674      1 gdb      CALL  wait4(0xffffffff,0xbfbfe408,0,0)

(I SIGKILL gdb at this point)

 10674      1 gdb      RET   wait4 RESTART
 10674      1 gdb      PSIG  SIGKILL SIG_DFL: code=SI_USER sent by pid=14918, 
uid=101)
 24312      1 mcsh     RET   __sigsuspend14 -1 errno 4 Interrupted system call
 24312      1 mcsh     PSIG  SIGKILL SIG_DFL: code=SI_NOINFO

The PT_CONTINUE call does make it look as though gdb is doing the right
thing here but signal delivery isn't happening.

Running that mcsh -c exec command under control of ktrace _without_ gdb
being involved produces

 25339      1 linktarget CALL  write(1,0xbb902000,7)
 25339      1 linktarget GIO   fd 1 wrote 7 bytes
       "/local\n"
 25339      1 linktarget RET   write 7
 25339      1 linktarget CALL  exit(0)
 25061      1 mcsh     GIO   fd 4 read 7 bytes
       "/local\n"
 25061      1 mcsh     RET   read 7
 25061      1 mcsh     CALL  read(4,0xbfbf1cb0,0x4000)
 25061      1 mcsh     GIO   fd 4 read 0 bytes
       ""
 25061      1 mcsh     RET   read 0
 25061      1 mcsh     CALL  close(4)
 25061      1 mcsh     RET   close 0
 25061      1 mcsh     CALL  __sigprocmask14(1,0xbfbf1c70,0xbfbf1c60)
 25061      1 mcsh     RET   __sigprocmask14 0
 25061      1 mcsh     CALL  __sigprocmask14(3,0xbfbf1c60,0)
 25061      1 mcsh     RET   __sigprocmask14 0
 25061      1 mcsh     CALL  __sigprocmask14(1,0xbfbf1c14,0xbfbf1c04)
 25061      1 mcsh     RET   __sigprocmask14 0
 25061      1 mcsh     CALL  __sigprocmask14(1,0xbfbf1c14,0)
 25061      1 mcsh     RET   __sigprocmask14 0
 25061      1 mcsh     CALL  __sigsuspend14(0xbfbf1c14)
 25061      1 mcsh     RET   __sigsuspend14 -1 errno 4 Interrupted  system call
 25061      1 mcsh     PSIG  SIGCHLD caught handler=0x806a110  mask=(2,20): 
code=CLD_EXITED child pid=25339, uid=101,  status=0,  utime=0, stime=0)
 25061      1 mcsh     CALL  wait4(0xffffffff,0xbfbf1818,3,0xbfbf17d0)
 25061      1 mcsh     RET   wait4 25339/0x62fb

and everything carries on correctly.

So it looks to me as though something's busted somewhere around
PT_CONTINUE and signal delivery, at least in the cas eof SIGCHLD.

Any thoughts?

I have a workaround - "env SHELL=/bin/sh gdb ..." - that presumably
works because I have no startup script for /bin/sh, so it doesn't need
SIGCHLD to work.  (In passing, is there an equivalent setting from
within gdb?  I haven't found one, but gdb's documentation is remarkably
difficult to use.  The most I've found is a variable that says whether
to use a shell, not what shell to use.  I tried gdb's environment
setting but that didn't help.)

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
Follow-Ups:
- re: 5.1 vs gdb
  - From: matthew green
- Re: 5.1 vs gdb
  - From: Christos Zoulas
Prev by Date: Re: NetBSD vs Solaris condvar semantics
Next by Date: Re: 5.1 vs gdb
Previous by Thread: Kernel based virtual machine
Next by Thread: Re: 5.1 vs gdb
Indexes:
Home | Main Index | Thread Index | Old Index