Subject: Re: Cross-building
To: None <fvdl@netbsd.org>
From: Mark Kettenis <kettenis@chello.nl>
List: port-amd64
Date: 10/22/2003 23:04:36
   Date: Tue, 21 Oct 2003 13:52:27 +0200
   From: Frank van der Linden <fvdl@netbsd.org>

   On Mon, Oct 20, 2003 at 11:43:26PM +0200, Mark Kettenis wrote:
   > The behaviour has changed.  The SIGALRM test pass now, but
   > unfortunately there is a test further down in signal.exp that hangs
   > the machine; I can't ping the machine and the console is dead.  I'll
   > need to investigate this, since I don't even know exactlt what test is
   > failing.

   Let me know if I can do anything to help, or how to reproduce the
   problem, so that I can debug the possible kernel issue lurking here.
   I appreciate your work on this.

I have experimented a bit and have pin-pointed the test that crashes
the machine.  The following scenario can be used to reproduce the problem:

$ gdb sandbox/virgin-gdb/obj/gdb/testsuite/gdb.base/run 
GNU gdb 5.3nb1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64--netbsd"...
(gdb) b main
Breakpoint 1 at 0x400a9f: file ../../../src/gdb/testsuite/gdb.base/run.c, line 59.
(gdb) r
Starting program: /home/kettenis/sandbox/virgin-gdb/obj/gdb/testsuite/gdb.base/run 

Breakpoint 1, main (argc=1, argv=0x7f7ffffffa48, envp=0x7f7ffffffa58)
    at ../../../src/gdb/testsuite/gdb.base/run.c:59
59      ../../../src/gdb/testsuite/gdb.base/run.c: No such file or directory.
        in ../../../src/gdb/testsuite/gdb.base/run.c
(gdb) signal 5
Continuing with signal SIGTRAP.

At this point the machine crashes.  Sometimes I get a message on the
console that starts with:

fatal protection fault in supervisor mode
rip ffffffff80100594

which corresponds to the 

   movq    %rcx,%cr0

instruction in locore.S:switch_exited.

The system call that crashes the machine is:

  ptrace (PT_CONTINUE, PID, 1, SIGTRAP);

I also tried a few other signals, and it seems that only those signals
whose default behaviour is to produce a core dump (i.e. SIGQUIT,
SIGILL, SIGTRAP, SIGABRT, SIGEMT, SIGFPE, SIGBUS, SIGSEGV, SIGSYS)
produce the crash.

Hope this bit of info helps.

Mark