Subject: Some feedback about netbsd on indigo --> part I: a kernel panic
To: None <port-sgimips@netbsd.org>
From: Pierre Letouzey <Pierre.Letouzey@pps.jussieu.fr>
List: port-sgimips
Date: 10/30/2006 18:50:28
Hi all

*** Some context: ***

I'm currently teaching in a compiler course where practical sessions
are based on mips assembly. We normally use various emulators (mainly 
spim, but also gxemul), but for the fun of it I recently searched and
found some real mips hardware: a sgi indigo (IP20) and an indigo 2 
(IP22). I'm now running netbsd 3.0.1 on both, after quite some fight 
(well, I'm new to netbsd...). In the process I've encountered some 
issues I would like to share with you: 

1) a kernel panic when trying to play with syscalls in a non-PIC way
2) a keyboard that isn't working on the IP20 ... until a kernel panic
3) some unstability when testing netbsd 3.1-RC4 on the IP22 (not IP20)

I'll give some details concerning points 2) and 3) in two separate
emails, meanwhile let's describe point 1): 


*** The issue: ***

On both IP20 and IP22 machines, with netbsd 3.0.1, kernel
GENERIC32_IP2x (ecoff flavour), I obtain a kernel crash when running
an application test_triv obtained from the following file ...

------------------ test_triv.s ------------------------
        .text
        .align 2
        .globl  __start

__start: 
        li $2,1    # $v0 = 1 i.e. syscall exit
        syscall
--------------------------------------------------------

... and compiled in the following way:

  gcc -nostdlib -mno-abicalls -fno-pic -static -o test_triv test_triv.s

Here is the displayed messages when the crash occurs: 

TLB miss (load or instr. fetch) in kernel mode
status=0xff23, cause=0x20000008, epc=0x0, vaddr=0x0
pid=447 cmd=test_triv usp=0x7fffde10 ksp=0xc41b7e78
Stopped in pid 447.1 (test_triv) at     0:     invalid address.

Then I ends on what looks like a kernel debugger, I tried to print
various things, but didn't learned much. Then typing "continue" twice
lead me to a reboot. By the way, I should of course mention that I was
running this lethal program as an unprivileged user.

Now, a few words on why on earth have I tried such an assembly code:
our compiler course is normally the first encounter with assembly for
almost all our students, so we try to keep things simple (but
realistic enough). Here in particular I was trying to run an example
in a non-PIC mode (without $gp handling and .cpload / .cprestore and
so on). I first tried to link my hand-coded assembly with some stuff
from the libc (mainly printf), ending only with a warning by ld about
mixing PIC and non-PIC code, and on a segfault when executing. Then I
tried to bypass the libc by writing direct syscalls like read and
write. The exit syscall above is only a simplified version of that.

Ok, what I'm doing is probably evil. But the funny thing is that it's
actually working on the gxemul mips emulator, simulating a pmax
machine (DECstation), with either netbsd/pmax or debian as OS.
Surely, a success on an emulator doesn't mean much, since trickier
things can happen in real life, but I would be interested by any
insight about this different behavior. 

Finally, I've tried both an older and a more recent netbsd: 
- On a 1.6.2, I get an error when lauching my weird binary:
    ./test_triv: Exec format error. Wrong Architecture.
- On a 3.1-RC4, same behavior as the one of 3.0.1

Best regards,


Pierre Letouzey

PS: please CC me, since I've not subscribed to this list

PPS: I've seen that the preferred way to report bugs is via the 
send-pr tool, but my netbsd experimental systems have no configured
mailer...