Subject: port-sparc/5016: 1.3/sparc suffers "everything dumps core" too!?!?!
To: None <gnats-bugs@gnats.netbsd.org>
From: None <woods@always.weird.com>
List: netbsd-bugs
Date: 02/18/1998 22:58:44
>Number:         5016
>Category:       port-sparc
>Synopsis:       1.3/sparc suffers "everything dumps core" too!?!?!
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Feb 18 20:05:00 1998
>Last-Modified:
>Originator:     Greg A. Woods
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Release:        NetBSD-1.3
>Environment:

Hardware: sparcstation-2, 64MB RAM, 2x2.14GB SCSI
System: NetBSD always 1.3 NetBSD 1.3 (GENERIC_SCSI3) #0: Thu Jan 1 19:03:39 MET 1998 pk@flambard:/usr/src1/sys/arch/sparc/compile/GENERIC_SCSI3 sparc

>Description:

I've just acquired a SparcStation-2 and have installed NetBSD-1.3/sparc
downloaded from ftp.netbsd.org onto it.

It is headless, has 64MB RAM, and has two external 2.14GB Quantum SCSI
drives attached.

The machine has been running now for less than 24 hours:

18:38 [4] $ w
 6:38PM  up 17:52, 6 users, load averages: 2.15, 2.25, 1.87

... and has been used to move some files around, unpack the source tree,
sup the pkgsrc collection, and do a little bit of compiling.

This evening I found I had to re-run 'ldconfig -m /usr/X11R6/lib'.
(because otherwise the X11 libraries were no longer found -- I had
done this manually last night and had successfully started Xterms).

Unfortunately I didn't collect enough information to prove that I
had not inadvertantly typed 'ldconfig' in the mean time (I've since
learned about the badly named /etc/ld.so.conf file to eliminate
the need to remember command-line arguments).

I then started 'make' in two /usr/pkgsrc directories (ctwm & ssh).

Wanting to check on a few other things I tried rsh'ing over to start
another Xterm (with -ls).  During execution of some of my .profile ksh
dumped core at least twice.  This is very abnormal, and no there were no
parity errors recorded by the kernel (I'm assuming they are detected on
SS2's).  Subsequent xterms started after the compiles finished work fine.

The stack backtrace is essentially useless, but does hint at some kind
of memory corruption problem:

18:49 [16] $ gdb /bin/ksh ksh.core
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-netbsd), Copyright 1996 Free Software Foundation, Inc...
(no debugging symbols found)...
Core was generated by `ksh'.
Program terminated with signal 4, Illegal instruction.
#0  0x80 in ?? ()
(gdb) where
#0  0x80 in ?? ()
#1  0xc in ?? ()
(gdb) quit


(I fail to see why binaries should be stripped in the distribution.  I'm
not suggesting full '-g' symbols, just the normal symbols for stack trace
usage.  People really short of disk can strip things after they install.)


The combination of the two above problems are extremely remnicent of
similar problems that the Sun3 port was suffering with up until recently
and which it still suffers from in diskless configurations.  Luckily this
seems less common so far than on the Sun3.  I'll continue working away
to build this into a configured server and will report any new errors....

>How-To-Repeat:

Work a NetBSD-1.3/sparc system really hard and watch for failures.

>Fix:

Unknown.

>Audit-Trail:
>Unformatted: