Subject: Re: Upgrading to -current
To: der Mouse <mouse@Holo.Rodents.Montreal.QC.CA>
From: Mark Newton <newton@cleese.apana.org.au>
List: port-sun3
Date: 08/28/1996 00:32:46
der Mouse wrote:

 > This sounds like the "everything dumps core" bug.  It happens on some
 > machines with depressing reproducibility; it doesn't happen on others.
 > For example, I have a -3/150 that it happens on and a -3/260 that has
 > been up for a couple of weeks (of software building, too) without a
 > single strange coredump.  It's a hard bug and AFAIK nobody knows what's
 > doing it, though I think gwr has some vague guesses.
 
I'd be interested to see if the 3-chip-SIMM vs 9-chip-SIMM problems
that used to plague a few Sun-3 platforms under SunOS could cause
spurious bugs under NetBSD.  Shrug.

'tis strange:  I ran "make build" over the weekend and let it build
the entire system without triggering the bug;  Yesterday I brought
home a tape with X11R6 on it and started up the X-server;  "Everything
dumps core" within minutes.  <sigh>

 > > ... from which I surmise that exec() has stopped working again :-)
 > 
 > It's not really exec().  It appears to have something to do with shared
 > libraries; an executable that uses no shared libraries will
 > (invariably, in my experience) work, even when "everything dumps core".
 
A Sun-3 dependency in mmap() ?   That might also explain why it seems
to occur with greater frequency in times of high memory demand...

It seems to be uid-dependent too: Although the first core dump can
result from virtually any process, I've found that only processes owned
by root (or setuid root) dump core once things start falling over.  This
is a damn shame, because "login" and "getty" are setuid root:  Typing
a username on a spare terminal when "everything is dumping core" sends
the terminal into a catatonic state :-/  Non-root processes don't seem
to care:  I've had no problems running all kinds of dynamically linked
programs as a pleb user.

Keeping in mind that I've only picked up NetBSD/sun3 a week or two
ago:  You mentioned that the bug occurs "with depressing reproducibility."
Does anyone know what, exactly, triggers it?  I've seen no pattern thus
far...  (not that I was looking that hard 'til I read your email this 
evening, mind you...)

 > I'm surprised the binary snapshot you got is that old.  If you want, I
 > can give someone a tar of binaries built from -current, and/or a dd of
 > a zip disk I keep as a disaster-recovery disk....
 
Too late :-)  "make install" finished at about 1:00am Monday, according
to my build log.  Yawn.

    - mark

--------------------------------------------------------------------
I tried an internal modem,                newton@cleese.apana.org.au
     but it hurt when I walked.                          Mark Newton
----- Voice: +61-8-3732429 --------------- Data: +61-8-3736006 -----