Subject: [repost] --
To: None <netbsd-help@NetBSD.ORG>
From: steve farrell <sfarrell@healthquiz.com>
List: netbsd-help
Date: 02/20/1997 06:56:37
--sorry-- not trying to be obnoxious, but i didn't get any responses
to this when i sent it a couple of days ago.  i'm thinking maybe
it never really made it to the list (?), or am i just coming from
another planet here? thanks for your help & time.

------original-----

i posted a while ago about core dumps with ranlib while attempting to
compile ssh on a out-of-the-box netbsd1.2/sparc (SS1) box.  unfortunately,
i didn't get any response but i promised to come back with the backtrace
from the core dump after i got a chance.  so today i downloaded all the
sources and recompiled ranlib with debugging symbols on, and proceeded
to reproduce the error.

it bails with a bus error (signal 10)

the error actually occurs in the usr.sbin/ar/archive.c in a function called
get_arobj().  i did this twice, so it's not just randomly coredumping
(as some people have had happen with netbsd/sparc, but i think mostly
on SS10s...)

when i first posted i assumed this was a known problem, but i guess not;
does this give enough info that someone can recognize what's going on?

incidentally, the machine in question was up for about a month without
incident, albeit under relatively light load (email, dns, light web,
user-ftp).  but when generating these core dumps, i actually got a panic!
i was very surprised.  here's the trace:

[some inane blabbering deleted]


Feb 18 03:27:55 xxxxxxxx /netbsd: /usr: bad dir ino 42276 at offset 0: mangled entry
Feb 18 03:27:55 xxxxxxxx /netbsd: panic: bad dir
Feb 18 03:27:55 xxxxxxxx /netbsd: syncing disks... 3 3 esp: invalid reselect (idbit=0x 8)
Feb 18 03:27:55 xxxxxxxx /netbsd: esp0: identify failed
Feb 18 03:27:56 xxxxxxxx /netbsd: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 giving up
Feb 18 03:27:56 xxxxxxxx /netbsd: Frame pointer is at 0xf97b4a08
Feb 18 03:27:56 xxxxxxxx /netbsd: Call traceback:
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f80f5b68  args = (0, 901fe5, f8130c00, f97b4b28, ffffffff, 97, f97b4a70) fp = 0xf97b4a70
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f8025dd0  args = (100, f8121c00, 1, f97b4b98, 0, 200, f97b4ad8) fp = 0xf97b4ad8
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f80b38b8  args = (100, f8121c00, 1, 0, f80b2ab8, 3ff, f97b4b40) fp = 0xf97b4b40
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f80b302c  args = (f8618900, 0, f80b2ab8, 400, 0, 0, f97b4ba8) fp = 0xf97b4ba8
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f803d37c  args = (0, 0, 0, 4044, 400, 1, f97b4c80) fp = 0xf97b4c80
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f803ce18  args = (f8122400, f85b4280, f8621500, f97b4e28, f97b3000, f85e8800, f97b4d08) fp = 0xf97b4d08
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f8041c84  args = (0, 77b80, 1b420, 77200, ffffffff, 7, f97b4db0) fp = 0xf97b4db0
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f8102af0  args = (f8621500, f97b4f28, f97b4f20, f8041c60, ffffffff, 97, f97b4ec0) fp = 0xf97b4ec0
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = f8007800  args = (bc, f97b4fb0, 0, 103, 6b884, 100, f97b4f50) fp = 0xf97b4f50
Feb 18 03:27:56 xxxxxxxx /netbsd:   pc = 4eb0  args = (77b80, f7fff458, 6ef0d, 6eee8, 9584c, f97b4fb0, f7fff398) fp = 0xf7fff398
Feb 18 03:27:56 xxxxxxxx /netbsd: 
Feb 18 03:27:56 xxxxxxxx /netbsd: dumping to dev 701, offset 8
Feb 18 03:27:56 xxxxxxxx /netbsd: dump esp: message queue not empty: 4!
Feb 18 03:27:56 xxxxxxxx /netbsd: succeeded
Feb 18 03:27:57 xxxxxxxx /netbsd: rebooting
Feb 18 03:27:57 xxxxxxxx /netbsd: 

...and for more info about my kernel and machine:

Feb 18 03:27:57 xxxxxxxx /netbsd: Copyright (c) 1982, 1986, 1989, 1991, 1993
Feb 18 03:27:57 xxxxxxxx /netbsd: 	The Regents of the University of California.  All rights reserved.
Feb 18 03:27:57 xxxxxxxx /netbsd: 
Feb 18 03:27:57 xxxxxxxx /netbsd: NetBSD 1.2 (GENERIC_SCSI3) #6: Fri Sep 27 22:01:55 MET DST 1996
Feb 18 03:27:57 xxxxxxxx /netbsd:     pk@kwik:/usr/src1/sys/arch/sparc/compile/GENERIC_SCSI3
Feb 18 03:27:57 xxxxxxxx /netbsd: real mem = 16728064
Feb 18 03:27:57 xxxxxxxx /netbsd: avail mem = 14163968
Feb 18 03:27:57 xxxxxxxx /netbsd: using 204 buffers containing 835584 bytes of memory
Feb 18 03:27:57 xxxxxxxx /netbsd: bootpath: /sbus0/esp0/sd@0,0
Feb 18 03:27:57 xxxxxxxx /netbsd: mainbus0 (root)
Feb 18 03:27:58 xxxxxxxx /netbsd: cpu0 at mainbus0: Sun 4/60 (MB86900/1A or L64801 @ 20 MHz, WTL3170/2 FPU)
Feb 18 03:27:58 xxxxxxxx /netbsd: cpu0: 65536 byte write-through, 16 bytes/line, sw flush cache enabled
Feb 18 03:27:58 xxxxxxxx /netbsd: memreg0 at mainbus0 ioaddr 0xf4000000
Feb 18 03:27:58 xxxxxxxx /netbsd: clock0 at mainbus0 ioaddr 0xf2000000: mk48t02 (eeprom)
Feb 18 03:27:59 xxxxxxxx /netbsd: timer0 at mainbus0 ioaddr 0xf3000000 delay constant 7
Feb 18 03:27:59 xxxxxxxx /netbsd: auxreg0 at mainbus0 ioaddr 0xf7400000
Feb 18 03:28:00 xxxxxxxx /netbsd: zs0 at mainbus0 ioaddr 0xf1000000 pri 12, softpri 6
Feb 18 03:28:00 xxxxxxxx /netbsd: zs0a: console i/o
Feb 18 03:28:00 xxxxxxxx /netbsd: zs1 at mainbus0 ioaddr 0xf0000000 pri 12, softpri 6
Feb 18 03:28:00 xxxxxxxx /netbsd: fdc0 at mainbus0 ioaddr 0xf7200000 pri 11, softpri 4: chip 82072
Feb 18 03:28:00 xxxxxxxx /netbsd: fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
Feb 18 03:28:01 xxxxxxxx /netbsd: audio0 at mainbus0 ioaddr 0xf7201000 pri 13, softpri 4
Feb 18 03:28:01 xxxxxxxx /netbsd: sbus0 at mainbus0 ioaddr 0xf8000000: clock = 25 MHz
Feb 18 03:28:02 xxxxxxxx /netbsd: dma0 at sbus0 slot 0 offset 0x400000: rev 1
Feb 18 03:28:02 xxxxxxxx /netbsd: esp0 at sbus0 slot 0 offset 0x800000 pri 3: ESP100 25Mhz, target 7
Feb 18 03:28:02 xxxxxxxx /netbsd: scsibus0 at esp0
Feb 18 03:28:02 xxxxxxxx /netbsd: sd0 at scsibus0 targ 3 lun 0: <MICROP, 2105-08MZ1001002, HZ48> SCSI1 0/direct fixed
Feb 18 03:28:02 xxxxxxxx /netbsd: sd0: 532MB, 1760 cyl, 8 head, 77 sec, 512 bytes/sec
Feb 18 03:28:02 xxxxxxxx /netbsd: le0 at sbus0 slot 0 offset 0xc00000 pri 5: address 08:00:20:07:49:04
Feb 18 03:28:03 xxxxxxxx /netbsd: le0: 8 receive buffers, 2 transmit buffers
Feb 18 03:28:04 xxxxxxxx /netbsd: root on sd0a
Feb 18 03:28:04 xxxxxxxx /netbsd: /dev/sd0a: file system not clean; please fsck(8)
Feb 18 03:27:57 xxxxxxxx savecore: reboot after panic: bad dir
Feb 18 03:27:57 xxxxxxxx savecore: /var/crash/bounds: No such file or directory
Feb 18 03:27:57 xxxxxxxx savecore: writing core to /var/crash/netbsd.0.core
Feb 18 03:28:25 xxxxxxxx savecore: writing kernel to /var/crash/netbsd.0

so... what's the moral here?  is my kernel screwed up?  my disk?  also,
with respect to ranlib core dumps (assuming they're unrelated to this
panic...)  should i grab the ar and ranlib from -current?  is there a
known-bug in the 1.2 distribution here?

thanks -- steve farrell

btw -- i should probably just read up on this, but do y'all have a way
to track bugfixed sources like freebsd-stable, or only the -current stuff?


------- End of Forwarded Message