netbsd-bugs: kern/9075: heavy use of FFS on Sparc machines causes kernel faults

Subject: kern/9075: heavy use of FFS on Sparc machines causes kernel faults
To: None <gnats-bugs@netbsd.org>
From: None <buhrow@lothlorien.nfbcal.org>
List: netbsd-bugs
Date: 12/29/1999 17:19:25
>Number:         9075
>Category:       kern
>Synopsis:       heavy use of FFS on Sparc machines causes kernel faults
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 29 17:18:00 1999
>Last-Modified:
>Originator:     Brian Buhrow
>Organization:
National Federation of the Blind of California
	
>Release:        1.4.2_ALPHA [also 1.4.1, 1.4, ...]
>Environment:
    	
system: NetBSD/Sparc 1.4.2_ALPHA (sun4m, Sparc station 5, 256MB memory)


>Description:
	Hello.  I've been having a long-standing problem with NetBSD on the 
Sparc architecture, which should be archived on the port-sparc archives.
However, my search for the cure has lead me to a problem that might affect
all ports.  Thus, I am forwarding my description of the problem to the
NetBSD community in the hope that some might have seen the problem
elsewhere, and to give non-Sparc users a heads up to the notion that there
might be a problem in the machine-independent portion of the code.
For those that are interested, I will happily provide the prequal mail
which describes my problem in more detail, including tracebacks, and the
like.

	OK.  I've narrowed the problem further, but my understanding of what's
going on simultaneously in the kernel is slowing my progress.  I believe
there is a race condition in the kernel whenever a new inode is created in
an FFS filesystem on Sparc machines.  I do not know if this problem extends
to architectures outside the Sparc, but on 1.4.1, and 1.4.2, if many inodes
are being created and destroyed simultaneously on an FFS filesystem, such
as might happen on a news server, there is some sort of context race
whereby when ffs_valloc() calls ffs_hashalloc() with the  allocator pointed
at ffs_nodealloccg, a condition occurs causing a memory exception to occur
just at the point when ffs_nodealloccg would call skpc() for the first
time.  If inodes are created on a quiescent system, all works fine.  This
condition isn't extremely predictable, but I have a system now where it
occurs quite consistently.  Also, because it only happens when
ffs_nodealloccg() is used to allocate a new inode, the problem doesn't
occur immediately on a new filesystem.  As inodes are allocated and freed,
and the location of free inodes becomes fragmented with respect to the
cylinder groups, or at least as the likelyhood of the preferred inode being
free becomes less and less, the conditions causing this panic become more
and more prevalent.
	Could someone explain to me how the locking mechanism works for inode 
allocation/deallocation for the FFS filesystem?  I don't see many sleeps in
the FFS or UFS code, but I don't know what steps are taken to keep the
bottom half of the kernel from corrupting the top structures.
	If someone could suggest a document to read that might help with this, or
if someone could suggest a fix, that would be great.
I'm nearly at my wit's end on this machine, and I think this has been my
problem since 1.3, though I couldn't pinpoint it this clearly before.

Any help would be greatly appreciated.
-thanks
-Brian



From buhrow Mon Dec 27 17:03:52 1999
          by lothlorien.nfbcal.org (8.9.3/8.8.4.nfbcal.org)
	  id RAA18961; Mon, 27 Dec 1999 17:03:44 -0800 (PST)
Message-Id: <199912280103.RAA18961@lothlorien.nfbcal.org>
From: buhrow@lothlorien.nfbcal.org (Brian Buhrow)
Date: Mon, 27 Dec 1999 17:03:44 -0800
In-Reply-To: buhrow@lothlorien.nfbcal.org (Brian Buhrow)
       "Re: Problems with NetBSD 1.4.1+ on Sparc 5" (Dec 27,  1:36pm)
To: root@ihack.net (Charles M. Hannum)
Subject: Re: Problems with NetBSD 1.4.1+ on Sparc 5
Cc: David Brownlee <abs@mono.org>, port-sparc@netbsd.org, buhrow

	To follow up on my own post of earlier today, I tried the 1.4.2_ALPHA kernel
from ftp.netbsd.org and find that it panics regularly every 30 minutes,
just when innd gets ready to start receiving articles in mass.  Again, I've
notated a backtrace and find that ffs_nodealloccg causes the machine to
throw a memfault_sun4m exception.  I think there's potentially a real
problem here, anyone able to shed light on what it is?
-Brian

Script started on Mon Dec 27 15:47:05 1999
%ftp  nf  news.zodal^C
%

NetBSD/sparc (news) (console)

login: data fault: pc=0xf01c463c addr=0xef4f25d9 sfsr=226<PERR=0,LVL=2,AT=1,FT=1,FAV,OW>
panic: kernel fault
syncing disks... panic: lockmgr: locking against myself
Frame pointer is at 0xf8a7c440
Call traceback:
  pc = 0xf01988bc [_cpu_reboot] args = (0x40010e6, 0x4001fe6, 0x0, 0x0, 0xf8a7c558, 0x49010e6, 0xf8a7c4a8) fp = 0xf8a7c4a8
  pc = 0xf004ec88 [_panic] args = (0x104, 0x0, 0x0, 0xf8a7c4ec, 0xf0797f00, 0x16b6b74, 0xf8a7c510) fp = 0xf8a7c510
  pc = 0xf00418c4 [trapbase_sun4] args = (0xf00414b8, 0x104, 0xf0041880, 0x2, 0x0, 0x1000, 0xf8a7c578) fp = 0xf8a7c578
  pc = 0xf01380bc [_ufs_lock] args = (0xf8c09704, 0x10, 0xf8c08510, 0x0, 0xf8f59c84, 0xf0002000, 0xf8a7c5e0) fp = 0xf8a7c5e0
  pc = 0xf00746d0 [_vn_lock] args = (0xf8a7c6a8, 0xf01380a8, 0x78, 0xf082de00, 0xf014b410, 0x0, 0xf8a7c648) fp = 0xf8a7c648
  pc = 0xf006d17c [_vget] args = (0xf8c08480, 0x10012, 0xf0041a70, 0x6, 0x31, 0x0, 0xf8a7c6c0) fp = 0xf8a7c6c0
  pc = 0xf01261e8 [_ffs_sync] args = (0xf8c08480, 0x10012, 0x0, 0x31, 0x2, 0x1, 0xf8a7c728) fp = 0xf8a7c728
  pc = 0xf006fcc0 [_sys_sync] args = (0x0, 0x2, 0xf081d700, 0xf0203848, 0xf0126144, 0xf020c8d0, 0xf8a7c7a8) fp = 0xf8a7c7a8
  pc = 0xf006ed24 [_vfs_shutdown] args = (0xf0203848, 0x0, 0x0, 0xf876a8c0, 0xf020c000, 0x31, 0xf8a7c810) fp = 0xf8a7c810
  pc = 0xf0198888 [_cpu_reboot] args = (0xf0203800, 0xf0002000, 0xf01e3800, 0x0, 0xf8a7c928, 0x49010e7, 0xf8a7c878) fp = 0xf8a7c878
  pc = 0xf004ec88 [_panic] args = (0x100, 0x0, 0x0, 0x0, 0xf8a7c99c, 0x49010e0, 0xf8a7c8e0) fp = 0xf8a7c8e0
  pc = 0xf01a8f00 [_mem_access_fault4m] args = (0xf01a8600, 0x100, 0xef4f25d9, 0xf8a7c9a8, 0x1e, 0x1, 0xf8a7c948) fp = 0xf8a7c948
  pc = 0xf0008518 [_memfault_sun4m] args = (0x0, 0x226, 0xef4f25d9, 0xf8a7ca50, 0xf8a7b564, 0x2710, 0xf8a7c9f0) fp = 0xf8a7c9f0
  pc = 0xf0121a08 [_ffs_nodealloccg] args = (0xff, 0x1f1ed20, 0xef4f25d9, 0x90255, 0xffffffff, 0xf80, 0xf8a7caa0) fp = 0xf8a7caa0
  pc = 0xf012048c [_ffs_hashalloc] args = (0x1f1ed20, 0xc7, 0x2, 0x81b4, 0x2, 0xf0866600, 0xf8a7cb10) fp = 0xf8a7cb10
  pc = 0xf0120068 [_ffs_valloc] args = (0xf8c096d0, 0xc7, 0xc0c82, 0x81b4, 0xf012183c, 0xf80, 0xf8a7cb78) fp = 0xf8a7cb78
  pc = 0xf0138978 [_ufs_makeinode] args = (0xf8a7cc58, 0xf011ffdc, 0x98, 0xf8a7cc54, 0xf082de00, 0xf083e400, 0xf8a7cbf0) fp = 0xf8a7cbf0
  pc = 0xf0135864 [_ufs_create] args = (0x81b4, 0xf8c08480, 0xf8a7ce68, 0xf8a7ce7c, 0x8000, 0xf8a7cd58, 0xf8a7cc78) fp = 0xf8a7cc78
  pc = 0xf00738b0 [_vn_open] args = (0xf8a7cd40, 0x10, 0xf0135838, 0xf082de00, 0xf01d1400, 0xf8c08480, 0xf8a7cce0) fp = 0xf8a7cce0
  pc = 0xf00704b4 [_sys_open] args = (0x0, 0x602, 0x1b4, 0x1b4, 0x208, 0x0, 0xf8a7cdc8) fp = 0xf8a7cdc8
  pc = 0xf01a92a0 [_syscall] args = (0x0, 0xf877ad70, 0xf8a7cf20, 0xf0070438, 0x68, 0x26848, 0xf8a7cec0) fp = 0xf8a7cec0
  pc = 0xf00087b0 [syscall] args = (0x5, 0xf8a7cfb0, 0x10098898, 0xa, 0x0, 0x12d1, 0xf8a7cf50) fp = 0xf8a7cf50
  pc = 0x36d0  args = (0xeffff3a0, 0x601, 0x1b4, 0xeffff3c4, 0x6, 0xffffffff, 0xefffea50) fp = 0xefffea50

dump to dev 7,1 not possible
sd4: WARNING: cache synchronization failed
sd3: WARNING: cache synchronization failed
rebooting

Resetting ... 
SPARCstation 5, No Keyboard
ROM Rev. 2.15, 256 MB memory installed, Serial #7788135.
Ethernet address 8:0:20:76:d6:67, Host ID: 8076d667.



Initializing Memory |/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\                                                                      Rebooting with command: 
Boot device: /iommu/sbus/espdma@5,8400000/esp@5,8800000/sd@3,0  File and args: 
 >> NetBSD/sparc Secondary Boot, Revision 1.9
 >> (pk@flambard, Tue Oct 19 11:10:24 MEST 1999)
Booting netbsd
entry: 0x4000, bootinfo: 0x245b20
bootinfo[0]=0x245f20; bootinfo[1]=0x245b28
nsym=0x1881c, ssym=0x2103c8, esym=0x245b1c
OBP version 3, revision 2.15 (plugin rev 2)
console is ttya
Copyright (c) 1996, 1997, 1998, 1999
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.4.2_ALPHA (GENERIC_SCSI3) #2: Tue Dec 21 16:15:10 MET 1999
    he@ken.runit.sintef.no:/usr/src/sys/arch/sparc/compile/GENERIC_SCSI3
real mem = 268029952
avail mem = 252063744
using 1792 buffers containing 7340032 bytes of memory
bootpath: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@3,0
mainbus0 (root): SUNW,SPARCstation-5
cpu0 at mainbus0: MB86904 @ 85 MHz, on-chip FPU
cpu0: 16K instruction (32 b/l), 8K data (16 b/l): cache enabled
obio0 at mainbus0
clock0 at obio0 slot 0 offset 0x200000: mk48t08 (eeprom)
timer0 at obio0 slot 0 offset 0xd00000 delay constant 40
zs0 at obio0 slot 0 offset 0x100000 level 12 softpri 6
zstty0 at zs0 channel 0 (console)
zstty1 at zs0 channel 1
zs1 at obio0 slot 0 offset 0x0 level 12 softpri 6
kbd0 at zs1 channel 0
ms0 at zs1 channel 1
slavioconfig at obio0 slot 0 offset 0x800000 not configured
auxreg0 at obio0 slot 0 offset 0x900000
power0 at obio0 slot 0 offset 0x910000 level 2
fdc0 at obio0 slot 0 offset 0x400000 level 11 softpri 4: chip 82077
iommu0 at mainbus0 addr 0x10000000: version 0x4/0x0, page-size 4096, range 64MB
sbus0 at iommu0: clock = 21.250 MHz
dma0 at sbus0 slot 5 offset 0x8400000: rev 2
esp0 at dma0 slot 5 offset 0x8800000 level 4: ESP200, 40MHz, SCSI ID 7
scsibus0 at esp0: 8 targets, 8 luns per target
probe(esp0:1:0): max sync rate 10.00Mb/s
sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST39102LCSUN9.0G, 0828> SCSI2 0/direct fixed
sd1: 8637MB, 4926 cyl, 27 head, 133 sec, 512 bytes/sect x 17689267 sectors
probe(esp0:2:0): max sync rate 10.00Mb/s
sd2 at scsibus0 targ 2 lun 0: <QUANTUM, FIREBALL_TM2110S, 300X> SCSI2 0/direct fixed
sd2: 2014MB, 6810 cyl, 4 head, 151 sec, 512 bytes/sect x 4124736 sectors
probe(esp0:3:0): max sync rate 10.00Mb/s
sd0 at scsibus0 targ 3 lun 0: <SEAGATE, ST5660N  SUN0535, 0522> SCSI2 0/direct fixed
sd0: 520MB, 3002 cyl, 4 head, 88 sec, 512 bytes/sect x 1065664 sectors
SUNW,bpp at sbus0 slot 5 offset 0xc800000 level 2 (ipl 3) not configured
ledma0 at sbus0 slot 5 offset 0x8400010: rev 2
le0 at ledma0 slot 5 offset 0x8c00000 level 6: address 08:00:20:76:d6:67
le0: 8 receive buffers, 2 transmit buffers
audiocs0 at sbus0 slot 4 offset 0xc000000 level 9: CS4231A
audio0 at audiocs0: full duplex
power-management at sbus0 slot 4 offset 0xa000000 not configured
dma1 at sbus0 slot 1 offset 0x100000: rev 1+
esp1 at sbus0 slot 1 offset 0x200000 level 5: ESP100A, 25MHz, SCSI ID 7
scsibus1 at esp1: 8 targets, 8 luns per target
probe(esp1:0:0): max sync rate 5.00Mb/s
sd3 at scsibus1 targ 0 lun 0: <SEAGATE, ST15150N, 0020> SCSI2 0/direct fixed
sd3: 4101MB, 3712 cyl, 21 head, 107 sec, 512 bytes/sect x 8399448 sectors
dma2 at sbus0 slot 2 offset 0x100000: rev 1+
esp2 at sbus0 slot 2 offset 0x200000 level 5: ESP100A, 25MHz, SCSI ID 7
scsibus2 at esp2: 8 targets, 8 luns per target
probe(esp2:3:0): max sync rate 5.00Mb/s
sd4 at scsibus2 targ 3 lun 0: <SEAGATE, ST19171N, 0024> SCSI2 0/direct fixed
sd4: 8683MB, 5268 cyl, 20 head, 168 sec, 512 bytes/sect x 17783112 sectors
root on sd0a dumps on sd0b
root file system type: ffs
WARNING: ccd0: end of partition `a' exceeds the size of ccd (35320320)
swapctl: adding /dev/sd0b as swap device at priority 0
swapctl: adding /dev/sd2b as swap device at priority 0
Automatic boot in progress: starting file system checks.
/dev/rsd0a: 2889 files, 99785 used, 152470 free (2710 frags, 18720 blocks, 1.1% fragmentation)
/dev/rsd0a: MARKING FILE SYSTEM CLEAN
WARNING: ccd0: end of partition `a' exceeds the size of ccd (35320320)
/dev/rsd2a: 37895 files, 669815 used, 879501 free (48949 frags, 103819 blocks, 3.2% fragmentation)
/dev/rsd2a: MARKING FILE SYSTEM CLEAN
/dev/rsd3c: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rsd3c: 46534 files, 427132 used, 3638006 free (94470 frags, 442942 blocks, 2.3% fragmentation)
/dev/rsd3c: MARKING FILE SYSTEM CLEAN
/dev/rccd0c: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rccd0c: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rccd0c: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rccd0c: 1003148 files, 10717203 used, 6390988 free (232956 frags, 769754 blocks, 1.4% fragmentation)
/dev/rccd0c: MARKING FILE SYSTEM CLEAN
setting tty flags
starting network
hostname: news
configuring network interfaces: le0.
adding interface aliases:
WARNING: ccd0: end of partition `a' exceeds the size of ccd (35320320)
starting system logger
checking for core dump...
savecore: no core dump
starting rpc daemons: portmap.
starting nfs daemons:.
creating runtime link editor directory cache.
checking quotas: done.
building databases...
clearing /tmp
updating motd.
turning on accounting
standard daemons: update cron.
starting network daemons: inetd.
starting local daemons:Setting autonicetime to 0
Setting kern.maxvnodes to 10832
Starting news services
INND:  PID file exists -- unclean shutdown!
send-mail: mailwrapper: can't open /etc/mailer.conf: No such file or directory
Starting innd.
.
Mon Dec 27 15:58:12 PST 1999
Dec 27 15:58:13 news init: kernel security level changed from 0 to 1



Script done on Mon Dec 27 16:07:35 1999

From buhrow Tue Dec 28 16:19:12 1999
	  id QAA27846; Tue, 28 Dec 1999 16:19:06 -0800 (PST)
Message-Id: <199912290019.QAA27846@lothlorien.nfbcal.org>
From: buhrow@lothlorien.nfbcal.org (Brian Buhrow)
Date: Tue, 28 Dec 1999 16:19:05 -0800
In-Reply-To: buhrow@lothlorien.nfbcal.org (Brian Buhrow)
       "Re: Problems with NetBSD 1.4.1+ on Sparc 5" (Dec 27,  1:36pm)
To: root@ihack.net (Charles M. Hannum)
Subject: Re: Problems with NetBSD 1.4.1+ on Sparc 5
Cc: David Brownlee <abs@mono.org>, port-sparc@netbsd.org, buhrow,
        netbsd-help@netbsd.org, current-users@netbsd.org


	Hello.  I've been having a long-standing problem with NetBSD on the 
Sparc architecture, which should be archived on the port-sparc archives.
However, my search for the cure has lead me to a problem that might affect
all ports.  Thus, I am forwarding my description of the problem to the
NetBSD community in the hope that some might have seen the problem
elsewhere, and to give non-Sparc users a heads up to the notion that there
might be a problem in the machine-independent portion of the code.
For those that are interested, I will happily provide the prequal mail
which describes my problem in more detail, including tracebacks, and the
like.

	OK.  I've narrowed the problem further, but my understanding of what's
going on simultaneously in the kernel is slowing my progress.  I believe
there is a race condition in the kernel whenever a new inode is created in
an FFS filesystem on Sparc machines.  I do not know if this problem extends
to architectures outside the Sparc, but on 1.4.1, and 1.4.2, if many inodes
are being created and destroyed simultaneously on an FFS filesystem, such
as might happen on a news server, there is some sort of context race
whereby when ffs_valloc() calls ffs_hashalloc() with the  allocator pointed
at ffs_nodealloccg, a condition occurs causing a memory exception to occur
just at the point when ffs_nodealloccg would call skpc() for the first
time.  If inodes are created on a quiescent system, all works fine.  This
condition isn't extremely predictable, but I have a system now where it
occurs quite consistently.  Also, because it only happens when
ffs_nodealloccg() is used to allocate a new inode, the problem doesn't
occur immediately on a new filesystem.  As inodes are allocated and freed,
and the location of free inodes becomes fragmented with respect to the
cylinder groups, or at least as the likelyhood of the preferred inode being
free becomes less and less, the conditions causing this panic become more
and more prevalent.
	Could someone explain to me how the locking mechanism works for inode 
allocation/deallocation for the FFS filesystem?  I don't see many sleeps in
the FFS or UFS code, but I don't know what steps are taken to keep the
bottom half of the kernel from corrupting the top structures.
	If someone could suggest a document to read that might help with this, or
if someone could suggest a fix, that would be great.
I'm nearly at my wit's end on this machine, and I think this has been my
problem since 1.3, though I couldn't pinpoint it this clearly before.

Any help would be greatly appreciated.
-thanks
-Brian


From buhrow Mon Dec 27 13:36:40 1999
          by lothlorien.nfbcal.org (8.9.3/8.8.4.nfbcal.org)
	  id NAA18208; Mon, 27 Dec 1999 13:36:17 -0800 (PST)
Message-Id: <199912272136.NAA18208@lothlorien.nfbcal.org>
From: buhrow@lothlorien.nfbcal.org (Brian Buhrow)
Date: Mon, 27 Dec 1999 13:36:16 -0800
In-Reply-To: root@ihack.net (Charles M. Hannum)
       "Re: Problems with NetBSD 1.4.1+ on Sparc 5" (Nov 22,  5:18pm)
To: root@ihack.net (Charles M. Hannum)
Subject: Re: Problems with NetBSD 1.4.1+ on Sparc 5
Cc: David Brownlee <abs@mono.org>, port-sparc@netbsd.org, buhrow

	Hello folks.  Well, I struggle on with this Sparc 5 that won't stay 
running for more than a few hours.  I've tried the NetBSD-1.4.2_ALPHA
kernel to no avail.  However, I've captured a traceback through a script
session, and filled in the appropriate function names in the trace.  It
appears that I'm suffering from a double panic, once in the lockmgr, but
another caused by a call to memfault_sun4m from ffs_alloccg.  This probably
isn't a call, but rather some sort of harddware trap.  Might anyone be able
to shed light on what's going on here?  Here's the traceback, along with
the dmesg output for the kernel.  If anyone cares, I also have the kernel
core image from the panic session, which I'd be happy to provide for
analysis.
-thanks
-Brian

Script started on Mon Dec 27 10:56:22 1999
panic: kernel fault
syncing disks... panic: lockmgr: locking against myself
Frame pointer is at 0xf9172440
Call traceback:
  pc = 0xf010cf4c [cpu_reboot] args = (0x40000e6, 0x4000fe6, 0x0, 0x0, 0xf9172558, 0x49000e6, 0xf91724a8) fp = 0xf91724a8
  pc = 0xf003caf8 [panic] args = (0x104, 0x0, 0x0, 0xf91724ec, 0xf070af00, 0x16b6b74, 0xf9172510) fp = 0xf9172510
  pc = 0xf002faec [lockmgr] args = (0xf002f6e0, 0x104, 0xf002faa8, 0x2, 0x0, 0x1000, 0xf9172578) fp = 0xf9172578
  pc = 0xf00c8564 [ufs_rmdir] args = (0xf93b5164, 0x10, 0xf93b45b4, 0x0, 0xf9655a18, 0xf0002000, 0xf91725e0) fp = 0xf91725e0
  pc = 0xf005f9c0 [vn_lock] args = (0xf91726a8, 0xf00c8550, 0x78, 0xf07a2f00, 0xf00d9d18, 0x0, 0xf9172648) fp = 0xf9172648
  pc = 0xf0058b9c [vget] args = (0xf93b4528, 0x10012, 0xf002fc98, 0x6, 0x31, 0x0, 0xf91726c0) fp = 0xf91726c0
  pc = 0xf00c11a0 [ffs_sync] args = (0xf93b4528, 0x10012, 0x0, 0x31, 0x2, 0x1, 0xf9172728) fp = 0xf9172728
  pc = 0xf005b390 [sys_sync] args = (0x0, 0x2, 0xf0790700, 0xf016fbe8, 0xf00c10fc, 0xf0174730, 0xf91727a8) fp = 0xf91727a8
  pc = 0xf005a3f4 [vfs_shutdown] args = (0xf016fbe8, 0x0, 0x0, 0xf8e608c0, 0xf0173c00, 0x31, 0xf9172810) fp = 0xf9172810
  pc = 0xf010cf18 [cpu_reboot] args = (0xf016f800, 0xf0002000, 0xf0142c00, 0x0, 0xf9172928, 0x49000e7, 0xf9172878) fp = 0xf9172878
  pc = 0xf003caf8 [panic] args = (0x100, 0x0, 0x0, 0x0, 0xf917299c, 0x49000e0, 0xf91728e0) fp = 0xf91728e0
  pc = 0xf01186b0 [mem_access_fault4m] args = (0xf0118340, 0x100, 0xef3385d9, 0xf91729a8, 0x1e, 0x9c, 0xf9172948) fp = 0xf9172948
  pc = 0xf0006254 [memfault_sun4m] args = (0x0, 0x226, 0xef3385d9, 0xf9172a50, 0xf9171564, 0xf8e5cd20, 0xf91729f0) fp = 0xf91729f0
  pc = 0xf00bcc60 [ffs_nodealloccg] args = (0xff, 0x1f1ed20, 0xef3385d9, 0x90255, 0xffffffff, 0xf80, 0xf9172aa0) fp = 0xf9172aa0
  pc = 0xf00bb6e4 [ffs_hashalloc] args = (0x1f1ed20, 0xc7, 0x0, 0x81b4, 0x2, 0xf07e3500, 0xf9172b10) fp = 0xf9172b10
  pc = 0xf00bb2c0 [ffs_valloc] args = (0xf93b5130, 0xc7, 0xc0c80, 0x81b4, 0xf00bca94, 0xf80, 0xf9172b78) fp = 0xf9172b78
  pc = 0xf00c8e20 [ufs_makeinode] args = (0xf9172c58, 0xf00bb234, 0x98, 0xf9172c54, 0xf07a2f00, 0xf07bce00, 0xf9172bf0) fp = 0xf9172bf0
  pc = 0xf00c609c [ufs_create] args = (0x81b4, 0xf93b4528, 0xf9172e68, 0xf9172e7c, 0x8000, 0xf9172d58, 0xf9172c78) fp = 0xf9172c78
  pc = 0xf005ecf4 [vn_open] args = (0xf9172d40, 0x10, 0xf00c6070, 0xf07a2f00, 0xf013c400, 0xf93b4528, 0xf9172ce0) fp = 0xf9172ce0
  pc = 0xf005bb84 [sys_mknod] args = (0x0, 0x602, 0x1b4, 0x1b4, 0x208, 0x0, 0xf9172dc8) fp = 0xf9172dc8
  pc = 0xf0118a50 [syscall] args = (0x0, 0xf8e70dc0, 0xf9172f20, 0xf005bb08, 0x68, 0x26848, 0xf9172ec0) fp = 0xf9172ec0
  pc = 0xf00064ec [syscall] args = (0x5, 0xf9172fb0, 0x10098898, 0xa, 0x0, 0x14d7, 0xf9172f50) fp = 0xf9172f50
  pc = 0x36d0  args = (0xeffff3a0, 0x601, 0x1b4, 0xeffff3b9, 0x6, 0xffffffff, 0xefffea50) fp = 0xefffea50

dumping to dev 7,17 offset 296291
dump succeeded
rebooting

Resetting ... 
SPARCstation 5, No Keyboard
ROM Rev. 2.15, 256 MB memory installed, Serial #7788135.
Ethernet address 8:0:20:76:d6:67, Host ID: 8076d667.



Initializing Memory 
Boot device: /iommu/sbus/espdma@5,8400000/esp@5,8800000/sd@3,0  File and args: 
 >> NetBSD/sparc Secondary Boot, Revision 1.9
  >> (pk@flambard, Tue Oct 19 11:10:24 MEST 1999)
Booting netbsd
entry: 0x4000, bootinfo: 0x198e50
bootinfo[0]=0x199250; bootinfo[1]=0x198e58
nsym=0x1044c, ssym=0x175cc0, esym=0x198e50
OBP version 3, revision 2.15 (plugin rev 2)
console is ttya
Copyright (c) 1996, 1997, 1998, 1999
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.4.1 (NEWS_ZOCALO) #1: Sat Nov 20 13:14:26 PST 1999
    buhrow@news:/usr/src/sys/arch/sparc/compile/NEWS_ZOCALO
real mem = 268029952
avail mem = 252669952
using 1792 buffers containing 7340032 bytes of memory
bootpath: /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@3,0
mainbus0 (root): SUNW,SPARCstation-5
cpu0 at mainbus0: MB86904 @ 85 MHz, on-chip FPU
cpu0: 16K instruction (32 b/l), 8K data (16 b/l): cache enabled
obio0 at mainbus0
clock0 at obio0 slot 0 offset 0x200000: mk48t08 (eeprom)
timer0 at obio0 slot 0 offset 0xd00000 delay constant 40
zs0 at obio0 slot 0 offset 0x100000 level 12 softpri 6
zstty0 at zs0 channel 0 (console)
zstty1 at zs0 channel 1
zs1 at obio0 slot 0 offset 0x0 level 12 softpri 6
kbd0 at zs1 channel 0
ms0 at zs1 channel 1
slavioconfig at obio0 slot 0 offset 0x800000 not configured
auxreg0 at obio0 slot 0 offset 0x900000
power0 at obio0 slot 0 offset 0x910000 level 2
fdc0 at obio0 slot 0 offset 0x400000 level 11 softpri 4: chip 82077
iommu0 at mainbus0 addr 0x10000000: version 0x4/0x0, page-size 4096, range 64MB
sbus0 at iommu0: clock = 21.250 MHz
dma0 at sbus0 slot 5 offset 0x8400000: rev 2
esp0 at dma0 slot 5 offset 0x8800000 level 4: ESP200, 40MHz, SCSI ID 7
scsibus0 at esp0: 8 targets, 8 luns per target
probe(esp0:1:0): max sync rate 10.00Mb/s
sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST39102LCSUN9.0G, 0828> SCSI2 0/direct fixed
sd1: 8637MB, 4926 cyl, 27 head, 133 sec, 512 bytes/sect x 17689267 sectors
probe(esp0:2:0): max sync rate 10.00Mb/s
sd2 at scsibus0 targ 2 lun 0: <QUANTUM, FIREBALL_TM2110S, 300X> SCSI2 0/direct fixed
sd2: 2014MB, 6810 cyl, 4 head, 151 sec, 512 bytes/sect x 4124736 sectors
probe(esp0:3:0): max sync rate 10.00Mb/s
sd0 at scsibus0 targ 3 lun 0: <SEAGATE, ST5660N  SUN0535, 0522> SCSI2 0/direct fixed
sd0: 520MB, 3002 cyl, 4 head, 88 sec, 512 bytes/sect x 1065664 sectors
SUNW,bpp at sbus0 slot 5 offset 0xc800000 level 2 (ipl 3) not configured
ledma0 at sbus0 slot 5 offset 0x8400010: rev 2
le0 at ledma0 slot 5 offset 0x8c00000 level 6: address 08:00:20:76:d6:67
le0: 8 receive buffers, 2 transmit buffers
audiocs0 at sbus0 slot 4 offset 0xc000000 level 9: CS4231A
audio0 at audiocs0: full duplex
power-management at sbus0 slot 4 offset 0xa000000 not configured
dma1 at sbus0 slot 1 offset 0x100000: rev 1+
esp1 at sbus0 slot 1 offset 0x200000 level 5: ESP100A, 25MHz, SCSI ID 7
scsibus1 at esp1: 8 targets, 8 luns per target
probe(esp1:0:0): max sync rate 5.00Mb/s
sd3 at scsibus1 targ 0 lun 0: <SEAGATE, ST15150N, 0020> SCSI2 0/direct fixed
sd3: 4101MB, 3712 cyl, 21 head, 107 sec, 512 bytes/sect x 8399448 sectors
dma2 at sbus0 slot 2 offset 0x100000: rev 1+
esp2 at sbus0 slot 2 offset 0x200000 level 5: ESP100A, 25MHz, SCSI ID 7
scsibus2 at esp2: 8 targets, 8 luns per target
probe(esp2:3:0): max sync rate 5.00Mb/s
sd4 at scsibus2 targ 3 lun 0: <SEAGATE, ST19171N, 0024> SCSI2 0/direct fixed
sd4: 8683MB, 5268 cyl, 20 head, 168 sec, 512 bytes/sect x 17783112 sectors
root on sd0a dumps on sd2b
root file system type: ffs
WARNING: ccd0: end of partition `a' exceeds the size of ccd (35320320)
swapctl: adding /dev/sd0b as swap device at priority 0
swapctl: adding /dev/sd2b as swap device at priority 0
Automatic boot in progress: starting file system checks.
/dev/rsd0a: 2888 files, 97467 used, 154788 free (2708 frags, 19010 blocks, 1.1% fragmentation)
/dev/rsd0a: MARKING FILE SYSTEM CLEAN
WARNING: ccd0: end of partition `a' exceeds the size of ccd (35320320)
/dev/rsd2a: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rsd2a: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rsd2a: 37891 files, 389739 used, 1159577 free (48953 frags, 138828 blocks, 3.2% fragmentation)
/dev/rsd2a: MARKING FILE SYSTEM CLEAN
/dev/rsd3c: 46534 files, 426322 used, 3638816 free (94328 frags, 443061 blocks, 2.3% fragmentation)
/dev/rsd3c: MARKING FILE SYSTEM CLEAN
/dev/rccd0c: DIRECTORY /alt/tv/melrose-place: LENGTH 1040 NOT MULTIPLE OF 512 (ADJUSTED)
/dev/rccd0c: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rccd0c: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rccd0c: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rccd0c: 1001419 files, 10697167 used, 6411024 free (235408 frags, 771952 blocks, 1.4% fragmentation)
/dev/rccd0c: MARKING FILE SYSTEM CLEAN
setting tty flags
starting network
hostname: news
configuring network interfaces: le0.
WARNING: ccd0: end of partition `a' exceeds the size of ccd (35320320)
starting system logger
checking for core dump...
savecore: reboot after panic: kernel fault
savecore: system went down at Mon Dec 27 11:01:35 1999
savecore: writing core to /var/crash/netbsd.101.core
starting rpc daemons: portmap.
starting nfs daemons:.
creating runtime link editor directory cache.
checking quotas: done.
building databases...
clearing /tmp
updating motd.
turning on accounting
standard daemons: update cron.
starting network daemons: inetd.
starting local daemons:Setting autonicetime to 0
Setting kern.maxvnodes to 10832
Starting news services
INND:  PID file exists -- unclean shutdown!
send-mail: mailwrapper: can't open /etc/mailer.conf: No such file or directory
Starting innd.
.
Mon Dec 27 11:19:09 PST 1999
Dec 27 11:19:10 news init: kernel security level changed from 0 to 1
%
%exit
%exit

Script done on Mon Dec 27 12:01:53 1999

>How-To-Repeat:
	
	I believe you'll be able to reproduce the problem on a Sun4m machine
by allocating a lot of inodes simultaneously over a period of time.
Deepening your directory structure on the filesystem, as exists on a fairly
complete news server, will help as it will excercise the hashing algorithm
used to spread used inodes over the cylinder groups.  The more I think
about it, the more I'm convinced this might be some sort of CPU cache
corruption problem as I'm able to reproduce the problem with a lot of
simultaneous creates, as opposed to simultaneous creates and removes.  I'm
also  willing to perform tests on a system with a filesystem exactly suited
to cause this panic.
-Brian
>Fix:
	
>Audit-Trail:
>Unformatted: