Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
To: Greg Oster <oster@cs.usask.ca>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: current-users
Date: 03/13/2003 03:14:11
	Hello Greg.  Here's another data point.  I believe I've found another
condition which can cause a similar hang.  Because of the panics I've been
getting, and writing about on another thread on the list, I've been running
the parity checker a lot.  This evening, when the machine paniced, it
rebooted, and began to run normally, but after just a few minutes, it hung,
just like when paging was enabled to the raid 5 device.  There was no
paging to the raid 5 device, however, so I wondered what it might be.  Then
I remembered that when this machine starts up, it runs bind, which fires up
about 200 zone transfers for domains I secondary.  So, I suspect that the
combination of creating, modifying and deleting alarge number of small
files while the parity checker is running can lead to the same kind of
starvation condition.

My setup:
/dev/rraid0 with 11 partitions, 5 of them mounted simultaneously.
Softdep is disabled on all filesystems.  The raid is a 3-drive raid5 set.

Guess on how to repeat:

1.  Write a script which creates a new file, puts a few hundred bytes in
it, renames it, and then deletes it.

2.  Start the parity checker -- I don't know how to force a check if one
isn't neded, but I bet there's a way. :)

3.  Run about 20 instances of your script, possibly more.  I've not counted
the number of named-xfer's going on at once on this machine, but I believe
it's more than 20, less than 100.


	My guess, before long, you'll get a hang.

-Brian