Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
To: Greg Oster <firstname.lastname@example.org>
From: Brian Buhrow <email@example.com>
Date: 03/13/2003 03:14:11
Hello Greg. Here's another data point. I believe I've found another
condition which can cause a similar hang. Because of the panics I've been
getting, and writing about on another thread on the list, I've been running
the parity checker a lot. This evening, when the machine paniced, it
rebooted, and began to run normally, but after just a few minutes, it hung,
just like when paging was enabled to the raid 5 device. There was no
paging to the raid 5 device, however, so I wondered what it might be. Then
I remembered that when this machine starts up, it runs bind, which fires up
about 200 zone transfers for domains I secondary. So, I suspect that the
combination of creating, modifying and deleting alarge number of small
files while the parity checker is running can lead to the same kind of
/dev/rraid0 with 11 partitions, 5 of them mounted simultaneously.
Softdep is disabled on all filesystems. The raid is a 3-drive raid5 set.
Guess on how to repeat:
1. Write a script which creates a new file, puts a few hundred bytes in
it, renames it, and then deletes it.
2. Start the parity checker -- I don't know how to force a check if one
isn't neded, but I bet there's a way. :)
3. Run about 20 instances of your script, possibly more. I've not counted
the number of named-xfer's going on at once on this machine, but I believe
it's more than 20, less than 100.
My guess, before long, you'll get a hang.