current-users: Re: Possible serious bug in NetBSD-1.6.1

Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
To: Paul Ripke <stixpjr@ozemail.com.au>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 03/11/2003 08:06:21
Paul Ripke writes:
> [ Greg, I've CC'ed you in on this, hope you don't mind, 

No problem...

> just in case you've
> missed this thread, here's some more info. And, IMNSHO, an almost 100%
> reproducible test case, if my understanding is anywhere close to the 
> mark. ]
> 
> On Tuesday, Mar 11, 2003, at 21:18 Australia/Sydney, Brian Buhrow wrote:
> 
> > 	Hello Paul.  I believe it is the same bug you encountered.  I've a few
> > observations about the bug, which I hope will be helpful in resolving 
> > it.
> >
> > 1.  It's definitely related to the use of the raid5 level device, 
> > including raw partitions.

RAID 5 is the hardest on kernel memory... and from both reports, it sounds 
like the kernel running out of memory...

> > 2.  the bug exists in NetBSD 1.6, but not in 1.5R, early 2001 code.
> 
> Good to know - this will definitely help, I'm sure.

Interesting.  2001 was a "slow year" for RAIDframe development, so I'm not 
sure why the problem wouldn't have been in 1.5R if it's in 1.6..

> > 3.  To reliably reproduce the hang:
> > A.  Define a swap partition on your raid5 device.
> >
> > b.  Turn that partition onto the system with swapctl.
> >
> > C.  Watch the system go into the deep freezer when you try to link a 
> > kernel
> > with debugging symbols turned on and swap is needed.
> 
> Hmm... swap on RAID5... never thought of doing that. Think I've only 
> ever
> seen it mirrored... nup, correction, have seen swap on hardware RAID5 on
> Tru64 with internal PCI RAID controller. OK, back to NetBSD, yes, I can
> see how swap on RAIDframe RAID5 would exacerbate this problem!

There is a maximum amount of kernel memory that a RAID 5 set should use.
(unless it has a leak, but I doubt that..)  The actual amount will depend on 
stripe sizes, number of partial stripe writes, and stuff like that.
 
> > 4.  Softdep exaserbates the problem, but, it's not softdep which is to
> > blame here.
> >
> > 	Has Greg indicated whether or not he has any ideas on the matter?
> 
> I'm CC'ing Greg in, I have a hunch he understands the problem, but the
> fix is more a design problem than bug squashing. Greg, correct me if I'm
> wrong...

The RAID code (like some other kernel code) doesn't handle "no memory" 
conditions very gracefully.  Making it handle "no memory" conditions 
gracefully is on my list, but requires some Major Changes to RAIDframe code.

A couple of things to try:
1) Bump up the amount of memory your kernel has using:

 options NKMEMPAGES=8192

in your kernel config file.  (yes, that should give you *LOTS* of kernel 
memory, but if the problem still happens with that much, then it really is 
more than just an "out of kernel memory" problem.)

2) Add the following option to your kernel config:

 options RAIDOUTSTANDING=3

This will limit the amount of IO going to each RAID set.  

If my guesses are correct, 1) will be more effective than 2), but a kernel 
using 2) should run for quite a bit longer...

Later...

Greg Oster