current-users: Re: Possible serious bug in NetBSD-1.6.1

Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 03/11/2003 15:53:44
Brian Buhrow writes:
> 	Hello Greg.  Another thought I had is that since kernel memory is
> sized dynamically according to the total amount of RAM, I may not have seen
> this problem before because the  machine I was using had 256MB of ram.

Ah... yes :)

>  If,
> as you say, and I tend to agree, that the raidframe drivers don't exhibit
> much in the way of memory leaks, it makes sense that if the space available
> is sufficient, it will always be sufficient.  

It needs to be sufficient to handle the maximum number of simulultaneous 
IO's that have the "worst-case" memory usage patterns (in most cases, that 
will be the amount of memory needed to do a "RAIDOUTSTANDING" number of IO's 
per degraded RAID 5, multiplied by the number of RAID sets configured.  
And exactly what that amount will be will depend on the number of RAID sets, 
the number of components in those RAID sets, and the RAIDOUTSTANDING value.)
But yes, as long as it has at least that much kernel memory available, it will 
always be happy... 

(Note: none of this should be taken to say that I agree with the current 
behaviour of RAIDframe in low-memory conditions.  I don't, and it's a serious 
problem that needs to be fixed.  The fix, however, is non-trivial, and will
take time :( )

> Is theree some part of vmstat,
> or some other command, which will tell how much kernel RAM is currently in
> use?

'vmstat -m' :)  (With KMEMSTATS in the kernel, of course...)

> On Mar 11,  2:13pm, Greg Oster wrote:
> } Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
> } Brian Buhrow writes:
> } > 	Hello Greg.  If I understand your message correctly, then I have a
> } > couple of questions and observations.
> } > 
> } > 1.  According to sysctl, nkmempages is already at 8102.  This is about 33
> MB
> } > of memory, if my calculations are correct.  Using the value 8192 would be
> } > about 35MB of memory,not much more than is currently in use.  Is there a
> } > limit to the number of pages I can allocate?  Must it be a power of 2? 
> } 
> } Hmmmmm... On a machine w/ 512MB RAM I see:
> } 
> } oster@merlin-39> sysctl -a | grep kmem
> } vm.nkmempages = 32739
> } 
> } On one with 256MB RAM, I get:
> } cs# sysctl -a | grep kmem
> } vm.nkmempages = 16354
> } 
> } So maybe try 16384 instead of 8192?  (I would have though 8192 should have 
> } been more than sufficient tho!!)  I'm not sure what the limits are, nor whe
> ther
> } the values need to be a power of two or not.
> } 
> } >  In
> } > case it helps with the sizing, right now, under normal operation, the
> } > machine lasts 28-36 hours before it hangs or panics.  If I perform the
> } > exercise I listed in the previous e-mail, it hangs immediately.
> } 
> } I'll try to dig up some time this evening to try to reproduce the problem o
> n 
> } my test box. (It only has 128MB RAM too...)  It's sounding like it's seen
> } mostly (exclusively?) when swap is on a RAID 5 set.
> } 
> } > 2.  I'm notcertain, but my guess is that the reason raid5 works under 1.5
> R
> } > and not 1.6 is not so much due to the changes in raidframe itself, but,
> } > rather, changes in the way kernel memory is managed by uvm.  They may be
> } > only tuning changes, but something definitely changed.
> } [snip]
> } 
> } That sounds suspiciously correct... (depending on how some dynamic kernel 
> } memory allocation might be, it could be that the RAID driver exhausts kerne
> l
> } memory when trying to page out stuff...  I need to do some testing it seems
> ...)
> } 
> } Later...
> } 
> } Greg Oster
> } 
> } 
> >-- End of excerpt from Greg Oster
> 

Later...

Greg Oster