Subject: Re: PR kern/20191, plea for fix in 1.6.2!
To: Nicolai Stevens <nicolai_stevens@yahoo.com>
From: Greg Oster <oster@cs.usask.ca>
List: tech-kern
Date: 11/16/2003 10:12:00
Nicolai Stevens writes:
> I noticed that the release process for 1.6.2 has begun
> and is in the release candidate stage.  However, it
> doesn't look like there has been any fix committed to
> the branch for the RAID5/low kernel memory lockup
> described in PR 20191.  I seem to be running across
> this crash/hang quite often on one particular
> production server. 

Assuming you need the RAID more than you need softdeps, 
a workaround is to just turn off softdeps.

> There is a yet uncommitted patch
> attached to the PR.  Is there any chance this patch
> can be merged into 1.6.2 before release time?  

No.

On the up-side, I have managed to get a few hours of RAIDframe
hacking in this weekend.  I've categorized the ~200 memory 
allocations (as to whether they can wait, or whether they absolutely 
need to succeed) and have done some preliminary analysis on the 
memory access patterns.  As expected, things are a jumbled mess
and still need more sorting.  But this is progress...

One "easy" solution may be to pre-allocate chunks of memory that can 
be handed out when malloc(..., M_RAIDFRAME, M_NOWAIT) fails.  One 
problem is that this pool would have to be "global" to all RAID sets.

I still need to analyse the use of pool_get() as well.  It's quite 
possible that the number of elements in the pool may not be 
sufficient in low memory conditions.

Later...

Greg Oster