Re: Problem with raidframe under NetBSD-3 and NetBSD-4

To: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
Subject: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
From: Greg Oster <oster%cs.usask.ca@localhost>
Date: Sun, 06 Apr 2008 20:00:22 -0600

Brian Buhrow writes:
>       Hello.  Following up on my own message, I can now say it's a memory
> deadlock issue.  If I try removing the swap device from the system, wich is
> the b partition of the raid set, and then issue the raidctl -F  component0
> command to get the construction going, I get:
> panic: malloc: out of space in kmem_map
>       
>       Since I assume it's a lot of work to change raidframe to use MALLOC,
> and check to see if it failed, perhaps a reasonable work around, although
> I'd prefer to see a real fix, is to note in the raidctl man page that users
> who are swapping to raid sets may need to attach temporary swap devices to
> their systems when attempting to reconstruct raid sets with large disks.
> I'd also be happy with a kernel message saying that the allocation failed
> and that the construction could not be completed due to a lack of memory.

I think I've tracked this down.... 

rf_reconstruct.c:rf_ContinueReconstructFailedDisk() is going suspend 
IO's via rf_SuspendNewRequestsAndWait() and will call
rf_reconutil.c:rf_MakeReconControl().  That, in turn, is going to call
rf_reconmap.c:rf_MakeReconMap() which is going to do this:

RF_Malloc(p->status, num_rus * sizeof(RF_ReconMapListElem_t *), 
          (RF_ReconMapListElem_t **));

For your array, it is going to be asking to malloc() something like:

 1953524992 / 64 * 4 =~ 116MB

which a) is just plain silly and b) that malloc() is willing to wait 
for.  This, of course, causes your system to fairly quickly grind to 
a halt since IOs have been stopped and the kernel isn't going to get 
that much memory! :(  

A workaround (untested) might be to bump up SUsPerRU (StripeUnits per 
Reconstruction Units) to say 128...  That'd at least get the above
malloc() down to a less-silly size... (As far as I know this should 
work -- I believe I tested it many years ago, but I know I havn't 
tested it in quite some time...)

The fix is to re-work the reconstruction code so that it doesn't need
to preallocate so much space... that's going to be a major undertaking, 
but one that appears to be necessary :( :(  

Later...

Greg Oster

References:
- Re: Problem with raidframe under NetBSD-3 and NetBSD-4
  - From: Brian Buhrow

Prev by Date: Re: Where can I get the source code of top?
Next by Date: Re: Where can I get the source code of top?
Previous by Thread: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
Next by Thread: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
Indexes:

Home | Main Index | Thread Index | Old Index