NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Problem with raidframe under NetBSD-3 and NetBSD-4



Brian Buhrow writes:
>       Hello Greg.  I think I understand this e-mail.  However, I have a
> question about changing the SusperReconUnit value.  Is there a way to do
> this without unconfiguring the raid set and then reconfiguring it?  I can't
> think of a way, but I thought I'd ask.

That's probably the "easiest" way...  Another way is to edit 
sys/dev/raidframe/rf_layout.c:rf_ConfigureLayout() to change:

layoutPtr->SUsPerRU = cfgPtr->SUsPerRU;

to

layoutPtr->SUsPerRU = 128;

Not optimal, I know, but I don't have a better fix at this time.... :(

Later...

Greg Oster

> On Apr 6,  8:00pm, Greg Oster wrote:
> } Subject: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
> } Brian Buhrow writes:
> } >   Hello.  Following up on my own message, I can now say it's a memory
> } > deadlock issue.  If I try removing the swap device from the system, wich 
> is
> } > the b partition of the raid set, and then issue the raidctl -F  component
> 0
> } > command to get the construction going, I get:
> } > panic: malloc: out of space in kmem_map
> } >   
> } >   Since I assume it's a lot of work to change raidframe to use MALLOC,
> } > and check to see if it failed, perhaps a reasonable work around, although
> } > I'd prefer to see a real fix, is to note in the raidctl man page that use
> rs
> } > who are swapping to raid sets may need to attach temporary swap devices t
> o
> } > their systems when attempting to reconstruct raid sets with large disks.
> } > I'd also be happy with a kernel message saying that the allocation failed
> } > and that the construction could not be completed due to a lack of memory.
> } 
> } I think I've tracked this down.... 
> } 
> } rf_reconstruct.c:rf_ContinueReconstructFailedDisk() is going suspend 
> } IO's via rf_SuspendNewRequestsAndWait() and will call
> } rf_reconutil.c:rf_MakeReconControl().  That, in turn, is going to call
> } rf_reconmap.c:rf_MakeReconMap() which is going to do this:
> } 
> } RF_Malloc(p->status, num_rus * sizeof(RF_ReconMapListElem_t *), 
> }           (RF_ReconMapListElem_t **));
> } 
> } For your array, it is going to be asking to malloc() something like:
> } 
> }  1953524992 / 64 * 4 =~ 116MB
> } 
> } which a) is just plain silly and b) that malloc() is willing to wait 
> } for.  This, of course, causes your system to fairly quickly grind to 
> } a halt since IOs have been stopped and the kernel isn't going to get 
> } that much memory! :(  
> } 
> } A workaround (untested) might be to bump up SUsPerRU (StripeUnits per 
> } Reconstruction Units) to say 128...  That'd at least get the above
> } malloc() down to a less-silly size... (As far as I know this should 
> } work -- I believe I tested it many years ago, but I know I havn't 
> } tested it in quite some time...)
> } 
> } The fix is to re-work the reconstruction code so that it doesn't need
> } to preallocate so much space... that's going to be a major undertaking, 
> } but one that appears to be necessary :( :(  
> } 
> } Later...
> } 
> } Greg Oster
> } 
> } 
> >-- End of excerpt from Greg Oster
> 





Home | Main Index | Thread Index | Old Index