Re: Problem with raidframe under NetBSD-3 and NetBSD-4

To: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
Subject: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
From: Greg Oster <oster%cs.usask.ca@localhost>
Date: Fri, 11 Apr 2008 10:09:55 -0600

Brian Buhrow writes:
>       Hello Greg.  OK, I've got the problematic box out of production and
> now have a chance to fool around with it.  Here's what I've done so far.
> 
> 1.  I reconfigured the raid set with a raid0.conf file that looks like:
> 
> 
> #Raid Configuration File for asterisk.via.net (2/18/2005)
> #Raid for root partition.
> #Brian Buhrow
> #Describe the size of the array, including spares
> START array
> #numrow numcol numspare
> 1 2 0
> 
> #Disk section
> START disks
> /dev/wd0a
> /dev/wd1a
> 
> #Layout section.  We'll use 64 sectors per stripe unit, 1 parity unit per 
> #stripe unit, 1 parity unit per stripe, and raid level 1.
> START layout
> #SectperSu SusperParityUnit SusperReconUnit Raid_level
> 64 1 128 1
> 
> #Fifo section.  We'll use 100 outstanding requests as a start.
> START queue
> fifo 100
> 
> #spare section
> #We have no spares in this  raid.
> #START spare
> 
> 
> 2.  I initialized the raid set with raidctl -I and raidctl -i to insure
> parity was good.
> 
> 3.  I ran raidctl -f /dev/wd1a to fail one of the disks.
> 
> 4.  I then ran
> raidctl -R /dev/wd1a raid0
> I got the following:
> 
> raid0: initiating in-place reconstruction on column 1
> raid0: Recon read failed!
> raid0: reconstruction failed.

Blah...  Unfortunately not the world's most descriptive/useful output :( 

>       I'm now trying with the reconstruction unit set to 256 to see what
> that does.
>       Also, I'm using a NetBSD-4.0-stable kernel for testing, though I don't
> think that makes a huge difference in terms of the raidframe code we're
> talking about.

4.0 should be fine enough for this..

>       Any thoughts?

Yes... It looks like I'm going to actually have to understand what 
this code is doing :-}

(I have no idea why the read would have failed (or thinks it's 
failed)... I need to dig a bit and attempt to figure out the 
implications of changing SUsPerRU... Fortunately I have a weekend 
right around the corner! )

Later...

Greg Oster

> On Apr 8,  1:41pm, Greg Oster wrote:
> } Subject: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
> } Brian Buhrow writes:
> } >   Hello Greg.  I think I understand this e-mail.  However, I have a
> } > question about changing the SusperReconUnit value.  Is there a way to do
> } > this without unconfiguring the raid set and then reconfiguring it?  I can
> 't
> } > think of a way, but I thought I'd ask.
> } 
> } That's probably the "easiest" way...  Another way is to edit 
> } sys/dev/raidframe/rf_layout.c:rf_ConfigureLayout() to change:
> } 
> } layoutPtr->SUsPerRU = cfgPtr->SUsPerRU;
> } 
> } to
> } 
> } layoutPtr->SUsPerRU = 128;
> } 
> } Not optimal, I know, but I don't have a better fix at this time.... :(
> } 
> } Later...
> } 
> } Greg Oster
> } 
> } > On Apr 6,  8:00pm, Greg Oster wrote:
> } > } Subject: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
> } > } Brian Buhrow writes:
> } > } >       Hello.  Following up on my own message, I can now say it's a me
> mory
> } > } > deadlock issue.  If I try removing the swap device from the system, w
> ich 
> } > is
> } > } > the b partition of the raid set, and then issue the raidctl -F  compo
> nent
> } > 0
> } > } > command to get the construction going, I get:
> } > } > panic: malloc: out of space in kmem_map
> } > } >       
> } > } >       Since I assume it's a lot of work to change raidframe to use MA
> LLOC,
> } > } > and check to see if it failed, perhaps a reasonable work around, alth
> ough
> } > } > I'd prefer to see a real fix, is to note in the raidctl man page that
>  use
> } > rs
> } > } > who are swapping to raid sets may need to attach temporary swap devic
> es t
> } > o
> } > } > their systems when attempting to reconstruct raid sets with large dis
> ks.
> } > } > I'd also be happy with a kernel message saying that the allocation fa
> iled
> } > } > and that the construction could not be completed due to a lack of mem
> ory.
> } > } 
> } > } I think I've tracked this down.... 
> } > } 
> } > } rf_reconstruct.c:rf_ContinueReconstructFailedDisk() is going suspend 
> } > } IO's via rf_SuspendNewRequestsAndWait() and will call
> } > } rf_reconutil.c:rf_MakeReconControl().  That, in turn, is going to call
> } > } rf_reconmap.c:rf_MakeReconMap() which is going to do this:
> } > } 
> } > } RF_Malloc(p->status, num_rus * sizeof(RF_ReconMapListElem_t *), 
> } > }           (RF_ReconMapListElem_t **));
> } > } 
> } > } For your array, it is going to be asking to malloc() something like:
> } > } 
> } > }  1953524992 / 64 * 4 =~ 116MB
> } > } 
> } > } which a) is just plain silly and b) that malloc() is willing to wait 
> } > } for.  This, of course, causes your system to fairly quickly grind to 
> } > } a halt since IOs have been stopped and the kernel isn't going to get 
> } > } that much memory! :(  
> } > } 
> } > } A workaround (untested) might be to bump up SUsPerRU (StripeUnits per 
> } > } Reconstruction Units) to say 128...  That'd at least get the above
> } > } malloc() down to a less-silly size... (As far as I know this should 
> } > } work -- I believe I tested it many years ago, but I know I havn't 
> } > } tested it in quite some time...)
> } > } 
> } > } The fix is to re-work the reconstruction code so that it doesn't need
> } > } to preallocate so much space... that's going to be a major undertaking,
>  
> } > } but one that appears to be necessary :( :(  
> } > } 
> } > } Later...
> } > } 
> } > } Greg Oster
> } > } 
> } > } 
> } > >-- End of excerpt from Greg Oster
> } > 
> } 
> } 
> } 
> >-- End of excerpt from Greg Oster
>

References:
- Re: Problem with raidframe under NetBSD-3 and NetBSD-4
  - From: Brian Buhrow

Prev by Date: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
Next by Date: NetBSD 4.0 where is boot-tiny.fs?
Previous by Thread: Re: Problem with raidframe under NetBSD-3 and NetBSD-4
Next by Thread: ath sysctl documentation?
Indexes:

Home | Main Index | Thread Index | Old Index