current-users: Re: RAIDFrame and RAID-5

Subject: Re: RAIDFrame and RAID-5
To: NetBSD-current Discussion List <current-users@NetBSD.org>
From: Frederick Bruckman <fredb@immanent.net>
List: current-users
Date: 10/27/2003 15:31:21
On Mon, 27 Oct 2003, Greg A. Woods wrote:

> [ On Wednesday, September 10, 2003 at 07:50:43 (-0600), Greg Oster wrote: ]
> > Subject: Re: RAIDFrame and RAID-5
> >
> > "Thomas Hertz" writes:
> > >
> > > I haven't been able to get a kernel core dump, since the system just
> > > freezes. I have noted that just moments before the system freezes, it's
> > > not possible to start new processes. The already running processes, will
> > > continue to run normally for some minute more. Most of the time the
> > > console prints out a few "cannot allocate Tx mbuf" for the various
> > > network interfaces just before the final freeze.

> I just this morning encountered a system freeze that appears to have
> been caused by RAIDframe.
>
> Oddly enough it recovered all by itself.

> Note this is on my development system running a 1.6.1_STABLE kernel from
> about a month ago.  The system has 320MB of RAM.  I have one RAID-1 set
> and two RAID-5 sets:
>
> 	total memory = 319 MB
> 	avail memory = 275 MB
> 	using 1000 buffers containing 32720 KB of memory

That should give the maximum of 64MB of kvm pages (16K pages of 4K
each), which sounds like plenty, but I suppose it could have become
too fragmented. Hey, you're not swapping to RAID-5 are you? That
configuration is known to cause problems.

> > You might want to add KMEMSTAT (or whatever it is) to the kernel
> > config, and then do a bunch of "vmstat -m" while causing the machine
> > to crash.  That might indicate whether you're actually out of kernel
> > memory or not....
>
> I can only show you what it looks like now, i.e. a couple of hours after
> it came back to life:

> Memory resource pool statistics
> Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
> phpool        40     6772    0     6128    27     6    21    24     0   inf    0
> pcgpool       76     3195    0     3190     2     1     1     2     0   inf    0
> pmappl        68   880701    0   880597    47    45     2    10     0   inf    0
> pdppl       4096     3455    0     3351  1648  1540   108   583     0   inf    4
> vmsppl       188   880701    0   880597    86    80     6    28     0   inf    0
> vmmpepl       64 24244269    0 24243118   481   460    21    79     0   inf    0
> vmmpekpl      64  1277696    0  1276760    22     6    16    16     0   inf    0
> uaoeltpl      84      164    0      129     1     0     1     1     0   inf    0
> aobjpl        52        1    0        0     1     0     1     1     0   inf    0
> amappl        40 11247213    0 11246520    61    52     9    31     0   inf    0
> mbpl         256    19224 10295   19087   203   186    17    44     1   inf    0
                            ^^^^^
                             |||

Greg ran out of MBUFS, all right.

> mclpl       2048     7827    0     7745  1872  1827    45    93     4 16384    4
> sockpl       168   438465    0   438268    83    72    11    29     0   inf    0

What I would try, is to increase NKMEMPAGES until the problem isn't
reproducable anymore. I just had to increase NKMEMPAGES to 5000 or
6000 on my 486 to get the ISA ethernet card to configure (from the
calculated default of 4096 for 64MB RAM), even though by the time I
login to view it, it's hardly using more than 1 MB, so I know that you
need a lot of headroom. This pig has never had more than a month of
uptime before getting wierd file system errors that go away on reboot,
so I'm anxious to see if the new configuration does any better.

Frederick