Subject: Re: RAIDFrame and RAID-5
To: None <thomas@hz.se>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 09/10/2003 07:50:43
"Thomas Hertz" writes:
> > What sorts of problems are you seeing? Panics? Freezes? Can we
> > please get 'dmesg' output, 'raidctl -s' output, raid config
> > files, etc, etc?
>
> I haven't been able to get a kernel core dump, since the system just
> freezes. I have noted that just moments before the system freezes, it's
> not possible to start new processes. The already running processes, will
> continue to run normally for some minute more. Most of the time the
> console prints out a few "cannot allocate Tx mbuf" for the various
> network interfaces just before the final freeze.
Hmm... I wonder which one of the "out of kernel memory" problems
applies to this case...
> It seems (obviously) to be a kernel memory problem. I have experimented
> a little with chunk sizes, and the system will stay up a little longer
> with smaller chunks (it crashes within seconds with any chunk size
> 256k). Also, cranking up vm.nkmempages (with options KMEMPAGES) to 64k
> will keep the system running even longer.
It could be the case that you are just plain running out of kernel
memory, especially with all the NICs you have in that box!!!!!
You might want to add KMEMSTAT (or whatever it is) to the kernel
config, and then do a bunch of "vmstat -m" while causing the machine
to crash. That might indicate whether you're actually out of kernel
memory or not....
I havn't had time to look at this in quite a while, but here's my
take on what else might be going on:
1) The kernel is trying to "do something" and runs out of "free
pages".
2) The pager then runs through a number of pages, and either frees
them outright, or schedules a paging operation on them (e.g. schedules
a write of a page that contains the most recent directory contents or
something).
3) The number of pages "freed outright" (e.g. marked PG_CLEAN) is
very low (or zero).
4) The device being paged to doesn't have a "malloc free" codepath,
and it ends up waiting for free pages to do it's thing -- DEADLOCK.
5) The pagedaemon doesn't go looking for any more pages to free up
since it figures that between what it already has freed and what it
has scheduled to be freed that it should have enough to be above it's
low-water mark. This is why the freeze -- the pagedaemon thinks that
it'll be getting more pages soon, and RAIDframe is unable to provide
a path to get those dirty pages paged out.
The "fix" to this (at least for RAIDframe!) is to pre-allocate the
storage needed to do the IOs, and then change all 200+ "RF_Malloc()"
places to request the appropriate memory chunks. This is right on
the top of my RAIDframe TODO list, but that, unfortunatly, is lower
in priority right now than putting flooring in my basement :-/
In theory, however, this problem can happen for any underlying device
that doesn't have a "malloc free" path.. and I believe that some of
the softdep codepaths arn't "malloc free" either. I've been talking
with another guy about a "general solution" to the problem (basically that
the page cleaner needs to make sure it frees a certain number (or
percentage) of PG_CLEAN pages, and that *should* reduce or even
eliminate the problem in all code paths... I need more evidence that
I'm barking up the right tree though :) (and I havn't had time to
dig far enough :( )
I also am not sure why I've never encountered this problem on either
my test boxes or production boxes... I'm guessing that I've never
filled up memory with 99% dirty pages....
> > Both of these are on 1.6 boxes... (Hmm.. I wonder if something has
> > changed since 1.6 that is causing problems in low-kernel-memory
> > conditions...)
>
>
> I have tried running kernels 1.6, 1.6.1 and now 1.6-current (1.6W and
> 1.6Z). They have all given the exact same behaviour!
:( 'vmstat -m' on my main box says that RAIDframe has used at most
957K (3 RAID 1 sets, and 1 RAID 5 set). And I've been building source
trees, making ISOs, dropping 40GB disk images on the set, and generally
not worrying about abusing it, all without a single problem....
Later...
Greg Oster