Subject: Re: LFS and Xen3 testing
To: None <current-users@netbsd.org, perseant@netbsd.org, bouyer@netbsd.org>
From: Blair Sadewitz <blair.sadewitz@gmail.com>
List: current-users
Date: 09/16/2006 22:25:31
It was many months ago (March, maybe), and so I apologize that I
cannot be more specific about this: I was using an LFS filesystem on a
ccd at the time (amd64, 3.99.xx most likely), and I had some lockups
that resembled what was described here (I'm almost positive the
cleaner was stuck in biowait).
I do not want to waste anyone's time or mislead anyone with incorrect
information, but I just wanted to add that it's possible that's not
just a Xen issue.
On 9/16/06, Daniel Carosone <dan@geek.com.au> wrote:
> A general update.
>
> On Tue, Sep 05, 2006 at 01:56:44PM +1000, Daniel Carosone wrote:
> > * Sometimes, all disk activity will stop, and something (usually the
> > cleaner) is stuck in biowait. I suspect this to be a Xen issue.
> > Dom0 is linux with LVM2 volumes for the xbd backend, domU is
> > -current a day or two old. It seems most easily (or even only?)
> > triggered when dom0 is busy with CPU-heavy tasks. I saw a commit
> > go by recently that looked promising for something like this, but
> > it doesn't seem to have helped this case.
>
> Consensus seems to be that this is a Xen issue.
>
> Just for clarification: I'm aware of the xen scheduler aspect, but
> it's more than this. The disk really is stuck, and while it might be
> more likely to get stuck when the dom0 is stealing all the cycles,
> it's not just a simple cpu starvation issue: it doesn't get unstuck
> when the dom0 finishes.
>
> > * if I run screen, the screen process takes 100% of the cpu, in state
> > either "lfs sb" or "lfs_ioco", and can't be killed. The cleaner
> > and several other things are then in "lfs segl" and the system gets
> > generally unhappier from there. The whole system (including /tmp)
> > is all on one root lfs, perhaps this is related to screen's socket
> > usage in /tmp?
>
> Fixed with the latest lfs commit regarding mknod, thanks Konrad!
>
> > * the kernel prints "lfs_segwrite: loopcount=2" every so often, and
> > just once or twice "lfs_writeinode: looping count=2".
>
> Still get these.
>
> > * resize_lfs produces an almost instant, repeatable panic trying to
> > shrink a filesystem:
>
> Haven't tried this again, but I now have some damage to the filesystem
> that fsck and the cleaner can't seem to resolve. I assume it may have
> happened as a result of this..
>
> fsck -f produces the same set of errors about unlinked files each time
> it is run, and the cleaner runs constantly and complains about not
> making forward progress in the logs.
>
> I'll copy across to a new disk image, but will keep the old one around
> for a while in case it may contain something interesting.
>
> --
> Dan.
>