current-users: Re: LFS and Xen3 testing

Subject: Re: LFS and Xen3 testing
To: None <current-users@netbsd.org, perseant@netbsd.org, bouyer@netbsd.org>
From: Blair Sadewitz <blair.sadewitz@gmail.com>
List: current-users
Date: 09/16/2006 22:25:31
It was many months ago (March, maybe), and so I apologize that I
cannot be more specific about this: I was using an LFS filesystem on a
ccd at the time (amd64, 3.99.xx most likely), and I had some lockups
that resembled what was described here (I'm almost positive the
cleaner was stuck in biowait).

I do not want to waste anyone's time or mislead anyone with incorrect
information, but I just wanted to add that it's possible that's not
just a Xen issue.



On 9/16/06, Daniel Carosone <dan@geek.com.au> wrote:
> A general update.
>
> On Tue, Sep 05, 2006 at 01:56:44PM +1000, Daniel Carosone wrote:
> >  * Sometimes, all disk activity will stop, and something (usually the
> >    cleaner) is stuck in biowait.  I suspect this to be a Xen issue.
> >    Dom0 is linux with LVM2 volumes for the xbd backend, domU is
> >    -current a day or two old.  It seems most easily (or even only?)
> >    triggered when dom0 is busy with CPU-heavy tasks.  I saw a commit
> >    go by recently that looked promising for something like this, but
> >    it doesn't seem to have helped this case.
>
> Consensus seems to be that this is a Xen issue.
>
> Just for clarification: I'm aware of the xen scheduler aspect, but
> it's more than this.  The disk really is stuck, and while it might be
> more likely to get stuck when the dom0 is stealing all the cycles,
> it's not just a simple cpu starvation issue: it doesn't get unstuck
> when the dom0 finishes.
>
> >  * if I run screen, the screen process takes 100% of the cpu, in state
> >    either "lfs sb" or "lfs_ioco", and can't be killed.  The cleaner
> >    and several other things are then in "lfs segl" and the system gets
> >    generally unhappier from there.  The whole system (including /tmp)
> >    is all on one root lfs, perhaps this is related to screen's socket
> >    usage in /tmp?
>
> Fixed with the latest lfs commit regarding mknod, thanks Konrad!
>
> >  * the kernel prints "lfs_segwrite: loopcount=2" every so often, and
> >    just once or twice "lfs_writeinode: looping count=2".
>
> Still get these.
>
> >  * resize_lfs produces an almost instant, repeatable panic trying to
> >    shrink a filesystem:
>
> Haven't tried this again, but I now have some damage to the filesystem
> that fsck and the cleaner can't seem to resolve.  I assume it may have
> happened as a result of this..
>
> fsck -f produces the same set of errors about unlinked files each time
> it is run, and the cleaner runs constantly and complains about not
> making forward progress in the logs.
>
> I'll copy across to a new disk image, but will keep the old one around
> for a while in case it may contain something interesting.
>
> --
> Dan.
>