Subject: Re: LFS and Xen3 testing
To: None <current-users@netbsd.org, perseant@netbsd.org, bouyer@netbsd.org>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 09/17/2006 11:06:40
--mqHBVT07r4ObqsKY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

A general update.

On Tue, Sep 05, 2006 at 01:56:44PM +1000, Daniel Carosone wrote:
>  * Sometimes, all disk activity will stop, and something (usually the
>    cleaner) is stuck in biowait.  I suspect this to be a Xen issue.
>    Dom0 is linux with LVM2 volumes for the xbd backend, domU is
>    -current a day or two old.  It seems most easily (or even only?)=20
>    triggered when dom0 is busy with CPU-heavy tasks.  I saw a commit
>    go by recently that looked promising for something like this, but
>    it doesn't seem to have helped this case.

Consensus seems to be that this is a Xen issue.

Just for clarification: I'm aware of the xen scheduler aspect, but
it's more than this.  The disk really is stuck, and while it might be
more likely to get stuck when the dom0 is stealing all the cycles,
it's not just a simple cpu starvation issue: it doesn't get unstuck
when the dom0 finishes.

>  * if I run screen, the screen process takes 100% of the cpu, in state
>    either "lfs sb" or "lfs_ioco", and can't be killed.  The cleaner
>    and several other things are then in "lfs segl" and the system gets
>    generally unhappier from there.  The whole system (including /tmp)
>    is all on one root lfs, perhaps this is related to screen's socket
>    usage in /tmp? =20

Fixed with the latest lfs commit regarding mknod, thanks Konrad!

>  * the kernel prints "lfs_segwrite: loopcount=3D2" every so often, and
>    just once or twice "lfs_writeinode: looping count=3D2".=20

Still get these.

>  * resize_lfs produces an almost instant, repeatable panic trying to=20
>    shrink a filesystem:

Haven't tried this again, but I now have some damage to the filesystem
that fsck and the cleaner can't seem to resolve.  I assume it may have
happened as a result of this..

fsck -f produces the same set of errors about unlinked files each time
it is run, and the cleaner runs constantly and complains about not
making forward progress in the logs.

I'll copy across to a new disk image, but will keep the old one around
for a while in case it may contain something interesting.

--
Dan.
--mqHBVT07r4ObqsKY
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (NetBSD)

iD8DBQFFDJ+fEAVxvV4N66cRAsjyAJwIJub3Xy+F18EIbkvoto23W9qfKgCdGMeH
4SB5a1pm1sbU7rF8HwW1Rdw=
=hnKp
-----END PGP SIGNATURE-----

--mqHBVT07r4ObqsKY--