Subject: Re: LFS and Xen3 testing
To: None <current-users@netbsd.org, perseant@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 09/10/2006 18:13:10
Hi,
sorry for the delay replying to this

On Tue, Sep 05, 2006 at 01:56:44PM +1000, Daniel Carosone wrote:
> 
> I reinstated my LFS-testing setup from a while ago.  For convenience
> it seemed easier, this time around, to test on a Xen3 domU - but now
> it's not clear to me whether the problems I find are due to LFS or
> Xen.  So, sorry, I'm going to mix together both.
> 
>  * Sometimes, all disk activity will stop, and something (usually the
>    cleaner) is stuck in biowait.  I suspect this to be a Xen issue.
>    Dom0 is linux with LVM2 volumes for the xbd backend, domU is
>    -current a day or two old.  It seems most easily (or even only?) 
>    triggered when dom0 is busy with CPU-heavy tasks.  I saw a commit
>    go by recently that looked promising for something like this, but
>    it doesn't seem to have helped this case.

It may indeed be a Xen issue; see port-xen/34005

> 
>  * if I run screen, the screen process takes 100% of the cpu, in state
>    either "lfs sb" or "lfs_ioco", and can't be killed.  The cleaner
>    and several other things are then in "lfs segl" and the system gets
>    generally unhappier from there.  The whole system (including /tmp)
>    is all on one root lfs, perhaps this is related to screen's socket
>    usage in /tmp?  It doesn't matter whether screen is run on the xm
>    console or in a sshd pty.  I probably wouldn't have found this if
>    I'd remembered to enable tmpfs in the kernel, and I'll confirm
>    whether that affects the issue.

I do use screen on Xen systems and didn't notice this issue, so I'm tempted
to blame LFS for this one :)

> 
>  * the kernel prints "lfs_segwrite: loopcount=2" every so often, and
>    just once or twice "lfs_writeinode: looping count=2". This happens
>    every few minutes as the cleaner is running after a crash (via xm
>    destroy) after one of the above.  If this is a diagnostic for
>    something, it seems to be happening here, in case that's
>    interesting.
> 
>  * resize_lfs produces an almost instant, repeatable panic trying to 
>    shrink a filesystem:
> 
>     panic: lfs_rescount
> [...]
>    I recall this appearing to work last time I tried it, but I may not
>    have had DIAGNOSTIC in that kernel, more fool me :)

Note that XEN3_DOM0 and XEN3_DOMU are built with DIAGNOSTIC and DEBUG

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--