current-users: Re: LFS and Xen3 testing

Subject: Re: LFS and Xen3 testing
To: Blair Sadewitz <blair.sadewitz@gmail.com>
From: Luke Crawford <lsc@prgmr.com>
List: current-users
Date: 09/16/2006 19:34:32
Yes, I got very similar issues with a Xen 3.0.2-2 linux Dom0 with LVM on 
MD backend.

it would freze half way through the install-  I figured I'd compile it 
from source (was grabbing a nightly build binary)  before I spent to much 
time on it.

re-compilng the NetBSD domU using 3.1rc2 from CVS seemed to fix it-  the 
DomU images I compiled and used successfully are here:

http://prgmr.com/~lsc/netbsd-INSTALL_XEN3_DOMU
http://prgmr.com/~lsc/netbsd-XEN3_DOMU




On Sat, 16 Sep 2006, Blair Sadewitz wrote:

> Date: Sat, 16 Sep 2006 22:25:31 -0400
> From: Blair Sadewitz <blair.sadewitz@gmail.com>
> To: current-users@NetBSD.org, perseant@NetBSD.org, bouyer@NetBSD.org
> Subject: Re: LFS and Xen3 testing
> 
> It was many months ago (March, maybe), and so I apologize that I
> cannot be more specific about this: I was using an LFS filesystem on a
> ccd at the time (amd64, 3.99.xx most likely), and I had some lockups
> that resembled what was described here (I'm almost positive the
> cleaner was stuck in biowait).
>
> I do not want to waste anyone's time or mislead anyone with incorrect
> information, but I just wanted to add that it's possible that's not
> just a Xen issue.
>
>
>
> On 9/16/06, Daniel Carosone <dan@geek.com.au> wrote:
>> A general update.
>> 
>> On Tue, Sep 05, 2006 at 01:56:44PM +1000, Daniel Carosone wrote:
>> >  * Sometimes, all disk activity will stop, and something (usually the
>> >    cleaner) is stuck in biowait.  I suspect this to be a Xen issue.
>> >    Dom0 is linux with LVM2 volumes for the xbd backend, domU is
>> >    -current a day or two old.  It seems most easily (or even only?)
>> >    triggered when dom0 is busy with CPU-heavy tasks.  I saw a commit
>> >    go by recently that looked promising for something like this, but
>> >    it doesn't seem to have helped this case.
>> 
>> Consensus seems to be that this is a Xen issue.
>> 
>> Just for clarification: I'm aware of the xen scheduler aspect, but
>> it's more than this.  The disk really is stuck, and while it might be
>> more likely to get stuck when the dom0 is stealing all the cycles,
>> it's not just a simple cpu starvation issue: it doesn't get unstuck
>> when the dom0 finishes.
>> 
>> >  * if I run screen, the screen process takes 100% of the cpu, in state
>> >    either "lfs sb" or "lfs_ioco", and can't be killed.  The cleaner
>> >    and several other things are then in "lfs segl" and the system gets
>> >    generally unhappier from there.  The whole system (including /tmp)
>> >    is all on one root lfs, perhaps this is related to screen's socket
>> >    usage in /tmp?
>> 
>> Fixed with the latest lfs commit regarding mknod, thanks Konrad!
>> 
>> >  * the kernel prints "lfs_segwrite: loopcount=2" every so often, and
>> >    just once or twice "lfs_writeinode: looping count=2".
>> 
>> Still get these.
>> 
>> >  * resize_lfs produces an almost instant, repeatable panic trying to
>> >    shrink a filesystem:
>> 
>> Haven't tried this again, but I now have some damage to the filesystem
>> that fsck and the cleaner can't seem to resolve.  I assume it may have
>> happened as a result of this..
>> 
>> fsck -f produces the same set of errors about unlinked files each time
>> it is run, and the cleaner runs constantly and complains about not
>> making forward progress in the logs.
>> 
>> I'll copy across to a new disk image, but will keep the old one around
>> for a while in case it may contain something interesting.
>> 
>> --
>> Dan.
>> 
>