Subject: Re: LFS and Xen3 testing
To: Blair Sadewitz <blair.sadewitz@gmail.com>
From: Luke Crawford <lsc@prgmr.com>
List: current-users
Date: 09/16/2006 19:34:32
Yes, I got very similar issues with a Xen 3.0.2-2 linux Dom0 with LVM on
MD backend.
it would freze half way through the install- I figured I'd compile it
from source (was grabbing a nightly build binary) before I spent to much
time on it.
re-compilng the NetBSD domU using 3.1rc2 from CVS seemed to fix it- the
DomU images I compiled and used successfully are here:
http://prgmr.com/~lsc/netbsd-INSTALL_XEN3_DOMU
http://prgmr.com/~lsc/netbsd-XEN3_DOMU
On Sat, 16 Sep 2006, Blair Sadewitz wrote:
> Date: Sat, 16 Sep 2006 22:25:31 -0400
> From: Blair Sadewitz <blair.sadewitz@gmail.com>
> To: current-users@NetBSD.org, perseant@NetBSD.org, bouyer@NetBSD.org
> Subject: Re: LFS and Xen3 testing
>
> It was many months ago (March, maybe), and so I apologize that I
> cannot be more specific about this: I was using an LFS filesystem on a
> ccd at the time (amd64, 3.99.xx most likely), and I had some lockups
> that resembled what was described here (I'm almost positive the
> cleaner was stuck in biowait).
>
> I do not want to waste anyone's time or mislead anyone with incorrect
> information, but I just wanted to add that it's possible that's not
> just a Xen issue.
>
>
>
> On 9/16/06, Daniel Carosone <dan@geek.com.au> wrote:
>> A general update.
>>
>> On Tue, Sep 05, 2006 at 01:56:44PM +1000, Daniel Carosone wrote:
>> > * Sometimes, all disk activity will stop, and something (usually the
>> > cleaner) is stuck in biowait. I suspect this to be a Xen issue.
>> > Dom0 is linux with LVM2 volumes for the xbd backend, domU is
>> > -current a day or two old. It seems most easily (or even only?)
>> > triggered when dom0 is busy with CPU-heavy tasks. I saw a commit
>> > go by recently that looked promising for something like this, but
>> > it doesn't seem to have helped this case.
>>
>> Consensus seems to be that this is a Xen issue.
>>
>> Just for clarification: I'm aware of the xen scheduler aspect, but
>> it's more than this. The disk really is stuck, and while it might be
>> more likely to get stuck when the dom0 is stealing all the cycles,
>> it's not just a simple cpu starvation issue: it doesn't get unstuck
>> when the dom0 finishes.
>>
>> > * if I run screen, the screen process takes 100% of the cpu, in state
>> > either "lfs sb" or "lfs_ioco", and can't be killed. The cleaner
>> > and several other things are then in "lfs segl" and the system gets
>> > generally unhappier from there. The whole system (including /tmp)
>> > is all on one root lfs, perhaps this is related to screen's socket
>> > usage in /tmp?
>>
>> Fixed with the latest lfs commit regarding mknod, thanks Konrad!
>>
>> > * the kernel prints "lfs_segwrite: loopcount=2" every so often, and
>> > just once or twice "lfs_writeinode: looping count=2".
>>
>> Still get these.
>>
>> > * resize_lfs produces an almost instant, repeatable panic trying to
>> > shrink a filesystem:
>>
>> Haven't tried this again, but I now have some damage to the filesystem
>> that fsck and the cleaner can't seem to resolve. I assume it may have
>> happened as a result of this..
>>
>> fsck -f produces the same set of errors about unlinked files each time
>> it is run, and the cleaner runs constantly and complains about not
>> making forward progress in the logs.
>>
>> I'll copy across to a new disk image, but will keep the old one around
>> for a while in case it may contain something interesting.
>>
>> --
>> Dan.
>>
>