current-users: LFS and Xen3 testing

Subject: LFS and Xen3 testing
To: None <current-users@netbsd.org>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 09/05/2006 13:56:44
--YwTTlJgQ7QoYB9ta
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


I reinstated my LFS-testing setup from a while ago.  For convenience
it seemed easier, this time around, to test on a Xen3 domU - but now
it's not clear to me whether the problems I find are due to LFS or
Xen.  So, sorry, I'm going to mix together both.

 * Sometimes, all disk activity will stop, and something (usually the
   cleaner) is stuck in biowait.  I suspect this to be a Xen issue.
   Dom0 is linux with LVM2 volumes for the xbd backend, domU is
   -current a day or two old.  It seems most easily (or even only?)=20
   triggered when dom0 is busy with CPU-heavy tasks.  I saw a commit
   go by recently that looked promising for something like this, but
   it doesn't seem to have helped this case.

 * if I run screen, the screen process takes 100% of the cpu, in state
   either "lfs sb" or "lfs_ioco", and can't be killed.  The cleaner
   and several other things are then in "lfs segl" and the system gets
   generally unhappier from there.  The whole system (including /tmp)
   is all on one root lfs, perhaps this is related to screen's socket
   usage in /tmp?  It doesn't matter whether screen is run on the xm
   console or in a sshd pty.  I probably wouldn't have found this if
   I'd remembered to enable tmpfs in the kernel, and I'll confirm
   whether that affects the issue.

 * the kernel prints "lfs_segwrite: loopcount=3D2" every so often, and
   just once or twice "lfs_writeinode: looping count=3D2". This happens
   every few minutes as the cleaner is running after a crash (via xm
   destroy) after one of the above.  If this is a diagnostic for
   something, it seems to be happening here, in case that's
   interesting.

 * resize_lfs produces an almost instant, repeatable panic trying to=20
   shrink a filesystem:

    panic: lfs_rescount
    Stopped in pid 138.1 (lfs_cleanerd) at  netbsd:cpu_Debugger+0x4:       =
 popl    %ebp
    db> tr
    cpu_Debugger(c03fc581,d3b3aa48,d3b3aa4c,c0282ade,ccadde8c) at netbsd:cp=
u_Debugger+0x4
    panic(c03f3849,0,0,200,c1f69218) at netbsd:panic+0x155
    lfs_reserve(c1f69000,ccadde8c,0,ffffffb8,cd511900) at netbsd:lfs_reserv=
e+0x2c1
    lfs_create(d3b3aab8,d3b54f50,0,0,1b0713) at netbsd:lfs_create+0x135
    VOP_CREATE(ccadde8c,d3b3abb8,d3b3abcc,d3b3aafc,d3b54f50) at netbsd:VOP_=
CREATE+0x31
    vn_open(d3b3aba8,602,1a4,d3b41c44,bbbd5000) at netbsd:vn_open+0x274
    sys_open(d3b54f50,d3b3ac48,d3b3ac68,0,bbbd5098) at netbsd:sys_open+0xb6
    syscall_plain() at netbsd:syscall_plain+0xb3
    --- syscall (number 5) ---
    0xbbb206cb:
    db>=20

   I recall this appearing to work last time I tried it, but I may not
   have had DIAGNOSTIC in that kernel, more fool me :)

--
Dan,

--YwTTlJgQ7QoYB9ta
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (NetBSD)

iD8DBQFE/PV7EAVxvV4N66cRAvQVAJ4u2+0G1kUj4IccF/yMA01IXdlJzwCfRiAu
WLoKWXp5UwHh9qAM1LXtCAc=
=t6he
-----END PGP SIGNATURE-----

--YwTTlJgQ7QoYB9ta--