Subject: Re: problems attempting remote NFS root
To: Neil Ludban <nludban@columbus.rr.com>
From: Christian Limpach <chris@pin.lu>
List: port-xen
Date: 04/12/2004 21:39:12
Hi,

> [1] nfs_boot: root=192.168.1.17:/export/xen31
> [1] pending 00000000 mask1 80000012 mask2 00000012
> [1] nfs_boot_setaddress: sleeping (150)
>
> It hangs here, no ping!!! messages except the one that slipped in
> before IPSec, and no ICMP pings (not responding to ARP requests).

So interrupts work and then it gets stuck in the context switch code
somewhere.  Could you try the following so we can narrow down where it gets
stuck:
in arch/xen/xen/xen_debug.c: change the initialization of xen_once to = 1
in arch/xen/i386/locore.S: add a call to xen_dbg0 before the yield trap:
around line 1653:
    jmp idle_start
4:
    call _C_LABEL(xen_dbg0)       # add this
    movl $__HYPERVISOR_yield,%eax
    TRAP_INSTR

rebuild locore.o, xen_debug.o and vector.o with -DXENDEBUG_LOW.  The easiest
way to do this is to go to the kernel build directory, remove the 3 files
and run make (probably via the makewrapper) with DBG=-DXENDEBUG_LOW
(Alternatively, grab the kernel from
ftp://lola.pin.lu/pub/NetBSD/test/netbsd-xen-neil.gz)

> I disabled tsleep() in nfs_boot_setaddress() just to see what
> happens, and it's a bit different:

yes, if you disable the tsleep() it won't do a context switch.  If you had a
root filesystem ready it would probably mount it and then hang when
additional kernel threads are created which is shortly after the root
filesystem is mounted.

Could you send me the dmesg from when you boot regular NetBSD (or Linux) on
the machine?  I've tried the Xen 1.2 DemoCD on several machines today and
all managed to boot past the tsleep().

    christian