Subject: port-xen/30635: restarting xend causes assertion failure in dom0 pmap
To: None <port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: None <jld@panix.com>
List: netbsd-bugs
Date: 06/29/2005 21:42:00
>Number:         30635
>Category:       port-xen
>Synopsis:       restarting xend causes assertion failure in dom0 pmap
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 29 21:42:00 +0000 2005
>Originator:     Jed Davis
>Release:        NetBSD 3.0_BETA (2005-06-28)
>Organization:
PANIX Public Access Internet and UNIX, NYC
>Environment:
System: NetBSD Fairhaven.xlerb.net 3.0_BETA NetBSD 3.0_BETA (XEN0) #3: Wed Jun 29 14:34:58 EDT 2005  jdev@planetarium.xlerb.net:/usr/src/sys/arch/i386/compile/XEN0 i386
Architecture: i386
Machine: i386
>Description:

My dom0 kernel is from the netbsd-3 branch, with xentools-2.0.6nb1.
"/etc/rc.d/xend stop" stops xend, with no output to its log; starting it
again gives this, in xend.log:

  [2005-06-23 12:17:02 xend] INFO (SrvDaemon:610) Xend Daemon started
  [2005-06-23 12:17:02 xend] INFO (console:94) Created console id=14 domain=1 port=9601
  [2005-06-23 12:17:02 xend] DEBUG (XendDomainInfo:1130) Creating vbd dom=1 uname=phy:/dev/ccd0i
  [2005-06-23 12:17:02 xend] DEBUG (XendDomainInfo:1130) Creating vbd dom=1 uname=phy:/dev/wd0g
  [2005-06-23 12:17:02 xend] DEBUG (XendDomainInfo:1107) Creating vif dom=1 vif=0 mac=aa:00:00:5f:ce:de
  [2005-06-23 12:17:02 xend] DEBUG (XendDomainInfo:665) Destroying vifs for domain 1
  [2005-06-23 12:17:02 xend] DEBUG (XendDomainInfo:674) Destroying vbds for domain 1
  [2005-06-23 12:17:02 xend] DEBUG (blkif:552) Destroying blkif domain=1
  [2005-06-23 12:17:02 xend] DEBUG (blkif:408) Destroying vbd domain=1 idx=0
  [2005-06-23 12:17:02 xend] DEBUG (blkif:408) Destroying vbd domain=1 idx=1
  [2005-06-23 12:17:02 xend] DEBUG (XendDomainInfo:634) Closing console, domain 1
  [2005-06-23 12:17:02 xend] DEBUG (XendDomainInfo:622) Closing channel to domain 1

And this on the console:

  xbd backend: detach device ccd0h for domain 1
  xbd backend: detach device wd0g for domain 1

and xend exits.  Trying to start xend again yields, in xend.log:

  [2005-06-23 12:17:47 xend] INFO (SrvDaemon:610) Xend Daemon started

and, on the console, this panic:

  panic: kernel diagnostic assertion "pmap->pm_obj.uo_npages == 0" failed: file
"../../../../arch/xen/i386/pmap.c", line 2091
  Stopped in pid 891.1 (python2.3) at     netbsd:cpu_Debugger+0x4:        leave
  cpu_Debugger(cae61298,cae5d804,0,c9be96d4,0) at netbsd:cpu_Debugger+0x4
  panic(c05eaca0,c0569b9d,c057ee75,c05c8c40,82b) at netbsd:panic+0x121
  __main(c0569b9d,c05c8c40,82b,c057ee75,c072b000) at netbsd:__main
  pmap_destroy(c9be96d4,0,cae87dc8,c03c9666,1) at netbsd:pmap_destroy+0x8b
  pmap_load(c04afcb2,cae87e04,bfbfe4ec,70,cae87ef4) at netbsd:pmap_load+0x2d9
  copyout(cae87ef4,cae5d92c,cae87ecc,ca9df800,ca9d8d00) at netbsd:copyout+0xe
  kpsendsig(cae61298,cae87ef4,cae5d92c,cae87f20,c066b700) at netbsd:kpsendsig+0xbe
  postsig(14,cae87f64,cae87f5c,0,c06256a0) at netbsd:postsig+0x22d
  syscall_plain() at netbsd:syscall_plain+0xec
  --- syscall (number 4) ---
  0xbd9728df:
  ds          0x11
  es          0x11
  fs          0x31
  gs          0x11
  edi         0x1
  esi         0x100
  ebp         0xcae87d28
  ebx         0x1
  edx         0
  ecx         0xfffffffe
  eax         0x20ae
  eip         0xc03c1bf0  cpu_Debugger+0x4
  cs          0x9
  eflags      0x1202
  esp         0xcae87d28
  ss          0x11
  netbsd:cpu_Debugger+0x4:        leave
  Stopped in pid 891.1 (python2.3) at     netbsd:cpu_Debugger+0x4:        leave

That panic has also been seen to happen in the context of other
processes (e.g., init).

>How-To-Repeat:

Start at least one domU, then /etc/rc.d/xend restart, and possibly
/etc/rc.d/xend start again after that.  This doesn't always do the
trick, for reasons I don't begin to understand, but I do have at least
one system where I can reliably reproduce the problem that way.

>Fix: