Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Dom0 PAE panic when starting xend



On Wednesday 18 February 2009 17:31:47 Christoph Egger wrote:
> On Wednesday 18 February 2009 14:57:49 Christoph Egger wrote:
> > Hi,
> >
> >
> > I can boot i386 Dom0 PAE with Xen 3.3.1.
> > When I launch xend, I get a panic:
> >
> > # xend start
> > Feb 18 13:22:59 fricka xenstored: Checking store ...
> > Feb 18 13:23:02 fricka xenstored: Checking store complete.
> > (XEN) mm.c:1777:d0 Error pfn 55555: rd=ff2b8100, od=00000000,
> > caf=00000000, taf=0000000
> > 0
> > (XEN) mm.c:708:d0 Error getting mfn 55555 (pfn 55555555) from L1 entry
> > 0000000055555067 for dom0
> > xpq_flush_queue: 1 entries
> > 0x0000000102fd1608: 0x0000000055555067
> > panic: HYPERVISOR_mmu_update failed
> >
> > fatal breakpoint trap in supervisor mode
> > trap type 1 code 0 eip c02125c4 cs 9 eflags 246 cr2 bb6c1800 ilevel 6
> > Stopped in pid 415.1 (xenstored) at     netbsd:breakpoint+0x4:  popl
> > %ebp db> bt
> > breakpoint(c09987fe,cdfe9988,c09b7080,c05fce0b,c099f49d,5,0,0,cdfe9998,ff
> >ff ffea) at netbsd:breakpoint+0x4
> > panic(c099f4b3,2fd1608,1,55555067,0,cdfe99ac,0,c07619c6,cabfd484,c0a9105c
> >) at netbsd:panic+0x1a4
> > xpq_update_foreign(2fd1608,1,55555067,0,cdb6bce0,0,cdfe9a1c,c05f5b0c,cdb4
> >89 10,cd b6bce0) at netbsd:xpq_update_foreign
> > pmap_enter_ma(cda80934,bb6c1000,55555000,0,55555000,0,3,23,7ff0,3) at
> > netbsd:pmap_enter_ma+0x580
> > pmap_enter(cda80934,bb6c1000,55555000,0,3,23,ce0f7f50,0,c16fa640,bb6c1000
> >) at netbsd:pmap_enter+0xd3
> > udv_fault(cdfe9c70,bb6c1000,cdfe9c30,1,0,1,5,ffffffff,c06fb56b,0) at
> > netbsd:udv_fault+0x491
> > uvm_fault_internal(ca89f4e0,bb6c1000,1,0,0,0,0,cdf9aa20,0,c09f7ff0) at
> > netbsd:uvm_fault_internal+0x8e5
> > trap() at netbsd:trap+0x6e0
> > --- trap (number 6) ---
> > 0x804d53a:
> > db>
> >
> >
> > In Xen, said function in mm.c:1777 is this:
> >
> > int get_page(struct page_info *page, struct domain *domain)
> > {
> >     u32 x, nx, y = page->count_info;
> >     u32 d, nd = page->u.inuse._domain;
> >     u32 _domain = pickle_domptr(domain);
> >
> >     do {
> >         x  = y;
> >         nx = x + 1;
> >         d  = nd;
> >         if ( unlikely((x & PGC_count_mask) == 0) ||  /* Not allocated? */
> >              unlikely((nx & PGC_count_mask) == 0) || /* Count overflow?
> > */ unlikely(d != _domain) )                /* Wrong owner? */ {
> >             if ( !_shadow_mode_refcounts(domain) && !domain->is_dying )
> >                 gdprintk(XENLOG_INFO,
> >                          "Error pfn %lx: rd=%p, od=%p, caf=%08x, taf=%"
> >                          PRtype_info "\n",
> >                          page_to_mfn(page), domain, unpickle_domptr(d),
> >                          x, page->u.inuse.type_info);
> >             return 0;
> >         }
> >         asm volatile (
> >             LOCK_PREFIX "cmpxchg8b %2"
> >
> >             : "=d" (nd), "=a" (y),
> >
> >             "=m" (*(volatile u64 *)(&page->count_info))
> >
> >             : "0" (d), "1" (x), "c" (d), "b" (nx) );
> >
> >     }
> >     while ( unlikely(nd != d) || unlikely(y != x) );
> >
> >     return 1;
> > }
> >
> > I added additional debug output to see why get_page()
> > returns 0:
> >
> > (XEN) get_page: (x & PGC_count_mask) = 0
> > (XEN) get_page: (nx & PGC_count_mask) = 1
> > (XEN) get_page: wrong owner
> >
> > So the accessed page is a) allocated, b) overlows and c) doesn't belong
> > to Dom0.
> >
> > I added a BUG();  right before 'return 0;' to get a backstrace:
> >
> > (XEN) Xen call trace:
> > (XEN)    [<ff13d169>] get_page+0x11e/0x15a
> > (XEN)    [<ff13b5d2>] get_page_from_l1e+0x284/0x43f
> > (XEN)    [<ff13c98a>] mod_l1_entry+0x3c5/0x4a3
> > (XEN)    [<ff13eb74>] do_mmu_update+0x44d/0x76a
> > (XEN)    [<ff1a58a8>] hypercall+0xb8/0xd8
> >
> > I'm not sure, if I hit a bug in Xen or in NetBSD/Xen.
>
> I figured out, it is xenstored who triggers the issue.
> Starting xenstored manually triggers it.
> Looking into the source, I found a bunch of undocumented
> options. xenstored -D  skips some domain initialization code
> and this does NOT trigger the issue. Interesting...


It happens when xenstore starts up. In xenstored_domain.c:domain_can_read()
when conn->domain->interface is accessed, a pagefault happens.
Accessing conn and conn->domain does not trigger a pagefault.
In the pagefault handling, Dom0 wants to map the page via an mmu hypercall.


PFN 55555 is obviously bogus. It gets used as poison in machine-to-phys and
phys-to-machine tables in some cases. domain->interface should have been
mapped at setup time by a call to xc_map_foreign_range() in
xenstored_domain.c:do_introduce(). An attempt to demand-map the interface
page probably means something went wrong earlier.

jym: Does your save/restore/migration work have some fixes related to
machine-to-phys / phys-to-machine tables ?
If yes, can you commit them, please ?

Christoph


Home | Main Index | Thread Index | Old Index