Subject: Re: bouyer-xenamd64 merge (xen roadmap)
To: Adam Hamsik <haaaad@gmail.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 11/20/2007 22:40:48
On Tue, Nov 20, 2007 at 10:03:45PM +0100, Adam Hamsik wrote:
> >>
> >Of course I updated my source tree since Ibuilt this kernel, so the  
> >stack
> >trace isn't so usefull.
> >Can you try again with the kernel I've put at
> >ftp://asim.lip6.fr/outgoing/bouyer/amd64/netbsd-INSTALL_XEN3_DOMU.gz
> >
> >thanks
> >
> >BTW, what is your hardware, Xen version and domU config ?
> >
> 
> This machine had some hardware issues with memory. That can be source  
> of my problems I thought that they are resolved, but nobody knows.

Well, I had some issue with my devel box too, which seems to show up
again. But it 2 systems have hardware issues maybe it's not hardware ?
I'll run memtest tomorow, it was successfull at pointing out the issue
last time. In my case, I also get panics in the hypervisor itself, at
various places, so I suspect it's really hardware in my case.
> 
> 
> hardware:
> 
> Processor is:
> 
>  cat /proc/cpuinfo
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 75
> model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
> stepping        : 2
> cpu MHz         : 2000.000
> cache size      : 512 KB
> physical id     : 0
> siblings        : 1
> core id         : 0
> cpu cores       : 1
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36  
> clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext  
> 3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy
> bogomips        : 4038.90
> TLB size        : 1024 4K pages
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management: ts fid vid ttp tm stc
> 
> Dmesg attached.
> 
> XEN :
>  # xm info
> host                   : xena2
> release                : 2.6.20.4
> version                : #3 SMP Tue Apr 10 18:27:16 Local time zone  
> must be set--see zic
> machine                : x86_64
> nr_cpus                : 2
> nr_nodes               : 1
> sockets_per_node       : 1
> cores_per_socket       : 2
> threads_per_core       : 1
> cpu_mhz                : 2009
> hw_caps                : 178bfbff:ebd3fbff: 
> 00000000:00000010:00002001:00000000:0000001f
> total_memory           : 4031

I'll have to try on a system with that much memory. Hopefully I can get
at this tomorow.

> free_memory            : 969
> xen_major              : 3
> xen_minor              : 1
> xen_extra              : .0

Same as mine, it seems.

> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32  
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : unavailable
> cc_compiler            : gcc version 4.1.1 (Gentoo 4.1.1-r3)
> cc_compile_by          : root
> cc_compile_domain      : at.fiit.stuba.sk
> cc_compile_date        : Wed Jul  4 09:56:13 CEST 2007
> xend_config_format     : 4
> 
> DomU config attached.
> 
> I got this panic with your new kernel.
> 
> Kernelized RAIDframe activated
>      Status: Finishedpanic: HYPERVISOR_mmu_update failed
>     Command: /sbin/dhclient -q -pf /tmp/dhclnt.pid -lf /tmp/ 
> dhclient.leases xenn
> Stopped in pid 34.1 (dhclient) at       0xffffffff8026d9b9:     ret

So it did boot and started some programs

> db> bt
> ?() at 0xffffffff8026d9b9

breakpoint()

> ?() at 0xffffffff8027b1f9
xpq_flush_queue()

> ?() at 0xffffffff80274954
pmap_map_ptes() (pmap_pte_flush)

> ?() at 0xffffffff802767c5
pmap_do_remove()

> ?() at 0xffffffff801c6a09
uvm_unmap_remove()

> ?() at 0xffffffff801ca692
uvmspace_free()

> ?() at 0xffffffff801f1408
exit1()

> ?() at 0xffffffff801ff0ba
sigexit()

> ?() at 0xffffffff80200526
postsig()

> ?() at 0xffffffff801f607c
lwp_userret()

> ?() at 0xffffffff8027835d
child_return()

> 
> I got this panic only when I run dhclient from sysinst. If I configure  
> network from /bin/sh and then run sysinst everything works fine. I  

I didn't try dhcp in my domU, only fixed network configs.
I just tested and got the same panic as you did, with the same backtrace.
I'll probably be able to debug this tomorow then ...

> have installed system with this little hack and tested your yesterday  
> domu-kernel. I got this panic with it.
> 
> Starting file system checks:
> uvm_fault(0xffffa00009931758, 0x0, 1) -> e
> kernel: page fault trap, code=0
> Stopped in pid 17.1 (mount_ffs) at      0xffffffff80245a23:      

This is a null pointer dereference. I didn't see this one.
I'll see if I can get something from the trace.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--