Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD DomU MP freeze under Linux Dom0



On Tue, Sep 18, 2012 at 12:06:37PM +0200, Roger Pau Monne wrote:
> > Do you have a way to know what hypercall thread 4 is doing ?
> > it looks like it's doing an hypercall with the kernel_lock held,
> > and this hypercall blocks.
> 
> I'm not so sure this is related to Xen, I've been trying to debug this,
> in the case above the hypercall was a do_console_io, but I've been
> having a lot more of this crashes, and they all seem to be related to
> the filesystem (probably related to the bug that I've emailed to
> tech-kern "Panic when deleting large number of files inside DomU").

I've seen what looks the same problem on a non-Xen system.

> 
> Here is another crash, this time the hypercall is a do_sched_op_compat:

I'm not sure I follow: is it a crash, or a hang ?

> 
> Thread 4:
> 
> #0  0xffffffff801010ca in hypercall_page ()
> #1  0xffffffff807db030 in ?? ()
> #2  0x0000000000000001 in ?? ()
> #3  0xffffffff803b03ee in xenconscn_getc ()
> #4  0xffffffff8013be10 in db_readline ()
> #5  0xffffffff8013c934 in db_read_line ()
> #6  0xffffffff80139eb5 in db_command_loop ()
> #7  0xffffffff8013f43d in db_trap ()
> #8  0xffffffff8013c7da in kdb_trap ()
> #9  0xffffffff8034a525 in trap ()
> #10 0xffffffff8010340f in calltrap ()
> #11 0xffffffff80130bf5 in breakpoint ()
> #12 0xffffffff803172f1 in vpanic ()
> #13 0xffffffff80317410 in panic ()
> #14 0xffffffff803a2ae6 in wapbl_register_deallocation ()
> #15 0xffffffff8015ef1b in ffs_indirtrunc ()
> #16 0xffffffff8015eec2 in ffs_indirtrunc ()
> #17 0xffffffff8015eec2 in ffs_indirtrunc ()
> #18 0xffffffff8016007f in ffs_truncate ()
> #19 0xffffffff803575ef in ufs_inactive ()
> #20 0xffffffff803a817d in VOP_INACTIVE ()
> #21 0xffffffff8039f28c in vrelel ()
> #22 0xffffffff8039c31c in do_sys_stat ()
> #23 0xffffffff8039c3c9 in sys___lstat50 ()
> #24 0xffffffff8032c2e4 in syscall ()
> #25 0xffffffff8010221d in Xsyscall ()
> 
> Thread 3:
> 
> #0  0xffffffff8013c58f in ddb_suspend ()
> #1  0xffffffff8013c898 in ddb_ipi ()
> #2  0xffffffff803abae6 in xen_ipi_ddb ()
> #3  0xffffffff803aba91 in xen_ipi_handler ()
> #4  0xffffffff8014bc9b in evtchn_do_event ()
> #5  0xffffffff801027ed in call_evtchn_do_event ()
> #6  0xffffffff8017b76d in do_hypervisor_callback ()
> #7  0xffffffff80105bae in hypervisor_callback ()
> #8  0x00000000deadbeef in ?? ()
> #9  0x00000000deadbeef in ?? ()
> #10 0x0000000000000000 in ?? ()
> 
> Thread 2:
> 
> #0  0xffffffff8013c58f in ddb_suspend ()
> #1  0xffffffff8013c898 in ddb_ipi ()
> #2  0xffffffff803abae6 in xen_ipi_ddb ()
> #3  0xffffffff803aba91 in xen_ipi_handler ()
> #4  0xffffffff8014bc9b in evtchn_do_event ()
> #5  0xffffffff801027ed in call_evtchn_do_event ()
> #6  0xffffffff8017b76d in do_hypervisor_callback ()
> #7  0xffffffff80105bae in hypervisor_callback ()
> #8  0x00000000deadbeef in ?? ()
> #9  0x00000000deadbeef in ?? ()
> #10 0x0000000000000000 in ?? ()
> 
> Thread 1:
> 
> #0  0xffffffff8013c58f in ddb_suspend ()
> #1  0xffffffff8013c898 in ddb_ipi ()
> #2  0xffffffff803abae6 in xen_ipi_ddb ()
> #3  0xffffffff803aba91 in xen_ipi_handler ()
> #4  0xffffffff8014bc9b in evtchn_do_event ()
> #5  0xffffffff801027ed in call_evtchn_do_event ()
> #6  0xffffffff8017b76d in do_hypervisor_callback ()
> #7  0xffffffff80105bae in hypervisor_callback ()
> #8  0x00000000deadbeef in ?? ()
> #9  0x00000000deadbeef in ?? ()
> #10 0x0000000000000000 in ?? ()
> 
> This time I was able to get a ddb session also, here is the output:
> 
> panic: wapbl_register_deallocation: out of resources
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 rip ffffffff80130bf5 cs e030 rflags 246 cr2
> 7f7ff7b1f000 cpl 0 rsp ffffa0005b03b490
> Stopped in pid 1425.1 (find) at netbsd:breakpoint+0x5:  leave
> breakpoint() at netbsd:breakpoint+0x5
> vpanic() at netbsd:vpanic+0x1f2
> printf_nolog() at netbsd:printf_nolog
> wapbl_register_inode() at netbsd:wapbl_register_inode

The system has paniced with a clear message. This is a recuring issue
with WAPBL ...

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index