Re: NetBSD DomU MP freeze under Linux Dom0

To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Subject: Re: NetBSD DomU MP freeze under Linux Dom0
From: Roger Pau Monne <roger.pau%citrix.com@localhost>
Date: Tue, 18 Sep 2012 12:16:23 +0200

Manuel Bouyer wrote:
> On Tue, Sep 18, 2012 at 12:06:37PM +0200, Roger Pau Monne wrote:
>>> Do you have a way to know what hypercall thread 4 is doing ?
>>> it looks like it's doing an hypercall with the kernel_lock held,
>>> and this hypercall blocks.
>> I'm not so sure this is related to Xen, I've been trying to debug this,
>> in the case above the hypercall was a do_console_io, but I've been
>> having a lot more of this crashes, and they all seem to be related to
>> the filesystem (probably related to the bug that I've emailed to
>> tech-kern "Panic when deleting large number of files inside DomU").
> 
> I've seen what looks the same problem on a non-Xen system.
> 
>> Here is another crash, this time the hypercall is a do_sched_op_compat:
> 
> I'm not sure I follow: is it a crash, or a hang ?

This is a crash, I was able to get a ddb session, sorry for not making
it clear.

>> Thread 4:
>>
>> #0  0xffffffff801010ca in hypercall_page ()
>> #1  0xffffffff807db030 in ?? ()
>> #2  0x0000000000000001 in ?? ()
>> #3  0xffffffff803b03ee in xenconscn_getc ()
>> #4  0xffffffff8013be10 in db_readline ()
>> #5  0xffffffff8013c934 in db_read_line ()
>> #6  0xffffffff80139eb5 in db_command_loop ()
>> #7  0xffffffff8013f43d in db_trap ()
>> #8  0xffffffff8013c7da in kdb_trap ()
>> #9  0xffffffff8034a525 in trap ()
>> #10 0xffffffff8010340f in calltrap ()
>> #11 0xffffffff80130bf5 in breakpoint ()
>> #12 0xffffffff803172f1 in vpanic ()
>> #13 0xffffffff80317410 in panic ()
>> #14 0xffffffff803a2ae6 in wapbl_register_deallocation ()
>> #15 0xffffffff8015ef1b in ffs_indirtrunc ()
>> #16 0xffffffff8015eec2 in ffs_indirtrunc ()
>> #17 0xffffffff8015eec2 in ffs_indirtrunc ()
>> #18 0xffffffff8016007f in ffs_truncate ()
>> #19 0xffffffff803575ef in ufs_inactive ()
>> #20 0xffffffff803a817d in VOP_INACTIVE ()
>> #21 0xffffffff8039f28c in vrelel ()
>> #22 0xffffffff8039c31c in do_sys_stat ()
>> #23 0xffffffff8039c3c9 in sys___lstat50 ()
>> #24 0xffffffff8032c2e4 in syscall ()
>> #25 0xffffffff8010221d in Xsyscall ()
>>
>> Thread 3:
>>
>> #0  0xffffffff8013c58f in ddb_suspend ()
>> #1  0xffffffff8013c898 in ddb_ipi ()
>> #2  0xffffffff803abae6 in xen_ipi_ddb ()
>> #3  0xffffffff803aba91 in xen_ipi_handler ()
>> #4  0xffffffff8014bc9b in evtchn_do_event ()
>> #5  0xffffffff801027ed in call_evtchn_do_event ()
>> #6  0xffffffff8017b76d in do_hypervisor_callback ()
>> #7  0xffffffff80105bae in hypervisor_callback ()
>> #8  0x00000000deadbeef in ?? ()
>> #9  0x00000000deadbeef in ?? ()
>> #10 0x0000000000000000 in ?? ()
>>
>> Thread 2:
>>
>> #0  0xffffffff8013c58f in ddb_suspend ()
>> #1  0xffffffff8013c898 in ddb_ipi ()
>> #2  0xffffffff803abae6 in xen_ipi_ddb ()
>> #3  0xffffffff803aba91 in xen_ipi_handler ()
>> #4  0xffffffff8014bc9b in evtchn_do_event ()
>> #5  0xffffffff801027ed in call_evtchn_do_event ()
>> #6  0xffffffff8017b76d in do_hypervisor_callback ()
>> #7  0xffffffff80105bae in hypervisor_callback ()
>> #8  0x00000000deadbeef in ?? ()
>> #9  0x00000000deadbeef in ?? ()
>> #10 0x0000000000000000 in ?? ()
>>
>> Thread 1:
>>
>> #0  0xffffffff8013c58f in ddb_suspend ()
>> #1  0xffffffff8013c898 in ddb_ipi ()
>> #2  0xffffffff803abae6 in xen_ipi_ddb ()
>> #3  0xffffffff803aba91 in xen_ipi_handler ()
>> #4  0xffffffff8014bc9b in evtchn_do_event ()
>> #5  0xffffffff801027ed in call_evtchn_do_event ()
>> #6  0xffffffff8017b76d in do_hypervisor_callback ()
>> #7  0xffffffff80105bae in hypervisor_callback ()
>> #8  0x00000000deadbeef in ?? ()
>> #9  0x00000000deadbeef in ?? ()
>> #10 0x0000000000000000 in ?? ()
>>
>> This time I was able to get a ddb session also, here is the output:
>>
>> panic: wapbl_register_deallocation: out of resources
>> fatal breakpoint trap in supervisor mode
>> trap type 1 code 0 rip ffffffff80130bf5 cs e030 rflags 246 cr2
>> 7f7ff7b1f000 cpl 0 rsp ffffa0005b03b490
>> Stopped in pid 1425.1 (find) at netbsd:breakpoint+0x5:  leave
>> breakpoint() at netbsd:breakpoint+0x5
>> vpanic() at netbsd:vpanic+0x1f2
>> printf_nolog() at netbsd:printf_nolog
>> wapbl_register_inode() at netbsd:wapbl_register_inode
> 
> The system has paniced with a clear message. This is a recuring issue
> with WAPBL ...
>

Follow-Ups:
- Re: NetBSD DomU MP freeze under Linux Dom0
  - From: Manuel Bouyer

References:
- NetBSD DomU MP freeze under Linux Dom0
  - From: Roger Pau Monne
- Re: NetBSD DomU MP freeze under Linux Dom0
  - From: Manuel Bouyer
- Re: NetBSD DomU MP freeze under Linux Dom0
  - From: Roger Pau Monne
- Re: NetBSD DomU MP freeze under Linux Dom0
  - From: Manuel Bouyer

Prev by Date: Re: NetBSD DomU MP freeze under Linux Dom0
Next by Date: Re: NetBSD DomU MP freeze under Linux Dom0
Previous by Thread: Re: NetBSD DomU MP freeze under Linux Dom0
Next by Thread: Re: NetBSD DomU MP freeze under Linux Dom0
Indexes:

Home | Main Index | Thread Index | Old Index