Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: System hangs during daily jobs



Manuel Bouyer wrote:
> On Fri, Jun 09, 2006 at 07:24:23AM -0700, Jeff Rizzo wrote:
>   
>
> Please note (in case you didn't) that the magic string is +++++ on a
> Xen kernel, and not break (the serial console is managed by xen, so we
> can't see the break).
>   

Doh!  I knew it was probably something like that; I should have looked
harder.  :}  I managed to get a backtrace from the dom0 kernel this time.

>
> Could you try running in UP mode (I think it's 'nosmp' on the xen command
> line, or something like that) and see if it helps ? Next thing to try
> is to run a SMP Xen, but with both domains forced on cpu 0.
>   

I may try that, if I can set up a situation where I can force the crash
at will (since I don't really want to wait 24h each time I tweak
something if I can help it).  Since it seems to happen during the daily
job consistently, I will see if running them from the commandline will
trigger the hang.

> Also, you could try using 'q' after ^A^A^A, to see the state of
> domains, and other usefull infos (the NetBSD dom0 kernel should print
> a few things too, it can be an indication on how hard it's hung)
>
>   


Below is the backtrace from ddb, and the output from the Xen kernel.  (I
don't know anything about the Xen output - I assume the
apparently-interlaced-with-other-stuff output is due to both dom0 and
Xen outputting.

Stopped at      netbsd:cpu_Debugger+0x4:        leave
db> bt
cpu_Debugger(6dcc1c80,c09e2000,c09e2150,1,10) at netbsd:cpu_Debugger+0x4
xencons_tty_input(c0a6dc00,c055a930,1,10,7) at netbsd:xencons_tty_input+0xa9
xencons_intr(c0a6dc00,c062ab1c,0,c0aec100,0) at netbsd:xencons_intr+0x47
evtchn_do_event(4,c062ab1c,0,ab24,0) at netbsd:evtchn_do_event+0x9f
do_hypervisor_callback(c062ab1c,0,3b9a0011,31,11) at
netbsd:do_hypervisor_callba
ck+0xad
hypervisor_callback(c0574c80,0,0,c02f472d,c0575000) at
netbsd:hypervisor_callbac
k+0x64
cpu_switch(c0575000,0,cbcd7000,c02a99fe,c054fbc0) at netbsd:cpu_switch+0xd7
ltsleep(c0574c80,4,c04d464f,0,0) at netbsd:ltsleep+0x427
uvm_scheduler(c0573288,0,c0572b18,c04b80d4,c037351c) at
netbsd:uvm_scheduler+0xa
a
main(c0100177,c010017f,0,0,0) at netbsd:main+0x4f1
db> (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch
input to DOM0).
(XEN) 'q' pressed -> dumping task queues (now=0x547C:6E183AC1)
(XEN) Xen: DOM 0, CPU 0 [has=T] flags=106d refcnt=2 nr_pages=49135
xenheap_pages=2
(XEN) Shared_info@00be6000: caf=80000003, taf=f0000003
(XEN) Guest: upcall_pend = 00, upcall_mask = 00
(XEN) Notifying guest...
MdXeEbNu) Xegn :eve nDOt
3i_i, CPlUe v1el  0[hxac sci_i=peT]n difngla 0x83g0s= 10ci_0fide prtehf
c1nt=
kr_paegevstchn_u=pc6al5l_pen536ding xe n0 heevatpc_hpan_upcgaell_msa=2s
  (X1 EevN)t chSn_hpared_einfo@0n0dinbdgd0_se00l: caf= 080x0
evtchn00_m003, atsakf =f00ff00ff90503
b(2X ffENf) Gufffefsft f:f uffpfcffaf fllfffff_pffe ndf f= 0f0,f ufpfcff
allffff_mafskfff = ffff 0f0fff ff
f(XEfffN)f f Noftifyifng fguffefsfft .fff..
fffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
evtchn_pending 1410 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0


and for good measure (not sure if it's useful here), here's the dump of
the run queues:
(XEN) Scheduler: Borrowed Virtual Time (bvt)
(XEN) BVT: mcu=0x000186A0ns ctx_allow=0x004C4B40ns NOW=0x000054FDC442960F
(XEN) CPU[00] svt=0x3D1C6C6C QUEUE rq fcffd120   n: fcffc084, p: fcffc278
(XEN)   0: 32767 has=F mcua=10 ev=0xFFFFFFFF av=0xFFFFFFFF c=0x4E2DEE8EE4F9
(XEN)          l: fcffc084 n: fcffc278  p: fcffd120
(XEN)   1: 0 has=T mcua=10 ev=0x3D1C6C6C av=0x3D1C6C6C c=0x6CFDD13C31C
(XEN)          l: fcffc278 n: fcffd120  p: fcffc084
(XEN) CPU[01] svt=0x88F35DCC QUEUE rq fcffd140   n: fcffc214, p: fcffc214
(XEN)   0: 32767 has=T mcua=10 ev=0xFFFFFFFF av=0xFFFFFFFF c=0x3FE7740AFDE1
(XEN)          l: fcffc214 n: fcffd140  p: fcffd140


Unfortunately, I never set up a dump device on this machine, so I can't
get a crash dump.  (Would that even help?)

thanks,

+j


Attachment: signature.asc
Description: OpenPGP digital signature



Home | Main Index | Thread Index | Old Index