Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Some strange messages from the kernel



On 06/29/11 13:58, Hugo Silva wrote:
> Hello,
> 
> For my latest project I'm simulating many networks with virtual routers,
> subnets etc.
> 
> The dom0 is crashing occasionally, and I think it may be related to vm
> start/stop activity.
> 
> Here are some strange messages that I've been getting lately:
> 
> xvif17i2: can't map TX grant ref: err 0 status -1
> xvif17i0 GNTTABOP_transfer[0] -1
> xvif17i0: req_prod 256 req_cons 28 rsp_prod 27 rsp_prod_pvt 27 i 1
> 
> 
> At this moment I can't seem to start a NetBSD domU PV, it gets stuck
> during boot, last message is xennet2 being detected. It may be related
> to the above, especially when considering that it was working before,
> and no changes were made to the dom0/domU config.
> 
> Maybe someone with more knowledge of the code could comment on the above
> messages?
> 
> Regards,
> 
> Hugo

The dom0 just crashed. I have some more information.

I now suspect the NetBSD domU crashing is related to storage, not
network. Tried to start the VM without any vifs, it still hung. Then I
realized from the logs that it was only attaching 1 of two virtual disks
(the one it managed to attached is swap space), so I was about to test
running the netbsd/xen install image with and without the 1st virtual
disk in the config (the one that wasn't being attached).

Am I correct to deduce that explains why it "crashed" after xennet2?
That should be when the kernel starts going for the disks.

The machine panicked after I changed the config file, right after I
started it with xm create -c.



There seems to be a pattern in the dmesg output, have a look. The first
entry was logged many hours before the actual panic. rsp_prod etc appear
to keep on increasing. They appear to have started with the "can't map
TX grant ref" error, but I'm not 100% sure.

xvif17i2: can't map TX grant ref: err 0 status -1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 28 rsp_prod 27 rsp_prod_pvt 27 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 29 rsp_prod 28 rsp_prod_pvt 28 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 30 rsp_prod 29 rsp_prod_pvt 29 i 1


<---- many many other messages (the dom0 was operational for a few
hours, while messages like the following kept being logged ---->

xvif17i0: req_prod 256 req_cons 60 rsp_prod 59 rsp_prod_pvt 59 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 61 rsp_prod 60 rsp_prod_pvt 60 i 1


...

xvif17i0: req_prod 256 req_cons 124 rsp_prod 123 rsp_prod_pvt 123 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 125 rsp_prod 124 rsp_prod_pvt 124 i 1
xvif17i0 GNTTABOP_transfer[0] -1
xvif17i0: req_prod 256 req_cons 126 rsp_prod 125 rsp_prod_pvt 125 i 1



Eventually all the way down to when it crashes (where I destroyed the
NetBSD domU PV VM because it appeared to be stuck with xennet2, as
reported in the original email; then I xm create -c domu.conf, and
that's where it panicked:

xbd backend: detach device dm0n for domain 19
xvif19i0: disconnecting
xvif19i1: disconnecting
xvif19i2: disconnecting
xbdback backend/vbd/20/1: can't VOP_OPEN device 0xa907: 16
xbdback backend/vbd/20/1: can't VOP_OPEN device 0xa907: 16
xbd backend: attach device vg0-sslaccel1_s (size 131072) for domain 20
xbd backend 0x1 for domain 20 using event channel 40, protocol x86_64-abi
xbdback backend/vbd/20/2: can't map grant ref: 0/-1
xbdback backend/vbd/20/2: can't map grant ref: 0/-1
xbd backend: detach device dm0n for domain 20
xvif21i0: Ethernet address 00:16:3e:ab:03:03
xbdback backend/vbd/21/1: can't VOP_OPEN device 0xa907: 16
xvif21i1: Ethernet address 00:16:3e:f2:01:03
xvif21i2: Ethernet address 00:16:3e:db:02:03
xbdback backend/vbd/21/1: can't VOP_OPEN device 0xa907: 16
uvm_fault(0xffffffff805fca60, 0xffffa00046e99000, 2) -> e
fatal page fault in supervisor mode
trap type 6 code 3 rip ffffffff8020ecd1 cs e030 rflags 10286 cr2
ffffa00046e990
00 cpl 7 rsp ffffa0003dc10890

Backtrace from ddb:
mi_switch() at netbsd:mi_switch+0x1b4
sleepq_block() at netbsd:sleepq_block+0xdc
cv_timedwait_sig() at netbsd:cv_timedwait_sig+0x98
sbwait() at netbsd:sbwait+0x66
soreceive() at netbsd:soreceive+0xac1
do_sys_recvmsg() at netbsd:do_sys_recvmsg+0x185
sys_recvfrom() at netbsd:sys_recvfrom+0x66
syscall() at netbsd:syscall+0x9b


The process table:
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
18418    1 3   0         0   ffffa00046a29060      xenstore-read tstile
19556    1 3   0        80   ffffa00046ae7100                 sh pipe_rd
5750 >   1 7   0     40000   ffffa00046b595a0          python2.6
18191    2 3   0        80   ffffa00046c28aa0         xenconsole netio
18191    1 3   0     40080   ffffa000467479c0         xenconsole select
8162     1 3   0        80   ffffa00046b4e160                ksh ttyraw
17557    3 3   0   1000080   ffffa0003d74eb80            qemu-dm aiowork
17557    2 3   0   1000080   ffffa00046b599c0            qemu-dm netio
17557    1 3   0        80   ffffa00046be3620            qemu-dm select
16320    1 3   0        80   ffffa00046a93900             pickup kqueue
22609    3 3   0   1000080   ffffa000467595e0            qemu-dm aiowork
22609    2 3   0   1000080   ffffa00046d3db60            qemu-dm netio
22609    1 3   0        80   ffffa00046998760            qemu-dm select
17413    1 3   0        80   ffffa00046a9c920               tail kqueue
16495    1 3   0        80   ffffa00046cb1300                ksh pause
24732    1 3   0        80   ffffa000468846a0               tmux kqueue
23201    1 3   0        80   ffffa00046a354a0                ksh pause
24773    1 3   0        80   ffffa0003dc35140               sshd select
24242    1 3   0        80   ffffa0003dc20100               sshd netio
14674    1 3   0        80   ffffa00046a3e0a0               ntpd pause
13065    1 3   0        80   ffffa0003d4562e0                ksh ttyraw
13024    1 3   0        80   ffffa00046c2fac0                 su wait
5909     1 3   0        80   ffffa0003ccac5e0              getty ttyraw
3487     1 3   0        80   ffffa0003dc20520                ksh ttyraw
697      1 3   0        80   ffffa0003d91a060                ksh pause
72       1 3   0        80   ffffa0003d88e840                ksh pause
71       1 3   0        80   ffffa0003d89c860                 su wait
926      1 3   0        80   ffffa0003d74e760                ksh pause
539      1 3   0        80   ffffa0003d75fba0               tmux kqueue
1291     1 3   0        80   ffffa0003cbb41a0               qmgr kqueue
230      1 3   0        80   ffffa0003d75f780               cron nanoslp
1245     1 3   0        80   ffffa0003d4e4740             master kqueue
1038     1 3   0        80   ffffa0003d456700               sshd select
534      3 3   0   1000080   ffffa0003d4b1720            qemu-dm aiowork
534      2 3   0   1000080   ffffa0003d456b20            qemu-dm netio
534      1 3   0        80   ffffa0003d42e6e0            qemu-dm select
402      2 3   0        80   ffffa0003ccb9a20        xenbackendd netio
402      1 3   0        80   ffffa0003d1c8640        xenbackendd wait
373      2 3   0        80   ffffa0003d3d1260        xenconsoled netio
373      1 3   0        80   ffffa0003d3dd6a0        xenconsoled select
377    101 5   0         0   ffffa00046cb1b40          python2.6
377    100 3   0        80   ffffa00044ffb220          python2.6 netio
377     86 3   0        80   ffffa0003d4b1b40          python2.6 netio
377     77 3   0        80   ffffa0003dc2c540          python2.6 netio
377      9 3   0        80   ffffa0003d42e2c0          python2.6 netio
377      7 3   0        80   ffffa0003d42eb00          python2.6 select
377      6 3   0        80   ffffa0003d4202a0          python2.6 select
377      5 3   0        80   ffffa0003d4206c0          python2.6 netio
377      4 3   0        80   ffffa0003d420ae0          python2.6 parked
377      3 3   0        80   ffffa0003d3dd280          python2.6 select
377      2 3   0        80   ffffa0003d3ddac0          python2.6 socket
377      1 3   0        80   ffffa0003d3d1680          python2.6 select
379      1 3   0        80   ffffa0003d3b7240          python2.6 wait
372      1 2   0         0   ffffa0003d1c8220          xenstored
362      1 3   0        80   ffffa0003cd24200             pflogd bpf
346      1 3   0        80   ffffa0003ccb9600             pflogd netio
240      1 2   0         0   ffffa0003cd24620            syslogd
1        1 3   0        80   ffffa0003437e920               init wait
0       59 5   0       200   ffffa0003437b4a0           (zombie)
0       54 3   0       200   ffffa0003d3d1aa0        xen_balloon xen_balloon
0       53 3   0       200   ffffa0003ccb91e0              ipmi0 ipmi0
0       52 3   0       200   ffffa0003cd24a40            physiod physiod
0       51 3   0       200   ffffa0003cbb49e0           aiodoned aiodoned
0       50 3   0       200   ffffa0003cbaf5a0            ioflush syncer
0       49 3   0       200   ffffa0003cbaf180           pgdaemon pgdaemon
0       46 3   0       200   ffffa0003cbaf9c0            raidio1 raidiow
0       45 3   0       200   ffffa0003cbae160              raid1 rfnodeq
0       44 3   0       200   ffffa0003cbae580            raidio0 raidiow
0       43 3   0       200   ffffa0003cbae9a0              raid0 rfnodeq
0       42 3   0       200   ffffa0003cbad140           xbdbackd xbdbackd
0       41 3   0       200   ffffa0003cbad560            pfpurge pftm
0       40 3   0       200   ffffa0003cbad980          cryptoret crypto_w
0       39 3   0       200   ffffa0003cba9120           scsibus2 sccomp
0       38 3   0       200   ffffa0003437d4e0           scsibus1 sccomp
0       37 3   0       200   ffffa0003437d900               usb3 usbevt
0       36 3   0       200   ffffa0003437c0a0               usb2 usbevt
0       35 3   0       200   ffffa0003437c4c0               usb1 usbevt
0       34 3   0       200   ffffa0003cba9540         usbtask-dr usbtsk
0       33 3   0       200   ffffa0003cba9960         usbtask-hc usbtsk
0       32 3   0       200   ffffa0003437d0c0               usb0 usbevt
0       31 3   0       200   ffffa0003437f100          atapibus0 sccomp
0       29 3   0       200   ffffa0003437e500              unpgc unpgc
0       28 3   0       200   ffffa0003437e0e0        vmem_rehash vmem_rehash
0       27 3   0       200   ffffa0003437f520             xenbus rdst
0       26 3   0       200   ffffa0003437f940           xenwatch rplq
0       17 3   0       200   ffffa0003437c8e0            atabus0 atath
0       16 3   0       200   ffffa0003437b080           scsibus0 sccomp
0       14 3   0       200   ffffa0003437b8c0               ipmi ipmipoll
0       13 3   0       200   ffffa00034377060             sysmon smtaskq
0       12 3   0       200   ffffa00034377480         pmfsuspend pmfsuspend
0       11 3   0       200   ffffa000343778a0           pmfevent pmfevent
0       10 3   0       200   ffffa00034371040            cachegc cachegc
0        9 3   0       200   ffffa00034371460              vrele vrele
0        8 3   0       200   ffffa00034371880          modunload mod_unld
0        7 3   0       200   ffffa0003436f020            xcall/0 xcall
0        6 1   0       200   ffffa0003436f440          softser/0
0        5 1   0       200   ffffa0003436f860          softclk/0
0        4 1   0       200   ffffa0003436d000          softbio/0
0        3 1   0       200   ffffa0003436d420          softnet/0
0        2 1   0       201   ffffa0003436d840             idle/0
0        1 3   0       200   ffffffff805db720            swapper uvm


Callouts:
    ticks  wheel               arg  func
        2 -1/-256 ffffa00046998760  sleepq_timeout
        1  0/45   ffffa00046be3620  sleepq_timeout
        1  0/45   ffffa0003d42e6e0  sleepq_timeout
        4  0/48   ffffa0003d75fba0  sleepq_timeout
       14  0/58                  0  pffasttimo
       15  0/59   ffffa00001809c00  uhci_poll_hub
       15  0/59   ffffa00001809800  uhci_poll_hub
       15  0/59   ffffa00001809000  uhci_poll_hub
       19  0/63   ffffa0003cd24200  sleepq_timeout
       28  0/72                  0  if_slowtimo
       28  0/72                  0  nd6_timer
       28  0/72                  0  rt_timer_timer
       40  0/84   ffffffff805db720  sleepq_timeout
       40  0/84   ffffa00034371460  sleepq_timeout
       40  0/84   ffffa00034371040  sleepq_timeout
       46  0/90   ffffa0003d1c5738  realtimerexpire
       49  0/93                  0  pfslowtimo
       53  0/97   ffffa0003cbad560  sleepq_timeout
       86  0/130  ffffa000380a0000  bnx_tick
       97  0/141  ffffa0003cbaf5a0  sleepq_timeout
      178  0/222  ffffa0003d3dd280  sleepq_timeout
    13297  1/266  ffffa00001fcb000  bridge_timer
    13299  1/266  ffffa00001fcb800  bridge_timer
    13300  1/266  ffffa00002134800  bridge_timer
    13300  1/266  ffffa00002134000  bridge_timer
    13301  1/266  ffffa0000213c800  bridge_timer
    13302  1/266  ffffa00002140800  bridge_timer
    13304  1/266  ffffa00002140000  bridge_timer
    13394  1/266                 0  arptimer
    15293  1/273  ffffa0003cbb41a0  sleepq_timeout
    18693  1/287  ffffa0003d1c5b58  realtimerexpire
    31396  1/336  ffffa0003d1c5c60  realtimerexpire
      224  1/471  ffffa0003437b8c0  sleepq_timeout
      613  1/472  ffffffff805e8640  xen_timepush
      594  1/472                 0  vmem_rehash_all_kick
      834  1/473  ffffa0003cd24620  sleepq_timeout
     1012  1/474  ffffa0003cc26408  rf_paritymap_tick
     1319  1/475  ffffa00034916258  rf_paritymap_tick
     1500  1/476  ffffa0003cc38400  sme_events_check
     2097  1/478  ffffa0003d75f780  sleepq_timeout
     2520  1/480  ffffa0003d1c5e70  realtimerexpire
     4096  1/486  ffffa0003d4e4740  sleepq_timeout
     4670  1/488  ffffa0003437d0c0  sleepq_timeout
     4670  1/488  ffffa0003437c4c0  sleepq_timeout
     4670  1/488  ffffa0003437c0a0  sleepq_timeout
     5276  1/490  ffffa0003437d900  sleepq_timeout
     5948  1/493  ffffa0003dc35140  sleepq_timeout
     8096  1/501  ffffa00046a93900  sleepq_timeout
   100592  2/616                 0  nd6_slowtimo
   719968  2/622  ffffa00002448950  tcp_timer_keep
   598096  2/623  ffffa0003d1c5a50  realtimerexpire
  1884992  2/643                 0  in6_tmpaddrtimer
2140747400  3/896  ffffa0003d3d1680  sleepq_timeout


When it crashed, 3 HVM machines (all OpenBSD) were running, no issues.

I had to reboot the machine and have no access to ddb for the time
being. Hope the info provided is enough to track it down; should more
debugger info be required, I'm likely to run into this one again anyway.

Best regards,

Hugo


Home | Main Index | Thread Index | Old Index