Subject: Re: Xen3 DomU spins 100% busy during console access
To: None <port-xen@netbsd.org>
From: Simon Truss <simon@bigble.demon.co.uk>
List: port-xen
Date: 12/04/2006 08:11:28
Simon Truss wrote:

> I was in vim in insert mode and just pasted a large selection into the 
> xenconsole - instant busy lockup. Once I figured out what caused it I 
> was able to repeat this 5 out of 5 times.


ddb worked! and the stack trace said it was in xenconscn_getc

well I found my magic number :-)

xen3-public/io/console.h
struct xencons_interface {
     char in[1024];

I was a little surprised to see the DomU still spinning at 100% cpu, I 
know this as it dipped down to 99.x% for a moment when the Dom0 was 
busy. does ddb spin busy in normal operation or was the cpu consumed by 
something else?

while I do not understand the full context of how these functions are 
called and I have not tracked the spltty usage etc the following looks 
suspicious as the potential for infinite loops.

Is it not possible for the timing of these calls to conspire to produce 
a loop in getc? handler finishes, getc starts and never gets past the 
while loop. This can only happen if a shared resource is locked.


xencons_handler(void *arg)
{

         while (cons != prod) {
		...
                 if (__predict_false(xencons_interface->in_cons != 
cons)) 		{
                         /* catch up with xenconscn_getc() */
                         cons = xencons_interface->in_cons;
                         prod = xencons_interface->in_prod;
                         ...
                 } else
	...
	}
	...
	xencons_interface->in_cons = cons;
	...
}

xenconscn_getc(dev_t dev)
{
...
         cons = xencons_interface->in_cons;
         prod = xencons_interface->in_prod;
	...
         while (cons == prod) {
		HYPERVISOR_yield();
                 prod = xencons_interface->in_prod;
         }

lastly could someone explain the semantics of cons, prod indexes. are 
they next_free/last_full next_free/next_full or something else. I ask 
this as its very easy to get off by one errors in 
free/full/bytes_in_buffer tests. I've written enough ring buffers to 
always write the invariants in the header file before I start coding.


hope all this helps

Simon


db> ps
  PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND 
    WAIT
  591            639      591          0 2  0x4002    1               vi
  639            351      639          0 2  0x4002    1              csh 
   pause
  1424           444      444         12 2  0x4108    1           pickup 
  select
  297            228      228       1001 2  0x4100    1       imap-login 
  kqread
  224            228      228       1001 2  0x4100    1       imap-login 
  kqread
  230            228      228       1001 2  0x4100    1       imap-login 
  kqread
  1282           228      228          0 2  0x4000    1     dovecot-auth 
  kqread
  228              1      228          0 2       0    1          dovecot 
  kqread
  492            444      444         12 2  0x4108    1             qmgr 
  select
  351              1      351          0 2  0x4102    1            login 
    wait
  494              1      494          0 2       0    1             cron 
nanosle
  469              1      469          0 2       0    1            inetd 
  kqread
  444              1      444          0 2  0x4108    1           master 
  select
  330              1      330          0 2       0    1             sshd 
  select
  303              1      303          0 2       0    1           powerd 
  kqread
  151              1      151          0 2       0    1          syslogd 
  kqread
  30               0        0          0 2 0x20200    1          physiod 
physiod
  8                0        0          0 2 0x20200    1         aiodoned 
aiodone
  7                0        0          0 2 0x20200    1          ioflush 
  syncer
  6                0        0          0 2 0x20200    1       pagedaemon 
pgdaemo
  5                0        0          0 2 0x20200    1           xenbus 
    rdst
  4                0        0          0 2 0x20200    1         xenwatch 
   evtsq
  3                0        0          0 2 0x20200    1        cryptoret 
crypto_
  1                0        1          0 2  0x4001    1             init 
    wait
  0               -1        0          0 2 0x20200    1          swapper 
schedul
         dce
cpu_Debugger(41,4,2,c,c040ef41) at netbsd:cpu_Debugger+0x4
xenconscn_getc(8f00,2b,c0508804,c0198865,2b) at netbsd:xenconscn_getc+0x38f
cngetc(2b,10,0,0,c0508858) at netbsd:cngetc+0x1b
db_readline(c0449b80,78,0,c0508830,0) at netbsd:db_readline+0x35
db_read_line(c0401a2a,c0401c31,c04c9a54,0,c05089e0) at 
netbsd:db_read_line+0x1c
db_command_loop(c037ecf4,c050888c,c0508894,c03933ed,8f00) at 
netbsd:db_command_loop+0xc9
db_trap(1,0,58,0,64) at netbsd:db_trap+0xe1
kdb_trap(1,0,c05089e0,8,c07e0f60) at netbsd:kdb_trap+0xec
trap() at netbsd:trap+0xe1
--- trap (number 1) ---
cpu_Debugger(2b,ca223e70,3f1,23,400) at netbsd:cpu_Debugger+0x4
xencons_tty_input(c07d3e80,c050f000,23,c0508a74,c0397371) at 
netbsd:xencons_tty_input+0xa6
xencons_handler(c07d3e80,c0508b0c,4,c0508b04,b2cb) at 
netbsd:xencons_handler+0x10f
evtchn_do_event(2,c0508b0c,1,2,0) at netbsd:evtchn_do_event+0xbd
do_hypervisor_callback(c0508b0c,0,11,ca000031,c0500011) at 
netbsd:do_hypervisor_callback+0xfd
hypervisor_callback(0,c0440684,c0440660,c02e5591,c047eb60) at 
netbsd:hypervisor_callback+0x63
cpu_switch(c047eb60,0,c08305c0,c02c96ae,c047d860) at netbsd:cpu_switch+0x10e
ltsleep(c047e9a0,4,c0411d15,0,0) at netbsd:ltsleep+0x57c
uvm_scheduler(c047d0e8,0,c047c968,c0408f3e,0) at netbsd:uvm_scheduler+0x159
main(c01001df,c01001e7,0,0,0) at netbsd:main+0x59f