Re: dom0 kernel profiling on Xen

To: Thor Lancelot Simon <tls%panix.com@localhost>
Subject: Re: dom0 kernel profiling on Xen
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Mon, 11 Apr 2016 12:34:30 +0200

On Sun, Apr 03, 2016 at 08:45:28PM -0400, Thor Lancelot Simon wrote:
> On Sun, Apr 03, 2016 at 07:34:25PM -0400, Thor Lancelot Simon wrote:
> > Before I dig into this too much -- is this expected to work?  I'm at wits'
> > end trying to track down the consumer of 50-80% "interrupt" time under I/O
> > load on my dom0 elsewise.
> 
> Well, that was cool:
> 
> # kgmon -r -b
> (XEN) Pagetable walk from ffffa0006050ffc8:
> (XEN)  L4[0x140] = 000000120b3af027 00000000000013af
> (XEN)  L3[0x001] = 00000010400fd027 000000000007fff8
> (XEN)  L2[0x102] = 000000107c7d8027 0000000000011cc3 
> (XEN)  L1[0x10f] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d080215bfe create_bounce_frame+0x66/0x13a
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.5.2  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e033:[<ffffffff8075f8c4>]
> (XEN) RFLAGS: 0000000000010296   EM: 1   CONTEXT: pv guest (d0v0)
> (XEN) rax: ffffffff813a8000   rbx: 0000000000000001   rcx: 0000000000000000
> (XEN) rdx: 0000000000000004   rsi: ffffffff8020aad2   rdi: ffffffff8075f962
> (XEN) rbp: ffffa00060510010   rsp: ffffa0006050ffd8   r8:  00007f8000000000
> (XEN) r9:  0000000000000000   r10: ffffa00003a555a0   r11: ffffa000605100a0
> (XEN) r12: ffffffff8020aad2   r13: ffffffff8075f962   r14: ffffffff80c8d020
> (XEN) r15: 0000000000000004   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 000000120b37f000   cr2: ffffa0006050ffd8
> (XEN) ds: 003f   es: 003f   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffffa0006050ffd8:
> (XEN)   Fault while accessing guest memory.
> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.

I can reproduce this, with a domU.

It looks like we run out of stack in mcount.
I found what looks like a loop:
mount:
    [...]
0xffffffff8049a2ca <+54>:    callq  0xffffffff8049a2f3 <_mcount>
    [...]

_mcount
    [...]
0xffffffff8049a335 <+66>:    callq  0xffffffff80106478 <x86_lfence>
    [...]
0xffffffff8049a341 <+78>:    callq  0xffffffff8010c7c0 <__cpu_simple_lock>
    [...]
0xffffffff8049a443 <+336>:   callq  0xffffffff8010c7e0 <__cpu_simple_unlock>
    [...]
0xffffffff8049a461 <+366>:   jmpq   0xffffffff80106478 <x86_lfence>

x86_lfence:
0xffffffff80106478 <+0>:     push   %rbp
0xffffffff80106479 <+1>:     lea    (%rsp),%rbp
0xffffffff8010647d <+5>:     callq  0xffffffff8049a294 <mcount>
0xffffffff80106482 <+10>:    pop    %rbp
0xffffffff80106483 <+11>:    lfence 
0xffffffff80106486 <+14>:    retq   
0xffffffff80106487 <+15>:    nop

__cpu_simple_unlock:
0xffffffff8010c7e0 <+0>:     push   %rbp
0xffffffff8010c7e1 <+1>:     lea    (%rsp),%rbp
0xffffffff8010c7e5 <+5>:     callq  0xffffffff8049a294 <mcount>
0xffffffff8010c7ea <+10>:    pop    %rbp
0xffffffff8010c7eb <+11>:    movb   $0x0,(%rdi)
0xffffffff8010c7ee <+14>:    retq   

it seems that __cpu_simple_lock doens't call mcount.
Now I have to find why this doens't loop on bare metal. Any idea before
I keep looking ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--

Follow-Ups:
- Re: dom0 kernel profiling on Xen
  - From: Manuel Bouyer

References:
- dom0 kernel profiling on Xen
  - From: Thor Lancelot Simon
- Re: dom0 kernel profiling on Xen
  - From: Thor Lancelot Simon

Prev by Date: Re: DTrace on Xen?
Next by Date: Re: dom0 kernel profiling on Xen
Previous by Thread: Re: dom0 kernel profiling on Xen
Next by Thread: Re: dom0 kernel profiling on Xen
Indexes:

Home | Main Index | Thread Index | Old Index