Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64



On 19/05/14 17:14, Eduardo Horvath wrote:

Here is the gdb session showing the openfirmware() function after the NetBSD
kernel has called SUNW,set-trap-table:


(gdb) disas 0x1009478, 0x10094f8
Dump of assembler code from 0x1009478 to 0x10094f8:
    0x0000000001009478:  sethi  %hi(0x1800000), %o4
    0x000000000100947c:  btst  1, %sp
    0x0000000001009480:  be  %icc, 0x10094f8
    0x0000000001009484:  ldx  [ %o4 ], %o4
    0x0000000001009488:  save  %sp, -176, %sp
    0x000000000100948c:  rdpr  %pil, %i2
    0x0000000001009490:  mov  0xf, %i3
    0x0000000001009494:  cmp  %i3, %i2
    0x0000000001009498:  movle  %icc, %i2, %i3
    0x000000000100949c:  wrpr  %g0, %i3, %pil
    0x00000000010094a0:  mov  %i0, %o0
    0x00000000010094a4:  mov  %g1, %l1
    0x00000000010094a8:  mov  %g2, %l2
    0x00000000010094ac:  mov  %g3, %l3
    0x00000000010094b0:  mov  %g4, %l4
    0x00000000010094b4:  mov  %g5, %l5
    0x00000000010094b8:  mov  %g6, %l6
    0x00000000010094bc:  mov  %g7, %l7
    0x00000000010094c0:  rdpr  %pstate, %l0
    0x00000000010094c4:  call  %i4
    0x00000000010094c8:  wrpr  6, %pstate
=> 0x00000000010094cc:  wrpr  %l0, %pstate
    0x00000000010094d0:  mov  %l1, %g1
    0x00000000010094d4:  mov  %l2, %g2
    0x00000000010094d8:  mov  %l3, %g3
    0x00000000010094dc:  mov  %l4, %g4
    0x00000000010094e0:  mov  %l5, %g5
    0x00000000010094e4:  mov  %l6, %g6
    0x00000000010094e8:  mov  %l7, %g7
    0x00000000010094ec:  wrpr  %i2, 0, %pil
    0x00000000010094f0:  ret
    0x00000000010094f4:  restore  %o0, %g0, %o0
End of assembler dump.


I'm not sure what we're looking at here.  Is this kernel code or OpenBIOS
code?  I assume the machine is OK at this point?

Yes. This is a dump of the openfirmware() function from the NetBSD kernel and everything is okay until we hit the restore at the very end.

(gdb) info regi
g0             0x0      0
g1             0x1      1
g2             0x7e50000        132448256
g3             0x18d1c00        26024960
g4             0x1ae8000        28213248
g5             0x1000   4096
g6             0x0      0
g7             0x0      0
o0             0x0      0
o1             0x1      1
o2             0xfffffffffffffff8       -8
o3             0xffffffff00000000       -4294967296
o4             0x1c14230        29442608
o5             0x1000000        16777216
sp             0x1c054a1        0x1c054a1
o7             0x10094c4        16815300
l0             0x16     22
l1             0x1      1
l2             0x7e50000        132448256
l3             0x18d1c00        26024960
l4             0x1ae8000        28213248
l5             0x1000   4096
l6             0x0      0
l7             0x0      0
i0             0x1c05e00        29384192
i1             0x7e50000        132448256
i2             0xd      13
i3             0xf      15
i4             0xffd0fe60       4291886688
i5             0x18d1800        26023936
fp             0x1c05551        0x1c05551
i7             0x135fbc0        20315072
pc             0x10094cc        0x10094cc
npc            0x10094d0        0x10094d0
state          0x4482000604     294238815748
fsr            0x0      [ ]
fprs           0x4      [ FEF ]
y              0x0      0
cwp            0x4      4
pstate         0x6      [ IE PRIV ]
asi            0x82     130
ccr            0x44     68
(gdb)


The MMU TLB entries look like this:


QEMU 2.0.50 monitor - type 'help' for more information
(qemu) info tlb
MMU contexts: Primary: 0, Secondary: 0
DMMU dump
[00] VA: ffe00000, PA: 7f00000, 512k, priv, RW, locked, ctx 0 local
[01] VA: ffe80000, PA: 7f80000, 512k, priv, RW, locked, ctx 0 local
[02] VA: ffd00000, PA: 1fff0000000, 512k, priv, RO, locked, ctx 0 local
[03] VA: ffd80000, PA: 1fff0080000, 512k, priv, RO, locked, ctx 0 local
[04] VA: ffc80000, PA: 7e80000, 512k, priv, RW, locked, ctx 0 local
[05] VA: 4000, PA: 4000,   8k, priv, RW, unlocked, ctx 0 local
[06] VA: 6000, PA: 6000,   8k, priv, RW, unlocked, ctx 0 local
[07] VA: 8000, PA: 8000,   8k, priv, RW, unlocked, ctx 0 local
[08] VA: a000, PA: a000,   8k, priv, RW, unlocked, ctx 0 local
[09] VA: c000, PA: c000,   8k, priv, RW, unlocked, ctx 0 local
[10] VA: e000, PA: e000,   8k, priv, RW, unlocked, ctx 0 local
[11] VA: 10000, PA: 10000,   8k, priv, RW, unlocked, ctx 0 local
[12] VA: 12000, PA: 12000,   8k, priv, RW, unlocked, ctx 0 local
[13] VA: 14000, PA: 14000,   8k, priv, RW, unlocked, ctx 0 local
[14] VA: 16000, PA: 16000,   8k, priv, RW, unlocked, ctx 0 local
[15] VA: 18000, PA: 18000,   8k, priv, RW, unlocked, ctx 0 local
[16] VA: 1a000, PA: 1a000,   8k, priv, RW, unlocked, ctx 0 local
[17] VA: 100000, PA: 100000,   8k, priv, RW, unlocked, ctx 0 local
[18] VA: 102000, PA: 102000,   8k, priv, RW, unlocked, ctx 0 local
[19] VA: 104000, PA: 104000,   8k, priv, RW, unlocked, ctx 0 local
[20] VA: 106000, PA: 106000,   8k, priv, RW, unlocked, ctx 0 local
[21] VA: 108000, PA: 108000,   8k, priv, RW, unlocked, ctx 0 local
[22] VA: 10a000, PA: 10a000,   8k, priv, RW, unlocked, ctx 0 local
[23] VA: 10c000, PA: 10c000,   8k, priv, RW, unlocked, ctx 0 local
[24] VA: 10e000, PA: 10e000,   8k, priv, RW, unlocked, ctx 0 local
[25] VA: 110000, PA: 110000,   8k, priv, RW, unlocked, ctx 0 local
[26] VA: 112000, PA: 112000,   8k, priv, RW, unlocked, ctx 0 local
[27] VA: 114000, PA: 114000,   8k, priv, RW, unlocked, ctx 0 local
[28] VA: ffc7e000, PA: 7e7e000,   8k, priv, RW, unlocked, ctx 0 local
[29] VA: ffc7a000, PA: 7e7a000,   8k, priv, RW, unlocked, ctx 0 local
[30] VA: ffc7c000, PA: 7e7c000,   8k, priv, RW, unlocked, ctx 0 local
[31] VA: ffc78000, PA: 7e78000,   8k, priv, RW, unlocked, ctx 0 local
[32] VA: ffc76000, PA: 7e76000,   8k, priv, RW, unlocked, ctx 0 local
[33] VA: ffc72000, PA: 7e72000,   8k, priv, RW, unlocked, ctx 0 local
[34] VA: ffc70000, PA: 7e70000,   8k, priv, RW, unlocked, ctx 0 local
[35] VA: ffc6e000, PA: 7e6e000,   8k, priv, RW, unlocked, ctx 0 local
[36] VA: ffc64000, PA: 7e64000,   8k, priv, RW, unlocked, ctx 0 local
[37] VA: ffc66000, PA: 7e66000,   8k, priv, RW, unlocked, ctx 0 local
[38] VA: ffc68000, PA: 7e68000,   8k, priv, RW, unlocked, ctx 0 local
[39] VA: ffc6a000, PA: 7e6a000,   8k, priv, RW, unlocked, ctx 0 local
[40] VA: ffc6c000, PA: 7e6c000,   8k, priv, RW, unlocked, ctx 0 local
[41] VA: ffc62000, PA: 7e62000,   8k, priv, RW, unlocked, ctx 0 local
[42] VA: 1000000, PA: 7800000,   4M, priv, RO, locked, ctx 0 local
[43] VA: 1400000, PA: 7400000,   4M, priv, RO, locked, ctx 0 local
[44] VA: 1800000, PA: 7000000,   4M, priv, RW, locked, ctx 0 local
[45] VA: ffc60000, PA: 7e60000,   8k, priv, RW, unlocked, ctx 0 local
[46] VA: 7ffc000, PA: 7e5c000,   8k, priv, RW, unlocked, ctx 0 local
[47] VA: 7ffe000, PA: 7e5e000,   8k, priv, RW, unlocked, ctx 0 local
[48] VA: 7ffa000, PA: 7e5a000,   8k, priv, RW, unlocked, ctx 0 local
[49] VA: 1c0c000, PA: 7e40000,   8k, priv, RW, unlocked, ctx 0 local
[50] VA: 1c0e000, PA: 7e42000,   8k, priv, RW, unlocked, ctx 0 local
[51] VA: 1c10000, PA: 7e44000,   8k, priv, RW, unlocked, ctx 0 local
[52] VA: 1c12000, PA: 7e46000,   8k, priv, RW, unlocked, ctx 0 local
[53] VA: 1c14000, PA: 7e48000,   8k, priv, RW, unlocked, ctx 0 local
[54] VA: 1c16000, PA: 7e4a000,   8k, priv, RW, unlocked, ctx 0 local
[55] VA: 1c18000, PA: 7e4c000,   8k, priv, RW, unlocked, ctx 0 local
[56] VA: 1c1a000, PA: 7e4e000,   8k, priv, RW, unlocked, ctx 0 local
[57] VA: e0010000, PA: 7e40000,  64k, priv, RW, locked, ctx 0 local
[58] VA: 1c04000, PA: 14000,   8k, priv, RW, unlocked, ctx 0 local
IMMU dump
[00] VA: ffd00000, PA: 1fff0000000, 512k, priv, locked, ctx 0 local
[01] VA: ffc80000, PA: 7e80000, 512k, priv, locked, ctx 0 local
[02] VA: 100000, PA: 100000,   8k, priv, unlocked, ctx 0 local
[03] VA: 102000, PA: 102000,   8k, priv, unlocked, ctx 0 local
[04] VA: 10a000, PA: 10a000,   8k, priv, unlocked, ctx 0 local
[05] VA: 10c000, PA: 10c000,   8k, priv, unlocked, ctx 0 local
[06] VA: 110000, PA: 110000,   8k, priv, unlocked, ctx 0 local
[07] VA: 104000, PA: 104000,   8k, priv, unlocked, ctx 0 local
[08] VA: 108000, PA: 108000,   8k, priv, unlocked, ctx 0 local
[09] VA: 10e000, PA: 10e000,   8k, priv, unlocked, ctx 0 local
[10] VA: 106000, PA: 106000,   8k, priv, unlocked, ctx 0 local
[11] VA: 1000000, PA: 7800000,   4M, priv, locked, ctx 0 local
[12] VA: 1400000, PA: 7400000,   4M, priv, locked, ctx 0 local
(qemu)


As soon as I hit the restore at 0x10094f4 in gdb, I get a fill_0_normal trap
which vectors to 0x1001800:

So the fault happens on the last instruction of the previous routine?  And
this is *after* the call to SUNW,set-trap-table?  This means you should be
running with the kernel's trap table, right?

Yes, that's correct.

(gdb) disas 0x1001800, 0x100184c
Dump of assembler code from 0x1001800 to 0x100184c:
=> 0x0000000001001800:  wr  %g0, 0x11, %asi
    0x0000000001001804:  ldxa  [ %sp + 0x7ff ] %asi, %l0
    0x0000000001001808:  ldxa  [ %sp + 0x807 ] %asi, %l1
    0x000000000100180c:  ldxa  [ %sp + 0x80f ] %asi, %l2
    0x0000000001001810:  ldxa  [ %sp + 0x817 ] %asi, %l3
    0x0000000001001814:  ldxa  [ %sp + 0x81f ] %asi, %l4
    0x0000000001001818:  ldxa  [ %sp + 0x827 ] %asi, %l5
    0x000000000100181c:  ldxa  [ %sp + 0x82f ] %asi, %l6
    0x0000000001001820:  ldxa  [ %sp + 0x837 ] %asi, %l7
    0x0000000001001824:  ldxa  [ %sp + 0x83f ] %asi, %i0
    0x0000000001001828:  ldxa  [ %sp + 0x847 ] %asi, %i1
    0x000000000100182c:  ldxa  [ %sp + 0x84f ] %asi, %i2
    0x0000000001001830:  ldxa  [ %sp + 0x857 ] %asi, %i3
    0x0000000001001834:  ldxa  [ %sp + 0x85f ] %asi, %i4
    0x0000000001001838:  ldxa  [ %sp + 0x867 ] %asi, %i5
    0x000000000100183c:  ldxa  [ %sp + 0x86f ] %asi, %fp
    0x0000000001001840:  ldxa  [ %sp + 0x877 ] %asi, %i7
    0x0000000001001844:  restored
    0x0000000001001848:  retry
End of assembler dump.

So this is presumably fill_0_normal?

Yes, this is a dump of fill_0_normal from the kernel trap table.

(gdb) info regi
g0             0x0      0
g1             0x1f61ec8c2      8424179906
g2             0x1f60f8682      8423179906
g3             0xffe11df8       4292943352
g4             0x0      0
g5             0x0      0
g6             0x0      0
g7             0x0      0
o0             0x1c05e00        29384192
o1             0x7e50000        132448256
o2             0xd      13
o3             0xf      15
o4             0xffd0fe60       4291886688
o5             0x18d1800        26023936
sp             0x1c05551        0x1c05551
o7             0x135fbc0        20315072
l0             0xffffffffffe30c38       -1897416
l1             0xffe8ac38       4293438520
l2             0x17500f0        24445168
l3             0x1746c78        24407160
l4             0x1816400        25256960
l5             0x18c0800        25954304
l6             0x18c0800        25954304
l7             0x19cd570        27055472
i0             0xa      10
i1             0xffe8b0f0       4293439728
i2             0x20     32
i3             0xffd0fe60       4291886688
i4             0x17502f8        24445688
i5             0x0      0
fp             0xffe85219       0xffe85219
i7             0xffd0a988       4291864968
pc             0x1001800        0x1001800
npc            0x1001804        0x1001804
state          0x4482001503     294238819587
fsr            0x0      [ ]
fprs           0x4      [ FEF ]
y              0x0      0
cwp            0x3      3
pstate         0x15     [ AG PRIV PEF ]
asi            0x82     130
ccr            0x44     68
(gdb)


You should learn how to use ddb.  It has lots of nifty MD commands to dump
supervisor state registers, such as the trap stack.

Okay, that's definitely useful to know. Can it show any of the privileged registers as that would be quite helpful? Not that it matters too much as I can easy extract the information from internal QEMU variables as needed.

(cut)

If you've made it this far, then thank you for your time and I look forward to
hearing from you further

It's been a while since I last looked at the SPARC V9 manual, but ISTR
%wstate register controls which of the window fill/spill traps is taken
for regular and "other" states.

You need to dump the contents of %wstate.

I know that when the fill_0_normal trap is taken on the restore at the end of openfirmware(), the QEMU internal variables look like this:

(gdb) p/x env->cwp
$7 = 0x3
(gdb) p/x env->canrestore
$8 = 0x0
(gdb) p/x env->cansave
$9 = 0x6
(gdb) p/x env->cleanwin
$10 = 0x7
(gdb) p/x env->otherwin
$11 = 0x0
(gdb) p/x env->wstate
$12 = 0x0
(gdb)

There are 16 window trap vectors for each operation in both the normal and
nucleus trap tables.  I think both trap tables should be pretty much the
same.

The first 8 are for "normal" traps.  These are called when %otherwin is 0.
This occurs when all the windows are from the same address space, either
kernel or userland.

The second 8 are for "other" traps.  When a process traps from userland to
the kernel, the kernel sets %otherwin to the number of userland
stackframes.  Every time the kernel spills a frame, if %otherwin is not
zero the CPU calls one of the "other" traps, and then decrements
%otherwin.  But you probably don't care about this right now.

Of those are trap vectors:

        0 is used for 32-bit userland stackframes
        1 is used for 64-bit userland stackframes
        2 will check the stack alignment and call one of the above
routines.

        4 is used for 32-bit kernel stackframes
        5 is used for 64-bit kernel stackframes
        6 will check the stack alignment and call one of the above
routines.

When running user mode we set %wstate to 022, which means it will call
fill_2_normal and fill_2_other.  When running in kernel mode we set
%wstate to 066 which means it will call fill_6_normal and fill_6_other.

The sun4u code in locore.s does this:

         /* sun4u */
         set     _C_LABEL(trapbase), %l1
         call    _C_LABEL(prom_set_trap_table_sun4u)     ! Now we should be
running 100% from our handlers
          mov    %l1, %o0
7:
         wrpr    %l1, 0, %tba                    ! Make sure the PROM
didn't foul up.

         /*
          * Switch to the kernel mode and run away.
          */
         wrpr    %g0, WSTATE_KERN, %wstate

So right after installing the trap table it sets the %wstate register to
use trap vector 6 so it will use fill_6_normal.

Thanks for clarifying this - the information is really helpful and I now understand what is happening here.

I'm not entirely sure what's going on here since you didn't have symbols
in the disassembly and there's no stack trace, but I assume the routine
generating the fault is openfirmware() in the kernel.

Yes, that's correct. Unfortunately I don't have a NetBSD build environment so I've been doing most of the work by disassembling the kernel via QEMU's gdbstub and comparing against the source in a web browser. So based upon what you're saying it appears we have a stack like this (indented to show window saves/restores):

  cpu_initialize() {
    prom_set_trap_table() {
      openfirmware() {
          /* OpenBIOS C code */
          of_client_interface() {
             enter_forth() {
               set_trap_table() {
                  SUNW,set-trap-table
               }
             }
          }
      }
      /* fill_0_normal trap occurs here */
    }

    /* Switch to kernel mode */
    wrpr %g0, WSTATE_KERN, %wstate
  }

The assumption has to be that in order for this to work without errors then no window/fault traps can occur between calling SUNW,set-trap-table in OpenBIOS and getting back to cpu_initialize() to set the correct value for %wstate which is quite a few window levels. AIUI data faults can't happen because the ASI is set to 0x82 (no fault) which is why it is the fill_0_normal window fault which is triggering this.

I'm starting to wonder if setting %wstate to use trap vector 6 should happen *before* calling prom_set_trap_table()? At the point SUNW,set-trap-table is called then the kernel is effectively saying "I am taking responsibility for handling all traps from now on", and so if the kernel cannot handle traps after this point for any reason, then it is not honouring its contract to manage the trap table.

Regardless of this, now I understand this further I need to look into the OpenBIOS CIF interface in order to see if I can preserve the entire window state across CIF calls which I suspect might be what Sun's OBP does. Otherwise it would not be possible to run many versions of NetBSD (and OpenBSD which suffers from the same problem) under emulation :/

My guess is either QUEMU is ignoring the contents of the %wstate register,
or OpenBIOS is changing the contents of the %wstate register and not
restoring it before returning to the kernel.

FWIW I've double-checked the OpenBIOS source code and the only changes to %wstate other than system reset are to preserve the values during data and instruction faults.


Many thanks,

Mark.



Home | Main Index | Thread Index | Old Index