On 19/05/14 17:14, Eduardo Horvath wrote:
Here is the gdb session showing the openfirmware() function after the NetBSD kernel has called SUNW,set-trap-table: (gdb) disas 0x1009478, 0x10094f8 Dump of assembler code from 0x1009478 to 0x10094f8: 0x0000000001009478: sethi %hi(0x1800000), %o4 0x000000000100947c: btst 1, %sp 0x0000000001009480: be %icc, 0x10094f8 0x0000000001009484: ldx [ %o4 ], %o4 0x0000000001009488: save %sp, -176, %sp 0x000000000100948c: rdpr %pil, %i2 0x0000000001009490: mov 0xf, %i3 0x0000000001009494: cmp %i3, %i2 0x0000000001009498: movle %icc, %i2, %i3 0x000000000100949c: wrpr %g0, %i3, %pil 0x00000000010094a0: mov %i0, %o0 0x00000000010094a4: mov %g1, %l1 0x00000000010094a8: mov %g2, %l2 0x00000000010094ac: mov %g3, %l3 0x00000000010094b0: mov %g4, %l4 0x00000000010094b4: mov %g5, %l5 0x00000000010094b8: mov %g6, %l6 0x00000000010094bc: mov %g7, %l7 0x00000000010094c0: rdpr %pstate, %l0 0x00000000010094c4: call %i4 0x00000000010094c8: wrpr 6, %pstate => 0x00000000010094cc: wrpr %l0, %pstate 0x00000000010094d0: mov %l1, %g1 0x00000000010094d4: mov %l2, %g2 0x00000000010094d8: mov %l3, %g3 0x00000000010094dc: mov %l4, %g4 0x00000000010094e0: mov %l5, %g5 0x00000000010094e4: mov %l6, %g6 0x00000000010094e8: mov %l7, %g7 0x00000000010094ec: wrpr %i2, 0, %pil 0x00000000010094f0: ret 0x00000000010094f4: restore %o0, %g0, %o0 End of assembler dump.I'm not sure what we're looking at here. Is this kernel code or OpenBIOS code? I assume the machine is OK at this point?
Yes. This is a dump of the openfirmware() function from the NetBSD kernel and everything is okay until we hit the restore at the very end.
(gdb) info regi g0 0x0 0 g1 0x1 1 g2 0x7e50000 132448256 g3 0x18d1c00 26024960 g4 0x1ae8000 28213248 g5 0x1000 4096 g6 0x0 0 g7 0x0 0 o0 0x0 0 o1 0x1 1 o2 0xfffffffffffffff8 -8 o3 0xffffffff00000000 -4294967296 o4 0x1c14230 29442608 o5 0x1000000 16777216 sp 0x1c054a1 0x1c054a1 o7 0x10094c4 16815300 l0 0x16 22 l1 0x1 1 l2 0x7e50000 132448256 l3 0x18d1c00 26024960 l4 0x1ae8000 28213248 l5 0x1000 4096 l6 0x0 0 l7 0x0 0 i0 0x1c05e00 29384192 i1 0x7e50000 132448256 i2 0xd 13 i3 0xf 15 i4 0xffd0fe60 4291886688 i5 0x18d1800 26023936 fp 0x1c05551 0x1c05551 i7 0x135fbc0 20315072 pc 0x10094cc 0x10094cc npc 0x10094d0 0x10094d0 state 0x4482000604 294238815748 fsr 0x0 [ ] fprs 0x4 [ FEF ] y 0x0 0 cwp 0x4 4 pstate 0x6 [ IE PRIV ] asi 0x82 130 ccr 0x44 68 (gdb) The MMU TLB entries look like this: QEMU 2.0.50 monitor - type 'help' for more information (qemu) info tlb MMU contexts: Primary: 0, Secondary: 0 DMMU dump [00] VA: ffe00000, PA: 7f00000, 512k, priv, RW, locked, ctx 0 local [01] VA: ffe80000, PA: 7f80000, 512k, priv, RW, locked, ctx 0 local [02] VA: ffd00000, PA: 1fff0000000, 512k, priv, RO, locked, ctx 0 local [03] VA: ffd80000, PA: 1fff0080000, 512k, priv, RO, locked, ctx 0 local [04] VA: ffc80000, PA: 7e80000, 512k, priv, RW, locked, ctx 0 local [05] VA: 4000, PA: 4000, 8k, priv, RW, unlocked, ctx 0 local [06] VA: 6000, PA: 6000, 8k, priv, RW, unlocked, ctx 0 local [07] VA: 8000, PA: 8000, 8k, priv, RW, unlocked, ctx 0 local [08] VA: a000, PA: a000, 8k, priv, RW, unlocked, ctx 0 local [09] VA: c000, PA: c000, 8k, priv, RW, unlocked, ctx 0 local [10] VA: e000, PA: e000, 8k, priv, RW, unlocked, ctx 0 local [11] VA: 10000, PA: 10000, 8k, priv, RW, unlocked, ctx 0 local [12] VA: 12000, PA: 12000, 8k, priv, RW, unlocked, ctx 0 local [13] VA: 14000, PA: 14000, 8k, priv, RW, unlocked, ctx 0 local [14] VA: 16000, PA: 16000, 8k, priv, RW, unlocked, ctx 0 local [15] VA: 18000, PA: 18000, 8k, priv, RW, unlocked, ctx 0 local [16] VA: 1a000, PA: 1a000, 8k, priv, RW, unlocked, ctx 0 local [17] VA: 100000, PA: 100000, 8k, priv, RW, unlocked, ctx 0 local [18] VA: 102000, PA: 102000, 8k, priv, RW, unlocked, ctx 0 local [19] VA: 104000, PA: 104000, 8k, priv, RW, unlocked, ctx 0 local [20] VA: 106000, PA: 106000, 8k, priv, RW, unlocked, ctx 0 local [21] VA: 108000, PA: 108000, 8k, priv, RW, unlocked, ctx 0 local [22] VA: 10a000, PA: 10a000, 8k, priv, RW, unlocked, ctx 0 local [23] VA: 10c000, PA: 10c000, 8k, priv, RW, unlocked, ctx 0 local [24] VA: 10e000, PA: 10e000, 8k, priv, RW, unlocked, ctx 0 local [25] VA: 110000, PA: 110000, 8k, priv, RW, unlocked, ctx 0 local [26] VA: 112000, PA: 112000, 8k, priv, RW, unlocked, ctx 0 local [27] VA: 114000, PA: 114000, 8k, priv, RW, unlocked, ctx 0 local [28] VA: ffc7e000, PA: 7e7e000, 8k, priv, RW, unlocked, ctx 0 local [29] VA: ffc7a000, PA: 7e7a000, 8k, priv, RW, unlocked, ctx 0 local [30] VA: ffc7c000, PA: 7e7c000, 8k, priv, RW, unlocked, ctx 0 local [31] VA: ffc78000, PA: 7e78000, 8k, priv, RW, unlocked, ctx 0 local [32] VA: ffc76000, PA: 7e76000, 8k, priv, RW, unlocked, ctx 0 local [33] VA: ffc72000, PA: 7e72000, 8k, priv, RW, unlocked, ctx 0 local [34] VA: ffc70000, PA: 7e70000, 8k, priv, RW, unlocked, ctx 0 local [35] VA: ffc6e000, PA: 7e6e000, 8k, priv, RW, unlocked, ctx 0 local [36] VA: ffc64000, PA: 7e64000, 8k, priv, RW, unlocked, ctx 0 local [37] VA: ffc66000, PA: 7e66000, 8k, priv, RW, unlocked, ctx 0 local [38] VA: ffc68000, PA: 7e68000, 8k, priv, RW, unlocked, ctx 0 local [39] VA: ffc6a000, PA: 7e6a000, 8k, priv, RW, unlocked, ctx 0 local [40] VA: ffc6c000, PA: 7e6c000, 8k, priv, RW, unlocked, ctx 0 local [41] VA: ffc62000, PA: 7e62000, 8k, priv, RW, unlocked, ctx 0 local [42] VA: 1000000, PA: 7800000, 4M, priv, RO, locked, ctx 0 local [43] VA: 1400000, PA: 7400000, 4M, priv, RO, locked, ctx 0 local [44] VA: 1800000, PA: 7000000, 4M, priv, RW, locked, ctx 0 local [45] VA: ffc60000, PA: 7e60000, 8k, priv, RW, unlocked, ctx 0 local [46] VA: 7ffc000, PA: 7e5c000, 8k, priv, RW, unlocked, ctx 0 local [47] VA: 7ffe000, PA: 7e5e000, 8k, priv, RW, unlocked, ctx 0 local [48] VA: 7ffa000, PA: 7e5a000, 8k, priv, RW, unlocked, ctx 0 local [49] VA: 1c0c000, PA: 7e40000, 8k, priv, RW, unlocked, ctx 0 local [50] VA: 1c0e000, PA: 7e42000, 8k, priv, RW, unlocked, ctx 0 local [51] VA: 1c10000, PA: 7e44000, 8k, priv, RW, unlocked, ctx 0 local [52] VA: 1c12000, PA: 7e46000, 8k, priv, RW, unlocked, ctx 0 local [53] VA: 1c14000, PA: 7e48000, 8k, priv, RW, unlocked, ctx 0 local [54] VA: 1c16000, PA: 7e4a000, 8k, priv, RW, unlocked, ctx 0 local [55] VA: 1c18000, PA: 7e4c000, 8k, priv, RW, unlocked, ctx 0 local [56] VA: 1c1a000, PA: 7e4e000, 8k, priv, RW, unlocked, ctx 0 local [57] VA: e0010000, PA: 7e40000, 64k, priv, RW, locked, ctx 0 local [58] VA: 1c04000, PA: 14000, 8k, priv, RW, unlocked, ctx 0 local IMMU dump [00] VA: ffd00000, PA: 1fff0000000, 512k, priv, locked, ctx 0 local [01] VA: ffc80000, PA: 7e80000, 512k, priv, locked, ctx 0 local [02] VA: 100000, PA: 100000, 8k, priv, unlocked, ctx 0 local [03] VA: 102000, PA: 102000, 8k, priv, unlocked, ctx 0 local [04] VA: 10a000, PA: 10a000, 8k, priv, unlocked, ctx 0 local [05] VA: 10c000, PA: 10c000, 8k, priv, unlocked, ctx 0 local [06] VA: 110000, PA: 110000, 8k, priv, unlocked, ctx 0 local [07] VA: 104000, PA: 104000, 8k, priv, unlocked, ctx 0 local [08] VA: 108000, PA: 108000, 8k, priv, unlocked, ctx 0 local [09] VA: 10e000, PA: 10e000, 8k, priv, unlocked, ctx 0 local [10] VA: 106000, PA: 106000, 8k, priv, unlocked, ctx 0 local [11] VA: 1000000, PA: 7800000, 4M, priv, locked, ctx 0 local [12] VA: 1400000, PA: 7400000, 4M, priv, locked, ctx 0 local (qemu) As soon as I hit the restore at 0x10094f4 in gdb, I get a fill_0_normal trap which vectors to 0x1001800:So the fault happens on the last instruction of the previous routine? And this is *after* the call to SUNW,set-trap-table? This means you should be running with the kernel's trap table, right?
Yes, that's correct.
(gdb) disas 0x1001800, 0x100184c Dump of assembler code from 0x1001800 to 0x100184c: => 0x0000000001001800: wr %g0, 0x11, %asi 0x0000000001001804: ldxa [ %sp + 0x7ff ] %asi, %l0 0x0000000001001808: ldxa [ %sp + 0x807 ] %asi, %l1 0x000000000100180c: ldxa [ %sp + 0x80f ] %asi, %l2 0x0000000001001810: ldxa [ %sp + 0x817 ] %asi, %l3 0x0000000001001814: ldxa [ %sp + 0x81f ] %asi, %l4 0x0000000001001818: ldxa [ %sp + 0x827 ] %asi, %l5 0x000000000100181c: ldxa [ %sp + 0x82f ] %asi, %l6 0x0000000001001820: ldxa [ %sp + 0x837 ] %asi, %l7 0x0000000001001824: ldxa [ %sp + 0x83f ] %asi, %i0 0x0000000001001828: ldxa [ %sp + 0x847 ] %asi, %i1 0x000000000100182c: ldxa [ %sp + 0x84f ] %asi, %i2 0x0000000001001830: ldxa [ %sp + 0x857 ] %asi, %i3 0x0000000001001834: ldxa [ %sp + 0x85f ] %asi, %i4 0x0000000001001838: ldxa [ %sp + 0x867 ] %asi, %i5 0x000000000100183c: ldxa [ %sp + 0x86f ] %asi, %fp 0x0000000001001840: ldxa [ %sp + 0x877 ] %asi, %i7 0x0000000001001844: restored 0x0000000001001848: retry End of assembler dump.So this is presumably fill_0_normal?
Yes, this is a dump of fill_0_normal from the kernel trap table.
(gdb) info regi g0 0x0 0 g1 0x1f61ec8c2 8424179906 g2 0x1f60f8682 8423179906 g3 0xffe11df8 4292943352 g4 0x0 0 g5 0x0 0 g6 0x0 0 g7 0x0 0 o0 0x1c05e00 29384192 o1 0x7e50000 132448256 o2 0xd 13 o3 0xf 15 o4 0xffd0fe60 4291886688 o5 0x18d1800 26023936 sp 0x1c05551 0x1c05551 o7 0x135fbc0 20315072 l0 0xffffffffffe30c38 -1897416 l1 0xffe8ac38 4293438520 l2 0x17500f0 24445168 l3 0x1746c78 24407160 l4 0x1816400 25256960 l5 0x18c0800 25954304 l6 0x18c0800 25954304 l7 0x19cd570 27055472 i0 0xa 10 i1 0xffe8b0f0 4293439728 i2 0x20 32 i3 0xffd0fe60 4291886688 i4 0x17502f8 24445688 i5 0x0 0 fp 0xffe85219 0xffe85219 i7 0xffd0a988 4291864968 pc 0x1001800 0x1001800 npc 0x1001804 0x1001804 state 0x4482001503 294238819587 fsr 0x0 [ ] fprs 0x4 [ FEF ] y 0x0 0 cwp 0x3 3 pstate 0x15 [ AG PRIV PEF ] asi 0x82 130 ccr 0x44 68 (gdb)You should learn how to use ddb. It has lots of nifty MD commands to dump supervisor state registers, such as the trap stack.
Okay, that's definitely useful to know. Can it show any of the privileged registers as that would be quite helpful? Not that it matters too much as I can easy extract the information from internal QEMU variables as needed.
(cut)
If you've made it this far, then thank you for your time and I look forward to hearing from you further
It's been a while since I last looked at the SPARC V9 manual, but ISTR %wstate register controls which of the window fill/spill traps is taken for regular and "other" states. You need to dump the contents of %wstate.
I know that when the fill_0_normal trap is taken on the restore at the end of openfirmware(), the QEMU internal variables look like this:
(gdb) p/x env->cwp $7 = 0x3 (gdb) p/x env->canrestore $8 = 0x0 (gdb) p/x env->cansave $9 = 0x6 (gdb) p/x env->cleanwin $10 = 0x7 (gdb) p/x env->otherwin $11 = 0x0 (gdb) p/x env->wstate $12 = 0x0 (gdb)
There are 16 window trap vectors for each operation in both the normal and nucleus trap tables. I think both trap tables should be pretty much the same. The first 8 are for "normal" traps. These are called when %otherwin is 0. This occurs when all the windows are from the same address space, either kernel or userland. The second 8 are for "other" traps. When a process traps from userland to the kernel, the kernel sets %otherwin to the number of userland stackframes. Every time the kernel spills a frame, if %otherwin is not zero the CPU calls one of the "other" traps, and then decrements %otherwin. But you probably don't care about this right now. Of those are trap vectors: 0 is used for 32-bit userland stackframes 1 is used for 64-bit userland stackframes 2 will check the stack alignment and call one of the above routines. 4 is used for 32-bit kernel stackframes 5 is used for 64-bit kernel stackframes 6 will check the stack alignment and call one of the above routines. When running user mode we set %wstate to 022, which means it will call fill_2_normal and fill_2_other. When running in kernel mode we set %wstate to 066 which means it will call fill_6_normal and fill_6_other. The sun4u code in locore.s does this: /* sun4u */ set _C_LABEL(trapbase), %l1 call _C_LABEL(prom_set_trap_table_sun4u) ! Now we should be running 100% from our handlers mov %l1, %o0 7: wrpr %l1, 0, %tba ! Make sure the PROM didn't foul up. /* * Switch to the kernel mode and run away. */ wrpr %g0, WSTATE_KERN, %wstate So right after installing the trap table it sets the %wstate register to use trap vector 6 so it will use fill_6_normal.
Thanks for clarifying this - the information is really helpful and I now understand what is happening here.
I'm not entirely sure what's going on here since you didn't have symbols in the disassembly and there's no stack trace, but I assume the routine generating the fault is openfirmware() in the kernel.
Yes, that's correct. Unfortunately I don't have a NetBSD build environment so I've been doing most of the work by disassembling the kernel via QEMU's gdbstub and comparing against the source in a web browser. So based upon what you're saying it appears we have a stack like this (indented to show window saves/restores):
cpu_initialize() { prom_set_trap_table() { openfirmware() { /* OpenBIOS C code */ of_client_interface() { enter_forth() { set_trap_table() { SUNW,set-trap-table } } } } /* fill_0_normal trap occurs here */ } /* Switch to kernel mode */ wrpr %g0, WSTATE_KERN, %wstate }The assumption has to be that in order for this to work without errors then no window/fault traps can occur between calling SUNW,set-trap-table in OpenBIOS and getting back to cpu_initialize() to set the correct value for %wstate which is quite a few window levels. AIUI data faults can't happen because the ASI is set to 0x82 (no fault) which is why it is the fill_0_normal window fault which is triggering this.
I'm starting to wonder if setting %wstate to use trap vector 6 should happen *before* calling prom_set_trap_table()? At the point SUNW,set-trap-table is called then the kernel is effectively saying "I am taking responsibility for handling all traps from now on", and so if the kernel cannot handle traps after this point for any reason, then it is not honouring its contract to manage the trap table.
Regardless of this, now I understand this further I need to look into the OpenBIOS CIF interface in order to see if I can preserve the entire window state across CIF calls which I suspect might be what Sun's OBP does. Otherwise it would not be possible to run many versions of NetBSD (and OpenBSD which suffers from the same problem) under emulation :/
My guess is either QUEMU is ignoring the contents of the %wstate register, or OpenBIOS is changing the contents of the %wstate register and not restoring it before returning to the kernel.
FWIW I've double-checked the OpenBIOS source code and the only changes to %wstate other than system reset are to preserve the values during data and instruction faults.
Many thanks, Mark.