Re: [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64

To: Mark Cave-Ayland <mark.cave-ayland%ilande.co.uk@localhost>
Subject: Re: [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64
From: Eduardo Horvath <eeh%NetBSD.org@localhost>
Date: Mon, 19 May 2014 16:14:41 +0000 (UTC)
On Mon, 19 May 2014, Mark Cave-Ayland wrote:

> Hi all,
> 
> I'm one of the QEMU SPARC/OpenBIOS maintainers and I've been spending my time
> over the past few weeks (and possibly longer!) working on patches so that
> NetBSD kernels will boot under QEMU SPARC64.
> 
> I've made some good progress recently, however I'm a still experiencing a user
> trap during boot which I don't understand. I've had some previous
> correspondence with Martin on this, but it requires a deep-level understanding
> as to how the SPARC64 memory management code works so I was hoping that you'd
> be able to provide some help with this.
> 
> So far I have a set of patches for OpenBIOS which get my 6.1.2 ISO image to
> boot to this point:
> 
> 
> build@kentang:~/rel-qemu-git/bin$ ./qemu-system-sparc64 -cdrom
> /home/build/src/qemu/image/sparc64/NetBSD-6.1.2-sparc64.iso -bios
> /home/build/src/openbios/openbios-git/openbios-devel/obj-sparc64/openbios-builtin.elf.nostrip
> -boot d -nographic
> OpenBIOS for Sparc64
> Configuration device id QEMU version 1 machine id 0
> kernel cmdline
> CPUs: 1 x SUNW,UltraSPARC-IIi
> UUID: 00000000-0000-0000-0000-000000000000
> Welcome to OpenBIOS v1.1 built on May 12 2014 21:33
>   Type 'help' for detailed information
> Trying cdrom:f...
> Not a bootable ELF image
> Not a bootable a.out image
> 
> Loading FCode image...
> Loaded 7478 bytes
> entry point is 0x4000
> NetBSD IEEE 1275 Multi-FS Bootblock
> Version $NetBSD: bootblk.fth,v 1.13 2010/06/24 00:54:12 eeh Exp $
> ..
> Jumping to entry point 0000000000100000 for type 0000000000000001...
> switching to new context: entry point 0x100000 stack 0x00000000ffe8aa09
> >> NetBSD/sparc64 OpenFirmware Boot, Revision 1.16
> =0x8870a0
> Loading netbsd: 8071888+553056+339856 [601032+393301]=0x9cd528
> Unimplemented service set-symbol-lookup ([2] -- [0])
> 
> Unexpected client interface exception: -1
> 1 tt=30 tstate=4482000605 tpc=0x14984f4 tnpc=0x14984f8
> 2 tt=30 tstate=4411001503 tpc=0x1001804 tnpc=0x1001808
> 3 tt=c0 tstate=4482001604 tpc=0x10094f4 tnpc=0x135fbc8
> Stopped in pid 0.1 (system) at  1008528:        nop
> db{0}>
> 
> 
> The problem is that I'm getting a data_access_exception on the first window
> fill trap executed after the kernel takes over the trap table with
> SUNW,set-trap-table.
> 
> Here is the gdb session showing the openfirmware() function after the NetBSD
> kernel has called SUNW,set-trap-table:
> 
> 
> (gdb) disas 0x1009478, 0x10094f8
> Dump of assembler code from 0x1009478 to 0x10094f8:
>    0x0000000001009478:  sethi  %hi(0x1800000), %o4
>    0x000000000100947c:  btst  1, %sp
>    0x0000000001009480:  be  %icc, 0x10094f8
>    0x0000000001009484:  ldx  [ %o4 ], %o4
>    0x0000000001009488:  save  %sp, -176, %sp
>    0x000000000100948c:  rdpr  %pil, %i2
>    0x0000000001009490:  mov  0xf, %i3
>    0x0000000001009494:  cmp  %i3, %i2
>    0x0000000001009498:  movle  %icc, %i2, %i3
>    0x000000000100949c:  wrpr  %g0, %i3, %pil
>    0x00000000010094a0:  mov  %i0, %o0
>    0x00000000010094a4:  mov  %g1, %l1
>    0x00000000010094a8:  mov  %g2, %l2
>    0x00000000010094ac:  mov  %g3, %l3
>    0x00000000010094b0:  mov  %g4, %l4
>    0x00000000010094b4:  mov  %g5, %l5
>    0x00000000010094b8:  mov  %g6, %l6
>    0x00000000010094bc:  mov  %g7, %l7
>    0x00000000010094c0:  rdpr  %pstate, %l0
>    0x00000000010094c4:  call  %i4
>    0x00000000010094c8:  wrpr  6, %pstate
> => 0x00000000010094cc:  wrpr  %l0, %pstate
>    0x00000000010094d0:  mov  %l1, %g1
>    0x00000000010094d4:  mov  %l2, %g2
>    0x00000000010094d8:  mov  %l3, %g3
>    0x00000000010094dc:  mov  %l4, %g4
>    0x00000000010094e0:  mov  %l5, %g5
>    0x00000000010094e4:  mov  %l6, %g6
>    0x00000000010094e8:  mov  %l7, %g7
>    0x00000000010094ec:  wrpr  %i2, 0, %pil
>    0x00000000010094f0:  ret
>    0x00000000010094f4:  restore  %o0, %g0, %o0
> End of assembler dump.


I'm not sure what we're looking at here.  Is this kernel code or OpenBIOS 
code?  I assume the machine is OK at this point?

> (gdb) info regi
> g0             0x0      0
> g1             0x1      1
> g2             0x7e50000        132448256
> g3             0x18d1c00        26024960
> g4             0x1ae8000        28213248
> g5             0x1000   4096
> g6             0x0      0
> g7             0x0      0
> o0             0x0      0
> o1             0x1      1
> o2             0xfffffffffffffff8       -8
> o3             0xffffffff00000000       -4294967296
> o4             0x1c14230        29442608
> o5             0x1000000        16777216
> sp             0x1c054a1        0x1c054a1
> o7             0x10094c4        16815300
> l0             0x16     22
> l1             0x1      1
> l2             0x7e50000        132448256
> l3             0x18d1c00        26024960
> l4             0x1ae8000        28213248
> l5             0x1000   4096
> l6             0x0      0
> l7             0x0      0
> i0             0x1c05e00        29384192
> i1             0x7e50000        132448256
> i2             0xd      13
> i3             0xf      15
> i4             0xffd0fe60       4291886688
> i5             0x18d1800        26023936
> fp             0x1c05551        0x1c05551
> i7             0x135fbc0        20315072
> pc             0x10094cc        0x10094cc
> npc            0x10094d0        0x10094d0
> state          0x4482000604     294238815748
> fsr            0x0      [ ]
> fprs           0x4      [ FEF ]
> y              0x0      0
> cwp            0x4      4
> pstate         0x6      [ IE PRIV ]
> asi            0x82     130
> ccr            0x44     68
> (gdb)
> 
> 
> The MMU TLB entries look like this:
> 
> 
> QEMU 2.0.50 monitor - type 'help' for more information
> (qemu) info tlb
> MMU contexts: Primary: 0, Secondary: 0
> DMMU dump
> [00] VA: ffe00000, PA: 7f00000, 512k, priv, RW, locked, ctx 0 local
> [01] VA: ffe80000, PA: 7f80000, 512k, priv, RW, locked, ctx 0 local
> [02] VA: ffd00000, PA: 1fff0000000, 512k, priv, RO, locked, ctx 0 local
> [03] VA: ffd80000, PA: 1fff0080000, 512k, priv, RO, locked, ctx 0 local
> [04] VA: ffc80000, PA: 7e80000, 512k, priv, RW, locked, ctx 0 local
> [05] VA: 4000, PA: 4000,   8k, priv, RW, unlocked, ctx 0 local
> [06] VA: 6000, PA: 6000,   8k, priv, RW, unlocked, ctx 0 local
> [07] VA: 8000, PA: 8000,   8k, priv, RW, unlocked, ctx 0 local
> [08] VA: a000, PA: a000,   8k, priv, RW, unlocked, ctx 0 local
> [09] VA: c000, PA: c000,   8k, priv, RW, unlocked, ctx 0 local
> [10] VA: e000, PA: e000,   8k, priv, RW, unlocked, ctx 0 local
> [11] VA: 10000, PA: 10000,   8k, priv, RW, unlocked, ctx 0 local
> [12] VA: 12000, PA: 12000,   8k, priv, RW, unlocked, ctx 0 local
> [13] VA: 14000, PA: 14000,   8k, priv, RW, unlocked, ctx 0 local
> [14] VA: 16000, PA: 16000,   8k, priv, RW, unlocked, ctx 0 local
> [15] VA: 18000, PA: 18000,   8k, priv, RW, unlocked, ctx 0 local
> [16] VA: 1a000, PA: 1a000,   8k, priv, RW, unlocked, ctx 0 local
> [17] VA: 100000, PA: 100000,   8k, priv, RW, unlocked, ctx 0 local
> [18] VA: 102000, PA: 102000,   8k, priv, RW, unlocked, ctx 0 local
> [19] VA: 104000, PA: 104000,   8k, priv, RW, unlocked, ctx 0 local
> [20] VA: 106000, PA: 106000,   8k, priv, RW, unlocked, ctx 0 local
> [21] VA: 108000, PA: 108000,   8k, priv, RW, unlocked, ctx 0 local
> [22] VA: 10a000, PA: 10a000,   8k, priv, RW, unlocked, ctx 0 local
> [23] VA: 10c000, PA: 10c000,   8k, priv, RW, unlocked, ctx 0 local
> [24] VA: 10e000, PA: 10e000,   8k, priv, RW, unlocked, ctx 0 local
> [25] VA: 110000, PA: 110000,   8k, priv, RW, unlocked, ctx 0 local
> [26] VA: 112000, PA: 112000,   8k, priv, RW, unlocked, ctx 0 local
> [27] VA: 114000, PA: 114000,   8k, priv, RW, unlocked, ctx 0 local
> [28] VA: ffc7e000, PA: 7e7e000,   8k, priv, RW, unlocked, ctx 0 local
> [29] VA: ffc7a000, PA: 7e7a000,   8k, priv, RW, unlocked, ctx 0 local
> [30] VA: ffc7c000, PA: 7e7c000,   8k, priv, RW, unlocked, ctx 0 local
> [31] VA: ffc78000, PA: 7e78000,   8k, priv, RW, unlocked, ctx 0 local
> [32] VA: ffc76000, PA: 7e76000,   8k, priv, RW, unlocked, ctx 0 local
> [33] VA: ffc72000, PA: 7e72000,   8k, priv, RW, unlocked, ctx 0 local
> [34] VA: ffc70000, PA: 7e70000,   8k, priv, RW, unlocked, ctx 0 local
> [35] VA: ffc6e000, PA: 7e6e000,   8k, priv, RW, unlocked, ctx 0 local
> [36] VA: ffc64000, PA: 7e64000,   8k, priv, RW, unlocked, ctx 0 local
> [37] VA: ffc66000, PA: 7e66000,   8k, priv, RW, unlocked, ctx 0 local
> [38] VA: ffc68000, PA: 7e68000,   8k, priv, RW, unlocked, ctx 0 local
> [39] VA: ffc6a000, PA: 7e6a000,   8k, priv, RW, unlocked, ctx 0 local
> [40] VA: ffc6c000, PA: 7e6c000,   8k, priv, RW, unlocked, ctx 0 local
> [41] VA: ffc62000, PA: 7e62000,   8k, priv, RW, unlocked, ctx 0 local
> [42] VA: 1000000, PA: 7800000,   4M, priv, RO, locked, ctx 0 local
> [43] VA: 1400000, PA: 7400000,   4M, priv, RO, locked, ctx 0 local
> [44] VA: 1800000, PA: 7000000,   4M, priv, RW, locked, ctx 0 local
> [45] VA: ffc60000, PA: 7e60000,   8k, priv, RW, unlocked, ctx 0 local
> [46] VA: 7ffc000, PA: 7e5c000,   8k, priv, RW, unlocked, ctx 0 local
> [47] VA: 7ffe000, PA: 7e5e000,   8k, priv, RW, unlocked, ctx 0 local
> [48] VA: 7ffa000, PA: 7e5a000,   8k, priv, RW, unlocked, ctx 0 local
> [49] VA: 1c0c000, PA: 7e40000,   8k, priv, RW, unlocked, ctx 0 local
> [50] VA: 1c0e000, PA: 7e42000,   8k, priv, RW, unlocked, ctx 0 local
> [51] VA: 1c10000, PA: 7e44000,   8k, priv, RW, unlocked, ctx 0 local
> [52] VA: 1c12000, PA: 7e46000,   8k, priv, RW, unlocked, ctx 0 local
> [53] VA: 1c14000, PA: 7e48000,   8k, priv, RW, unlocked, ctx 0 local
> [54] VA: 1c16000, PA: 7e4a000,   8k, priv, RW, unlocked, ctx 0 local
> [55] VA: 1c18000, PA: 7e4c000,   8k, priv, RW, unlocked, ctx 0 local
> [56] VA: 1c1a000, PA: 7e4e000,   8k, priv, RW, unlocked, ctx 0 local
> [57] VA: e0010000, PA: 7e40000,  64k, priv, RW, locked, ctx 0 local
> [58] VA: 1c04000, PA: 14000,   8k, priv, RW, unlocked, ctx 0 local
> IMMU dump
> [00] VA: ffd00000, PA: 1fff0000000, 512k, priv, locked, ctx 0 local
> [01] VA: ffc80000, PA: 7e80000, 512k, priv, locked, ctx 0 local
> [02] VA: 100000, PA: 100000,   8k, priv, unlocked, ctx 0 local
> [03] VA: 102000, PA: 102000,   8k, priv, unlocked, ctx 0 local
> [04] VA: 10a000, PA: 10a000,   8k, priv, unlocked, ctx 0 local
> [05] VA: 10c000, PA: 10c000,   8k, priv, unlocked, ctx 0 local
> [06] VA: 110000, PA: 110000,   8k, priv, unlocked, ctx 0 local
> [07] VA: 104000, PA: 104000,   8k, priv, unlocked, ctx 0 local
> [08] VA: 108000, PA: 108000,   8k, priv, unlocked, ctx 0 local
> [09] VA: 10e000, PA: 10e000,   8k, priv, unlocked, ctx 0 local
> [10] VA: 106000, PA: 106000,   8k, priv, unlocked, ctx 0 local
> [11] VA: 1000000, PA: 7800000,   4M, priv, locked, ctx 0 local
> [12] VA: 1400000, PA: 7400000,   4M, priv, locked, ctx 0 local
> (qemu)
> 
> 
> As soon as I hit the restore at 0x10094f4 in gdb, I get a fill_0_normal trap
> which vectors to 0x1001800:

So the fault happens on the last instruction of the previous routine?  And 
this is *after* the call to SUNW,set-trap-table?  This means you should be 
running with the kernel's trap table, right?

> 
> 
> (gdb) disas 0x1001800, 0x100184c
> Dump of assembler code from 0x1001800 to 0x100184c:
> => 0x0000000001001800:  wr  %g0, 0x11, %asi
>    0x0000000001001804:  ldxa  [ %sp + 0x7ff ] %asi, %l0
>    0x0000000001001808:  ldxa  [ %sp + 0x807 ] %asi, %l1
>    0x000000000100180c:  ldxa  [ %sp + 0x80f ] %asi, %l2
>    0x0000000001001810:  ldxa  [ %sp + 0x817 ] %asi, %l3
>    0x0000000001001814:  ldxa  [ %sp + 0x81f ] %asi, %l4
>    0x0000000001001818:  ldxa  [ %sp + 0x827 ] %asi, %l5
>    0x000000000100181c:  ldxa  [ %sp + 0x82f ] %asi, %l6
>    0x0000000001001820:  ldxa  [ %sp + 0x837 ] %asi, %l7
>    0x0000000001001824:  ldxa  [ %sp + 0x83f ] %asi, %i0
>    0x0000000001001828:  ldxa  [ %sp + 0x847 ] %asi, %i1
>    0x000000000100182c:  ldxa  [ %sp + 0x84f ] %asi, %i2
>    0x0000000001001830:  ldxa  [ %sp + 0x857 ] %asi, %i3
>    0x0000000001001834:  ldxa  [ %sp + 0x85f ] %asi, %i4
>    0x0000000001001838:  ldxa  [ %sp + 0x867 ] %asi, %i5
>    0x000000000100183c:  ldxa  [ %sp + 0x86f ] %asi, %fp
>    0x0000000001001840:  ldxa  [ %sp + 0x877 ] %asi, %i7
>    0x0000000001001844:  restored
>    0x0000000001001848:  retry
> End of assembler dump.

So this is presumably fill_0_normal?

> (gdb) info regi
> g0             0x0      0
> g1             0x1f61ec8c2      8424179906
> g2             0x1f60f8682      8423179906
> g3             0xffe11df8       4292943352
> g4             0x0      0
> g5             0x0      0
> g6             0x0      0
> g7             0x0      0
> o0             0x1c05e00        29384192
> o1             0x7e50000        132448256
> o2             0xd      13
> o3             0xf      15
> o4             0xffd0fe60       4291886688
> o5             0x18d1800        26023936
> sp             0x1c05551        0x1c05551
> o7             0x135fbc0        20315072
> l0             0xffffffffffe30c38       -1897416
> l1             0xffe8ac38       4293438520
> l2             0x17500f0        24445168
> l3             0x1746c78        24407160
> l4             0x1816400        25256960
> l5             0x18c0800        25954304
> l6             0x18c0800        25954304
> l7             0x19cd570        27055472
> i0             0xa      10
> i1             0xffe8b0f0       4293439728
> i2             0x20     32
> i3             0xffd0fe60       4291886688
> i4             0x17502f8        24445688
> i5             0x0      0
> fp             0xffe85219       0xffe85219
> i7             0xffd0a988       4291864968
> pc             0x1001800        0x1001800
> npc            0x1001804        0x1001804
> state          0x4482001503     294238819587
> fsr            0x0      [ ]
> fprs           0x4      [ FEF ]
> y              0x0      0
> cwp            0x3      3
> pstate         0x15     [ AG PRIV PEF ]
> asi            0x82     130
> ccr            0x44     68
> (gdb)


You should learn how to use ddb.  It has lots of nifty MD commands to dump 
supervisor state registers, such as the trap stack.  


> 
> 
> From here you can see that %sp is 0x1c05551, so the first access at %sp +
> 0x7ff bias = 0x1c05d50 which is mapped just before the call to
> SUNW,set-trap-table. But because the access is made using ASI 0x11 which is a
> user ASI then the fill_0_normal invokes a further data_access_exception trap,
> which takes roughly the following path:
> 
> 
> -> trap 0x30, data_access_exception (0x1004600)
>   -> winfault: 0x00000000010081cc
>      http://nxr.netbsd.org/xref/src/sys/arch/sparc64/sparc64/locore.s#1721
> 
>      #1737 we did previously take a datafault, so go to winfixfill
> 
>   -> winfixfill: 0x000000000100822c
>      http://nxr.netbsd.org/xref/src/sys/arch/sparc64/sparc64/locore.s#1756
> 
>      #1770: we are in PRIV mode, so carry on
> 
>      #1819: not at trap level 3, so invoke software trap 1 (0x101)
> 
>      Trap 0x101 invokes the panic/debugger
> 
> 
> This shows that the 0x101 is being invoked deliberately because a kernel
> mapping is being accessed by a user ASI while the processor is in PSTATE.PRIV
> == 1 mode.
> 
> AFAICT the basic logic looks correct, so I am wondering if anyone can comment
> as to what should happen on real hardware? My current thoughts are that the
> initial fill_0_normal trap is incorrect, and instead a supervisor fill trap
> should be used instead but I can't quite understand how this is supposed to
> happen.
> 
> If anyone has any ideas as to why this is happening and/or what the intended
> behaviour is then I would be very interested to try and understand the memory
> management algorithms. And of course, when it all works then you get the warm
> feeling of being able to add a SPARC64 machine to your buildfarm!
> 
> If you've made it this far, then thank you for your time and I look forward to
> hearing from you further

It's been a while since I last looked at the SPARC V9 manual, but ISTR 
%wstate register controls which of the window fill/spill traps is taken 
for regular and "other" states.  

You need to dump the contents of %wstate.  

There are 16 window trap vectors for each operation in both the normal and 
nucleus trap tables.  I think both trap tables should be pretty much the 
same.  

The first 8 are for "normal" traps.  These are called when %otherwin is 0.  
This occurs when all the windows are from the same address space, either 
kernel or userland.

The second 8 are for "other" traps.  When a process traps from userland to 
the kernel, the kernel sets %otherwin to the number of userland 
stackframes.  Every time the kernel spills a frame, if %otherwin is not 
zero the CPU calls one of the "other" traps, and then decrements 
%otherwin.  But you probably don't care about this right now.

Of those are trap vectors:

        0 is used for 32-bit userland stackframes
        1 is used for 64-bit userland stackframes
        2 will check the stack alignment and call one of the above 
routines.

        4 is used for 32-bit kernel stackframes
        5 is used for 64-bit kernel stackframes
        6 will check the stack alignment and call one of the above 
routines.

When running user mode we set %wstate to 022, which means it will call 
fill_2_normal and fill_2_other.  When running in kernel mode we set 
%wstate to 066 which means it will call fill_6_normal and fill_6_other.

The sun4u code in locore.s does this:

        /* sun4u */
        set     _C_LABEL(trapbase), %l1
        call    _C_LABEL(prom_set_trap_table_sun4u)     ! Now we should be 
running 100% from our handlers
         mov    %l1, %o0
7:
        wrpr    %l1, 0, %tba                    ! Make sure the PROM 
didn't foul up.

        /*
         * Switch to the kernel mode and run away.
         */
        wrpr    %g0, WSTATE_KERN, %wstate

So right after installing the trap table it sets the %wstate register to 
use trap vector 6 so it will use fill_6_normal.

I'm not entirely sure what's going on here since you didn't have symbols 
in the disassembly and there's no stack trace, but I assume the routine 
generating the fault is openfirmware() in the kernel.

My guess is either QUEMU is ignoring the contents of the %wstate register, 
or OpenBIOS is changing the contents of the %wstate register and not 
restoring it before returning to the kernel.

Edaurdo
Follow-Ups:
- Re: [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64
  - From: Mark Cave-Ayland
References:
- [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64
  - From: Mark Cave-Ayland
Prev by Date: Re: User-level window trap when booting NetBSD kernel under QEMU SPARC64
Next by Date: Re: [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64
Previous by Thread: [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64
Next by Thread: Re: [RESEND] User-level window trap when booting NetBSD kernel under QEMU SPARC64
Indexes:
Home | Main Index | Thread Index | Old Index