Subject: Re: ddb help
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Eduardo Horvath <eeh@NetBSD.org>
List: port-sparc64
Date: 10/25/2004 18:46:53
On Sun, Oct 24, 2004 at 10:19:02PM +0200, Manuel Bouyer wrote:
> Hi,
> I can't understand how this can happen. Is it possible that ddb is printing
> the wrong address here, or is missing a function call in the stack frame ?
> This is a current GENERIC32 kernel, recompiled with -g

Stack traces are done by traversing the register windows saved to the stack
and printing out the linkage pointers.  It is possible that the register 
windows were never saved to the stack, they were overwritten, the stack
pointer is pointing to the wrong place, or there have been some tail calls
a$nd the bottom register window has been recycled.  In this instance is 
most likely the latter.

> 
> text_access_fault: pc=0 va=0
> kernel trap 64: +fast instruction access MMU miss
> Stopped in pid 4.1 (atabus0) at 0:      undefined
> db> tr
> wdc_ata_bio_start(1dd4150, 1e34000, 0, 0, 0, 1dd4180) at netbsd:wdc_ata_bio_star
> t+0x48c
> atabus_thread(0, 0, ffff, 5bcd, 0, 0) at netbsd:atabus_thread+0x128
> proc_trampoline(0, 0, 0, 0, 0, 0) at netbsd:proc_trampoline+0x4
> db> examine/i wdc_ata_bio_start+0x480
> netbsd:wdc_ata_bio_start+0x480: st              %g1, [%l3 + 0x34]
> db> 
> netbsd:wdc_ata_bio_start+0x484: stb             %g0, [%l0 + 0xa]
> db> 
> netbsd:wdc_ata_bio_start+0x488: or              %g0, %i0, %o0
> db> 
> netbsd:wdc_ata_bio_start+0x48c: call            netbsd:wdc_ata_bio_done
> db> 

Here's a call to netbsd:wdc_ata_bio_done.  It probably calls something
else just before returning, so that call never got its own stack frame.

> netbsd:wdc_ata_bio_start+0x490: or              %g0, %i1, %o1
> db> 
> netbsd:wdc_ata_bio_start+0x494: ldd             [%l2 + 0xc0], %o4
> db> 
> netbsd:wdc_ata_bio_start+0x498: ldsb            [%l2 + 0xc8], %g4
> db> 
> netbsd:wdc_ata_bio_start+0x49c: or              %g4, 0x9, %g1
> db> 
> netbsd:wdc_ata_bio_start+0x4a0: subcc           %g1, 0x1d, %g0
> db> show registers 
> tstate      0x44
> pc          0
> npc         0
> ipl         0x5
> y           0
> g0          0
> g1          0
> g2          0
> g3          0
> g4          0
> g5          0
> g6          0
> g7          0xffffffff
> o0          0
> o1          0
> o2          0
> o3          0
> o4          0
> o5          0
> o6          0
> o7          0
> l0          0
> l1          0
> l2          0
> l3          0
> l4          0
> l5          0
> l6          0
> l7          0
> i0          0
> i1          0
> i2          0
> i3          0
> i4          0
> i5          0
> i6          0
> i7          0

Hm.  The registers here seem invalid.  I can't
believe they can all be zero.

> f0          0x3fb96ced
> f2          0xffffffff
> f4          0xffffffff
> f6          0xffffffff
> f8          0x3fb0c299
> f10         0x3eef7510
> f12         0x41e00000
> f14         0x3ff00000
> f16         0
> f18         0x3eef7510
> f20         0x40000
> f22         0xffffffff
> f24         0xffffffff
> f26         0xffffffff
> f28         0xffffffff
> f30         0xffffffff
> f32         0xffffffff
> f34         0xffffffff
> f36         0xffffffff
> f38         0xffffffff
> f40         0xffffffff
> f42         0xffffffff
> f44         0xffffffff
> f46         0xffffffff
> f48         0xffffffff
> f50         0xffffffff
> f52         0xffffffff
> f54         0xffffffff
> f56         0xffffffff
> f58         0xffffffff
> f60         0xffffffff
> f62         0xffffffff
> fsr         0
> gsr         0
> 0:      undefined
> 
> 
> The matching lines in the sources would be:
> 0x1386f0c is in wdc_ata_bio_start (/local/pop1/bouyer/current/src/sys/dev/ata/ata_wdc.c:309).
> 304                     ata_bio->r_error = chp->ch_error;
> 305                     ata_bio->error = ERROR;
> 306             }
> 307     ctrldone:
> 308             drvp->state = 0;
> 309             wdc_ata_bio_done(chp, xfer);
> 310             bus_space_write_1(wdr->ctl_iot, wdr->ctl_ioh, wd_aux_ctlr, WDCTL_4BIT);
> 311             return;
> 312     }
> 
> Any idea how to debug this further ?

1) Enable traptrace.  It should give you a better idea of the calling sequence.

2) If you can find the end of the stack, dump out the bottom trapframe.  You
might get a better idea of the machine state from that.

Eduardo