Subject: Re: kernel: alignment fault trap on sparc
To: Eduardo Horvath <eeh@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 06/07/2004 22:38:18
On Mon, Jun 07, 2004 at 06:26:38PM +0000, Eduardo Horvath wrote:
> >
> > How can I do that ??I didn't find anything in ddb to do disassembly, but I
> > probably missed something.
>
> x/i
>
> (Or man ddb.)
Ok, I knew I missed something.
>
> > > Otherwise, it could be that the instruction in the instruction cache does not
> > > match the contents in memory,
> >
> > Software bug ? we have had cache issues on sun4c in the past ...
>
> Could be cache coherency issues.
Yes. But the fault address don't seem to be on a cache line boundary.
However, it's at a function call.
BTW, when I get
trap type 0x7: pc=0xf01c4090 npc=0xf01c4094
pc is the address of the instruction which caused the trap, right ?
This is the second instruction of uvmfault_anonget:
db> x/i 0xf01c408c
netbsd:uvmfault_anonget: save %sp, -0x70, %sp
db>
netbsd:uvmfault_anonget+0x4: sethi %hi(0xf02e3000), %l6
db>
netbsd:uvmfault_anonget+0x8: or %l6, 0x2c, %g1
db>
netbsd:uvmfault_anonget+0xc: ld [%g1 + 0x10c], %g2
db>
netbsd:uvmfault_anonget+0x10: or %g0, %i0, %l2
db>
netbsd:uvmfault_anonget+0x14: add %g2, 0x1, %g2
db>
netbsd:uvmfault_anonget+0x18: st %g2, [%g1 + 0x10c]
The cache boundary may not be relevant: we jump from uvm_fault(), so
we could have a cache issue anyway.
Or the pc is off by one instruction when the trap occurs,
and it's the save which cause the trap.
I hope there's a way to look at the registers content when in ddb.
What does save %sp, -0x70, %sp do ?
The call to uvmfault_anonget() is:
uvmfault_anonget(&ufi, amap, anon)
>
> > > or your CPU is getting old and flakey. I've seen
> > > this happen a lot with old machines.
> >
> > I didn't have much problems with sparc yet. And this box started doing this
> > right after the upgrade, it was solid under 1.6.2.
>
> Could be a coincidence that the hardware broke at around the same time you
> updated the software. I've seen that happen on occasion.
>
> In any case, you need a proper crashdump analysis.
Unfortunably I'm not familiar with assembly, and I couldn't get a dump to disk
yet.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--