Subject: Re: kernel: alignment fault trap on sparc
To: Eduardo Horvath <eeh@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 06/07/2004 22:38:18
On Mon, Jun 07, 2004 at 06:26:38PM +0000, Eduardo Horvath wrote:
> > 
> > How can I do that ??I didn't find anything in ddb to do disassembly, but I
> > probably missed something.
> 
> x/i
> 
> (Or man ddb.)

Ok, I knew I missed something.

> 
> > > Otherwise, it could be that the instruction in the instruction cache does not
> > > match the contents in memory,
> > 
> > Software bug ? we have had cache issues on sun4c in the past ...
> 
> Could be cache coherency issues.

Yes. But the fault address don't seem to be on a cache line boundary.
However, it's at a function call.


BTW, when I get
trap type 0x7: pc=0xf01c4090 npc=0xf01c4094
pc is the address of the instruction which caused the trap, right ?

This is the second instruction of uvmfault_anonget:
db> x/i 0xf01c408c
netbsd:uvmfault_anonget:        save            %sp, -0x70, %sp
db> 
netbsd:uvmfault_anonget+0x4:    sethi           %hi(0xf02e3000), %l6
db> 
netbsd:uvmfault_anonget+0x8:    or              %l6, 0x2c, %g1
db> 
netbsd:uvmfault_anonget+0xc:    ld              [%g1 + 0x10c], %g2
db> 
netbsd:uvmfault_anonget+0x10:   or              %g0, %i0, %l2
db> 
netbsd:uvmfault_anonget+0x14:   add             %g2, 0x1, %g2
db> 
netbsd:uvmfault_anonget+0x18:   st              %g2, [%g1 + 0x10c]

The cache boundary may not be relevant: we jump from uvm_fault(), so
we could have a cache issue anyway.

Or the pc is off by one instruction when the trap occurs,
and it's the save which cause the trap.
I hope there's a way to look at the registers content when in ddb.

What does save %sp, -0x70, %sp do ?

The call to uvmfault_anonget() is:
uvmfault_anonget(&ufi, amap, anon)

> 
> > > or your CPU is getting old and flakey.  I've seen
> > > this happen a lot with old machines.
> > 
> > I didn't have much problems with sparc yet. And this box started doing this
> > right after the upgrade, it was solid under 1.6.2.
> 
> Could be a coincidence that the hardware broke at around the same time you
> updated the software.  I've seen that happen on occasion.
> 
> In any case, you need a proper crashdump analysis.

Unfortunably I'm not familiar with assembly, and I couldn't get a dump to disk
yet.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--