Subject: Re: kernel: alignment fault trap on sparc
To: None <tech-kern@netbsd.org>
From: Eduardo Horvath <eeh@NetBSD.org>
List: tech-kern
Date: 06/05/2004 19:16:25
On Sat, Jun 05, 2004 at 03:28:35PM +0200, Juergen Hannken-Illjes wrote:
> On Sat, Jun 05, 2004 at 03:08:06PM +0200, Manuel Bouyer wrote:
> > Hi,
> > I initially posted this to port-sparc, but I wonder if it may be a MI problem
> > that could affect other ports with alignement contraint.
> > 
> > I get this under moderate load on a sparc IPX 2.0_BETA (moderate = one or 2
> > kernel compiles, + some perl processes from mrtg or spamassassin) which was
> > solid when it was running 1.6.x.
> > 
> > login: trap type 0x7: pc=0xf01c4090 npc=0xf01c4094 psr=ffffffff908010c3<EF,S,PS>
> > kernel: alignment fault trap
> > Stopped in pid 24633.1 (perl) at        netbsd:uvmfault_anonget+0x4:    sethi           %
> > hi(0xf02e3000), %l6
> > db> tr
> > uvmfault_anonget(0xf335ee50, 0xf33a0d78, 0xf2fe1790, 0x5, 0x9, 0xf347c330) at ne
> > tbsd:uvm_fault+0x464
> > uvm_fault(0xf335ede4, 0x5, 0x7, 0xf335ee70, 0x1, 0xf335ee10) at netbsd:mem_acces
> > s_fault+0x178
> > mem_access_fault(0x9, 0x80, 0x10259364, 0x101cfeb0, 0x2400042, 0xf335efb0) at 0x
> > f00062f4
> > 
> > Any idea ?
> > 
> > Here is what GDB says about the tr:
> > (gdb) l *(uvm_fault+0x464)
> > 0xf01c48dc is in uvm_fault (/local/pop1/bouyer/netbsd-2-0/src/sys/uvm/uvm_fault.c:1052).
> > 1047             * also, if it is OK, then the anon's page is on the queues.
> > 1048             * if the page is on loan from a uvm_object, then anonget will
> > 1049             * lock that object for us if it does not fail1050             */
> > 1051    
> > 1052            error = uvmfault_anonget(&ufi, amap, anon);
> > 1053            switch (error) {
> > 1054            case 0:
> > 1055                    break;
> > 1056    
> > (gdb) l *(mem_access_fault+0x178)
> > 0xf0201f6c is in mem_access_fault (/local/pop1/bouyer/netbsd-2-0/src/sys/arch/sp
> > arc/sparc/trap.c:1010).
> > 1005            }
> > 1006            if (rv > 0)
> > 1007                    goto out;
> > 1008    
> > 1009            /* alas! must call the horrible vm code */
> > 1010            rv = uvm_fault(&vm->vm_map, (vaddr_t)va, 0, atype);
> > 1011    
> > 1012            /*
> > 1013             * If this was a stack access we keep track of the maximum
> > 1014             * accessed stack size.  Also, if vm_fault gets a protection
> 
> The instruction "sethi %hi(0xf02e3000), %l6" should load the high 22 bits
> of 0xf02e3000 into register l6 and is the first part of "uvmexp.fltanget++;".
> No idea how this instruction could cause an alignment trap.

sethi %hi(0xf02e3000), %l6 cannot generate an alignment fault.  Only loads or 
stores or possibly some sort of branches can generate alignment faults.

You can try disassembling a few instructions surrounding the one that allegedly
caused the trap and see if one of them was the real culprit.  

Otherwise, it could be that the instruction in the instruction cache does not
match the contents in memory, or your CPU is getting old and flakey.  I've seen
this happen a lot with old machines.

Eduardo