Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Repeatable crash

On Mon, 4 Feb 2008, Paul Goyette wrote:

On Mon, 4 Feb 2008, Andrew Doran wrote:

On Mon, Feb 04, 2008 at 03:40:02AM -0800, Paul Goyette wrote:

This is from a 4.99.49 kernel and userland built from sources dated
2008-01-24 21:14:50 UTC

Usually, I run build.sh on the machine that actually contains the
source, but this time I ran it on another host.  The entire /usr/src and
/usr/obj directories were NFS-mounted.  The crash happens about 30 or 40
minutes after starting build.sh and it happens at different places, so I
don't think it's data-specific;  rather I suspect some strange race
condition.  The back-traces don't seem terribly usefule (maybe gdb is
out-of-sync with the trap stack-frame again?).

The trap frame layout changed, I think dsl%netbsd.org@localhost is looking at 
Did you get anything out of ddb?

Transcribed by hand since I don't have a serial console:

kernel: protection fault trap, code=0
Stopped in pid 29318.1 (x86_64--netbsd-g) at netbsd:nfs_loadattrcache+0x13c: cmpl %ecx,0x10(%rbx)
nfs_loadattrcache() at netbsd:nfs_loadattrcache+0x13c
nfsm_loadattrcache() at netbsd:nfs_loadattrcache+0x70
nfs_lookup() at netbsd:nfs_lookup+0xdbd
VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x49
lookup() at netbsd:lookup+0x345
namei() at netbsd:namei+0x1a1
sys_access() at netbsd:sys_access+0x97
syscall() at netbsd:syscall+0xa9

Unable to enter any commands at the ddb prompt - it appears that the keyboard (USB) is dead or interrupts blocked.

Looking at the sources, nfs_loadattrcache+0x13c is here:

(gdb) list * nfs_loadattrcache + 0x13c
0xffffffff8019d44c is in nfs_loadattrcache (/usr/src/sys/nfs/nfs_subs.c:1687).
1682            vap = np->n_vattr;
1684            /*
1685             * Invalidate access cache if uid, gid, mode or ctime changed.
1686             */
1687            if (np->n_accstamp != -1 &&
1688                (gid != vap->va_gid || uid != vap->va_uid || vmode != 
1689                || timespeccmp(&ctime, &vap->va_ctime, !=)))
1690                    np->n_accstamp = -1;

Disassembling this area gives us (offset 0x13c --> +316)

0xffffffff8019d416 <nfs_loadattrcache+262>:     mov    %r9d,0xa0(%r13)
0xffffffff8019d41d <nfs_loadattrcache+269>:     mov    %rax,0xa8(%r13)
0xffffffff8019d424 <nfs_loadattrcache+276>:     mov    0xc(%r12),%eax
0xffffffff8019d429 <nfs_loadattrcache+281>:     mov    %eax,%esi
0xffffffff8019d42b <nfs_loadattrcache+283>:     bswap  %esi
0xffffffff8019d42d <nfs_loadattrcache+285>:     mov    0x10(%r12),%ecx
0xffffffff8019d432 <nfs_loadattrcache+290>:     bswap  %ecx
0xffffffff8019d434 <nfs_loadattrcache+292>:     cmpl   
0xffffffff8019d43c <nfs_loadattrcache+300>:     mov    0x80(%r13),%rbx
0xffffffff8019d443 <nfs_loadattrcache+307>:     movzwl 
0xffffffff8019d447 <nfs_loadattrcache+311>:     je     0xffffffff8019d461 
0xffffffff8019d449 <nfs_loadattrcache+313>:     cmp    %ecx,0x10(%rbx)
0xffffffff8019d44c <nfs_loadattrcache+316>:     mov    %eax,%edx

So I'm guessing that np (in %rbx ?) contained something invalid...

|   Paul Goyette   | PGP DSS Key fingerprint: |  E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 |  paul%whooppee.com@localhost   |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette%juniper.net@localhost |

Home | Main Index | Thread Index | Old Index