Subject: bin/37236: Mac OS X NFS client frequently crashes rpc.lockd(8) on NetBSD
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org>
From: None <tron@zhadum.org.uk>
List: netbsd-bugs
Date: 10/27/2007 17:00:01
>Number: 37236
>Category: bin
>Synopsis: Mac OS X NFS client frequently crashes rpc.lockd(8) on NetBSD
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Oct 27 17:00:00 +0000 2007
>Originator: Matthias Scheler
>Release: NetBSD 4.0_RC3
>Organization:
Matthias Scheler http://zhadum.org.uk/
>Environment:
System: NetBSD colwyn.zhadum.org.uk 4.0_RC3 NetBSD 4.0_RC3 (COLWYN) #0: Mon Oct 22 08:33:43 BST 2007 tron@colwyn.zhadum.org.uk:/src/sys/compile/COLWYN i386
Architecture: i386
Machine: i386
>Description:
My desktop is Mac OS X (PowerPC, 10.4.10) machine. My NetBSD server provides
accounts via LDAP and home directories via NFS to the Mac. After I moved the
"Library" directory of my personal account from the Mac's local harddisk
back to the NFS server (which now has more diskspace) the Mac started
complaind about NFS locking problems frequently. The symptoms are the
same John D. Baker descrived over a year ago:
http://mail-index.netbsd.org/port-sparc/2006/05/21/0001.html
Running "/etc/rc.d/nfslocking restart" on the NetBSD system fixes
the problem for a few hours. The problem is caused by "rpc.lockd"
crashing. I've managed to get a crash dump. Here is the stack trace:
#0 0xbbb825b8 in strcmp () from /usr/lib/libc.so.12
#1 0x0804c8ef in unlock ()
#2 0x0804b4d8 in nlm4_unlock_msg_4_svc ()
#3 0x0804995d in nlm_prog_4 ()
#4 0xbbb3ef48 in svc_getreq_common () from /usr/lib/libc.so.12
#5 0xbbb3f04f in svc_getreqset () from /usr/lib/libc.so.12
#6 0xbbae368b in svc_run () from /usr/lib/libc.so.12
#7 0x0804a434 in main ()
Here is the register dump:
eax 0x8051320 134550304
ecx 0x0 0
edx 0x6602024c 1711407692
ebx 0x66020248 1711407688
esp 0xbfbfe794 0xbfbfe794
ebp 0xbfbfe7e8 0xbfbfe7e8
esi 0x805131c 134550300
edi 0xbfbfe86c -1077942164
eip 0xbbb825b8 0xbbb825b8 <strcmp+48>
eflags 0x10212 [ AF IF RF ]
cs 0x17 23
ss 0x1f 31
ds 0x1f 31
es 0x1f 31
fs 0x1f 31
gs 0x1f 31
If I understand the assembler code of strcmp() correctly %ebx should
point to valid memory address but apparently doesn't. Looking at the
code of unlock() ...
if (strcmp(fl->client_name, lck->caller_name) ||
fhcmp(&filehandle, &fl->filehandle) != 0 ||
... I would guess that "fl->client_name" hasn't been initialized properly.
The lalloc() function could be causing this:
static struct file_lock *
lalloc(void)
{
struct file_lock *fl;
fl = malloc(sizeof(*fl));
if (fl != NULL) {
fl->addr = NULL;
fl->client.oh.n_bytes = NULL;
fl->client_cookie.n_bytes = NULL;
fl->filehandle.fhdata = NULL;
}
return fl;
}
Why was this function written at all? "calloc(1, sizeof(file_lock))" would
IMHO do the job much better.
>How-To-Repeat:
Run Firefox 2.0.0.x under Mac OS X with an NFS mounted home directory.
>Fix:
Not known.