Subject: bin/37236: Mac OS X NFS client frequently crashes rpc.lockd(8) on NetBSD
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org>
From: None <tron@zhadum.org.uk>
List: netbsd-bugs
Date: 10/27/2007 17:00:01
>Number:         37236
>Category:       bin
>Synopsis:       Mac OS X NFS client frequently crashes rpc.lockd(8) on NetBSD
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 27 17:00:00 +0000 2007
>Originator:     Matthias Scheler
>Release:        NetBSD 4.0_RC3
>Organization:
Matthias Scheler                                  http://zhadum.org.uk/
>Environment:
System: NetBSD colwyn.zhadum.org.uk 4.0_RC3 NetBSD 4.0_RC3 (COLWYN) #0: Mon Oct 22 08:33:43 BST 2007 tron@colwyn.zhadum.org.uk:/src/sys/compile/COLWYN i386
Architecture: i386
Machine: i386
>Description:
My desktop is Mac OS X (PowerPC, 10.4.10) machine. My NetBSD server provides
accounts via LDAP and home directories via NFS to the Mac. After I moved the
"Library" directory of my personal account from the Mac's local harddisk
back to the NFS server (which now has more diskspace) the Mac started
complaind about NFS locking problems frequently. The symptoms are the
same John D. Baker descrived over a year ago:

http://mail-index.netbsd.org/port-sparc/2006/05/21/0001.html

Running "/etc/rc.d/nfslocking restart" on the NetBSD system fixes
the problem for a few hours. The problem is caused by "rpc.lockd"
crashing. I've managed to get a crash dump. Here is the stack trace:

#0  0xbbb825b8 in strcmp () from /usr/lib/libc.so.12
#1  0x0804c8ef in unlock ()
#2  0x0804b4d8 in nlm4_unlock_msg_4_svc ()
#3  0x0804995d in nlm_prog_4 ()
#4  0xbbb3ef48 in svc_getreq_common () from /usr/lib/libc.so.12
#5  0xbbb3f04f in svc_getreqset () from /usr/lib/libc.so.12
#6  0xbbae368b in svc_run () from /usr/lib/libc.so.12
#7  0x0804a434 in main ()

Here is the register dump:

eax            0x8051320        134550304
ecx            0x0      0
edx            0x6602024c       1711407692
ebx            0x66020248       1711407688
esp            0xbfbfe794       0xbfbfe794
ebp            0xbfbfe7e8       0xbfbfe7e8
esi            0x805131c        134550300
edi            0xbfbfe86c       -1077942164
eip            0xbbb825b8       0xbbb825b8 <strcmp+48>
eflags         0x10212  [ AF IF RF ]
cs             0x17     23
ss             0x1f     31
ds             0x1f     31
es             0x1f     31
fs             0x1f     31
gs             0x1f     31

If I understand the assembler code of strcmp() correctly %ebx should
point to valid memory address but apparently doesn't. Looking at the
code of unlock() ...

                if (strcmp(fl->client_name, lck->caller_name) ||
                    fhcmp(&filehandle, &fl->filehandle) != 0 ||

... I would guess that "fl->client_name" hasn't been initialized properly.
The lalloc() function could be causing this:

static struct file_lock *
lalloc(void)
{
        struct file_lock *fl;

        fl = malloc(sizeof(*fl));
        if (fl != NULL) {
                fl->addr = NULL;
                fl->client.oh.n_bytes = NULL;
                fl->client_cookie.n_bytes = NULL;
                fl->filehandle.fhdata = NULL;
        }
        return fl;
}

Why was this function written at all? "calloc(1, sizeof(file_lock))" would
IMHO do the job much better.

>How-To-Repeat:
Run Firefox 2.0.0.x under Mac OS X with an NFS mounted home directory.

>Fix:
Not known.