Subject: Re: PR/37236 CVS commit: src/usr.sbin/rpc.lockd
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,>
From: Matthias Scheler <tron@zhadum.org.uk>
List: netbsd-bugs
Date: 11/04/2007 19:55:02
The following reply was made to PR bin/37236; it has been noted by GNATS.
From: Matthias Scheler <tron@zhadum.org.uk>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: PR/37236 CVS commit: src/usr.sbin/rpc.lockd
Date: Sun, 4 Nov 2007 19:52:50 +0000
On Thu, Nov 01, 2007 at 03:00:14PM +0000, Matthias Scheler wrote:
> This time the loop in unlock() was executed although fl is NULL. The
> crash looks like a race between sigchild_handler() and one of the
> dispatch procedures. That should however not happen because of the
> calls to siglock() and sigunlock().
I've changed lalloc() to add redzones before and after each "struct file_lock"
and added various checks e.g. in lfree() and each LIST_FOREACH loop that
check the redzones. "rpc.lockd" crashed again with this stack trace:
#0 0x0804bc39 in get_alloc (fl=0x8b030210) at lockd_lock.c:470
470 assert(memcmp(fla->redzone_head, redzone_head_pattern,
(gdb) where
#0 0x0804bc39 in get_alloc (fl=0x8b030210) at lockd_lock.c:470
#1 0x0804cb82 in unlock (lck=0xbfbfe0e4, flags=2) at lockd_lock.c:413
#2 0x0804b518 in nlm4_unlock_msg_4_svc (arg=0xbfbfe0dc, rqstp=0xbfbfe198)
at lock_proc.c:1044
#3 0x0804999d in nlm_prog_4 (rqstp=0xbfbfe198, transp=0x8063080)
at nlm_prot_svc.c:469
#4 0xbbb3ef48 in svc_getreq_common () from /usr/lib/libc.so.12
#5 0xbbb3f04f in svc_getreqset () from /usr/lib/libc.so.12
#6 0xbbae368b in svc_run () from /usr/lib/libc.so.12
#7 0x0804a474 in main (argc=Cannot access memory at address 0x20
) at lockd.c:211
The reason is heap corruption:
(gdb) print lcklst_head
$2 = {lh_first = 0x8b030210}
(gdb) print *(struct file_lock *)0x8b030210
Cannot access memory at address 0x8b030210
(gdb) print hostlst_head
$3 = {lh_first = 0x76b5bb51}
(gdb) print *(struct host *)0x76b5bb51
Cannot access memory at address 0x76b5bb51
This rules out the theory about a race condition. These pointers are
completely invalid.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/