Subject: kern/3860: kernel file locking functions not robust vs. debugger
To: None <gnats-bugs@gnats.netbsd.org>
From: John Kohl <jtk@kolvir.arlington-heights.ma.us>
List: netbsd-bugs
Date: 07/13/1997 23:26:44
>Number:         3860
>Category:       kern
>Synopsis:       kernel file locking functions not robust vs. debugger
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jul 13 20:35:01 1997
>Last-Modified:
>Originator:     John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release:        NetBSD-current, 1997/07/13
>Environment:
	
System: NetBSD pattern.arlington-heights.ma.us 1.2G NetBSD 1.2G (PATTERN) #24: Sun Jul 13 23:05:45 EDT 1997 jtk@pattern.arlington-heights.ma.us:/u4/sandbox/src/sys/arch/i386/compile/PATTERN i386


>Description:
If you debug a program which uses flock() or lockf(), and you interrupt
it while it's blocking for a lock, when it resumes it restarts and
wedges the top half of the kernel, providing a handy denial-of-service
attack for anybody with debugger access.

The call to tsleep() inside vfs_lockf.c:lf_setlock() will wake up with
no error but the condition has not been signaled.  This is implicitly
allowed to happen by existing practice implementations of
sleep()/tsleep(), yet the code assumes that it will not happen.  The
code then loops around and attempts to add itself again to the list of
blocking locks, creating a circularity that hangs the top half of the
kernel when lf_addblock() tries to find the end of the blocking list.

>How-To-Repeat:
run a program which will block on a lock (see a test below) under a
debugger.  Hit ^C, then continue the program.  Watch your machine wedge :(
Here's the output on such a run with LOCKF_DEBUG set and lockf_debug=3:
(I added some KASSERTs and extra prints to track down this bug)

lf_setlock: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
lf_setlock: got the lock: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
lf_setlock: Lock list:
        lock 0xf880bd80 for id 0x0xf880be40, shared, start 0, end ffffffffffffffff
lf_setlock: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
        checking: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
overlap == lock
lf_clearlock: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_clearlock: Lock list:
        lock 0xf880bd40 for id 0x0xf880bdc0, unlock, start 0, end ffffffffffffffff
addblock: adding: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
to blocked list of: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
lf_setlock: blocking on: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff block 0xf880bd40
lf_setlock: Lock list:
        lock 0xf880bd80 for id 0x0xf880be40, shared, start 0, end ffffffffffffffff block 0xf880bd40
        lock 0xf880bd40 for id 0x0xf880bdc0, exclusive, start 0, end ffffffffffffffff
lf_setlock: wakeup, no error: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
        checking: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff block 0xf880bd40
overlap == lock
lf_clearlock: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_clearlock: Lock list:
        lock 0xf880bd40 for id 0x0xf880bdc0, unlock, start 0, end ffffffffffffffff
addblock: adding: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
to blocked list of: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff block 0xf880bd40
panic: kernel diagnostic assertion "lf != blocked" failed: file "../../../../kern/vfs_lockf.c", line 675

#include <fcntl.h>
#include <stdlib.h>
#include <sys/file.h>

int
main(int argc, char *argv[])
{
    int fd1, fd2;
    if (argc < 2)
	errx(1, "arg count");
    fd1 = open(argv[1], O_RDWR);
    if (fd1 == -1)
	err(1, "%s", argv[1]);

    fd2 = open(argv[1], O_RDWR);
    if (fd2 == -1)
	err(1, "%s", argv[1]);
    
    if (flock(fd1, LOCK_SH) == -1)
	err(1, "flock 1");
    if (flock(fd2, LOCK_EX) == -1)
	err(1, "flock 2");
    close(fd1);
    close(fd2);
    return 0;
}

>Fix:
	Not sure exactly how to code this, but the sleep loop should be
smarter.  Maybe 4.4BSD-Lite2 has fixed this bug?

>Audit-Trail:
>Unformatted: