NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/50627: filemon can hang the process



>Number:         50627
>Category:       kern
>Synopsis:       filemon can hang a process
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 06 01:45:00 +0000 2016
>Originator:     Paul Goyette
>Release:        NetBSD 7.99.25
>Organization:
+------------------+--------------------------+------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+
>Environment:
	
	
System: NetBSD pokey.whooppee.com 7.99.25 NetBSD 7.99.25 (POKEY 2015-12-23 05:05:48) #9: Wed Dec 23 15:10:44 PHT 2015 paul%pokey.whooppee.com@localhost:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/POKEY amd64
Architecture: x86_64
Machine: amd64
>Description:
If the file descriptor on which a process opens /dev/filemon is numerically
larger than the descriptor to which the activity entries are being logged
(as set with ioctl(filemon_fd, FILEMON_SET_FD) call), the monitoring process
will hang when it tries to exit.  Process exit code attempts to close all
open file descriptors, in sequence, but closing the fd to which the activity
entries are being written hangs waiting for its reference count to reduce to
zero.  An example sequence of events is:

        1. Process opens /dev/filemon and gets fd #3
        2. Process tells filemon to log activity to fd #1 (stdout)
        3. Process calls sys_exit(), which starts process cleanup
        4. Clean-up code tries to fd_close all open descriptors, in
           order, so handles fd #0 and then fd #1
        5. fd #1 has another reference, so we wait on the condvar,
           which never gets broadcast since there's no other thread
           to run.  We hang here forever.

When the activity log file's descriptor is numerically greater than the
fd on which /dev/filemon is open, the filemon descriptor gets closed
first, which removes the additional reference to the log file:

        1. Process opens /dev/filemon and gets fd #3
        2. Process opens up a temp file (or simply calls dup(stdout))
           and gets fd #4;  the process tells filemon to log activity
           to fd #4
        3. Process calls sys_exit(), which starts process cleanup
        4. Clean-up code tries to fd_close all open descriptors, in
           order, so handles fd #0 and then fd #1
        5. In this scenario, fd#1 has no extra references, so it can
           close normally.
        6. Cleanup proceeds with fd #2, and then gets to fd#3, where
           /dev/filemon is open
        7. We call filemon_close() which calls fd_putfile() on fd #4.
           This removes the additional reference on fd #4
        8. Cleanup moves on to fd #4 which now has only a single
           reference, so it, too, can be successfully closed!

This bug will be added to the filemon(4) man page.
	
>How-To-Repeat:
	
See above.
>Fix:
Fix not yet known.  However, you can work-around the problem by using an
atexit() handler to close the filemon device's fd first (or set the log's
fd to an invalid value, which disassociates from the log without setting
up a new one).
	

>Unformatted:
 	
 	


Home | Main Index | Thread Index | Old Index