Subject: kern/14640: kernel hangs in syncing disk
To: None <gnats-bugs@gnats.netbsd.org>
From: None <mrauch@netbsdorg.fs.tum.de>
List: netbsd-bugs
Date: 11/19/2001 13:07:46
>Number:         14640
>Category:       kern
>Synopsis:       kernel hangs in syncing disks
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Nov 19 05:08:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     Michael Rauch
>Release:        1.5Y (2001/11/18)
>Organization:
>Environment:
NetBSD i386, syssrc cvs update'd at 2001/11/18 about 12:00 GMT,
custom kernel (mainly GENERIC with unneeded drivers commented out)

>Description:
        The kernel hangs in a loop it won't exit after heavy disk i/o. 
	Invoking ddb is still possible (and switching virtual consoles), 
	it hangs in function sched_sync (sys/miscfs/syncfs/sync_subr.c) 
	in the first while loop (starting line 185 in rev. 1.10), executing 
	the following functions over and over:

	,>sched_sync
	| `-> vn_lock
	      `-> VOP_LOCK
	          `-> genfs_lock
		      `-> lockmgr
		       <--'
		   <--'
	       <--'
	   <--'
	  `-> VOP_FSYNC
	      `-> genfs_fsync
	          `-> vflushbuf
		   <--'
	          `-> VOP_UPDATE
		      `-> ext2fs_update
		       <--'
		   <--'
	       <--'
	   <--'
	  `-> VOP_UNLOCK
	      `-> genfs_unlock
	          `-> lockmgr
	^	   <--'
	|      <--'
	`-----'

	mount: 
        /dev/wd0a on / type ffs (local)
        /dev/wd0e on /usr type ffs (local)
        /dev/wd0f on /windows type msdos (local)
        /dev/wd0g on /usr/src type ext2fs (local)
	mfs:118 on /tmp type mfs (asynchronous, local)

	The heavy disk i/o was on the /usr/src partition (ext2fs filesystem). 

	Trying to `sync` from within ddb I get 
            panic: lockmgr: locking against myself
	drop back into ddb and another `sync` reboots the machine. 
	Slight disk corruption can occur, although mostly fsck reports 
	no errors on the disk. 

	This problem was also found by others, see 
	http://mail-index.netbsd.org/current-users/2001/11/14/0006.html
	for the start of the thread.

>How-To-Repeat:
	Do operations which require a lot of disk i/o. See the system suddenly
	lock. 
>Fix:
        n/a
>Release-Note:
>Audit-Trail:
>Unformatted: