Subject: kern/10731: writing to ffs on vnd can hang all processes
To: None <gnats-bugs@gnats.netbsd.org>
From: Atsushi Onoe <onoe@sm.sony.co.jp>
List: netbsd-bugs
Date: 08/01/2000 03:52:17
>Number:         10731
>Category:       kern
>Synopsis:       writing to ffs on vnd can hang all processes
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 01 03:53:01 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Atsushi Onoe
>Release:        NetBSD-current 20000728
>Organization:
	Sony Corporation
>Environment:
System: NetBSD duplo.sm.sony.co.jp 1.5C NetBSD 1.5C (DUPLO) #21: Mon Jul 31 03:38:37 EDT 2000 onoe@duplo.sm.sony.co.jp:/work/netbsd/obj/DUPLO i386

>Description:
	writing many files to ffs on vnd eventually need to write block
	to the files where vnd resides on.  In that circumstance, getblk()
	in kern/vfs_bio.c is called.  The function set B_BUSY of some
	blocks within the file where vnd resides on, and then it calls
	allocbuf().

	It is possible to allocbuf() find the bp with DELWRI bit set, which
	is also a block of the ffs on vnd.  This needs to write the same file
	mounted via vnd.  Since it is already marked as BUSY, it enters a
	dead lock waiting for the BUSY bit cleared in getblk().

	In this state, pagedaemon() will be blocked, and all processes
	can hang eventually.

	Below is a trace back written by hand.
		getblk()
		ufs_bmaparray()
		ufs_bmap()
		vndstrategy()
		spec_strategy()
		ufs_strategy()
		bwrite()
		vn_bwrite()
		bawrite()
		getnewbuf()
		allocbuf()
		getblk()
		ufs_bmaparray()
		ufs_bmap()
		vndstrategy()
		spec_strategy()
		ufs_strategy()
		bwrite()
		vn_bwrite()
		bawrite()
		getnewbuf()
		getblk()
		ffs_balloc()
		ffs_write()
		vn_write()
		dofilewrite()
		sys_write()
		syscall()

>How-To-Repeat:
	Though it is not required whole of this, but what I did is:

	/work is mounted ffs with softupdate.

	# dd if=/dev/zero of=/work/fs bs=1024k count=1536	(yes, 1.5GB)
	# vnconfig vnd0 /work/fs
	# newfs /dev/rvnd0a
	# mount -o async /dev/vnd0a /mnt
				-o async is not required to reproduce the
				problem, but the time is shorter.
	# rsync -apv /cvsroot /mnt

>Fix:
	Not provided.

	# mount -o sync /dev/vnd0a /mnt
	may be effective to avoid the problem.
>Release-Note:
>Audit-Trail:
>Unformatted: