Subject: kern/20081: lfs_cleanerd deadlock
To: None <gnats-bugs@gnats.netbsd.org>
From: None <karkn443@student.liu.se>
List: netbsd-bugs
Date: 01/27/2003 08:57:39
>Number:         20081
>Category:       kern
>Synopsis:       lfs_cleanerd deadlock
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 27 08:58:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Karl Knutsson
>Release:        NetBSD 1.6M
>Organization:
>Environment:
NetBSD debug 1.6M NetBSD 1.6M (LOVE.MP) #0: Fri Jan 24 09:42:50 UTC 2003     wbagger@debug:/mass/netbsd_src/sys/arch/sparc/compile/LOVE.MP sparc
>Description:
Lfs_cleanerd can deadlock waiting on fs->lfs_iocount. This happens
occasionally on my system (SS10) during heavy cleaning activity.
I belive that the sollution is to protect lfs_iocount access
with splbio since it is modified at interrupt context in lfs's
callback routines.

>How-To-Repeat:
Concurrently unpack two or more version of pkgsrz.tar.gz into
different directories, delete one directory and watch the cleaner work
and evenutally sleep forever. This usually triggers the bug in one out
of three runs.

>Fix:
Index: lfs_segment.c
===================================================================
RCS file: /usr/src/sys/ufs/lfs/lfs_segment.c,v
retrieving revision 1.94
diff -u -r1.94 lfs_segment.c
--- lfs_segment.c	25 Jan 2003 23:00:09 -0000	1.94
+++ lfs_segment.c	27 Jan 2003 15:53:31 -0000
@@ -1876,7 +1876,9 @@
 #endif
 			tsleep(&fs->lfs_iocount, PRIBIO+1, "lfs_throttle", 0);
 		}
+		s = splbio();
 		++fs->lfs_iocount;
+		splx(s);
 
 		for (p = cbp->b_data; i && cbp->b_bcount < CHUNKSIZE; i--) {
 			bp = *bpp;
@@ -2050,8 +2052,8 @@
 	vop_strategy_a.a_bp = bp;
 	s = splbio();
 	++bp->b_vp->v_numoutput;
-	splx(s);
 	++fs->lfs_iocount;
+	splx(s);
 	(strategy)(&vop_strategy_a);
 }
 

Index: lfs_subr.c
===================================================================
RCS file: /usr/src/sys/ufs/lfs/lfs_subr.c,v
retrieving revision 1.29
diff -u -r1.29 lfs_subr.c
--- lfs_subr.c	24 Jan 2003 21:55:28 -0000	1.29
+++ lfs_subr.c	27 Jan 2003 15:52:54 -0000
@@ -131,6 +131,7 @@
 lfs_seglock(struct lfs *fs, unsigned long flags)
 {
 	struct segment *sp;
+	int s;
 	
 	if (fs->lfs_seglock) {
 		if (fs->lfs_lockpid == curproc->p_pid) {
@@ -163,7 +164,9 @@
 	 * so we artificially increment it by one until we've scheduled all of
 	 * the writes we intend to do.
 	 */
+	s = splbio();
 	++fs->lfs_iocount;
+	splx(s);
 }
 
 /*
@@ -243,6 +246,8 @@
 	}
 
 	if (fs->lfs_seglock == 1) {
+		int s;
+		
 		sync = sp->seg_flags & SEGM_SYNC;
 		ckp = sp->seg_flags & SEGM_CKP;
 		if (sp->bpp != sp->cbpp) {
@@ -275,7 +280,10 @@
 		 * At the moment, the user's process hangs around so we can
 		 * sleep.
 		 */
-		if (--fs->lfs_iocount < LFS_THROTTLE)
+		s = splbio();
+		--fs->lfs_iocount;
+		splx(s);
+		if (fs->lfs_iocount < LFS_THROTTLE)
 			wakeup(&fs->lfs_iocount);
 		if(fs->lfs_iocount == 0) {
 			lfs_countlocked(&locked_queue_count,
>Release-Note:
>Audit-Trail:
>Unformatted: