Subject: port-i386/3773: locking bug in wd.c
To: None <gnats-bugs@gnats.netbsd.org>
From: None <onno@simplex.nl>
List: netbsd-bugs
Date: 06/22/1997 21:58:32
>Number:         3773
>Category:       port-i386
>Synopsis:       locking bug in wd.c
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jun 22 13:05:01 1997
>Last-Modified:
>Originator:     Onno van der Linden
>Organization:
	
>Release:        June 22nd	<NetBSD-current source date>
>Environment:
	
System: NetBSD sheep 1.2G NetBSD 1.2G (SHEEP) #17: Sun Jun 22 20:18:15 MEST 1997 onno@sheep:/usr/src/sys/arch/i386/compile/SHEEP i386


>Description:
There's a bug regarding the locking mechanism in wd_get_parms() from
/sys/dev/isa/wd.c which makes it possible to crash the startup of NetBSD/i386.
You'll only notice this if you have TWO drives at ONE controller. Besides that
this bug is very timing dependent, so not everyone will have a crash at startup.
The wakeup() statement in wdcstart() can happen in the middle of a transfer
((wdc->sc_flags & WDCF_ACTIVE) == 0 && wd->sc_skip != 0) to the `other' drive
at the wd controller.

NOTE: this may effect the arm32 port and other ports that use the wd driver.
	
>How-To-Repeat:
On my system
1. Boot single user mode
2. Hit ENTER for sh
3. fsck -p

Results in the following output of ddb

kernel: integer divide overflow trap, code = 0
stopped at _readdisklabel+0xae divl 0x38(%ecx), %eax


Also possible is a normal startup with the wd driver saying

wd0: wdcintr read intr before drq status 0 error 0
	
>Fix:
The *real* fix is to implement what Charles wrote in the wd_get_parms()
comment. For the moment the fix below will do. It will only wakeup()
the wd_get_parms() request if the controller that the drive is attached
to has *nothing* (i.e. request queue is empty) to do. Even with this patch
there's still a race between wdcstart() and wd_get_parms() to be the first
to enter a command. If wdcstart() is able to get there first and issue
a multi sector read command, we're still in trouble ......

*** /sys/dev/isa/wd.c.old	Sun Jun 22 13:53:45 1997
--- /sys/dev/isa/wd.c	Sun Jun 22 20:17:19 1997
***************
*** 538,556 ****
  #endif
  
  	/*
  	 * XXX
  	 * This is a kluge.  See comments in wd_get_parms().
  	 */
! 	if ((wdc->sc_flags & WDCF_WANTED) != 0) {
! 		wdc->sc_flags &= ~WDCF_WANTED;
! 		wakeup(wdc);
  		return;
  	}
- 
- loop:
- 	/* Is there a drive for the controller to do a transfer with? */
- 	wd = wdc->sc_drives.tqh_first;
- 	if (wd == NULL)
- 		return;
      
  	/* Is there a transfer to this drive?  If not, deactivate drive. */
--- 538,556 ----
  #endif
  
+ 
+ loop:
+ 	/* Is there a drive for the controller to do a transfer with? */
+ 	wd = wdc->sc_drives.tqh_first;
+ 	if (wd == NULL) {
  	/*
  	 * XXX
  	 * This is a kluge.  See comments in wd_get_parms().
  	 */
! 		if ((wdc->sc_flags & WDCF_WANTED) != 0) {
! 			wdc->sc_flags &= ~WDCF_WANTED;
! 			wakeup(wdc);
! 		}
  		return;
  	}
      
  	/* Is there a transfer to this drive?  If not, deactivate drive. */
	
>Audit-Trail:
>Unformatted: