Subject: Re: netbsd-2 system crash: wdc?
To: None <netbsd-users@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: netbsd-users
Date: 05/20/2005 16:23:47
--fdj2RfSjLxBAspz7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Fri, May 20, 2005 at 01:25:37PM +0200, Jukka Salmi wrote:
> Hello,
> 
> today I found a NetBSD/i386 2.0_STABLE system dropped into ddb:
> 
> db> dmesg
> [...]
> ex0: uplistptr was 0
> ex0: uplistptr was 0
> viaide0:0:1: lost interrupt
>         type: ata tc_bcount: 0 tc_skip: 0
> db> trace
> ltsleep(c038adb8,10,c02c6de8,1,0) at netbsd:ltsleep+0x6b
> wdcwait(c0c2b8fc,48,40,3e8,9) at netbsd:wdcwait+0xa5
> __wdccommand_start(c0c2b8fc,c0d76000,0,c0d77904,c0d77904) at netbsd:__wdccommand_start+0x73
> wdcstart(c0c2b8fc,c0d76034,c038ae50,c0d1dce0,800) at netbsd:wdcstart+0xb1
> wdc_ata_bio_intr(c0c2b8fc,c0d76034,1,c0d1a000,c038ae7c) at netbsd:wdc_ata_bio_intr+0x18a
> wdcintr(c0c2b8fc,0,ad510010,30030,580e0010) at netbsd:wdcintr+0xda
> Xintr_legacy14() at netbsd:Xintr_legacy14+0xa8

OK, this is the problem, tsleep() called from interrupt context.
I can see a race condition where this could happen: in atabus_thread(),
there is a window (2 in the netbsd-2 sources) where interrupts are enabled
while the WDCF_TH_RUN flag is still set. The attached patch fixes this for
netbsd-2. I'm going to commit a similar change to current.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--

--fdj2RfSjLxBAspz7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff

Index: ata.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ata/ata.c,v
retrieving revision 1.27.2.1
diff -u -r1.27.2.1 ata.c
--- ata.c	18 Apr 2004 02:23:45 -0000	1.27.2.1
+++ ata.c	20 May 2005 14:15:12 -0000
@@ -129,8 +129,8 @@
 	/* Configure the devices on the bus. */
 	atabusconfig(sc);
 
+	s = splbio();
 	for (;;) {
-		s = splbio();
 		if ((chp->ch_flags & (WDCF_TH_RESET | WDCF_SHUTDOWN)) == 0 &&
 		    ((chp->ch_flags & WDCF_ACTIVE) == 0 ||
 		     chp->ch_queue->queue_freeze == 0)) {
@@ -138,10 +138,8 @@
 			(void) tsleep(&chp->ch_thread, PRIBIO, "atath", 0);
 			chp->ch_flags |= WDCF_TH_RUN;
 		}
-		splx(s);
 		if (chp->ch_flags & WDCF_SHUTDOWN)
 			break;
-		s = splbio();
 		if (chp->ch_flags & WDCF_TH_RESET) {
 			int drive;
 
@@ -162,8 +160,8 @@
 			(*xfer->c_start)(chp, xfer);
 		} else if (chp->ch_queue->queue_freeze > 1)
 			panic("ata_thread: queue_freeze");
-		splx(s);
 	}
+	splx(s);
 	chp->ch_thread = NULL;
 	wakeup((void *)&chp->ch_flags);
 	kthread_exit(0);

--fdj2RfSjLxBAspz7--