Subject: kern/494: Long SCSI Select returns cause timeout and Kernel panic
To: None <gnats-admin@sun-lamp.cs.berkeley.edu>
From: Mark P. Gooderum <mark@nirvana.good.com>
List: netbsd-bugs
Date: 09/23/1994 04:50:04
>Number:         494
>Category:       kern
>Synopsis:       Long SCSI Select returns cause timeout and Kernel panic
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    gnats-admin (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Sep 23 04:50:03 1994
>Originator:     Mark P. Gooderum
>Organization:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark P. Gooderum			   USSnail:  Good Creations
Senior Consultant - Operating Systems Group	     3029 Blackstone Ave. So.
  "Working hard to be hardly working..."	     St. Louis Park, MN 55416
EMail:	     mark@Good.com		   Voice:    (612) 922-3953
Interactive: mark@nirvana.Good.com	   Fax:	     (612) 922-2676
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Release:        NetBSD 1.0 Beta
>Environment:
486DLC/40
		aic6360 (SoundBlaster SCSI-2)
System: NetBSD nirvana 1.0_BETA NetBSD 1.0_BETA (NIRVANA_GATEWAY.1024) #12: Mon Sep 19 21:12:17 CDT 1994 mark@nirvana:/export/usr/src/sys/arch/i386/compile/NIRVANA_GATEWAY.1024 i386
>Description:
A SCSI cmd that takes a long time to return the SELECT will cause a timeout.
After this timeout the device will continue to do its work, and when it
does return the SELECT, the kernel will panic with a data fault in 
supervisor mode (actually lots of them, 3-4...).
Note that in this case, the mt is part of a script of successive dumps,
so the mt fails and returns and the script moves onto a sequence of 
dump and more mt commands, so the bug may be triggered by the other activity
occuring at the same time.
>How-To-Repeat:
The easiest way to reproduce this is an "mt fsf 1" through a long file
on a tape drive.  On my Viper 150, a 100MB file takes about 10 min to 
get through, this is long enough to cause the timeout followed by the
panic when the device actually completes the command.
>Fix:
Workaround:
Use dd instead of fsf to skip files, slow but it works.
Fix:

Lengthen Timeout, maybe for tape commands only since nothing else
really takes that long to come back legitimately, easy fix, and needs
to be part of any fix so that long mt commands do succeed).
Fix (apparent) dangling pointer bug caused by timeout (or is it just
an unexpected SELECT return from the device).
>Audit-Trail:
>Unformatted: