Subject: port-pmax/13595: asc_timeout produces a diagnostic-free panic.
To: None <gnats-bugs@gnats.netbsd.org>
From: John Hawkinson <jhawk@mit.edu>
List: netbsd-bugs
Date: 07/30/2001 18:39:20
>Number:         13595
>Category:       port-pmax
>Synopsis:       asc_timeout produces a diagnostic-free panic.
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    port-pmax-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jul 29 15:38:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     John Hawkinson
>Release:        NetBSD 1.5.1
>Organization:
	MIT
>Environment:
	
System: NetBSD sotuhgoing-zax.zocalo.terc.edu 1.5.1 NetBSD 1.5.1 (GENERIC) #3: Tue Jul 3 23:41:20 EST 2001 root@medusa.thistledown.com.au:/usr/obj/NetBSD/src15/sys/arch/pmax/compile/GENERIC pmax


>Description:
	The tc scsi driver can fail with an asc_timeout in a really annoying
way. It just printf's "asc_timeout" and then calls cpu_reboot(). This is
really unacceptable. It should provide some indicatino to the user that
this is a significant fault and that a reboot is going to happen, otherwise
one stares at the message buffer and that the machine has rebooted and has
to go UTSLing to figure it out.

It seems like there's some reason that panic() isn't being used, but I'm
puzzled by it. "Why not?"


Beyond that, why is an asc_timeout a panic-worthy operation? 

>How-To-Repeat:
	Watch your machine take an asc_timeout and reboot, like this:

Jul 30 14:00:07 sotuhgoing-zax syslogd: restart
Jul 30 14:12:55 sotuhgoing-zax /netbsd: rz0: Recoverable error
Jul 30 14:30:43 sotuhgoing-zax syslogd: restart
Jul 30 14:30:44 sotuhgoing-zax /netbsd: asc_timeout: cmd 0x80254790 drive 0
Jul 30 14:30:44 sotuhgoing-zax /netbsd: rebooting...

Code inspection shows:

void
asc_timeout(arg)
	void *arg;
{
	int s = splbio();
	ScsiCmd *scsicmd = (ScsiCmd *) arg;

	printf("asc_timeout: cmd %p drive %d\n", scsicmd, scsicmd->sd->sd_drive);
#ifdef DEBUG
	asc_DumpLog("asc_timeout");
#endif
#if 0
	panic("asc_timeout");
#else
	cpu_reboot(RB_NOSYNC, NULL); /* XXX */
#endif
	splx(s);
}


>Fix:
	Punt the #if 0 case? Or make the printf more verbose. And add a note
to asc(4).
>Release-Note:
>Audit-Trail:
>Unformatted: