Subject: Re: ncr hangs system
To: None <joe@js.ne.mediaone.net, port-alpha@netbsd.org>
From: Ross Harvey <ross@ghs.com>
List: port-alpha
Date: 03/18/1999 12:37:05
: From: Joseph Sarkes <joe@js.ne.mediaone.net>

> I am having a set of 4 ncr timeouts occur and then the system
> hangs. The system is still responsive to keyboard input, but 
> obviously is blocked in the scsi system waiting for accesses.
>
> The system is a multia 233MHz/64Mb running -current of a couple
> days ago. The problem occurs while using the internal ncr810
> scsi controller, the internal 530Mb scsi drive that came with
> the system, and an external syjet 1.5G drive and ricoh cr-rw
> drive. The system seems to work ok as long as i don't access
> the 2.5" scsi internal drive. If I use that drive however, in
> some variable length of time the system will wedge. 
>
> I can build new kernels, -current, etc. as long as I stay with
> just the syjet. While moving data to the internal drive it runs
> until the hang, from 50 to 250 Meg of data transfer prior to the
> death. I've tried options FAILSAFE in the config file to no
> result. Any approach to debugging this would be helpful. The 
> failure is repeatable, and not dependent on the type of filesystem
> on the 2.5" drive. (I tried lfs also). 
>
> If the debugger is in the kernel, how do I get it to start via
> the keyboard? Perhaps some stack trace would be available. Also
> I would like to use a serial console vice the tga to allow logging
> of the results. How do I do this?
>
> -- 
> Joseph Sarkes                   mailto:joe@mediaone.net
>

This sounds like a HW problem, but ncr(4) has lots of bugs and is always
one of the usual suspects.

	To start ddb from display keyboard:	ctrl-alt-esc

	To start ddb from a serial console:	send a break

						real terminal:	[break] key
						with tip(1):	~#

	To use a serial console:		* halt system
						* unplug your keyboard
						* reset
						SRM will come up on the serial
						line and NetBSD will follow
						suit

Caveats: ddb(4) does not know how to do stack traces on alpha, and ncr.c
is 8,000 lines, complex, and features a strange internal design. Lots of
external docs are needed to work on the script engine.

Does your system run reliably if you unplug the external SCSI devices?
Presumably, you need to turn off the terminator at the SCSI controller,
or is that automatic?

SCSI-2 was never intended to be used in fast synch modes with external
devices, and using two external scsi devices is venturing onto somewhat
thin ice.  (Sure, people do it all the time, but people walk on thin ice
and railroad tracks, too. :-)

	Ross.Harvey@Computer.Org