Subject: Re: ncr hangs system
To: None <ross@ghs.com, port-alpha@netbsd.org>
From: Joseph Sarkes <joe@js.ne.mediaone.net>
List: port-alpha
Date: 03/18/1999 16:19:43
Ross Harvey writes:
> 
> : From: Joseph Sarkes <joe@js.ne.mediaone.net>
> 
> > I am having a set of 4 ncr timeouts occur and then the system
> > hangs. The system is still responsive to keyboard input, but 
> > obviously is blocked in the scsi system waiting for accesses.
> >
> >
> > If the debugger is in the kernel, how do I get it to start via
> > the keyboard? Perhaps some stack trace would be available. Also
> > I would like to use a serial console vice the tga to allow logging
> > of the results. How do I do this?
> >
> > -- 
> > Joseph Sarkes                   mailto:joe@mediaone.net
> >
> 
> This sounds like a HW problem, but ncr(4) has lots of bugs and is always
> one of the usual suspects.
> 
> 	To start ddb from display keyboard:	ctrl-alt-esc

I managed to dig around and find this... works fine till I 
try to examine memory (bad address i guess) at which point I
panic the kernel.

> 
> 	To start ddb from a serial console:	send a break
> 
> 						real terminal:	[break] key
> 						with tip(1):	~#

this also is in the very find docs for ddb if one manages to look
for them, rather than poking about aimlessly

> 
> 	To use a serial console:		* halt system
> 						* unplug your keyboard
> 						* reset

This was not documented in a place that i could find it. Thanks, I will
do this... BTW, i have a beldin omniview box that I use to share monitor
and keyboard/mouse with. Sometimes in a crash, the thing won't switch
any more, till i power off the multia... some stream of garbage being 
send on the keyboard??? who knows...

> 						SRM will come up on the serial
> 						line and NetBSD will follow
> 						suit

I may set this box up solely with a serial console until X works again.
Even with the keyboard connected, srm works with the serial console.
NetBSD goes directly to the keyboard/tga console however... Can I configure
to have a serial console in netbsd even with the keyboard responsive?

> 
> Caveats: ddb(4) does not know how to do stack traces on alpha, and ncr.c
> is 8,000 lines, complex, and features a strange internal design. Lots of
> external docs are needed to work on the script engine.

The "ccb timeout" thing should manage to reset the scsi bus / restart 
I would think. I have been looking at the ncr.c file, and would prefer
to not go near it... I also haven't been able to figure out how to turn
on any debugging in the ncr driver, or get the scsi debugging turned on.

The ioctl returns -1 and no scsi driver debugging info is printed. Perhaps
I don't have the proper incantation, but I suspect that ncr.c doesn't even
support debugging using the scsi debugging ioctls.

I notice a bunch of debugging stuff in ncr.c, but don't see how to turn it
on, and there are no instructions... Is there some type of non-included
utility that is used for debugging this driver?

> 
> Does your system run reliably if you unplug the external SCSI devices?
> Presumably, you need to turn off the terminator at the SCSI controller,
> or is that automatic?

I am totally unsure about what termination the ncr controller has, and 
the same for the internal hd. As long as I don't access the internal hd
i pretty much have a stable system. The drive is a weird one that I have
no other use for, so i don't really want to just unplug it, but I may do
this anyways.

> 
> SCSI-2 was never intended to be used in fast synch modes with external
> devices, and using two external scsi devices is venturing onto somewhat
> thin ice.  (Sure, people do it all the time, but people walk on thin ice
> and railroad tracks, too. :-)

With the options FAILSAFE  config line, i would hope most stuff was turned
off. However, I have no idea how to probe what the driver is actually 
doing, and whether sync transfers are in progress, tagged queueing is
occuring, or whatever else... All I know is that the controller gets
confused, then wedges, instead of restarting. Perhaps somebody that
knows how the driver works could put a sledgehammer restart into it
somewhere to restart it if it goes south.  

Should anybody need info regarding this, I will set up a serial console
and log the results for them, and try out attempts to fix it. My only
other chance for a stable system at this time is likely to install
an adaptec 2940uw card i have in a pc at this moment, into the multia.

> 
> 	Ross.Harvey@Computer.Org
> 


-- 
Joseph Sarkes                   mailto:joe@mediaone.net
P.O. Box 482
Ipswich, MA 01938
(978) 948-5017