Subject: Re: ncr hangs system
To: None <email@example.com, firstname.lastname@example.org>
From: Joseph Sarkes <email@example.com>
Date: 03/18/1999 16:19:43
Ross Harvey writes:
> : From: Joseph Sarkes <firstname.lastname@example.org>
> > I am having a set of 4 ncr timeouts occur and then the system
> > hangs. The system is still responsive to keyboard input, but
> > obviously is blocked in the scsi system waiting for accesses.
> > If the debugger is in the kernel, how do I get it to start via
> > the keyboard? Perhaps some stack trace would be available. Also
> > I would like to use a serial console vice the tga to allow logging
> > of the results. How do I do this?
> > --
> > Joseph Sarkes mailto:email@example.com
> This sounds like a HW problem, but ncr(4) has lots of bugs and is always
> one of the usual suspects.
> To start ddb from display keyboard: ctrl-alt-esc
I managed to dig around and find this... works fine till I
try to examine memory (bad address i guess) at which point I
panic the kernel.
> To start ddb from a serial console: send a break
> real terminal: [break] key
> with tip(1): ~#
this also is in the very find docs for ddb if one manages to look
for them, rather than poking about aimlessly
> To use a serial console: * halt system
> * unplug your keyboard
> * reset
This was not documented in a place that i could find it. Thanks, I will
do this... BTW, i have a beldin omniview box that I use to share monitor
and keyboard/mouse with. Sometimes in a crash, the thing won't switch
any more, till i power off the multia... some stream of garbage being
send on the keyboard??? who knows...
> SRM will come up on the serial
> line and NetBSD will follow
I may set this box up solely with a serial console until X works again.
Even with the keyboard connected, srm works with the serial console.
NetBSD goes directly to the keyboard/tga console however... Can I configure
to have a serial console in netbsd even with the keyboard responsive?
> Caveats: ddb(4) does not know how to do stack traces on alpha, and ncr.c
> is 8,000 lines, complex, and features a strange internal design. Lots of
> external docs are needed to work on the script engine.
The "ccb timeout" thing should manage to reset the scsi bus / restart
I would think. I have been looking at the ncr.c file, and would prefer
to not go near it... I also haven't been able to figure out how to turn
on any debugging in the ncr driver, or get the scsi debugging turned on.
The ioctl returns -1 and no scsi driver debugging info is printed. Perhaps
I don't have the proper incantation, but I suspect that ncr.c doesn't even
support debugging using the scsi debugging ioctls.
I notice a bunch of debugging stuff in ncr.c, but don't see how to turn it
on, and there are no instructions... Is there some type of non-included
utility that is used for debugging this driver?
> Does your system run reliably if you unplug the external SCSI devices?
> Presumably, you need to turn off the terminator at the SCSI controller,
> or is that automatic?
I am totally unsure about what termination the ncr controller has, and
the same for the internal hd. As long as I don't access the internal hd
i pretty much have a stable system. The drive is a weird one that I have
no other use for, so i don't really want to just unplug it, but I may do
> SCSI-2 was never intended to be used in fast synch modes with external
> devices, and using two external scsi devices is venturing onto somewhat
> thin ice. (Sure, people do it all the time, but people walk on thin ice
> and railroad tracks, too. :-)
With the options FAILSAFE config line, i would hope most stuff was turned
off. However, I have no idea how to probe what the driver is actually
doing, and whether sync transfers are in progress, tagged queueing is
occuring, or whatever else... All I know is that the controller gets
confused, then wedges, instead of restarting. Perhaps somebody that
knows how the driver works could put a sledgehammer restart into it
somewhere to restart it if it goes south.
Should anybody need info regarding this, I will set up a serial console
and log the results for them, and try out attempts to fix it. My only
other chance for a stable system at this time is likely to install
an adaptec 2940uw card i have in a pc at this moment, into the multia.
Joseph Sarkes mailto:firstname.lastname@example.org
P.O. Box 482
Ipswich, MA 01938