Subject: Re: status of ncr 53c810 scsi adapter?
To: Cliff Romash <romash@BBN.COM>
From: Cliff Romash <romash@BBN.COM>
List: port-alpha
Date: 02/20/1998 11:25:04
After some further thinking and experimentation, I've come to the
conclusion that the problem that we are having here at GTE is caused by
using a serial console! A particular machine will reliably reboot with a
graphics console, but fail with a serial console.


The choice of disk drive also seems to affect this. Machines with Seagate
ST34501W exhibit the problem, while a machine with a ST32155W have not
seen the problem over several months.


So possible workarounds seem to me to either use a graphics console, or
to remove lots of the printing in the NCR driver's
autoconfiguration.....


I've submitted a bug report.


Cliff Romash





At 05:21 PM 2/19/98 -0500, Cliff Romash wrote:

>>>> 

<excerpt>Mark's description below is a problem I have recently seen here
at GTE. We have  8 systems with EB164 motherboards and Intraserver
ITI-3140U controllers with NCR 53c875 chips. All but one are using
Seagate ST35401W drives. Until two weeks ago, I had never seen a problem.
But two weeks ago, we received two new systems, and within 3 hours, were
seeing a boot time problem initializing the disks. I have since seen the
same problem on one of our old systems, but can offer not clues as to
what might be happening. 


After several power off/on cycles, we have always been able to get the
machine to boot. 


Any clues as to how to make this problem go away would be appreciated.


I'm including the console log from one of the failed boots at the end of
this message.


Cliff Romash





At 01:09 PM 2/19/98 -0500, Mark H. Levine wrote:

>

>   Charles Lepple writes:

>   > What is the current status of the code for the 53c810? I keep
seeing

>   > references to how it is buggy, and that it occasionally doesn't
work.

>   > 

>   > Essentially, I am trying to get an estimate of the percentage of
uptime

>   > I would have with a UDB with said SCSI adapter :-)

>   

>   The driver's reliability is bimodal. On a machine/disk combo on
which

>   it works, it is nearly 100% reliable. On a machine/disk combo on
which

>   it does not work, it will fail to work for more than a few seconds
to

>   few minutes. You will not operate the thing for several weeks only
to

>   have a failure -- you'll know very fast if you have trouble.

>   

>   Perry

>

>Hmm, that has not precisely been my experience with the NCR driver,
although

>it did seem to mostly work on the UDB box.  On newer boxes, it
exhibited

>the behavior of working until asked to do something like a large
tranfer,

>say tar/untar of the sources or toolchain sources, and then it would
fail,

>in a way that indicates it has little or no error recovery code. 
Typically

>the driver would log that it had received a scsi error, then start
timing

>out and logging the timeouts, then hang forever instead of trying to

>restart the controller and continue, requiring a reboot.  We did not
see

>that behavior at all on the 20164/66 boxes, but did see it on 21164
eb164s

>and pc164s.

>

>There was some loose talk here about giving this driver some priority
because

>of its support in SRM and being built-in to eb164 systems... is there
actually

>anyone doing development?  If not, is there anyone who can give pointers
to

>documentation on how to correct NCR scripts?  I've recently heard of a
new

>failure mode with Intraserver controllers and the pc164 that keeps the
machine

>from boostrapping once the kernel device driver has control, with
latest

>versions of pc164s and Seagate drives, so I have some interest in
looking at

>same.

>

>

>


<bigger>8192 byte page size, 1 processor.

real mem = 536870912 (2490368 reserved for PROM, 534380544 used by
NetBSD)

avail mem = 463233024

using 6523 buffers containing 53436416 bytes of memory

mainbus0 (root)

cpu0 at mainbus0: ID 0 (primary), 21164A (pass 2)

cia0 at mainbus0

pci0 at cia0 bus 0

ncr0 at pci0 dev 5 function 0: NCR 53c875 Wide SCSI

ncr0: interrupting at eb164 irq 2

        Delay (GEN=11): 236 msec

        Delay (GEN=11): 194 msec

        Delay (GEN=11): 194 msec

        NCR clock is 46871KHz, 46871KHz

        initial value of SCNTL3 = 05, final = 35

ncr0: restart (scsi reset).

scsibus0 at ncr0: 16 targets

sd0 at scsibus0 targ 0 lun 0: <<SEAGATE, ST34501W, 0018> SCSI2 0/direct
fixed

sd0: sd0(ncr0:0:0): WIDE SCSI (16 bit) enabled

sd0(ncr0:0:0): 20.0 MB/s (100 ns, offset 15)

ncr0: aborting job ...

ncr0:0: ERROR (10:0) (1-21-0) (f/3d) @ (d8c:1900001c).

        script cmd = 89030000

        reg:     da 10 80 3d 47 0f 00 0f 03 01 80 21 00 01 01 09.

ncr0: have to clear fifos.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

ncr0: aborting job ...

ncr0:0: ERROR (90:0) (0-21-27) (0/35) @ (418:430000b0).

        script cmd = 878b0000

        reg:     da 00 00 35 47 00 00 0f 71 00 00 21 80 01 00 0a.

ncr0: restart (fatal error).

sd0(ncr0:0:0): COMMAND FAILED (9 ff) @0xfffffe004a6d3400.

sd0: could not mode sense (4/5); using fictitious geometry

</bigger>


</excerpt><<<<<<<<