Subject: port-i386/1701: my system reproducably crashes due to ncr problems which seem to be related to simultaneous ncr and ide disk access
To: None <gnats-bugs@gnats.netbsd.org>
From: None <tsm@cs.brown.edu>
List: netbsd-bugs
Date: 10/28/1995 18:23:16
>Number:         1701
>Category:       port-i386
>Synopsis:       my system repeatedly crashes due to ncr problems which may be related to simultaneous ncr and ide access
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 28 18:35:01 1995
>Last-Modified:
>Originator:     Timothy Miller
>Organization:
Brown University
>Release:        oct 26 1995
>Environment:
P75 with NCR 825 

Here's the output from booting up

Oct 28 17:48:51 haywire /netbsd: NetBSD 1.1_ALPHA (HAYWIRE) #0: Sat Oct 28 17:21:04 EDT 1995
Oct 28 17:48:51 haywire /netbsd:     tim@haywire.pvd.ri.us:/usr/src/sys/arch/i386/compile/HAYWIRE
Oct 28 17:48:51 haywire /netbsd: CPU: Pentium (GenuineIntel 586-class CPU)
Oct 28 17:48:51 haywire /netbsd: real mem  = 16384000
Oct 28 17:48:52 haywire /netbsd: avail mem = 13930496
Oct 28 17:48:52 haywire /netbsd: using 225 buffers containing 921600 bytes of memory
Oct 28 17:48:52 haywire /netbsd: isa0 (root)
Oct 28 17:48:52 haywire /netbsd: com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
Oct 28 17:48:52 haywire /netbsd: com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
Oct 28 17:48:52 haywire /netbsd: lpt0 at isa0 port 0x378-0x37f irq 7
Oct 28 17:48:52 haywire /netbsd: wdc0 at isa0 port 0x1f0-0x1f7 irq 14
Oct 28 17:48:53 haywire /netbsd: wd0 at wdc0 drive 0: 325MB, 1010 cyl, 12 head, 55 sec, 512 bytes/sec <WDC AC2340>
Oct 28 17:48:53 haywire /netbsd: wd0: using 16-sector 16-bit pio transfers, chs addressing
Oct 28 17:48:53 haywire /netbsd: wdc1 at isa0 port 0x170-0x177 irq 15
Oct 28 17:48:53 haywire /netbsd: wd2 at wdc1 drive 0: 1033MB, 2100 cyl, 16 head, 63 sec, 512 bytes/sec <WDC AC31000H>
Oct 28 17:48:53 haywire /netbsd: wd2: using 16-sector 16-bit pio transfers, lba addressing
Oct 28 17:48:53 haywire /netbsd: ed0 at isa0 port 0x240-0x25f irq 10: address 00:40:33:29:b6:89, type NE2000 (16-bit)
Oct 28 17:48:53 haywire /netbsd: npx0 at isa0 port 0xf0-0xff: using exception 16
Oct 28 17:48:53 haywire /netbsd: pc0 at isa0 port 0x60-0x6f irq 1: color
Oct 28 17:48:54 haywire /netbsd: pms0 at isa0 port 0x60-0x67 irq 12
Oct 28 17:48:54 haywire /netbsd: fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
Oct 28 17:48:54 haywire /netbsd: fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
Oct 28 17:48:54 haywire /netbsd: fd1 at fdc0 drive 1: 1.2MB 80 cyl, 2 head, 15 sec
Oct 28 17:48:54 haywire /netbsd: root device eisa not configured
Oct 28 17:48:54 haywire /netbsd: pci0 (root): configuration mode 2
Oct 28 17:48:54 haywire /netbsd: pci0 bus 0 device 0: unknown vendor/product: 0x8086/0x04a3 (class: bridge, subclass: host, revision: 0x11) not configured
Oct 28 17:48:55 haywire /netbsd: pci0 bus 0 device 1: unknown vendor/product: 0x1042/0x1000 (class: mass storage, subclass: IDE, revision: 0x01) not configured
Oct 28 17:48:55 haywire /netbsd: pci0 bus 0 device 2: unknown vendor/product: 0x8086/0x0484 (class: prehistoric, subclass: miscellaneous, revision: 0x43) not configured
Oct 28 17:48:55 haywire /netbsd: pci0 bus 0 device 6: unknown vendor/product: 0x1002/0x4758 (class: display, subclass: VGA, revision: 0x01) not configured
Oct 28 17:48:55 haywire /netbsd: ncr0 at pci0 bus 0 device 14
Oct 28 17:48:55 haywire /netbsd: pci_map_mem: mapping memory at virtual f95ffc00, physical ffbffc00
Oct 28 17:48:55 haywire /netbsd: pci_map_int: pin A mapped to line 9
Oct 28 17:48:55 haywire /netbsd: ncr0: restart (scsi reset).
Oct 28 17:48:55 haywire /netbsd: scsibus0 at ncr0
Oct 28 17:48:56 haywire /netbsd: ncr0 targ 2 lun 0: <ARCHIVE, Python 25501-XXX, 2.26> SCSI2 1/sequential removable
Oct 28 17:48:56 haywire /netbsd: st0 at scsibus0: st0(ncr0:2:0): 200ns (5 Mb/sec) offset 8.
Oct 28 17:48:56 haywire /netbsd: drive empty
Oct 28 17:48:56 haywire /netbsd: ncr0 targ 5 lun 0: <DEC, DSP5400S, 427L> SCSI2 0/direct fixed
Oct 28 17:48:56 haywire /netbsd: sd0 at scsibus0sd0(ncr0:5:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
Oct 28 17:48:56 haywire /netbsd: : 3814MB, 3055 cyl, 26 head, 98 sec, 512 bytes/sec

System: NetBSD cis-ts5-slip5.cis.brown.edu 1.1_ALPHA NetBSD 1.1_ALPHA (HAYWIRE) #0: Sat Oct 28 17:21:04 EDT 1995 tim@haywire.pvd.ri.us:/usr/src/sys/arch/i386/compile/HAYWIRE i386

There are three disks in my system, one each on two ide controllers and one on
the ncr controller. Swap is on the scsi disk. Here's the relevant parts of the
disklabels for the disks:

wd0:

sectors/track: 55
tracks/cylinder: 12
sectors/cylinder: 660
cylinders: 1010
rpm: 3322

1 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:   666600        0    4.2BSD     1024  8192    16 	# (Cyl.    0 - 1009)

wd2:

sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 2100
rpm: 4495

1 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:  2116800        0    4.2BSD     1024  8192    16 	# (Cyl.    0 - 2099)

sd0:

sectors/track: 98
tracks/cylinder: 26
sectors/cylinder: 2548
cylinders: 3055
rpm: 5400

3 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:  7580300   203840    4.2BSD     1024  8192    16 	# (Cyl.   80 - 3054)
  b:   203840        0      swap                    	# (Cyl.    0 - 79)
  c:  7784140        0    unused        0     0       	# (Cyl.    0 - 3054)

>Description:
My system reproducably has problems with the ncr driving the disk. With the
version of the system from a few months ago it was a rare problem, easily
avoided, but then I recently upgraded to a newer version of the system (oct 1)
and it happened enough to interfere with my work, so I got the latest version
which supposedly incorporates a number of fixes to the ncr driver and now it
happens all the time and essentially makes the machine unusable in its current
configuration. One thing guaranteed to cause the problem is to make the machine
fsck -p all the disks at once, eg by resetting without shutdown. This will cause
the machine to print out something about ncr0 reset (ncr dead?) and go into
single-user mode, but it is then impossible to access sd0 until restart. Older
versions of the system would just hang with the scsi light on. Another way is
to start X and then start xemacs and an xterm at the same time. The output from
starting xemacs and then an xterm follows:

Oct 28 17:40:44 haywire /netbsd: sd0(ncr0:5:0): extraneous data discarded.
Oct 28 17:40:45 haywire /netbsd: sd0(ncr0:5:0): COMMAND FAILED (9 0) @f86c3200.
Oct 28 17:40:45 haywire /netbsd: swap_pager_clean: clean of page 3fa000 failed
Oct 28 17:40:57 haywire /netbsd: ncr0:5: ERROR (a0:10) (8-28-0) (8/13) @ (fc0:18f87d86).
Oct 28 17:40:57 haywire /netbsd: 	script cmd = 88030000
Oct 28 17:40:57 haywire /netbsd: 	reg:	 da 10 80 13 47 08 05 0f 03 08 85 28 80 00 00 00.
Oct 28 17:40:58 haywire /netbsd: ncr0: restart (fatal error).
Oct 28 17:40:58 haywire /netbsd: sd0(ncr0:5:0): COMMAND FAILED (9 ff) @f86cfc00.
Oct 28 17:40:58 haywire /netbsd: sd0(ncr0:5:0): COMMAND FAILED (9 ff) @f86cfe00.
Oct 28 17:40:58 haywire /netbsd: sd0(ncr0:5:0): COMMAND FAILED (9 ff) @f86c3000.
Oct 28 17:40:58 haywire /netbsd: sd0(ncr0:5:0): COMMAND FAILED (9 ff) @f86c3200.
Oct 28 17:40:58 haywire /netbsd: swap_pager_clean: clean of page 737000 failed
Oct 28 17:40:58 haywire /netbsd: swap_pager_clean: clean of page 7d7000 failed
Oct 28 17:40:58 haywire /netbsd: swap_pager_clean: clean of page 72a000 failed
Oct 28 17:40:58 haywire /netbsd: swap_pager_clean: clean of page 7d2000 failed
Oct 28 17:41:03 haywire /netbsd: ncr0: restart (ncr dead ?).
Oct 28 17:41:04 haywire /netbsd: sd0(ncr0:5:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
Oct 28 17:41:05 haywire /netbsd: swap_pager_clean: clean of page 795000 failed

[... many more clean of page X failed ...]

It hung up and I had to reboot it. In all the cases I've seen, it only gets
this error when the machine is doing something intensive with both the scsi
disk and an ide disk at the same time, and often works fine if I do something
intensive with the scsi disk alone, or for that matter with the scsi tape and
anything else. The card passes all its diagnostics, the disk is new and had
been working more or less fine until the recent upgrade (and seems to work
fine with something like dos), and everything worked fine for a long time
before putting the scsi disk in.
>How-To-Repeat:
see description.
>Fix:
>Audit-Trail:
>Unformatted: