Subject: kern/7694: Assertion "cp" fails in /sys/dev/pci/ncr.c
To: None <gnats-bugs@gnats.netbsd.org>
From: None <jarle@runit.sintef.no>
List: netbsd-bugs
Date: 06/03/1999 03:23:07
>Number:         7694
>Category:       kern
>Synopsis:       Assertion "cp" fails in /sys/dev/pci/ncr.c
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun  3 03:20:04 1999
>Last-Modified:
>Originator:     Jarle Greipsland
>Organization:
RUNIT
>Release:        1.4_BETA (1999-04-20)
>Environment:
	
System: NetBSD klodrik.uninett.no 1.4_BETA NetBSD 1.4_BETA (KLODRIK) #1: Wed Jun  2 22:36:24 MEST 1999     jarle@klodrik.uninett.no:/usr/src/sys/arch/i386/compile/KLODRIK i386

>Description:

There seems to be (yet another) problem with the NCR 53C810 SCSI driver 
(/* $NetBSD: ncr.c,v 1.80 1998/12/13 00:11:37 thorpej Exp $ */).
On this system we run a ccd across 5 SCSI disks, spread over 2 NCR53C810
controllers.  After running for several days the system suddenly starts
spewing out error messages as show below.

The CCD is configured as follows:
ccd0    32      none    /dev/sd1e /dev/sd2e /dev/sd3e /dev/sd4e /dev/sd5e

The error messages that gets logged are:

Jun  2 22:13:51 klodrik /netbsd: ccd0: error 5 on component 3
Jun  2 22:13:51 klodrik last message repeated 3 times
Jun  2 22:19:01 klodrik /netbsd: ccd0: error 5 on component 3
Jun  2 22:24:11 klodrik last message repeated 7 times
Jun  2 22:25:13 klodrik /netbsd: assertion "cp" failed: file "../../../../dev/pc
i/ncr.c", line 6693
Jun  2 22:25:13 klodrik /netbsd: sd4(ncr1:1:0): COMMAND FAILED (4 28) @0xf077140
0.
Jun  2 22:25:13 klodrik /netbsd: assertion "cp" failed: file "../../../../dev/pc
i/ncr.c", line 6693
Jun  2 22:25:13 klodrik /netbsd: sd4(ncr1:1:0): COMMAND FAILED (4 28) @0xf077180
0.
Jun  2 22:25:13 klodrik /netbsd: sd4(ncr1:1:0): COMMAND FAILED (4 28) @0xf0771c0
0.
Jun  2 22:25:13 klodrik /netbsd: assertion "cp" failed: file "../../../../dev/pc
i/ncr.c", line 6693
Jun  2 22:25:13 klodrik /netbsd: assertion "cp" failed: file "../../../../dev/pc
i/ncr.c", line 6693
Jun  2 22:25:13 klodrik /netbsd: sd4(ncr1:1:0): COMMAND FAILED (4 28) @0xf076e40
0.
Jun  2 22:25:13 klodrik /netbsd: assertion "cp" failed: file "../../../../dev/pc
i/ncr.c", line 6693

and the dmesg output of the system is:

NetBSD 1.4_BETA (KLODRIK) #1: Wed Jun  2 22:36:24 MEST 1999
    jarle@klodrik.uninett.no:/usr/src/sys/arch/i386/compile/KLODRIK
cpu0: family 6 model 5 step 1
cpu0: Intel Pentium II (686-class)
real mem  = 268029952
avail mem = 246931456
using 2822 buffers containing 13504512 bytes of memory
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o enabled, memory enabled
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82443LX PCI AGP Controller (PAC) (rev. 0x03)
ppb0 at pci0 dev 1 function 0: Intel 82443LX AGP Interface (PAC) (rev. 0x03)
pci1 at ppb0 bus 1
pci1: i/o enabled, memory enabled
vga1 at pci1 dev 0 function 0: Matrox MGA Millennium II 2164WA-B AG (rev. 0x00)
wsdisplay0 at vga1: console (80x25, vt100 emulation)
pcib0 at pci0 dev 4 function 0
pcib0: Intel 82371AB PCI-to-ISA Bridge (PIIX4) (rev. 0x01)
pciide0 at pci0 dev 4 function 1: Intel 82371AB IDE controller (PIIX4)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <IBM-DTTA-351680>
wd0: drive supports 16-sector pio transfers, lba addressing
wd0: 16124MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 33022080 sectors
wd0: 32-bits data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
pciide0: primary channel interrupting at irq 14
pciide0: secondary channel wired to compatibility mode
pciide0: disabling secondary channel (no drives)
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfers)
uhci0 at pci0 dev 4 function 2: Intel 82371AB USB Host Controller (PIIX4) (rev. 0x01)
uhci0: interrupting at irq 9
uhci0: USB version 1.0
usb0 at uhci0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
Intel 82371AB Power Management Controller (PIIX4) (miscellaneous bridge, revision 0x01) at pci0 dev 4 function 3 not configured
ahc0 at pci0 dev 6 function 0
ahc0: interrupting at irq 9
ahc0: aic7880 Wide Channel, SCSI Id=7, 16 SCBs
scsibus0 at ahc0 channel 0: 16 targets, 8 luns per target
ahc0: target 0 using 16Bit transfers
ahc0: target 0 synchronous at 10.0MHz, offset = 0x8
sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST39173W, 5958> SCSI2 0/direct fixed
sd0: 8683MB, 7501 cyl, 10 head, 237 sec, 512 bytes/sect x 17783240 sectors
fxp0 at pci0 dev 10 function 0: Intel EtherExpress Pro 10+/100B Ethernet
fxp0: interrupting at irq 5
fxp0: Ethernet address 00:a0:c9:b6:df:33
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ncr0 at pci0 dev 11 function 0: ncr 53c860 fast20 scsi
ncr0: interrupting at irq 10
ncr0: minsync=12, maxsync=137, maxoffs=8, 16 dwords burst, normal dma fifo
ncr0: single-ended, open drain IRQ driver
ncr0: restart (scsi reset).
scsibus1 at ncr0: 8 targets, 8 luns per target
sd1 at scsibus1 targ 0 lun 0: <DEC, RZ29B    (C) DEC, 0016> SCSI2 0/direct fixed
sd1(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
sd1: 4091MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sd2 at scsibus1 targ 2 lun 0: <DEC, RZ29B    (C) DEC, 0009> SCSI2 0/direct fixed
sd2(ncr0:2:0): 10.0 MB/s (100 ns, offset 8)
sd2: 4091MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sd3 at scsibus1 targ 4 lun 0: <DEC, RZ29B    (C) DEC, 0014> SCSI2 0/direct fixed
sd3(ncr0:4:0): 10.0 MB/s (100 ns, offset 8)
sd3: 4091MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
ncr1 at pci0 dev 12 function 0: ncr 53c860 fast20 scsi
ncr1: interrupting at irq 11
ncr1: minsync=12, maxsync=137, maxoffs=8, 16 dwords burst, normal dma fifo
ncr1: single-ended, open drain IRQ driver
ncr1: restart (scsi reset).
scsibus2 at ncr1: 8 targets, 8 luns per target
sd4 at scsibus2 targ 1 lun 0: <DEC, RZ29B    (C) DEC, 0007> SCSI2 0/direct fixed
sd4(ncr1:1:0): 10.0 MB/s (100 ns, offset 8)
sd4: 4091MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sd5 at scsibus2 targ 3 lun 0: <DEC, RZ29B    (C) DEC, 0007> SCSI2 0/direct fixed
sd5(ncr1:3:0): 10.0 MB/s (100 ns, offset 8)
sd5: 4091MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
lpt0 at isa0 port 0x378-0x37b irq 7
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
opmsprobe: command error
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
biomask 4e40 netmask 4e60 ttymask 4ee2
wscons: wskbd0 glued to wsdisplay0 (console)
WARNING: old BSD partition ID!
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs

	
>How-To-Repeat:

I have no idea.  It seems to reoccur on this system every few days, but I
cannot reproduce it at will.  Note that a couple of the disks that make up
the CCD have previously had some bad blocks, but I think we've remapped
those away.  However, it's entirely possible that either we haven't managed
to do that properly, or some other bad blocks may have developed.  Still,
should some low level SCSI error (noise or bad blocks etc) be causing this
I find it a bit strange that no lower level errors are logged.

If anyone wants more information please ask and I'll see what I can find.

	
>Fix:
	
No clue.
>Audit-Trail:
>Unformatted: