Subject: port-sparc/4777: 1.3 si0 SCSI driver bombs.
To: None <gnats-bugs@gnats.netbsd.org>
From: David Gilbert <dgilbert@jaywon.pci.on.ca>
List: netbsd-bugs
Date: 01/04/1998 23:32:39
>Number:         4777
>Category:       port-sparc
>Synopsis:       New in 1.3, si0 SCSI dies with flags 0x03
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan  4 20:35:00 1998
>Last-Modified:
>Originator:     David Gilbert
>Organization:
============================================================================
|David Gilbert, Velocet Communications.       | Two things can only be     |
|Mail:       dgilbert@velocet.net             |  equal if and only if they |
|http://www.velocet.net/~dgilbert             |   are precisely opposite.  |
=========================================================GLO================
>Release:        Got 1.3 from master release site.
>Environment:
	
System: NetBSD repeat 1.3 NetBSD 1.3 (REPEAT) #0: Sun Jan 4 17:08:39 EST 1998 dgilbert@repeat:/u3/dgilbert/downloads/NetBSD/src/src/sys/arch/sparc/compile/REPEAT sparc

Machine is Sun4/260 with 5 SCSI drives and 4 'xd' drives.  The following
is the kernel config:


# 	$NetBSD: GENERIC,v 1.51.2.1 1997/11/20 08:46:57 mellon Exp $

include "arch/sparc/conf/std.sparc"

maxusers	128

# Options for variants of the Sun SPARC architecure.
# At least one is required.
options 	SUN4		# sun4/100, sun4/200, sun4/300
#options 	SUN4C		# sun4c - SS1, 1+, 2, ELC, SLC, IPC, IPX, etc.
#options 	SUN4M		# sun4m - SS10, SS20, Classic, etc.

#options 	MMU_3L		# 3-level MMU on sun4/400; (incomplete)

# Standard system options
options 	KTRACE		# system call tracing
options 	SYSVMSG		# System V message queues
options 	SYSVSEM		# System V semaphores
options 	SYSVSHM		# System V shared memory
#options 	SHMMAXPGS=1024	# 1024 pages is the default
options 	LKM		# loadable kernel modules
#options 	INSECURE	# disable kernel security level
#options 	UCONSOLE	# allow anyone to steal the virtual console

# Debugging options
options 	DDB		# kernel dynamic debugger
#options 	DEBUG		# kernel debugging code
#options 	DIAGNOSTIC	# extra kernel sanity checking
#options 	KGDB		# support for kernel gdb
#options 	KGDBDEV=0xc01	# kgdb device number (dev_t)
#options 	KGDBRATE=38400	# baud rate
#options 	SCSIVERBOSE	# Verbose SCSI errors

# Compatibility options
options 	COMPAT_43	# 4.3BSD system interfaces
options 	COMPAT_10	# NetBSD 1.0 binary compatibility
options 	COMPAT_11	# NetBSD 1.1 binary compatibility
options 	COMPAT_12	# NetBSD 1.2 binary compatibility
options 	COMPAT_SUNOS	# SunOS 4.x binary compatibility
options 	COMPAT_SVR4	# SunOS 5.x binary compatibility
options 	EXEC_ELF32	# Exec module for SunOS 5.x binaries.

# Filesystem options
file-system	FFS		# Berkeley Fast Filesystem
file-system	NFS		# Sun NFS-compatible filesystem client
file-system	KERNFS		# kernel data-structure filesystem
file-system	NULLFS		# NULL layered filesystem
file-system	MFS		# memory-based filesystem
file-system	FDESC		# user file descriptor filesystem
file-system	UMAPFS		# uid/gid remapping filesystem
file-system	LFS		# Log-based filesystem (still experimental)
file-system	PORTAL		# portal filesystem (still experimental)
file-system	PROCFS		# /proc
file-system	CD9660		# ISO 9660 + Rock Ridge file system
file-system	UNION		# union file system
file-system	MSDOSFS		# MS-DOS FAT filesystem(s).

options 	NFSSERVER	# Sun NFS-compatible filesystem server
options 	QUOTA		# FFS quotas
options 	FIFO		# POSIX fifo support (in all filesystems)

# Networking options
options 	INET		# IP stack
options 	TCP_COMPAT_42	# compatibility with 4.2BSD TCP/IP
options 	GATEWAY		# IP packet forwarding
#options 	ISO,TPIP	# OSI networking
#options 	EON		# OSI tunneling over IP
#options 	CCITT,LLC,HDLC	# X.25
#options 	PFIL_HOOKS	# pfil(9) packet filter hooks.

# Options for SPARCstation hardware
options 	RASTERCONSOLE	# fast rasterop console
options 	BLINK		# blink the led on supported machines

# Generic swap; second partition of root disk or network.
config		netbsd	root on ? type ?

# Main bus and CPU .. all systems.
mainbus0 at root
cpu0	at mainbus0

# Bus types found on SPARC systems.
#sbus0	at mainbus0				# sun4c
obio0	at mainbus0				# sun4 and sun4m
vmes0	at mainbus0				# sun4
vmel0	at mainbus0				# sun4
#iommu0	at mainbus0				# sun4m
#sbus0	at iommu0				# sun4m

#audioamd0	at mainbus0				# sun4c
#audio*	at audioamd0
#audioamd0	at obio0				# sun4m
#audio*	at audioamd0
#audioamd0	at sbus0 slot ? offset ?		# sun4m
#audio*	at audioamd0

#auxreg0	at mainbus0				# sun4c
#auxreg0	at obio0				# sun4m

# Power status and control register found on Sun4m systems
#power0	at obio0

# Mostek clock found on 4/300, sun4c, and sun4m systems.
# The Mostek clock NVRAM is the "eeprom" on sun4/300 systems.
#clock0	at mainbus0				# sun4c
#clock0	at obio0				# sun4m
#clock0	at obio0 addr 0xf2000000		# sun4/300

# Intersil clock found on 4/100 and 4/200 systems.
oclock0	at obio0 addr 0xf3000000		# sun4/200
#oclock0	at obio0 addr 0x03000000		# sun4/100

# Memory error registers.
#memreg0	at mainbus0				# sun4c
#memreg0	at obio0				# sun4m
memreg0	at obio0 addr 0xf4000000		# sun4/200 and sun4/300
#memreg0	at obio0 addr 0x04000000		# sun4/100

# Timer chip found on 4/300, sun4c, and sun4m systems.
#timer0	at mainbus0				# sun4c
#timer0	at obio0				# sun4m
#timer0	at obio0 addr 0xef000000		# sun4/300

# EEPROM found on 4/100 and 4/200 systems.  Note that the 4/300
# doesn't use this driver; the `EEPROM' is in the NVRAM on the
# Mostek clock chip on 4/300 systems.
eeprom0	at obio0 addr 0xf2000000		# sun4/200
#eeprom0	at obio0 addr 0x02000000		# sun4/100

# Zilog 8530 serial chips.  Each has two-channels.
# zs0 is ttya and ttyb.  zs1 is the keyboard and mouse.
#zs0	at mainbus0					# sun4c
#zs0	at obio0					# sun4m
zs0	at obio0 addr 0xf1000000 level 12 flags 0x103	# sun4/200 and sun4/300
#zs0	at obio0 addr 0x01000000 level 12 flags 0x103	# sun4/100
zstty0	at zs0 channel 0	# ttya
zstty1	at zs0 channel 1	# ttyb

#zs1	at mainbus0					# sun4c
#zs1	at obio0					# sun4m
zs1	at obio0 addr 0xf0000000 level 12 flags 0x103	# sun4/200 and sun4/300
#zs1	at obio0 addr 0x00000000 level 12 flags 0x103	# sun4/100
kbd0	at zs1 channel 0	# keyboard
ms0	at zs1 channel 1	# mouse

#zs2	at obio0 addr 0xe0000000 level 12 flags 0x103	# sun4/300
#zstty2	at zs2 channel 0	# ttyc
#zstty3	at zs2 channel 1	# ttyd

#
# Note the flags on the esp entries below, that work around
# deficiencies in the current driver:
#	bits 0-7:  disable disconnect/reselect for the corresponding target
#	bits 8-15: disable synch negotiation for target [bit-8]
#	Note: targets 4-7 have disconnect/reselect enabled on the premise
#	      that tape devices normally have one of these targets. Tape
#	      devices should be allowed to disconnect for the SCSI bus
#	      to operate acceptably.
#

# sun4/300 SCSI - an NCR53c94 or equivalent behind
# an LSI Logic DMA controller
#dma0	at obio0 addr 0xfa001000 level 4		# sun4/300
#esp0	at obio0 addr 0xfa000000 level 4 flags 0x0000	#

# sun4c or sun4m SCSI - an NCR53c94 or equivalent behind
# specialized DMA glue
#dma0	at sbus0 slot ? offset ?			# on-board SCSI
#esp0	at sbus0 slot ? offset ? flags 0x0000		# sun4c
#esp0	at dma0 flags 0x0000				# sun4m

# FSBE/S SCSI - an NCR53c94 or equivalent behind
#dma*	at sbus? slot ? offset ?			# SBus SCSI
#esp*	at sbus? slot ? offset ? flags 0x0000		# two flavours
#esp*	at dma? flags 0x0000				# depending on model

# Qlogic ISP SBus SCSI Card
#isp*	at sbus? slot ? offset ?

# sun4m Ethernet - an AMD 7990 LANCE behind
# specialized DMA glue
#ledma0	at sbus0 slot ? offset ?			# sun4m on-board
#le0	at ledma0					#

# Additional SBus LANCE devices - glued on by lebuffer
#lebuffer0	at sbus0 slot ? offset ?		# sun4m SBus
#lebuffer*	at sbus? slot ? offset ?		# sun4m SBus
#le0	at lebuffer0					#
#le*	at lebuffer?					#

# sun4/300 and sun4c Ethernet - an AMD 7990 LANCE
#le0	at sbus0 slot ? offset ?			# sun4c on-board
#le*	at sbus? slot ? offset ?

#le0	at obio0 addr 0xf9000000 level 6		# sun4/300

# sun4/100 and sun4/200 Ethernet - an Intel 82586 on-board
# or on a Multibus/VME card.
ie0	at obio0 addr 0xf6000000 level 6		# sun4/200 on-board
#ie0	at obio0 addr 0x06000000 level 6		# sun4/100 on-board

ie1	at vmes0 addr 0xffe88000 level 5 vect 0x75
ie2	at vmel0 addr 0xff31ff02 level 5 vect 0x76
ie3	at vmel0 addr 0xff35ff02 level 5 vect 0x77
ie4	at vmel0 addr 0xff2dff02 level 5 vect 0x7c

# Xylogics 753 or 7053 VME SMD disk controllers and disks, found
# on sun4 systems.
xdc0	at vmel0 addr 0xffffee80 level 3 vect 0x44
xdc1	at vmel0 addr 0xffffee90 level 3 vect 0x45
xdc2	at vmel0 addr 0xffffeea0 level 3 vect 0x46
xdc3	at vmel0 addr 0xffffeeb0 level 3 vect 0x47
xd*	at xdc? drive ?

# Xylogics 451 or 451 VME SMD disk controllers and disks, found
# on sun4 systems.
xyc0	at vmes0 addr 0xffffee40 level 3 vect 0x48
xyc1	at vmes0 addr 0xffffee48 level 3 vect 0x49
xy*	at xyc? drive ?

# NCR5380-based "Sun SCSI 3" VME SCSI controller.
# This driver has several flags which may be enabled by OR'ing
# the values and using the "flags" directive.
# Valid flags are:
#
#	0x01		Use DMA (may be polled)
#	0x02		Use DMA completion interrupts
#	0x04		Allow disconnect/reselect
#
# E.g. the following would enable DMA, interrupts, and reselect:
# si0	at vmes0 addr 0xff200000 level 3 vect 0x40 flags 0x07
#
# By default, DMA is enabled in the driver.
si0	at vmes0 addr 0xff200000 level 3 vect 0x40 flags 0x03

# NCR5380-based "SCSI Weird" on-board SCSI interface found
# on sun4/100 systems.  The flags are the same as the "si"
# controller.  Note, while DMA is enabled by default, only
# polled DMA works at this time, and reselects do not work
# on this particular controller.
#sw0	at obio0 addr 0x0a000000 level 3

# Sun "bwtwo" black and white framebuffer, found on sun4, sun4c, and sun4m
# systems.  If your sun4 system has a cgfour installed in the P4 slot,
# the P4 entries for "bwtwo" will attach to the overlay plane of the
# "cgfour".
#bwtwo0	at sbus0 slot ? offset ?		# sun4c on-board
#bwtwo*	at sbus? slot ? offset ?		# sun4c and sun4m
bwtwo0	at obio0 addr 0xfd000000 level 4	# sun4/200
#bwtwo0	at obio0 addr 0xfb300000 level 4	# sun4/300 in P4 slot
#bwtwo0	at obio0 addr 0x0b300000 level 4	# sun4/100 in P4 slot

# Sun "cgtwo" VME color framebuffer
cgtwo0	at vmes0 addr 0xff400000 level 4 vect 0xa8

# Sun "cgthree" Sbus color framebuffer
#cgthree0 at sbus? slot ? offset ?
#cgthree* at sbus? slot ? offset ?
#cgthree0 at obio? slot ? offset ?		# sun4m
#cgthree* at obio? slot ? offset ?		# sun4m

# Sun "cgfour" color framebuffer with overlay plane.  See above comment
# regarding overlay plane.
#cgfour0	at obio0 addr 0xfb300000 level 4	# sun4/300 P4
#cgfour0	at obio0 addr 0x0b300000 level 4	# sun4/100 P4

# Sun "cgsix" accelerated color framebuffer.
#cgsix0	at sbus? slot ? offset ?
#cgsix*	at sbus? slot ? offset ?
#cgsix0	at obio0 addr 0xfb000000 level 4	# sun4/300 P4
#cgsix0	at obio0 addr 0x0b000000 level 4	# sun4/100 P4

# Sun "cgeight" 24-bit framebuffer
#cgeight0 at obio0 addr 0xfb300000 level 4	# sun4/300 P4
#cgeight0 at obio0 addr 0x0b300000 level 4	# sun4/100 P4

# Sun "tcx" accelerated color framebuffer.
#tcx0	at sbus? slot ? offset ?
#tcx*	at sbus? slot ? offset ?

# Sun "cgfourteen" accelerated 24-bit framebuffer.
#cgfourteen0	at obio0			# sun4m

# SCSI bus layer.  SCSI devices attach to the SCSI bus, which attaches
# to the underlying hardware controller.
#scsibus* at esp?
#scsibus* at isp?
scsibus* at si?
#scsibus* at sw?

# These entries find devices on all SCSI busses and assign
# unit numbers dynamically.
sd*	at scsibus? target ? lun ?		# SCSI disks
st*	at scsibus? target ? lun ?		# SCSI tapes
cd*	at scsibus? target ? lun ?		# SCSI CD-ROMs
ch*	at scsibus? target ? lun ?		# SCSI changer devices

# Floppy controller and drive found on SPARCstations.
#fdc0	at mainbus0				# sun4c controller
#fdc0	at obio0				# sun4m controller
#fd*	at fdc0					# the drive itself

pseudo-device	loop			# loopback interface; required
pseudo-device	pty		32	# pseudo-ttys (for network, etc.)
pseudo-device	sl		2	# SLIP interfaces
pseudo-device	ppp		2	# PPP interfaces
pseudo-device	tun		4	# Network "tunnel" device
pseudo-device	bpfilter	16	# Berkeley Packet Filter
pseudo-device	vnd		4	# disk-like interface to files
pseudo-device	ccd		4	# concatenated and striped disks
#pseudo-device	strip		1	# radio clock
#pseudo-device	ipfilter		# ip filter
# rnd is EXPERIMENTAL
pseudo-device	rnd			# /dev/random and in-kernel generator

>Description:
	Machine crashes to DB ... havn't noticed a panic message.  The calls
on the top of the 'trace' command output are:

ncr5380_data_xfer
ncr5380_machine
ncr5380_sched
ncr5380_scsi_cmd

	So far, the machine has come back without major disk lossage.  I
vaguely suspect that the loading of screensavers is hosing the system.
The system runs a small news feed, but /var and /var/news are both on the
xd drives (/var/news is a ccd across two of them).  /var/spool/uucp
is written from NFS.  Swapping is done to all for xd drives.  I don't
currently have a dump partition large enough for the 80 Megs RAM I have.

	So... the only thing running from SCSI disks should be the
screensavers (/usr/X11R6 is on a ccd covering two SCSI drives).

>How-To-Repeat:
	I don't know how much of the problem lies in the changes to the
SCSI driver.  It seems very different from the last time I looked at it.
On thing that cropped up in 1.2 was a chache flushing problem that
basically screwed my disks.  This smells similar since my install for
1.3 is acting strange for any portion of it that is residing on SCSI
disk.

	The machine is available for testing.  I even have a X86 and/or 
a Sun3 that I can hook the console to (such that someone testing could
do useful debugging).
>Fix:
	Right now, I'm compiling a kernel (luckily that's on an xd
drive) that has flags 0x01 on si0.  This would suck in the long
term.  It is my perception (may not be acurate) that the SCSI
disks seem 25-50% slower (one-quarter to one-half the speed) of
1.2.

Dave.
>Audit-Trail:
>Unformatted: