Subject: Re: -current on Ultra 5+ - now it's major siop0 lossage
To: None <port-sparc@netbsd.org>
From: None <eeh@netbsd.org>
List: port-sparc
Date: 01/26/2001 01:06:40
	> As far as the siop problems are concerned, they are new to
	> me.  Manuel has added tag queuing to that driver recently,
	> so you should make certain he's aware of the problem.

	OK, I'm copying Manuel too  :-)

	I'm pretty sure I've only seen these (see above) siop0/SCSI bus DMA errors when
	booted under a post-1.5 kernel.  Was that when the tagged queueing stuff was
	added?  Is there a way to turn it off (I'm willing to try and build another
	-current kernel if I can figure out how ... I'd like to go to -current, but
	I need stability first and foremost on this system, and right now, with 1.5
	and my 2 hme.c changes, I have a sort of stable-but-delicate equilibrium)?

Tags were added after 1.5.  The problems you're seeing are
probably not directly due to tags but sideffects of code 
changes needed to add tags.  For that reason Manuel needs
to be aware of this so he can fix the problem for 1.6.

	> Having said all that, this could be an issue with the PCI
	> controller.  What sort of machine is this?  So far we have
	> mostly been dealing with machines that have UltraSPARC IIi
	> processors with the on-board PCI controller.  If you have 
	> a machine with an UltraSPARC II and a psycho or psycho+, 
	> (first congratulations for having gotten it to boot) 
	> you probably have issues with the PCI drivers and the
	> IOMMU's streaming buffer cache.  Try disabling it.

	I have a plain vanilla 333 MHz Ultra 5+.  Should I "try disabling" the IOMMU's
	streaming buffer cache anyway (not that I would know how to do that ... )?

	Here's the relevant boot messages:

	NetBSD 1.5 (NETBSD4ME) #0: Thu Jan 25 06:51:41 PST 2001
	    root@netbsd4me:/usr/src/1.5/sys/arch/sparc64/compile/NETBSD4ME
	total memory = 128 MB
	avail memory = 86880 KB
	using 4075 buffers containing 32600 KB of memory  [Why so much for bufcache??]
	bootpath: /pci@1f,0/pci@1,0/scsi@1,0/disk@0,0
	mainbus0 (root): SUNW,Ultra-5_10
	cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 333 MHz, version 0 FPU
	cpu0: physical 4K instruction (32 b/l), 4K data (32 b/l), 2048K external (64 
	b/l)

If this really is an Ultra 5, it has an UltraSPARC-IIi
and uses the PCI controller on the CPU.  That controller does
not have a streaming cache so disabling it should have no effect.

	psycho0 at mainbus0 addr 0xfffc4000
	sabre: bus range 0 to 2; simba b, PCI bus 1; simba a, PCI bus 2
	DVMA map: c0002000 to ffffe000
	pci0 at psycho0

This is strange.  Is it a psycho or a sabre?  psycho has a streaming
cache, but sabre is the on-board controller and does not.  You may as
well try disabling it to see if that's the problem.  Go into iommu.c and
look for `BUS_DMA_COHERENT'.  That is the code to turn off the streaming
cache.  Do something like this:

	tte = MAKEIOTTE(pa, !(flags&BUS_DMA_NOWRITE), !(flags&BUS_DMA_NOCACHE), 
-			!(flags&BUS_DMA_COHERENT));
+			0);


	pci0: i/o space, memory space enabled
	Sun Microsystems product 0x2000 (miscellaneous prehistoric) at pci0 dev 0 
	functi
	on 0 not configured
	simba0 at pci0 dev 1 function 0: Sun Microsystems Simba PCI bridge (rev. 0x13)
	pci1 at simba0 bus 2
	pci1: i/o space, memory space enabled
	siop0 at pci1 dev 1 function 0: Symbios Logic 53c875 (ultra-wide scsi)
	siop0: using on-board RAM
	siop0: interrupting at vector 16
	scsibus0 at siop0: 16 targets, 8 luns per target
	[...]
	hme0 at pci2 dev 1 function 1: address 08:00:20:ac:52:d4
	nsphy0 at hme0 phy 1: DP83840 10/100 media interface, rev. 1
	nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
	hme0: using vector 33 for interrupt
	[...]
	scsibus0: waiting 2 seconds for devices to settle...
	sd0 at scsibus0 target 0 lun 0: <QUANTUM, XP39100S, LYK8> SCSI2 0/direct fixed
	siop0: target 0 now synchronous at 20.0Mhz, offset 16
	sd0: 8682 MB, 5899 cyl, 20 head, 150 sec, 512 bytes/sect x 17781520 sectors
	root on sd0a dumps on sd0b
	[...]

	Oh.  While I'm here.  Two other (non-related) things:

	[...]
	Starting mountd.
	Jan 25 16:15:37 netbsd4me /netbsd: Non-unique normal route, mask not entered
	Setting securelevel: kern.securelevel: 0 -> 1

Don't kow about that.

	[...]
	Updating motd.
	Alignment error: dsfsr=00000000:00800001 dsfar=ffffffff:ffff98c4 \
	isfsr=00000000:00000000 pc=0x10c340
	Starting lpd.
	[...]

	I don't see anything in "netstat -r" to tell me what the cause of that
	"Non-unique normal route" message is.

	And what is the "Alignment error:" from?

Something being started is geberating an alignment error.  I've 
seen that myself recently but I have not diagnosed it.  It
probably is some kvm issue since it is kernel-version sensitive.
Probably `ntp' related.


Eduardo