Subject: Re: -current on Ultra 5+ - now it's major siop0 lossage
To: None <port-sparc@NetBSD.ORG>
From: Greg Earle <earle@isolar.DynDNS.ORG>
List: port-sparc
Date: 01/25/2001 18:31:16
> Tags were added after 1.5.  The problems you're seeing are
> probably not directly due to tags but sideffects of code 
> changes needed to add tags.  For that reason Manuel needs
> to be aware of this so he can fix the problem for 1.6.

Well, if it's any help, the problem's gotten *far* worse since the Jan. 14th
snapshot, from my Ultra 5+'s standpoint.

> 	NetBSD 1.5 (NETBSD4ME) #0: Thu Jan 25 06:51:41 PST 2001
> 	    root@netbsd4me:/usr/src/1.5/sys/arch/sparc64/compile/NETBSD4ME
> 	total memory = 128 MB
> 	avail memory = 86880 KB
> 	using 4075 buffers containing 32600 KB of memory
> 	bootpath: /pci@1f,0/pci@1,0/scsi@1,0/disk@0,0
> 	mainbus0 (root): SUNW,Ultra-5_10
> 	cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 333 MHz, version 0 FPU
> 	cpu0: physical 4K instruction (32 b/l), 4K data (32 b/l), 2048K externa
>       l (64 b/l)
> 
> If this really is an Ultra 5, it has an UltraSPARC-IIi
> and uses the PCI controller on the CPU.  That controller does
> not have a streaming cache so disabling it should have no effect.

It really is an Ultra 5+, honest  :-)

> 	psycho0 at mainbus0 addr 0xfffc4000
> 	sabre: bus range 0 to 2; simba b, PCI bus 1; simba a, PCI bus 2
> 	DVMA map: c0002000 to ffffe000
> 	pci0 at psycho0
> 
> This is strange.  Is it a psycho or a sabre?  psycho has a streaming
> cache, but sabre is the on-board controller and does not.  You may as
> well try disabling it to see if that's the problem.  Go into iommu.c and
> look for `BUS_DMA_COHERENT'.  That is the code to turn off the streaming
> cache.  Do something like this:
> 
> 	tte = MAKEIOTTE(pa, !(flags&BUS_DMA_NOWRITE), !(flags&BUS_DMA_NOCACHE),
> -			!(flags&BUS_DMA_COHERENT));
> +			0);

Made this change; no impact whatsoever.   Same problem persists.

>> siop0 at pci1 dev 1 function 0: Symbios Logic 53c875 (ultra-wide scsi)
>> siop0: using on-board RAM
>> siop0: interrupting at vector 16
>> scsibus0 at siop0: 16 targets, 8 luns per target
>> [...]
>> scsibus0: waiting 2 seconds for devices to settle...
>> sd0 at scsibus0 target 0 lun 0: <QUANTUM, XP39100S, LYK8> SCSI2 0/direct 
fixed
>> siop0: target 0 now synchronous at 20.0Mhz, offset 16
>> sd0: 8682 MB, 5899 cyl, 20 head, 150 sec, 512 bytes/sect x 17781520 sectors
>> root on sd0a dumps on sd0b

1.5Q messages:

scsibus0: waiting 2 seconds for devices to settle...
siop0: alloc newcdb at PHY addr 0xc006c000
siop0: target 0 using tagged queuing

[ARRGH - how do I turn this off???]

sd0 at scsibus0 target 0 lun 0: <QUANTUM, XP39100S, LYK8> SCSI2 0/direct fixed
sd0: 8682 MB, 5899 cyl, 20 head, 150 sec, 512 bytes/sect x 17781520 sectors
DMA IRQ: bus fault dma fifo empty, DSP=0xc0069fec DSA=0xc006df0:
last msg_in=0x0 status=0xff
siop0: unhandled scsi interrupt, sist=0x400 sstat1=0xf DSA=0xc006df00 
DSP=0xc0069fec
IPsec: Initialized Security Association Processing.
root device: sd0a
[...]

So the problem happens right away, as soon as the disk is probed/id'd.

> 	Starting ntpd.
> 	Updating motd.
> 	Alignment error: dsfsr=00000000:00800001 dsfar=ffffffff:ffff98c4 \
> 	isfsr=00000000:00000000 pc=0x10c340
> 	Starting lpd.
> 	[...]
> 
> 	And what is the "Alignment error:" from?
> 
> Something being started is generating an alignment error.  I've 
> seen that myself recently, but I have not diagnosed it.  It
> probably is some kvm issue since it is kernel-version sensitive.
> Probably `ntp' related.

OK, thanks.

Matt, there's no core file generated from this "Alignment error", no.

Sorry for the real-time debugging folks ...

	- Greg