Subject: Re: -current on Ultra 5+ - now it's major siop0 lossage
To: None <port-sparc@NetBSD.ORG>
From: Greg Earle <earle@isolar.DynDNS.ORG>
List: port-sparc
Date: 01/25/2001 18:31:16
> Tags were added after 1.5. The problems you're seeing are
> probably not directly due to tags but sideffects of code
> changes needed to add tags. For that reason Manuel needs
> to be aware of this so he can fix the problem for 1.6.
Well, if it's any help, the problem's gotten *far* worse since the Jan. 14th
snapshot, from my Ultra 5+'s standpoint.
> NetBSD 1.5 (NETBSD4ME) #0: Thu Jan 25 06:51:41 PST 2001
> root@netbsd4me:/usr/src/1.5/sys/arch/sparc64/compile/NETBSD4ME
> total memory = 128 MB
> avail memory = 86880 KB
> using 4075 buffers containing 32600 KB of memory
> bootpath: /pci@1f,0/pci@1,0/scsi@1,0/disk@0,0
> mainbus0 (root): SUNW,Ultra-5_10
> cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 333 MHz, version 0 FPU
> cpu0: physical 4K instruction (32 b/l), 4K data (32 b/l), 2048K externa
> l (64 b/l)
>
> If this really is an Ultra 5, it has an UltraSPARC-IIi
> and uses the PCI controller on the CPU. That controller does
> not have a streaming cache so disabling it should have no effect.
It really is an Ultra 5+, honest :-)
> psycho0 at mainbus0 addr 0xfffc4000
> sabre: bus range 0 to 2; simba b, PCI bus 1; simba a, PCI bus 2
> DVMA map: c0002000 to ffffe000
> pci0 at psycho0
>
> This is strange. Is it a psycho or a sabre? psycho has a streaming
> cache, but sabre is the on-board controller and does not. You may as
> well try disabling it to see if that's the problem. Go into iommu.c and
> look for `BUS_DMA_COHERENT'. That is the code to turn off the streaming
> cache. Do something like this:
>
> tte = MAKEIOTTE(pa, !(flags&BUS_DMA_NOWRITE), !(flags&BUS_DMA_NOCACHE),
> - !(flags&BUS_DMA_COHERENT));
> + 0);
Made this change; no impact whatsoever. Same problem persists.
>> siop0 at pci1 dev 1 function 0: Symbios Logic 53c875 (ultra-wide scsi)
>> siop0: using on-board RAM
>> siop0: interrupting at vector 16
>> scsibus0 at siop0: 16 targets, 8 luns per target
>> [...]
>> scsibus0: waiting 2 seconds for devices to settle...
>> sd0 at scsibus0 target 0 lun 0: <QUANTUM, XP39100S, LYK8> SCSI2 0/direct
fixed
>> siop0: target 0 now synchronous at 20.0Mhz, offset 16
>> sd0: 8682 MB, 5899 cyl, 20 head, 150 sec, 512 bytes/sect x 17781520 sectors
>> root on sd0a dumps on sd0b
1.5Q messages:
scsibus0: waiting 2 seconds for devices to settle...
siop0: alloc newcdb at PHY addr 0xc006c000
siop0: target 0 using tagged queuing
[ARRGH - how do I turn this off???]
sd0 at scsibus0 target 0 lun 0: <QUANTUM, XP39100S, LYK8> SCSI2 0/direct fixed
sd0: 8682 MB, 5899 cyl, 20 head, 150 sec, 512 bytes/sect x 17781520 sectors
DMA IRQ: bus fault dma fifo empty, DSP=0xc0069fec DSA=0xc006df0:
last msg_in=0x0 status=0xff
siop0: unhandled scsi interrupt, sist=0x400 sstat1=0xf DSA=0xc006df00
DSP=0xc0069fec
IPsec: Initialized Security Association Processing.
root device: sd0a
[...]
So the problem happens right away, as soon as the disk is probed/id'd.
> Starting ntpd.
> Updating motd.
> Alignment error: dsfsr=00000000:00800001 dsfar=ffffffff:ffff98c4 \
> isfsr=00000000:00000000 pc=0x10c340
> Starting lpd.
> [...]
>
> And what is the "Alignment error:" from?
>
> Something being started is generating an alignment error. I've
> seen that myself recently, but I have not diagnosed it. It
> probably is some kvm issue since it is kernel-version sensitive.
> Probably `ntp' related.
OK, thanks.
Matt, there's no core file generated from this "Alignment error", no.
Sorry for the real-time debugging folks ...
- Greg