Subject: Odd data faults on U5 with Promise U66 IDE controller
To: None <port-sparc64@netbsd.org>
From: Rafal Boni <rafal@pobox.com>
List: port-sparc64
Date: 08/27/2003 21:12:32
I've retooled my U5 to be a more useful server box (which is what it has
been doing anyway), and to that aim I installed a Promise Ultra/66 IDE
controller with two Seagate 120GB IDE drives hanging off of it, which I
intended to use as a mirror set (I had to rip out the CDROM & floppy to
do this and be able to fit everything in, but those seldom got any use
anyway :-).

The configuration is this (dmesg is at end of the email for all the gory
detail):
	Original Sun 8GB root disk hanging off the on-board IDE controller
	Two 120GB seagate disks hanging off the Promise U66 card, one per
	   channel (each master on it's own channel).

Because one of the Seagates died and needed to be replaced, I've got data
on wd1, but nothing yet on wd2.  The plan is to mirror then using RAIDFrame
so when the next one dies, I don't lose any data :-<

I've set up RAIDFrame using wd2 and wd3 (standing in for what will be wd1
once I can copy data off of it to the mirror set and re-label it as a part
of the RAID set) and now am trying to copy data from wd1 to the new RAID
set (raid0).

Each time I've tried this, so far (2 or 3 times), I've gotten an odd panic
from what looks like an async data error, like so:

    data error type 32 sfsr=0 sfva=778000 afsr=84000000 afva=1fe02000458 tf=0xe0017c30
    data fault: pc=116a808 addr=778000 sfsr=0<ASI=0> 
    kernel trap 32: data access error
    Stopped at      netbsd:pdc202xx_pci_intr+0x24:  subcc     %l3, %o1, %g0

In each case, the values printed have been the same, as well.

Any ideas on how to debug this?  I'm guessing that this has something to do
with (heavier) concurrent access to both drives on the promise controller,
as I've had no problems with the single drive on the box, nor with both
drives present but only one being accessed.  Just to make sure it's not
dump or restore doing something dumb, I suppose I could dump to /dev/null
and see if that breaks it (but only after I send this email out, since
the U5 is also my mail box :-) -- unfortunately, I don't have enough room
to keep the dump on the boot/root disk, but I could try and see how far
it gets.

I rebuilt the kernel from today's CVS as I needed to add RAIDFrame anyway,
and had noticed issues with processes never making it off the run queue in
the previous kernel I was running, so figured it was worth a try...

Thanks!
--rafal

dmesg follows:

NetBSD 1.6X (FEARLESS_VAMPIRE_KILLER) #11: Wed Aug 27 14:53:21 EDT 2003
	rafal@fearless-vampire-killer.waterside.net:/extra/sparc64/obj/sys/arch/sparc64/compile/FEARLESS_VAMPIRE_KILLER
total memory = 128 MB
avail memory = 110 MB
using 832 buffers containing 6656 KB of memory
bootpath: /pci@1f,0/pci@1,1/ide@3,0/disk@0,0
mainbus0 (root): SUNW,Ultra-5_10
cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 360 MHz, version 0 FPU
cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 256K external (64 b/l)
psycho0 at mainbus0 addr 0xfffc4000
SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 2; PCI bus 0
DVMA map: c0000000 to e0000000
IOTSB: a46000 to ac6000
pci0 at psycho0
pci0: i/o space, memory space enabled
ppb0 at pci0 dev 1 function 1: Sun Microsystems, Inc. Simba PCI bridge (rev. 0x13)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
ebus0 at pci1 dev 1 function 0
ebus0: Sun Microsystems, Inc. PCIO Ebus2, revision 0x01
auxio0 at ebus0 addr 726000-726003, 728000-728003, 72a000-72a003, 72c000-72c003, 72f000-72f003
power at ebus0 addr 724000-724003 ipl 37 not configured
SUNW,pll at ebus0 addr 504000-504002 not configured
sab0 at ebus0 addr 400000-40007f ipl 43: rev 3.2
sabtty0 at sab0 port 0
sabtty1 at sab0 port 1: console i/o
com0 at ebus0 addr 3083f8-3083ff ipl 41: ns16550a, working fifo       
kbd0 at com0
com1 at ebus0 addr 3062f8-3062ff ipl 42: ns16550a, working fifo       
ms0 at com1
lpt0 at ebus0 addr 3043bc-3043cb, 30015c-30015d, 700000-70000f ipl 34 
fdthree at ebus0 addr 3023f0-3023f7, 706000-70600f, 720000-720003 ipl 39 not configured
clock0 at ebus0 addr 0-1fff: mk48t59: hostid 80d164db
flashprom at ebus0 addr 0-fffff not configured
audiocs0 at ebus0 addr 200000-2000ff, 702000-70200f, 704000-70400f, 722000-722003 ipl 35 ipl 36: CS4231A
audio0 at audiocs0: full duplex
hme0 at pci1 dev 1 function 1: Sun Happy Meal Ethernet, rev. 1
hme0: interrupting at ivec 3021    
hme0: Ethernet address 08:00:20:XX:XX:XX
nsphy0 at hme0 phy 1: DP83840 10/100 media interface, rev. 1
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pciide0 at pci1 dev 3 function 0: CMD Technology PCI0646 (rev. 0x03)  
pciide0: bus-master DMA support present
pciide0: primary channel configured to native-PCI mode
pciide0: using ivec 1820 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: <ST38410A>
wd0: drive supports 32-sector PIO transfers, LBA addressing
wd0: 8223 MB, 16708 cyl, 16 head, 63 sec, 512 bytes/sect x 16841664 sectors  
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
pciide0: secondary channel configured to native-PCI mode
pciide0: disabling secondary channel (no drives)
ppb1 at pci0 dev 1 function 0: Sun Microsystems, Inc. Simba PCI bridge (rev. 0x13)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
pciide1 at pci2 dev 1 function 0: Promise Ultra66/ATA Bus Master IDE Accelerator (rev. 0x01)
pciide1: bus-master DMA support present
pciide1: primary channel configured to native-PCI mode
pciide1: using ivec 10 for native-PCI interrupt
wd1 at pciide1 channel 0 drive 0: <ST3120026A>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors 
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data
transfers)
pciide1: secondary channel configured to native-PCI mode
wd2 at pciide1 channel 1 drive 0: <ST3120026A>
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd2(pciide1:1:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data
transfers)
wm0 at pci2 dev 2 function 0: Intel i82544EI 1000BASE-T Ethernet, rev. 2
wm0: interrupting at ivec 14
wm0: Ethernet address 00:02:b3:YY:YY:YY
makphy0 at wm0 phy 1: Marvell 88E1000 Gigabit PHY, rev. 0
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
uhci0 at pci2 dev 3 function 0: VIA Technologies VT83C572 USB Controller (rev. 0x50)
uhci0: interrupting at ivec 18
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: VIA Technologies UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci2 dev 3 function 1: VIA Technologies VT83C572 USB Controller (rev. 0x50)
uhci1: interrupting at ivec 19
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: VIA Technologies UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
ehci0 at pci2 dev 3 function 2: VIA Technologies VT8237 EHCI USB Controller (rev. 0x51)
ehci0: EHCI version 0.95
ehci0: companion controllers, 2 ports each: uhci0 uhci1
usb2 at ehci0: USB revision 2.0
uhub2 at usb2
uhub2: VIA Technologie EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 4 ports with 4 removable, self powered
pcons at mainbus0 not configured   
No counter-timer -- using %tick at 360MHz as system clock.
Kernelized RAIDframe activated     
IPsec: Initialized Security Association Processing.
ehci0: handing over low speed device on port 1 to uhci0
uhub2: port 1, device disappeared after reset
ehci0: handing over full speed device on port 2 to uhci0
uhub2: port 2, device disappeared after reset
root on wd0a dumps on wd0b
[...]

(Yes, I know, it hardly looks like a U5 anymore... I should probably use a
 cheap PC for this instead, but the only PC I've got that's more powerful
 is a big honking machine that sounds like a 747, and I hate having it on
 all the time :-)

----
Rafal Boni                                                     rafal@pobox.com
We are all worms.  But I do believe I am a glowworm.  -- Winston Churchill