Subject: U10 stability problems
To: None <port-sparc64@netbsd.org>
From: Tomi Nylund <wizard@in.finland.invalid>
List: port-sparc64
Date: 02/04/2002 12:13:50
Hi all,

I was recently able to acquire an U10 with a broken hme0
interface. The problem manifests itself as "no carrier on
hme0 -link down or cable problem?" so the mb circuitry is
probably in good condition, probably cold solder or something.

I added a 3com 3c905b NIC into a pci slot and installed
a -current snapshot (1.5 ZA) into it. After installation,
I left it to run overnight pinging another machine with
ping -i 60 -s 2500, to see if it stays up. Well, it didn't.

Below is my dmesg & kgdb output:


console is /pci@1f,0/pci@1,1/ebus@1/se@14,400000:a
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.5ZA (GENERIC) #0: Sun Dec 23 03:12:29 PST 2001
    chs@ultra2:/build/obj/build/src/sys/arch/sparc64/compile/GENERIC
total memory = 512 MB
avail memory = 465 MB
using 3289 buffers containing 26312 KB of memory
bootpath: /pci@1f,0/pci@1,1/ide@3,0/disk@0,0
mainbus0 (root): SUNW,Ultra-5_10
cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 440 MHz, version 0 FPU
cpu0: physical 4K instruction (32 b/l), 4K data (32 b/l), 2048K external (64 b/l
) 
psycho0 at mainbus0 addr 0xfffc4000
SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 2; PCI bus 0
DVMA map: c0000000 to e0000000
pci0 at psycho0
pci0: i/o space, memory space enabled
ppb0 at pci0 dev 1 function 0: Sun Microsystems Simba PCI bridge (rev. 0x13)
pci1 at ppb0 bus 2
pci1: i/o space, memory space enabled
ex0 at pci1 dev 1 function 0: 3Com 3c905-TX 10/100 Ethernet (rev. 0x0)
ex0: interrupting at ivec 10
ex0: MAC address 00:60:97:ad:4d:c6
nsphy0 at ex0 phy 24: DP83840 10/100 media interface, rev. 1
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ppb1 at pci0 dev 1 function 1: Sun Microsystems Simba PCI bridge (rev. 0x13)
pci2 at ppb1 bus 1
pci2: i/o space, memory space enabled
ebus0 at pci2 dev 1 function 0
ebus0: Sun Microsystems PCIO Ebus2, revision 0x01
auxio0 at ebus0 addr 726000-726003, 728000-728003, 72a000-72a003, 72c000-72c003,
 72f000-72f003
power at ebus0 addr 724000-724003 ipl 37 not configured
SUNW,pll at ebus0 addr 504000-504002 not configured
se at ebus0 addr 400000-40007f ipl 43 not configured
com0 at ebus0 addr 3083f8-3083ff ipl 41: ns16550a, working fifo
kbd0 at com0
com1 at ebus0 addr 3062f8-3062ff ipl 42: ns16550a, working fifo
ms0 at com1
lpt0 at ebus0 addr 3043bc-3043cb, 30015c-30015d, 700000-70000f ipl 34
fdthree at ebus0 addr 3023f0-3023f7, 706000-70600f, 720000-720003 ipl 39 not con
figured
clock0 at ebus0 addr 0-1fff: mk48t59: hostid 80b580a8
flashprom at ebus0 addr 0-fffff not configured
SUNW,CS4231 at ebus0 addr 200000-2000ff, 702000-70200f, 704000-70400f, 722000-72
2003 ipl 35 ipl 36 not configured
hme0 at pci2 dev 1 function 1: Sun Happy Meal Ethernet, rev. 1
hme0: interrupting at ivec 3021
hme0: Ethernet address 08:00:20:b5:80:a8
nsphy1 at hme0 phy 1: DP83840 10/100 media interface, rev. 1
nsphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ATI Technologies Mach64 B (VGA display, revision 0x5c) at pci2 dev 2 function 0 
not configured
pciide0 at pci2 dev 3 function 0: CMD Technology PCI0646 (rev. 0x03)
pciide0: bus-master DMA support present
pciide0: primary channel configured to native-PCI mode
pciide0: using ivec 1820 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: <ST39140A>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 8693 MB, 17662 cyl, 16 head, 63 sec, 512 bytes/sect x 17803440 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
pciide0: secondary channel configured to native-PCI mode
atapibus0 at pciide0 channel 1: 2 targets
cd0 at atapibus0 drive 0: <CRD-8322B, 1998/09/24, 1.05> type 5 cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
pcons0 at mainbus0
No counter-timer -- using %tick at 440MHz as system clock.
Kernelized RAIDframe activated
root on wd0a dumps on wd0b
root file system type: ffs

Sometime at night, it barfed:

panic: psycho0: uncorrectable DMA error AFAR 2524608 pa ffffffffffffffff
AFSR 4100ff0020800000:
4100ff0020800000<BLK,P_DTE,P_DRD>
kdb breakpoint at 132f8f0
Stopped at      cpu_Debugger+0x4:       nop
db>


Another one today:


# ping -s 2500 -i 60 10.10.10.138
PING 10.10.10.138 (10.10.10.138): 2500 data bytes
panic: psycho0: uncorrectable DMA error AFAR 2524608
pa ffffffffffffffff AFSR 4100ff0020800000:
4100ff0020800000<BLK,P_DTE,P_DRD>
kdb breakpoint at 132f8f0
Stopped at      cpu_Debugger+0x4:       nop
db> 

Is this somehow related to the UDMA problems I saw
mentioned on the mailing list archives, or to the
broken hme0 on this mb? I can provide stack traces
etc, if necessary.


Tomi

---
PGP key at http://www.ee.oulu.fi/~wizard
---
	---------------------------------
	+          Tomi Nylund          +
	+                               +
	+  Grad. Student of Electronic  +
	+          Engineering          +
	---------------------------------
---