Subject: 2 x hang with RC4
To: None <current-users@netbsd.org>
From: Paul Ripke <stix@stix.homeunix.net>
List: current-users
Date: 11/02/2004 19:05:58
--bp/iNruPH9dso1Pn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Yesterday I experienced one very bizarre hang with 2.0 RC4. This is x86, P4
with HT+ACPI, running quite stable for some time. At the time, it was running
3 NetBSD builds, and bacula full backups (using postgresql).  I arrived home
to find the console (VGA) dead to the world (not even ddb), the TX light hard
on on the fxp0 ethernet, and the AIT SCSI drive busy light madly flashing (with
no tape movement).

Plugging an ancient laptop via x-over captured the following packets
continually being transmitted:

20:25:33.628113 0:50:8b:5d:d3:37 1:80:c2:0:0:1 8808 60: 
                         0001 011f 0000 0000 0000 0000 0000 0000
                         0000 0000 0000 0000 0000 0000 0000 0000
                         0000 0000 0000 0000 0000 0000 0000
20:25:33.629368 0:50:8b:5d:d3:37 1:80:c2:0:0:1 8808 60: 
                         0001 011f 0000 0000 0000 0000 0000 0000
                         0000 0000 0000 0000 0000 0000 0000 0000
                         0000 0000 0000 0000 0000 0000 0000
20:25:33.630683 0:50:8b:5d:d3:37 1:80:c2:0:0:1 8808 60: 
                         0001 011f 0000 0000 0000 0000 0000 0000
                         0000 0000 0000 0000 0000 0000 0000 0000
                         0000 0000 0000 0000 0000 0000 0000

Source MAC address is right, no idea on the dest, or the packets. Power
cycled devices on scsi bus, no change.

Anyone ever seen anything vaguely like this before?

Next, hit the reset button. Due to what I believe is a BIOS bug, next 2
boots failed to enable the second processor, so I booted generic just to
make sure things looked ok, and build a new kernel.  This is where I
tripped over the second problem. System hung on reboot after
"syncing disks... done". ddb trace was:

main...
uvm_scheduler(c07c9580,0,c07d60fc,c06cd258,0) at netbsd:uvm_scheduler+0x74
ltsleep(c07c9740,4,c06e8427,0,0) at netbsd:ltsleep+0x323
...

which I assume is the normal idle loop, and not that useful...

I noticed that the reboot process was sitting in the vnlock waitchannel,
but since I haven't set up partitions for "dumps on" yet (running raidframe
RAID1 root), I couldn't get a crash dump. I'll grab one next time. I'll
also print out ddb(4) so I have something to refer to :)
On reboot, the system did a full fsck of all filesystems, and went through
the normal 3 hour raidframe parity scan. So, if this happens again,
apart from grabbing a dump, what else can I look at?

cheers,
-- 
stix

--bp/iNruPH9dso1Pn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="dmesg.boot"

NetBSD 2.0_RC4 (ZION) #0: Tue Nov  2 07:44:13 EST 2004
	stix@zion.stix.org.au:/export/netbsd/netbsd-2-0/obj.i386/export/netbsd/netbsd-2-0/src/sys/arch/i386/compile/ZION
total memory = 1023 MB
avail memory = 997 MB
BIOS32 rev. 0 found at 0xf0010
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf25
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: Intel Pentium 4 (686-class), 2798.66 MHz, id 0xf25
cpu1: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu1: L2 cache 512 KB 64B/line 8-way
cpu1: ITLB 4K/4M: 64 entries
cpu1: DTLB 4K/4M: 64 entries
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
ioapic0: misconfigured as apic 13
ioapic0: remapped to apic 2
acpi0 at mainbus0
acpi0: using Intel ACPI CA subsystem version 20040211
acpi0: X/RSDT: OemId <A M I ,OEMRSDT ,02000425>, AslId <MSFT,00000097>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0100 [AT Timer] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
pckbc0 at acpi0 (PNP0303): kbd port
pckbc0: io 0x60,0x64 irq 1
PNP0800 [AT-style speaker sound] at acpi0 not configured
npx0 at acpi0 (PNP0C04)
npx0: io 0xf0-0xff irq 13
npx0: using exception 16
com0 at acpi0 (PNP0501-1)
com0: io 0x3f8-0x3ff irq 4
com0: ns16550a, working fifo
com1 at acpi0 (PNP0501-2)
com1: io 0x2f8-0x2ff irq 3
com1: ns16550a, working fifo
fdc0 at acpi0 (PNP0700)
fdc0: io 0x3f0-0x3f5,0x3f7 irq 6 drq 2
fdc0: expected BUFFER, got 4
lpt0 at acpi0 (PNP0401-1)
lpt0: io 0x378-0x37f,0x778-0x77b irq 7 drq 3
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
acpibut0 at acpi0 (PNP0C0C-170): ACPI Power Button
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82865 Host (rev. 0x02)
pchb0: random number generator enabled
agp0 at pchb0: aperture at 0xf8000000, size 0x4000000
ppb0 at pci0 dev 1 function 0: Intel 82865 AGP (rev. 0x02)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga0 at pci1 dev 0 function 0: Nvidia Corporation RIVA TNT2 Model 64 (rev. 0x15)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
uhci0 at pci0 dev 29 function 0: Intel 82801EB/ER USB UHCI Controller #0 (rev. 0x02)
uhci0: interrupting at ioapic0 pin 16 (irq 10)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801EB/ER USB UHCI Controller #1 (rev. 0x02)
uhci1: interrupting at ioapic0 pin 19 (irq 5)
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801EB/ER USB UHCI Controller #2 (rev. 0x02)
uhci2: interrupting at ioapic0 pin 18 (irq 5)
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3 at pci0 dev 29 function 3: Intel 82801EB/ER USB UHCI Controller #3 (rev. 0x02)
uhci3: interrupting at ioapic0 pin 16 (irq 10)
usb3 at uhci3: USB revision 1.0
uhub3 at usb3
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller (rev. 0x02)
ehci0: interrupting at ioapic0 pin 23 (irq 11)
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
usb4 at ehci0: USB revision 2.0
uhub4 at usb4
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
ppb1 at pci0 dev 30 function 0: Intel 82801BA Hub-to-PCI Bridge (rev. 0xc2)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
fwohci0 at pci2 dev 3 function 0: VIA Technologies VT3606 OHCI IEEE 1394 Controller (rev. 0x80)
fwohci0: interrupting at ioapic0 pin 20 (irq 11)
fwohci0: OHCI 1.0, 00:e0:18:00:00:73:25:3d, 400Mb/s, 2048 max_rec, 4 ir_ctx, 8 it_ctx
skc0 at pci2 dev 5 function 0: ioapic0 pin 22 (irq 11)
skc0: Yukon Gigabit Ethernet 10/100/1000Base-T Adapter
sk0 at skc0 port A: Ethernet address 00:0e:a6:bb:ac:b4
makphy0 at sk0 phy 0: Marvell 88E1011 Gigabit PHY, rev. 5
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
fxp0 at pci2 dev 9 function 0: i82558 Ethernet, rev 5
fxp0: interrupting at ioapic0 pin 21 (irq 5)
fxp0: Ethernet address 00:50:8b:5d:d3:37
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
wi0 at pci2 dev 10 function 0: Intersil Prism2.5 Wireless Lan
wi0: interrupting at ioapic0 pin 22 (irq 11)
wi0: 802.11 address 00:05:5d:5b:c5:f5
wi0: using RF:PRISM2.5 MAC:ISL3874A(Mini-PCI)
wi0: Intersil Firmware: Primary (1.0.5), Station (1.3.4)
wi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
siop0 at pci2 dev 11 function 0: Symbios Logic 53c810a (fast scsi)
siop0: interrupting at ioapic0 pin 23 (irq 11)
scsibus0 at siop0: 8 targets, 8 luns per target
cmdide0 at pci2 dev 13 function 0
cmdide0: Silicon Image 0680 (rev. 0x02)
cmdide0: bus-master DMA support present
cmdide0: primary channel configured to native-PCI mode
cmdide0: using ioapic0 pin 21 (irq 5) for native-PCI interrupt
atabus0 at cmdide0 channel 0
cmdide0: secondary channel configured to native-PCI mode
atabus1 at cmdide0 channel 1
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801EB LPC Interface Bridge (rev. 0x02)
piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel wired to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus2 at piixide0 channel 0
piixide0: secondary channel wired to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus3 at piixide0 channel 1
Intel 82801EB/ER SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 31 function 3 not configured
auich0 at pci0 dev 31 function 5: i82801EB (ICH5) AC-97 Audio
auich0: interrupting at ioapic0 pin 17 (irq 5)
auich0: ac97: Avance Logic ALC850 codec; no 3D stereo
auich0: ac97: ext id 9c6<AC97_23,LDAC,SDAC,CDAC,SPDIF,DRA>
isa0 at pcib0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
ioapic0: enabling
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
audio0 at auich0: full duplex, independent
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
fw0 at fwohci0: 00:e0:18:00:00:73:25:3d:0a:02:ff:ff:f0:01:00:00
scsibus0: waiting 2 seconds for devices to settle...
cd0 at scsibus0 target 0 lun 0: <RICOH, CD-R/RW MP7060S, 1.70> cdrom removable
cd0: async, 8-bit transfers
st0 at scsibus0 target 1 lun 0: <SONY, SDT-5000, 3.26> tape removable
st0: drive empty
st0: async, 8-bit transfers
sd0 at scsibus0 target 2 lun 0: <SyQuest, SQ5200C, 3CE4> disk removable
sd0: drive offline
sd0: sync (200.00ns offset 8), 8-bit (5.000MB/s) transfers
cd1 at scsibus0 target 4 lun 0: <PLEXTOR, CD-ROM PX-32TS, 1.03> cdrom removable
cd1: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers
ulpt0 at uhub3 port 2 configuration 1 interface 0
ulpt0: Canon BJC-2100SP, rev 1.00/1.02, addr 2, iclass 7/1
ulpt0: using bi-directional mode
wd0 at atabus0 drive 0: <ST3120026A>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(cmdide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
wd1 at atabus1 drive 0: <ST3120026A>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(cmdide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
atapibus0 at atabus2: 2 targets
cd2 at atapibus0 drive 1: <CD-ROM 48X/AKU, , T3A> cdrom removable
cd2: 32-bit data port
cd2: drive supports PIO mode 4, DMA mode 2
wd2 at atabus2 drive 0: <ST340014A>
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 38166 MB, 77545 cyl, 16 head, 63 sec, 512 bytes/sect x 78165360 sectors
wd2: 32-bit data port
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd2(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
cd2(piixide0:0:1): using PIO mode 4, DMA mode 2 (using DMA data transfers)
wd3 at atabus3 drive 0: <ST340014A>
wd3: drive supports 16-sector PIO transfers, LBA48 addressing
wd3: 38166 MB, 77545 cyl, 16 head, 63 sec, 512 bytes/sect x 78165360 sectors
wd3: 32-bit data port
wd3: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd4 at atabus3 drive 1: <ST3120026A>
wd4: drive supports 16-sector PIO transfers, LBA48 addressing
wd4: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
wd4: 32-bit data port
wd4: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd3(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
wd4(piixide0:1:1): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
st1 at scsibus0 target 5 lun 0: <SONY, SDX-300C, 04c2> tape removable
st1: drive empty
st1: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers
sd0(siop0:0:2:0):  Check Condition on CDB: 0x00 00 00 00 00 00
    SENSE KEY:  Not Ready
     ASC/ASCQ:  Medium Not Present

raid0: RAID Level 1
raid0: Components: /dev/wd2a /dev/wd3a
raid0: Total Sectors: 78165216 (38166 MB)
raid1: RAID Level 5
raid1: Components: /dev/wd0a /dev/wd1a /dev/wd4a
raid1: Total Sectors: 468883168 (228946 MB)
sd0(siop0:0:2:0):  Check Condition on CDB: 0x00 00 00 00 00 00
    SENSE KEY:  Not Ready
     ASC/ASCQ:  Medium Not Present

boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
cpu1: CPU 1 running
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x50, vt100 emulation)
wsdisplay0: screen 3 added (80x50, vt100 emulation)
wsdisplay0: screen 5 added (80x25, vt100 emulation)
wsdisplay0: screen 6 added (80x25, vt100 emulation)
wsdisplay0: screen 7 added (80x25, vt100 emulation)

--bp/iNruPH9dso1Pn--