Subject: port-amd64/24908: amd64 spontaneously reboots under heavy disk i/o
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <blymn@baea.com.au>
List: netbsd-bugs
Date: 03/25/2004 23:26:58
>Number:         24908
>Category:       port-amd64
>Synopsis:       performing a disk write intensive job causes a reboot
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 25 12:58:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Brett Lymn (Master of the Siren)
>Release:        NetBSD 1.6ZK (March 23 2004, updated about 2200 (GMT+10.5))
>Organization:
Brett Lymn
>Environment:
	
	
System: NetBSD siren 1.6ZK NetBSD 1.6ZK (SIREN.MP) #0: Tue Mar 23 22:40:26 CST 2004 root@:/usr/src/sys/arch/amd64/compile/SIREN.MP amd64
Architecture: x86_64
Machine: amd64
>Description:
	It appears that when I attempt a task that requires a high level
of disk writes the machine spontaneously reboots.  There are no console
messages that I could see the only time I managed to be watching the console
when this happened.  I can produce this reboot reliably two ways:

1) build.sh -j 6 build causes a reboot during the cleandir phase
2) try to install mrtg one of the dependencies (freetype2 I think) causes
   the reboot during the install phase.

Attached is the dmesg of the machine.  Note that the disk used in this case
is the SATA drive connected to the 3114 silicon image sata controller.
Also, the machine is stable at lower loads (e.g. build.sh -j 2 build seems
to run fine).  The reboot occurs on both SMP and UP kernels.


NetBSD 1.6ZK (SIREN.MP) #0: Tue Mar 23 22:40:26 CST 2004
	root@:/usr/src/sys/arch/amd64/compile/SIREN.MP
total memory = 1503 MB
avail memory = 1407 MB
mainbus0 (root)
mainbus0: scanning 0x9f400 to 0x9f7f0 for MP signature
mainbus0: scanning 0x9f000 to 0x9f3f0 for MP signature
mainbus0: scanning 0xf0000 to 0xffff0 for MP signature
mainbus0: MP floating pointer found in bios at 0xff780
mainbus0: MP config table at 0xf9b80, 380 bytes long
mainbus0: Intel MP Specification (Version 1.4) (TYAN     S2885       )
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 242, 1594.20 MHz
cpu0: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu0: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu0: I-cache 64 KB 64b/line 2-way, D-cache 64 KB 64b/line 2-way
cpu0: L2 cache 1 MB 64b/line 16-way
cpu0: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu0: 16 page colors
cpu0: kstack at 0xffff80000f820000 for 20480 bytes
cpu0: idle pcb at 0xffff80000f820000, idle sp at 0xffff80000f824ff0
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: AMD Opteron(tm) Processor 242, 1594.10 MHz
cpu1: features: e7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features: e7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu1: features: e7dbfbff<FXSR,SSE,SSE2,LONG,3DNOW2,3DNOW>
cpu1: I-cache 64 KB 64b/line 2-way, D-cache 64 KB 64b/line 2-way
cpu1: L2 cache 1 MB 64b/line 16-way
cpu1: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu1: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu1: kstack at 0xffff80000f825000 for 20480 bytes
cpu1: idle pcb at 0xffff80000f825000, idle sp at 0xffff80000f829ff0
mpbios: bus 0 is type PCI   
mpbios: bus 1 is type PCI   
mpbios: bus 2 is type PCI   
mpbios: bus 3 is type PCI   
mpbios: bus 4 is type PCI   
mpbios: bus 5 is type PCI   
mpbios: bus 6 is type PCI   
mpbios: bus 7 is type ISA   
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, virtual wire mode, version 11, 24 pins
ioapic1 at mainbus0 apid 3 (I/O APIC)
ioapic1: pa 0xff4fe000, virtual wire mode, version 11, 4 pins
ioapic2 at mainbus0 apid 4 (I/O APIC)
ioapic2: pa 0xff4ff000, virtual wire mode, version 11, 4 pins
ioapic0: int0 attached to ExtINT (type 3<type=3=ExtINT> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int1 attached to isa0 irq 1 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int2 attached to isa0 irq 0 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int3 attached to isa0 irq 3 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int4 attached to isa0 irq 4 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int6 attached to isa0 irq 6 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int7 attached to isa0 irq 7 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int8 attached to isa0 irq 8 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int12 attached to isa0 irq 12 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int13 attached to isa0 irq 13 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int14 attached to isa0 irq 14 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int15 attached to isa0 irq 15 (type 0<type=0> flags 5<pol=1=Act Hi,trig=1=Edge>)
ioapic0: int19 attached to pci0 device 7 INT_D (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic0: int17 attached to pci0 device 7 INT_B (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic0: int19 attached to pci4 device 0 INT_D (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic0: int16 attached to pci6 device 0 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic0: int16 attached to pci4 device 10 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic0: int17 attached to pci4 device 10 INT_B (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic0: int17 attached to pci4 device 11 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic0: int19 attached to pci4 device 12 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic1: int3 attached to pci2 device 8 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic1: int2 attached to pci3 device 4 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic1: int0 attached to pci3 device 5 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
ioapic1: int0 attached to pci2 device 9 INT_A (type 0<type=0> flags f<pol=3=Act Lo,trig=3=Level>)
local apic: int0 attached to ExtINT (type 3<type=3=ExtINT> flags 5<pol=1=Act Hi,trig=1=Edge>)
local apic: int1 attached to NMI (type 1<type=1=NMI> flags 5<pol=1=Act Hi,trig=1=Edge>)
mainbus0: MP WARNING: 220 bytes of extended entries not examined
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
ppb0 at pci0 dev 6 function 0: Advanced Micro Devices AMD8111 I/O Hub (rev. 0x07)
pci1 at ppb0 bus 4
pci1: i/o space, memory space enabled
ohci0 at pci1 dev 0 function 0: Advanced Micro Devices AMD8111 USB Host Controller (rev. 0x0b)
ohci0: interrupting at ioapic0 pin 19 (irq 9)
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Advanced Micro OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1 at pci1 dev 0 function 1: Advanced Micro Devices AMD8111 USB Host Controller (rev. 0x0b)
ohci1: interrupting at ioapic0 pin 19 (irq 9)
ohci1: OHCI version 1.0, legacy support
usb1 at ohci1: USB revision 1.0
uhub1 at usb1
uhub1: Advanced Micro OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
Ricoh 5C476 PCI-CardBus bridge (CardBus bridge, revision 0x80) at pci1 dev 10 function 0 not configured
Ricoh 5C476 PCI-CardBus bridge (CardBus bridge, revision 0x80) at pci1 dev 10 function 1 not configured
satalink0 at pci1 dev 11 function 0
satalink0: Silicon Image SATALink 3114 (rev. 0x02)
satalink0: 33MHz PCI bus
satalink0: bus-master DMA support present
satalink0: using ioapic0 pin 17 (irq 10) for native-PCI interrupt
atabus0 at satalink0 channel 0
atabus1 at satalink0 channel 1
atabus2 at satalink0 channel 2
atabus3 at satalink0 channel 3
Texas Instruments TSB43AA22/A OHCI IEEE 1394 Host Controller (Firewire serial bus, interface 0x10) at pci1 dev 12 function 0 not configured
pcib0 at pci0 dev 7 function 0
pcib0: Advanced Micro Devices AMD8111 LPC Controller (rev. 0x05)
viaide0 at pci0 dev 7 function 1
viaide0: Advanced Micro Devices AMD8111 IDE Controller (rev. 0x03)
viaide0: bus-master DMA support present
viaide0: primary channel configured to compatibility mode
viaide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus4 at viaide0 channel 0
viaide0: secondary channel configured to compatibility mode
viaide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus5 at viaide0 channel 1
Advanced Micro Devices AMD8111 SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 7 function 2 not configured
Advanced Micro Devices AMD8111 ACPI Controller (miscellaneous bridge, revision 0x05) at pci0 dev 7 function 3 not configured
auich0 at pci0 dev 7 function 5: AMD8111 AC-97 Audio
auich0: interrupting at ioapic0 pin 17 (irq 10)
auich0: ac97: Analog Devices AD1981B codec; headphone, 20 bit DAC, no 3D stereo
auich0: ac97: ext id 605<AC97_22,AMAP,SPDIF,VRA>
ppb1 at pci0 dev 10 function 0: Advanced Micro Devices PCI-X Tunnel (rev. 0x12)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
intel NIC match hack
ppb2 at pci2 dev 7 function 0: Intel S21152BA,S21154AE/BE PCI to PCI Bridge (rev. 0x00)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled
fxp0 at pci3 dev 4 function 0: i82550 Ethernet, rev 13
fxp0: interrupting at ioapic1 pin 2 (irq 5)
fxp0: Ethernet address 00:02:b3:a6:16:a3
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1 at pci3 dev 5 function 0: i82550 Ethernet, rev 13
fxp1: interrupting at ioapic1 pin 0 (irq 9)
fxp1: Ethernet address 00:02:b3:a6:16:a4
inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ahc0 at pci2 dev 8 function 0: Adaptec 29160 Ultra160 SCSI adapter
ahc0: interrupting at ioapic1 pin 3 (irq 9)
ahc0: aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
bge0 at pci2 dev 9 function 0: Broadcom BCM5703X Gigabit Ethernet
bge0: interrupting at ioapic1 pin 0 (irq 11)
bge0: ASIC BCM5703 A2 (0x1002), Ethernet address 00:e0:81:27:a1:e6
brgphy0 at bge0 phy 1: BCM5703 1000BASE-T media interface, rev. 2
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
aapic0 at pci0 dev 10 function 1: Advanced Micro Devices IO Apic (rev. 0x01)
ppb3 at pci0 dev 11 function 0: Advanced Micro Devices PCI-X Tunnel (rev. 0x12)
pci4 at ppb3 bus 1
pci4: memory space enabled
aapic1 at pci0 dev 11 function 1: Advanced Micro Devices IO Apic (rev. 0x01)
pchb0 at pci0 dev 24 function 0
pchb0: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb1 at pci0 dev 24 function 1
pchb1: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb2 at pci0 dev 24 function 2
pchb2: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb3 at pci0 dev 24 function 3
pchb3: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
pchb4 at pci0 dev 25 function 0
pchb4: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb5 at pci0 dev 25 function 1
pchb5: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb6 at pci0 dev 25 function 2
pchb6: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb7 at pci0 dev 25 function 3
pchb7: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
lm0 at isa0 port 0x290-0x297: W83627HF
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
pci5 at mainbus0 bus 5
pci5: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb8 at pci5 dev 0 function 0
pchb8: Advanced Micro Devices AMD8151 AGP Bridge (rev. 0x13)
ppb4 at pci5 dev 1 function 0: Advanced Micro Devices product 0x7455 (rev. 0x13)
pci6 at ppb4 bus 6
pci6: i/o space, memory space enabled
vga0 at pci6 dev 0 function 0: ATI Technologies product 0x4e4a (rev. 0x00)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
wsdisplay0: screen 1-3 added (80x25, vt100 emulation)
ATI Technologies product 0x4e6a (miscellaneous display) at pci6 dev 0 function 1 not configured
cpu0: prelint0 700<vector=0,delmode=7,dest=0> 0<target=0>
cpu0: prelint1 400<vector=0,delmode=4,dest=0> 0<target=0>
cpu0: timer0 300c0<vector=c0,delmode=0,masked,dest=0> 0<target=0>
cpu0: pcint0 10000<vector=0,delmode=0,masked,dest=0> 0<target=0>
cpu0: lint0 10700<vector=0,delmode=7,masked,dest=0> 0<target=0>
cpu0: lint1 400<vector=0,delmode=4,dest=0> 0<target=0>
cpu0: err0 1000f<vector=f,delmode=0,masked,dest=0> 0<target=0>
ioapic2: enabling
ioapic1: enabling
ioapic1: int0 a171<vector=71,delmode=1,actlo,level,dest=0> 0<target=0>
ioapic1: int2 a170<vector=70,delmode=1,actlo,level,dest=0> 0<target=0>
ioapic1: int3 a164<vector=64,delmode=1,actlo,level,dest=0> 0<target=0>
ioapic0: enabling
ioapic0: int1 191<vector=91,delmode=1,dest=0> 0<target=0>
ioapic0: int3 1d1<vector=d1,delmode=1,dest=0> 0<target=0>
ioapic0: int4 1d0<vector=d0,delmode=1,dest=0> 0<target=0>
ioapic0: int6 165<vector=65,delmode=1,dest=0> 0<target=0>
ioapic0: int7 190<vector=90,delmode=1,dest=0> 0<target=0>
ioapic0: int12 192<vector=92,delmode=1,dest=0> 0<target=0>
ioapic0: int14 162<vector=62,delmode=1,dest=0> 0<target=0>
ioapic0: int15 163<vector=63,delmode=1,dest=0> 0<target=0>
ioapic0: int17 a161<vector=61,delmode=1,actlo,level,dest=0> 0<target=0>
ioapic0: int19 a160<vector=60,delmode=1,actlo,level,dest=0> 0<target=0>
auich0: measured ac97 link rate at 47912 Hz, will use 48000 Hz
audio0 at auich0: full duplex, mmap, independent
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
satalink0: port 0: device present, speed: 1.5Gb/s
wd0 at atabus0 drive 0: <ST3120026AS>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
scsibus0: waiting 2 seconds for devices to settle...
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(satalink0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA data transfers)
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST19171W, 2224> disk fixed
sd0: drive offline
sd0: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST19171W, 2219> disk fixed
sd1: drive offline
sd1: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
wd1 at atabus4 drive 0: <IBM-DTLA-307045>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 43979 MB, 89355 cyl, 16 head, 63 sec, 512 bytes/sect x 90069840 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
atapibus0 at atabus5: 2 targets
cd0 at scsibus0 target 6 lun 0: <PLEXTOR, CD-R   PX-R820T, 1.06> cdrom removable
cd1 at atapibus0 drive 0: <Pioneer DVD-ROM ATAPIModel DVD-121  010, , E1.02> cdrom removable
cd1: 32-bit data port
cd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
cd1(viaide0:1:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)
cd0: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
cpu1: prelint0 10000<vector=0,delmode=0,masked,dest=0> 0<target=0>
cpu1: prelint1 10000<vector=0,delmode=0,masked,dest=0> 0<target=0>
cpu1: timer0 200c0<vector=c0,delmode=0,dest=0> 0<target=0>
cpu1: pcint0 10000<vector=0,delmode=0,masked,dest=0> 0<target=0>
cpu1: lint0 10700<vector=0,delmode=7,masked,dest=0> 0<target=0>
cpu1: lint1 400<vector=0,delmode=4,dest=0> 0<target=0>
cpu1: err0 10000<vector=0,delmode=0,masked,dest=0> 0<target=0>
cpu1: CPU 1 running
IP Filter: v3.4.29 initialized.  Default = pass all, Logging = enabled
wsdisplay0: screen 4 added (80x25, vt100 emulation)

>How-To-Repeat:
	Try a build.sh with a large number of parallel jobs on a src tree
that has previously been built.  Or try installing mrtg.

>Fix:
	Only work around I have is to gentle the machine, back down the
number of parallel make jobs and not install mrtg.

>Release-Note:
>Audit-Trail:
>Unformatted: