Subject: Re: Problems with SMP kernels on P4's with hyperthreading under 2.0 (kern/22551)
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: port-i386
Date: 01/11/2005 11:01:22
	Hello Manuel.  Well, I compiled a new 2.0 kernel with options
DIAGNOSTIC, and options LOCKDEBUG, and gave it a test flight.  It died
within 20 minutes of booting with a diagnostic panic which matches
kern/22551.  Below is the relevant data I knew how to extract.  I tried to
force a crash dump, but it said it couldn't do a dump.  The trace here
looks similar to the one in the original bug report, in that it fails
somewhere in tcp_output().  This bug is different, though, in the sense
that it seems to only be triggered on multiprocessor kernels.
	This machine is a production server, meaning I can't work on it all
the time, and I have to leave it in a working state, but I'm happy to fly
test kernels on it when I'm available to watch it, and do any
debugging/fixing I can to make this problem go away.  I have no idea if
this problem is related to the hangs I was seeing under my non-diagnostic
kernel, but it seems like a good thing to 
try and fix in any event.  
	Any suggestions on what I might do to provide even more information,
since I believe I can reproduce the panic, almost at will.  Are there
different commands I should use to look at the state of the two processors,
additional alternate stacks I can look at?  Any pointers would be greately
appreciated.  If I can help squash this two-year old bug, that would be
good.
-thanks
-Brian

db{0}> bt
cpu_Debugger(c25e6100,c07c1e1c,c2615600,c25ea2f0,fffffed4) at netbsd:cpu_Debugge
r+0x4
panic(c07477a0,c06b2e58,c06c1fc4,c070b2a0,119) at netbsd:panic+0x121
__main(c06b2e58,c070b2a0,119,c06c1fc4,c25ea2e4) at netbsd:__main
callout_schedule(c25ea2f0,fffffed4,28,28,c07bf5a0) at netbsd:callout_schedule+0x
109
tcp_output(c25ea2e4,c2615600,c2615600,0,5) at netbsd:tcp_output+0x6e9
tcp_usrreq(c2543cac,9,c2615600,0,0) at netbsd:tcp_usrreq+0x2c5
sosend(c2543cac,0,cddf2ec4,c2615600,0) at netbsd:sosend+0x366
soo_write(cdb424b8,cdb424e0,cddf2ec4,c2541880,1) at netbsd:soo_write+0x1e
dofilewrite(cdaad3a8,c,cdb424b8,813f000,3d) at netbsd:dofilewrite+0x85
sys_write(cdb0a3a4,cddf2f64,cddf2f5c,4,0) at netbsd:sys_write+0x75
syscall_plain() at netbsd:syscall_plain+0x17e
--- syscall (number 4) ---
0x48240df3:
db{0}>dmesg
NetBSD 2.0 (STATS_DEBUG) #0: Mon Jan 10 16:57:08 PST 2005
        buhrow@lothlorien.nfbcal.org:/usr/src/sys/arch/i386/compile/STATS_DEBUG
total memory = 1006 MB
avail memory = 966 MB
BIOS32 rev. 0 found at 0xf0010
PCI BIOS rev. 2.1 found at 0xf0031
PCI IRQ Routing Table rev. 1.0 found at 0xf3d00, size 224 bytes (12 entries)
PCI Interrupt Router at 000:31:0 (Intel product 0x8086 compatible)
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel (686-class), 2992.86 MHz, id 0xf33
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way
cpu0: L2 cache 1 MB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu0: 32 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: Intel (686-class), 2992.71 MHz, id 0xf33
cpu1: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: I-cache 12K uOp cache 8-way
cpu1: L2 cache 1 MB 64B/line 8-way
cpu1: ITLB 4K/4M: 64 entries
cpu1: DTLB 4K/4M: 64 entries
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
acpi0 at mainbus0
acpi0: using Intel ACPI CA subsystem version 20040211
acpi0: X/RSDT: OemId <INTEL ,D865GLC ,20040412>, AslId <MSFT,00000097>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
acpi: activated PNP0C0F
acpi: activated PNP0C0F
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0100 [AT Timer] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
PNP0800 [AT-style speaker sound] at acpi0 not configured
PNP0C04 [Math Coprocessor] at acpi0 not configured
PNP0700 [PC standard floppy disk controller] at acpi0 not configured
PNP0400 [Standard LPT printer port] at acpi0 not configured
ACPI Object Type 'Power' (0x0b) at acpi0 not configured
ACPI Object Type 'Power' (0x0b) at acpi0 not configured
ACPI Object Type 'Power' (0x0b) at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
INT0800 at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
P^[^O^R [PCI interrupt link device] at acpi0 not configured
^[^[C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
acpibut0 at acpi0 (PNP0C0E-29): ACPI Sleep Button
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82865 Host (rev. 0x02)
pchb0: random number generator enabled
agp0 at pchb0: detected 16252k stolen memory
agp0: aperture at 0xf0000000, size 0x8000000
vga1 at pci0 dev 2 function 0: Intel 82865G Integrated Graphics Device (rev. 0x0
2)
wsdisplay0 at vga1 kbdmux 1
wsmux1: connecting to wsdisplay0
uhci0 at pci0 dev 29 function 0: Intel 82801EB/ER USB UHCI Controller #0 (rev. 0

x02)
uhci0: interrupting at ioapic0 pin 16 (irq 11)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801EB/ER USB UHCI Controller #1 (rev. 0

x02)
uhci1: interrupting at ioapic0 pin 19 (irq 5)
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub : Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801EB/ER USB UHCI Controller #2 (rev. 0

x02)
uhci2: interrupting at ioapic0 pin 18 (irq 10)
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3 at pci0 dev 29 function 3: Intel 82801EB/ER USB UHCI Controller #3 (rev. 0
x02)
uhci3: interrupting at ioapic0 pin 16 (irq 11)
usb3 at uhci3: USB revision 1.0
uhub3 at usb3
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
Intel 82801EB/ER USB EHCI Controller (USB serial bus, interface 0x20, revision 0

x02) at pci0 dev 29 function 7 not configured
ppb0 at pci0 dev 30 function 0: Intel 82801BA Hub-to-PCI Bridge (rev. 0xc2)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
fxp0 at pci1 dev 8 function 0: Intel PRO/100 VM Network Controller with 82562ET/

EZ PHY, rev 1
fxp0: interrupting at ioapic0 pin 20 (irq 9)
fxp0: Ethernet address 00:11:11:3f:46:56
inphy0 at fxp0 phy 1: i82562ET 10/100 media interface, rev. 0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801EB LPC Interface Bridge (rev. 0x02)
piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel wired to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at piixide0 channel 0
piixide0: secondary channel wired to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at piixide0 channel 1
piixide1 at pci0 dev 31 function 2
piixide1: Intel 82801EB Serial ATA Controller (rev. 0x02)
piixide1: bus-master DMA support present
piixide1: primary channel configured to native-PCI mode
piixide1: using ioapic0 pin 18 (irq 10) for native-PCI interrupt
atabus2 at piixide1 channel 0
piixide1: secondary channel configured to native-PCI mode
atabus3 at piixide1 channel 1
Intel 82801EB/ER SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev
3
1 function 3 not configured
auich0 at pci0 dev 31 function 5: i82801EB (ICH5) AC-97 Audio
auich0: interrupting at ioapic0 pin 17 (irq 3)
auich0: ac97: Analog Devices AD1985 codec; headphone, 20 bit DAC, no 3D stereo
auich0: ac97: ext id 3c7<AMAP,LDAC,SDAC,CDAC,SPDIF,DRA,VRA>
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
pckbc0 at isa0 port 0x60-0x64
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
spkr0 at pcppi0
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
isapnp0: no ISA Plug 'n Play devices found
ioapic0: enabling
auich0: measured ac97 link rate at 49454 Hz, will use 48000 Hz
audio0 at auich0: full duplex, mmap, independent
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <WDC WD2500JB-00GVA0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA dat
a transfers)
wd1 at atabus1 drive 0: <WDC WD2500JB-00FUA0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA dat
a transfers)
raid0: RAID Level 1
raid0: Components: /dev/wd0e /dev/wd1e
raid0: Total Sectors: 482348916 (235521 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
cpu1: CPU 1 running
raid0: Device already configured!
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
Accounting started
panic: kernel diagnostic assertion "to_ticks >= 0" failed: file "../../../../ker
n/kern_timeout.c", line 281
db{0}> sync
syncing disks... panic: kernel diagnostic assertion "to_ticks >= 0" failed: file
 "../../../../kern/kern_timeout.c", line 281
Stopped in pid 4914.1 (sendmail) at     netbsd:cpu_Debugger+0x4:
db{0}> cont

dump to dev 0,1 not possible
rebooting...

>> NetBSD/i386 BIOS Boot, Revision 2.13
>> (buhrow@lothlorien.nfbcal.org, Mon Apr 26 14:01:48 PDT 2004)
>> Memory: 639/1030400 k
Press return to boot now, any other key for boot menu
booting hd0a:netbsd - starting in 0