Subject: kern/34751: regular panics in tcp_sack_option on NetBSD/alpha 3.0_STABLE
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Eric Schnoebelen <eric@cirr.com>
List: netbsd-bugs
Date: 10/08/2006 01:50:01
>Number:         34751
>Category:       kern
>Synopsis:       panics in tcp_sack_option on NetBSD/alpha 3.0_STABLE
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 08 01:50:00 +0000 2006
>Originator:     Eric Schnoebelen
>Release:        NetBSD 3.0_STABLE
>Organization:
>Environment:
System: NetBSD milo.cirr.com 3.0_STABLE NetBSD 3.0_STABLE (Milo: based on ALPHA-$Revision: 1.202.2.3 $) #2: Wed Jul 26 08:30:51 CDT 2006 root@milo.cirr.com:/usr/src/sys/arch/alpha/compile/MILO alpha
Architecture: alpha
Machine: alpha
>Description:
'm running NetBSD/alpha on an assortment of alpha
hardware, but  mostly DS10L's.  One of them, running 3.0_STABLE
(circa 26 July 2006) is seeing the following panics on a
semi-regular basis:

	[-- eric@localhost attached -- Tue Sep 26 19:09:14 2006]
	db> bt
	cpu_Debugger() at netbsd:cpu_Debugger+0x4
	panic() at netbsd:panic+0x1f8
	trap() at netbsd:trap+0x120
	XentUna() at netbsd:XentUna+0x20
	--- unaligned access fault (from ipl 1) ---
	tcp_sack_option() at netbsd:tcp_sack_option+0x13c
	tcp_dooptions() at netbsd:tcp_dooptions+0x278
	tcp_input() at netbsd:tcp_input+0xa20
	ip_input() at netbsd:ip_input+0xb4c
	ipintr() at netbsd:ipintr+0xa0
	netintr() at netbsd:netintr+0x158
	softintr_dispatch() at netbsd:softintr_dispatch+0x160
	exception_return() at netbsd:exception_return+0x7c
	--- root of call graph ---

	[-- eric@localhost attached -- Mon Aug 21 00:39:48 2006]
	db> bt
	cpu_Debugger() at netbsd:cpu_Debugger+0x4
	panic() at netbsd:panic+0x1f8
	pool_get() at netbsd:pool_get+0x1b8
	pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x170
	pmap_lev1map_create() at netbsd:pmap_lev1map_create+0x80
	pmap_create() at netbsd:pmap_create+0xe4
	uvmspace_init() at netbsd:uvmspace_init+0xa8
	uvmspace_alloc() at netbsd:uvmspace_alloc+0x58
	uvmspace_exec() at netbsd:uvmspace_exec+0x54
	sys_execve() at netbsd:sys_execve+0x6e0
	syscall_plain() at netbsd:syscall_plain+0xc4
	XentSys() at netbsd:XentSys+0x5c
	--- syscall (59) ---
	--- user mode ---

	[-- eric@localhost attached -- Thu Aug 10 22:37:52 2006]
	db> bt
	cpu_Debugger() at netbsd:cpu_Debugger+0x4
	panic() at netbsd:panic+0x1f8
	trap() at netbsd:trap+0x120
	XentUna() at netbsd:XentUna+0x20
	--- unaligned access fault (from ipl 1) ---
	tcp_sack_option() at netbsd:tcp_sack_option+0x13c
	tcp_dooptions() at netbsd:tcp_dooptions+0x278
	tcp_input() at netbsd:tcp_input+0xa20
	ip_input() at netbsd:ip_input+0xb4c
	ipintr() at netbsd:ipintr+0xa0
	netintr() at netbsd:netintr+0x158
	softintr_dispatch() at netbsd:softintr_dispatch+0x160
	exception_return() at netbsd:exception_return+0x7c
	--- root of call graph ---

	[-- eric@localhost attached -- Thu Aug 10 14:33:47 2006]
	db> bt
	cpu_Debugger() at netbsd:cpu_Debugger+0x4
	panic() at netbsd:panic+0x1f8
	trap() at netbsd:trap+0x120
	XentUna() at netbsd:XentUna+0x20
	--- unaligned access fault (from ipl 1) ---
	tcp_sack_option() at netbsd:tcp_sack_option+0x13c
	tcp_dooptions() at netbsd:tcp_dooptions+0x278
	tcp_input() at netbsd:tcp_input+0xa20
	ip_input() at netbsd:ip_input+0xb4c
	ipintr() at netbsd:ipintr+0xa0
	netintr() at netbsd:netintr+0x158
	softintr_dispatch() at netbsd:softintr_dispatch+0x160
	exception_return() at netbsd:exception_return+0x7c
	--- root of call graph ---

	[-- eric@localhost attached -- Mon Jul 24 17:52:22 2006]
	db> bt
	cpu_Debugger() at netbsd:cpu_Debugger+0x4
	panic() at netbsd:panic+0x1f8
	trap() at netbsd:trap+0x120
	XentUna() at netbsd:XentUna+0x20
	--- unaligned access fault (from ipl 1) ---
	tcp_sack_option() at netbsd:tcp_sack_option+0x13c
	tcp_dooptions() at netbsd:tcp_dooptions+0x278
	tcp_input() at netbsd:tcp_input+0xa20
	ip_input() at netbsd:ip_input+0xb4c
	ipintr() at netbsd:ipintr+0xa0
	netintr() at netbsd:netintr+0x158
	softintr_dispatch() at netbsd:softintr_dispatch+0x160
	exception_return() at netbsd:exception_return+0x7c
	--- root of call graph ---

dmesg:

	Loaded initial symtab at 0xfffffc0000b9ccc0, strtab at 0xfffffc0000c0ec50, # entries 19369
	consinit: not using prom console
	Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
	    The NetBSD Foundation, Inc.  All rights reserved.
	Copyright (c) 1982, 1986, 1989, 1991, 1993
	    The Regents of the University of California.  All rights reserved.

	NetBSD 3.0_STABLE (Milo: based on ALPHA-$Revision: 1.202.2.3 $) #2: Wed Jul 26 08:30:51 CDT 2006
		root@milo.cirr.com:/usr/src/sys/arch/alpha/compile/MILO
	AlphaServer DS10L 617 MHz, s/n AY10605785
	8192 byte page size, 1 processor.
	total memory = 1024 MB
	(2912 KB reserved for PROM, 1021 MB used by NetBSD)
	avail memory = 993 MB
	mainbus0 (root)
	cpu0 at mainbus0: ID 0 (primary), 21264A-9
	cpu0: VAX FP support, IEEE FP support, Primary Eligible
	cpu0: Architecture extensions: 307<PAT,MVI,CIX,FIX,BWX>
	tsc0 at mainbus0: 21272 Core Logic Chipset, Cchip rev 0
	tsc0: 2 Dchips, 1 memory bus of 16 bytes
	tsc0: arrays present: 1024MB (split), 0MB, 0MB, 0MB, Dchip 0 rev 1
	tsp0 at tsc0
	pci0 at tsp0 bus 0
	pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
	sio0 at pci0 dev 7 function 0: Acer Labs M1543 PCI-ISA Bridge (rev. 0xc3)
	tlp0 at pci0 dev 9 function 0: DECchip 21143 Ethernet, pass 4.1
	tlp0: interrupting at dec 6600 irq 29
	tlp0: Ethernet address 00:10:64:30:1c:a9
	tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
	tlp1 at pci0 dev 11 function 0: DECchip 21143 Ethernet, pass 4.1
	tlp1: interrupting at dec 6600 irq 30
	tlp1: Ethernet address 00:10:64:30:1c:ab
	tlp1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
	aceride0 at pci0 dev 13 function 0
	aceride0: Acer Labs M5229 UDMA IDE Controller (rev. 0xc1)
	aceride0: bus-master DMA support present
	aceride0: primary channel wired to compatibility mode
	aceride0: primary channel interrupting at isa irq 14
	atabus0 at aceride0 channel 0
	aceride0: secondary channel wired to compatibility mode
	aceride0: secondary channel interrupting at isa irq 15
	atabus1 at aceride0 channel 1
	isa0 at sio0
	lpt0 at isa0 port 0x3bc-0x3bf irq 7
	com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
	com0: console
	com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
	pckbc0 at isa0 port 0x60-0x64
	pckbdprobe: reset error 5
	pmsprobe: reset error 5
	pcppi0 at isa0 port 0x61
	midi0 at pcppi0: PC speaker
	spkr0 at pcppi0
	isabeep0 at pcppi0
	fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
	mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
	raidattach: Asked for 8 units
	Kernelized RAIDframe activated
	IPsec: Initialized Security Association Processing.
	wd0 at atabus0 drive 0: <Maxtor 53073H4>
	wd0: drive supports 16-sector PIO transfers, LBA addressing
	wd0: 28629 MB, 58168 cyl, 16 head, 63 sec, 512 bytes/sect x 58633344 sectors
	wd0: 32-bit data port
	wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
	wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
	wd1 at atabus1 drive 0: <Maxtor 6B250R0>
	wd1: drive supports 16-sector PIO transfers, LBA48 addressing
	wd1: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
	wd1: 32-bit data port
	wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
	wd1(aceride0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
	Searching for RAID components...
	stray isa irq 14
	stray isa irq 15
	root on wd0a dumps on wd0b
	mountroot: trying nfs...
	mountroot: trying msdos...
	mountroot: trying cd9660...
	wd0: transfer error, downgrading to Ultra-DMA mode 1
	wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA)
	wd0a: DMA error reading fsbn 64 of 64-67 (wd0 bn 64; cn 0 tn 1 sn 1), retrying
	stray isa irq 14
	wd0: soft error (corrected)
	mountroot: trying lfs...
	mountroot: trying ffs...
	root file system type: ffs
	readclock: 6/9/27/0/17/44=>1159316264 (1159311193)
	init: copying out path `/sbin/init' 11
	stray isa irq 15
	wd1: transfer error, downgrading to Ultra-DMA mode 1
	wd1(aceride0:1:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA)
	wd1a: DMA error reading fsbn 16 of 16-31 (wd1 bn 16; cn 0 tn 0 sn 16), retrying
	stray isa irq 15
	stray isa irq 15
	wd1: soft error (corrected)
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	aceride0:1:0: lost interrupt
		type: ata tc_bcount: 8192 tc_skip: 0
	aceride0:1:0: bus-master DMA error: missing interrupt, status=0x21
	wd1: transfer error, downgrading to PIO mode 4
	wd1(aceride0:1:0): using PIO mode 4
	wd1f: DMA error reading fsbn 16 of 16-31 (wd1 bn 280132624; cn 312648 tn 0 sn 16), retrying
	stray isa irq 15
	stray isa irq 15
	wd1: soft error (corrected)
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	stray isa irq 15
	setclock: 6/9/27/0/59/47
	stray isa irq 15
	stray isa irq 15

>How-To-Repeat:
	Let run under networking load?
>Fix:
Simon Burge says:

This looks like it happened in netinet/tcp_sack.c at:

        for (i = 0; i < num_sack_blks; i++, lp += 2) {
                memcpy(&left, lp, sizeof(*lp));
                memcpy(&right, lp + 1, sizeof(*lp));
--->            left = ntohl(left);
                right = ntohl(right);

Disassembly of tcp_sack.o shows:

../../../../netinet/tcp_sack.c:225
 168:   a2 09 e4 43     cmplt   zero,t3,t1
../../../../netinet/tcp_sack.c:224
 16c:   8f 0c 61 44     cmovle  t2,t0,fp
../../../../netinet/tcp_sack.c:225
 170:   0e 04 ff 47     clr     s5
 174:   20 00 40 e4     beq     t1,1f8 <tcp_sack_option+0x1b8>
../../../../netinet/tcp_sack.c:228
 178:   00 00 0c a2     ldl     a0,0(s3)
../../../../netinet/tcp_sack.c:227
 17c:   04 00 2c a1     ldl     s0,4(s3)

I think that it looks like gcc is optimising the memcpy out and doing an
unaligned load directly.  We probably need some sort of qualifier on a
variable somewhere?