kern/53185: axe(4) on evbarm cause panic, possibly compiler-related

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/53185: axe(4) on evbarm cause panic, possibly compiler-related
From: bouyer%antioche.eu.org@localhost
Date: Sun, 15 Apr 2018 16:35:01 +0000 (UTC)

>Number:         53185
>Category:       kern
>Synopsis:       axe(4) on evbarm cause panic, possibly compiler-related
>Confidential:   yes
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 15 16:35:00 +0000 2018
>Originator:     Manuel Bouyer
>Release:        NetBSD 8.99.14
>Organization:
>Environment:
System: NetBSD chartplotter 8.99.14 NetBSD 8.99.14 (CHARTPLOTTER) #43: Fri Apr 13 19:38:31 CEST 2018 bouyer%bip.soc.lip6.fr@localhost:/dsk/l1/misc/bouyer/tmp/evbarm-earmhf/obj/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/evbarm/compile/CHARTPLOTTER evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
	I have a axe(4) device connected to a allwinner A20-based board (olimex
	lime2). CHARTPLOTTER is derived from sunxi:
axe0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        ec_capabilities=1<VLAN_MTU>
        ec_enabled=0
        address: 38:c9:86:f1:6b:4d
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        inet6 fe80::362e:ebb2:14b5:fb93%axe0/64 flags 0x0 scopeid 0x2
        inet6 2001:41d0:fe9d:1100:5545:9079:39fa:e695/64 flags 0x0
        inet 10.0.0.3/24 broadcast 255.255.255.0 flags 0x0

	Since upgrading to a 8.99.14 kernel (from 8.99.12), I get kernel
	panic when starting large transfers from remote to the axe(4)
	(like a scp, or pkg_add):
[      256.958847916] data_abort_handler: data_aborts fsr=0x1 far=0x9cda25ee
[      256.958847916] Fatal kernel mode data abort: 'Alignment Fault 1'
[      256.958847916] trapframe: 0x99e37df0
[      256.958847916] FSR=00000001, FAR=9cda25ee, spsr=20070113
[      256.958847916] r0 =00000000, r1 =f0082000, r2 =00000004, r3 =00000004
[      256.958847916] r4 =915eff00, r5 =0000023e, r6 =00001002, r7 =910af808
[      256.958847916] r8 =9cda25ee, r9 =0000.824078999] uhid1 at uhidev2 reportid 3: input=2, output=0, feat6ec, ssp=99e37e40, slr=8000bf50, pc =80095134

	lite network operation (such as an interactive ssh session) doens't
	cause this. I never seen this with 8.99.12 or earlier, although
	I used it the same way (especially, copying kernels to the
	local sd card while testing new sunxi drivers).

	0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
	More specifically:
0x80095124 <+272>:   ldr     r5, [r11, #-52] ; 0xffffffcc
0x80095128 <+276>:   b       0x80095274 <axe_rxeof+608>
0x8009512c <+280>:   cmp     r5, #3
0x80095130 <+284>:   bls     0x80095350 <axe_rxeof+828>
0x80095134 <+288>:   ldr     r3, [r8], #4
0x80095138 <+292>:   movw    r0, #2047       ; 0x7ff
0x8009513c <+296>:   sub     r2, r5, #4
0x80095140 <+300>:   str     r3, [r11, #-48] ; 0xffffffd0

	that would be the ldr which cause the trap, so this would be
	c->axe_buf which is misaligned, confirmed by the r8 value.

	I'm not sure how this could happen yet, but reading the sources,
	it looks like at line 1323:
                        buf += sizeof(csum_hdr);
	we're mis-aligning buf, as axe_csum_hdr is either 3 or 5 uint16_t.

	another thing that changed is the compiler. I wonder if the compiler
	could be optimising the memcpy() call the wrong way here,
	assuming buf is always aligned.
>How-To-Repeat:
	use a axe(4) on a arm CPU ?
>Fix:

Prev by Date: Re: port-evbarm/51905 (GXIO Expension boards arn't configured if GXIO_DEFAULT_EXPANSION isn't defined)
Next by Date: kern/53186: Portable run-time non-privileged determination of processor cache parameters
Previous by Thread: kern/53184: system panics on shutdown
Next by Thread: kern/53186: Portable run-time non-privileged determination of processor cache parameters
Indexes:

Home | Main Index | Thread Index | Old Index