Re: toolchain/53185: axe(4) on evbarm cause panic, possibly compiler-related

To: toolchain-manager%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,bouyer%antioche.eu.org@localhost
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly compiler-related
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Mon, 16 Apr 2018 16:15:01 +0000 (UTC)

The following reply was made to PR toolchain/53185; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
 compiler-related
Date: Mon, 16 Apr 2018 18:12:18 +0200

 On Sun, Apr 15, 2018 at 04:35:01PM +0000, bouyer%antioche.eu.org@localhost wrote:
 > System: NetBSD chartplotter 8.99.14 NetBSD 8.99.14 (CHARTPLOTTER) #43: Fri Apr 13 19:38:31 CEST 2018 bouyer%bip.soc.lip6.fr@localhost:/dsk/l1/misc/bouyer/tmp/evbarm-earmhf/obj/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/evbarm/compile/CHARTPLOTTER evbarm
 > Architecture: earmv7hf
 > Machine: evbarm
 > >Description:
 > 	I have a axe(4) device connected to a allwinner A20-based board (olimex
 > 	lime2). CHARTPLOTTER is derived from sunxi:
 > axe0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
 >         ec_capabilities=1<VLAN_MTU>
 >         ec_enabled=0
 >         address: 38:c9:86:f1:6b:4d
 >         media: Ethernet autoselect (100baseTX full-duplex)
 >         status: active
 >         inet6 fe80::362e:ebb2:14b5:fb93%axe0/64 flags 0x0 scopeid 0x2
 >         inet6 2001:41d0:fe9d:1100:5545:9079:39fa:e695/64 flags 0x0
 >         inet 10.0.0.3/24 broadcast 255.255.255.0 flags 0x0
 > 
 > 	Since upgrading to a 8.99.14 kernel (from 8.99.12), I get kernel
 > 	panic when starting large transfers from remote to the axe(4)
 > 	(like a scp, or pkg_add):
 > [      256.958847916] data_abort_handler: data_aborts fsr=0x1 far=0x9cda25ee
 > [      256.958847916] Fatal kernel mode data abort: 'Alignment Fault 1'
 > [      256.958847916] trapframe: 0x99e37df0
 > [      256.958847916] FSR=00000001, FAR=9cda25ee, spsr=20070113
 > [      256.958847916] r0 =00000000, r1 =f0082000, r2 =00000004, r3 =00000004
 > [      256.958847916] r4 =915eff00, r5 =0000023e, r6 =00001002, r7 =910af808
 > [      256.958847916] r8 =9cda25ee, r9 =0000.824078999] uhid1 at uhidev2 reportid 3: input=2, output=0, feat6ec, ssp=99e37e40, slr=8000bf50, pc =80095134
 > 
 > 	lite network operation (such as an interactive ssh session) doens't
 > 	cause this. I never seen this with 8.99.12 or earlier, although
 > 	I used it the same way (especially, copying kernels to the
 > 	local sd card while testing new sunxi drivers).
 > 
 > 	0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
 > 	More specifically:
 > 0x80095124 <+272>:   ldr     r5, [r11, #-52] ; 0xffffffcc
 > 0x80095128 <+276>:   b       0x80095274 <axe_rxeof+608>
 > 0x8009512c <+280>:   cmp     r5, #3
 > 0x80095130 <+284>:   bls     0x80095350 <axe_rxeof+828>
 > 0x80095134 <+288>:   ldr     r3, [r8], #4
 > 0x80095138 <+292>:   movw    r0, #2047       ; 0x7ff
 > 0x8009513c <+296>:   sub     r2, r5, #4
 > 0x80095140 <+300>:   str     r3, [r11, #-48] ; 0xffffffd0
 > 
 > 	that would be the ldr which cause the trap, so this would be
 > 	c->axe_buf which is misaligned, confirmed by the r8 value.
 > 
 > 	I'm not sure how this could happen yet, but reading the sources,
 > 	it looks like at line 1323:
 >                         buf += sizeof(csum_hdr);
 > 	we're mis-aligning buf, as axe_csum_hdr is either 3 or 5 uint16_t.
 > 
 > 	another thing that changed is the compiler. I wonder if the compiler
 > 	could be optimising the memcpy() call the wrong way here,
 > 	assuming buf is always aligned.
 
 The compiler may be right after all.
 from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
 "When compiling for a ARMv6 or ARMv7-A/R processor, the ARM Compiler will assume that it can use unaligned accesses"
 
 And
 "Further, unaligned accesses are only allowed to regions marked as Normal memory type, and unaligned access support must be enabled by setting the SCTLR.A bit in the system control coprocessor. Attempts to perform unaligned accesses when not allowed will cause an alignment fault (data abort)."
 
 Are we setting the SCTLR.A bit ? Also in kernel mode ?
 
 If not, should the kernel be compiled with -mno-unaligned-access ?
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

Follow-Ups:
- Re: toolchain/53185: axe(4) on evbarm cause panic, possibly compiler-related
  - From: Manuel Bouyer

Prev by Date: kern/53189: reproducable kernel assertion
Next by Date: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly compiler-related
Previous by Thread: kern/53189: reproducable kernel assertion
Next by Thread: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly compiler-related
Indexes:

Home | Main Index | Thread Index | Old Index