NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes



Hi,

On Tue, Sep 1, 2015 at 12:40 PM,  <jdbaker%mylinuxisp.com@localhost> wrote:
>>Number:         50186
>>Category:       kern
>>Synopsis:       sparc memfault panic after 7.99.21 ARP changes
>>Confidential:   no
>>Severity:       critical
>>Priority:       high
>>Responsible:    kern-bug-people
>>State:          open
>>Class:          sw-bug
>>Submitter-Id:   net
>>Arrival-Date:   Tue Sep 01 03:40:00 +0000 2015
>>Originator:     John D. Baker
>>Release:        NetBSD/sparc-7.99.21
>>Organization:
>>Environment:
> NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (JEAN) #0: Mon Aug 31 20:21:50 CDT 2015  sysop%skuld.technoskunk.fur@localhost:/d0/build/current/obj/sparc/sys/arch/sparc/compile/JEAN sparc
>
> NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (GENERIC) #19: Mon Aug 31 20:03:50 CDT 2015  sysop%skuld.technoskunk.fur@localhost:/d0/build/current/obj/sparc/sys/arch/sparc/compile/GENERIC sparc
>
>>Description:
> Following the changes to ARP cache handling beginning with the
> following commit:
>
>   http://mail-index.netbsd.org/source-changes/2015/08/31/msg068612.html
>
> sparc platform will panic after an indeterminate time (probably when
> about to expire an ARP entry) as follows:
>
> From custom kernel JEAN:
>
> cpu0: data fault: pc=0xf008350c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
> panic: kernel fault
> Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
> o7, %g0, %g1
> db> bt
> cpu_Debugger(0xf03a4758, 0xf99efd20, 0xf0432400, 0xf04331a8, 0xf0433000, 0x104) a
> t netbsd:panic+0x20
> panic(0xf03a4758, 0x0, 0xf008350c, 0x10, 0xf99efd40, 0xf040cc00) at netbsd:mem_a
> ccess_fault4m+0x5a4
> mem_access_fault4m(0x9, 0x326, 0x10, 0xf99efde0, 0xf0409ff0, 0xf0a0d540) at netb
> sd:memfault_sun4m+0xe8
> memfault_sun4m(0xf0b366ac, 0x1, 0x0, 0xf041e318, 0xf0a0d544, 0xf0a0d544) at netb
> sd:arptimer+0x6c
> arptimer(0xf0b36600, 0xf0a0d540, 0xf0b39008, 0x0, 0xf0b366ac, 0xf0437800) at net
> bsd:callout_softclock+0x154
> callout_softclock(0xf041e31c, 0x1000000, 0x10000, 0xf041e318, 0xf0b36600, 0xf008
> 3478) at netbsd:softint_thread+0x94
> softint_thread(0xf0a0d540, 0x3000, 0x2000, 0x0, 0x0, 0xf99e8218) at netbsd:lwp_t
> rampoline+0x8
> db>
>
>
> From GENERIC:
>
> cpu0: data fault: pc=0xf00a626c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
> panic: kernel fault
> Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
> o7, %g0, %g1
> db> bt
> cpu_Debugger(0xf03efb58, 0xf9ac7d20, 0xf0482c00, 0xf0483a58, 0xf0483800, 0x104) a
> t netbsd:panic+0x20
> panic(0xf03efb58, 0x0, 0xf00a626c, 0x10, 0xf9ac7d40, 0xf045c800) at netbsd:mem_a
> ccess_fault4m+0x5b0
> mem_access_fault4m(0x9, 0x326, 0x10, 0xf9ac7de0, 0xf0459b20, 0xf0a60540) at netb
> sd:memfault_sun4m+0xe8
> memfault_sun4m(0xf0b8852c, 0x1, 0x0, 0xf04712a0, 0xf0a60544, 0xf0a60544) at netb
> sd:arptimer+0x6c
> arptimer(0xf0b88480, 0xf0a60540, 0xf0b8c808, 0x0, 0xf0b8852c, 0xf0488800) at net
> bsd:callout_softclock+0x154
> callout_softclock(0xf04712a4, 0x1000000, 0x10000, 0xf04712a0, 0xf0b88480, 0xf00a
> 61d8) at netbsd:softint_thread+0x94
> softint_thread(0xf0a60540, 0x3000, 0x2000, 0x0, 0x0, 0xf9ac0218) at netbsd:lwp_t
> rampoline+0x8
> db>
>
> Machine is SPARCstation 5, 110Mhz, 256MB RAM.  Operating diskless.
> (NetBSD-7.0_RC3 on local disk)
>
> I hope to confirm this observation on another system, but it is
> engaged in another task at this time.
>>How-To-Repeat:
> Build sparc release from 201509010100 or later and boot GENERIC.
>>Fix:
>

I investigated where it happens:

----
$ ~/git/netbsd-src/work.tools/sparc--netbsdelf/bin/nm -n
work.sparc/sys/arch/sparc/compile/GENERIC/netbsd |grep arptimer
f00a61d8 t arptimer
$ ruby -e 'puts (0xf00a61d8 + 0x6c).to_s(16)'
f00a6244
$ ~/git/netbsd-src/work.tools/sparc--netbsdelf/bin/objdump -d -S
work.sparc/sys/arch/sparc/compile/GENERIC/netbsd.gdb |grep -10
f00a6244
        ifp = lle->lle_tbl->llt_ifp;
f00a6234:       c2 06 20 40     ld  [ %i0 + 0x40 ], %g1

        callout_stop(&lle->la_timer);
f00a6238:       90 10 00 1b     mov  %i3, %o0
f00a623c:       40 03 34 68     call  f01733dc <callout_stop>
f00a6240:       f4 00 60 10     ld  [ %g1 + 0x10 ], %i2

        /* XXX: LOR avoidance. We still have ref on lle. */
        LLE_WUNLOCK(lle);
f00a6244:       40 02 f7 63     call  f0163fd0 <rw_exit>
f00a6248:       90 10 00 1c     mov  %i4, %o0
/*
 * Free an arp entry.
 */
static void arptfree(struct llentry *la)
{
        struct rtentry *rt = la->la_rt;
f00a624c:       f6 06 20 b0     ld  [ %i0 + 0xb0 ], %i3

        KASSERT(rt != NULL);
----

Hmm, the place calling rw_exit? Or just before/after it?
I'm not familiar with sparc so I may be wrong on the
investigation.

Thanks,
  ozaki-r


Home | Main Index | Thread Index | Old Index