NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes



The following reply was made to PR kern/50186; it has been noted by GNATS.

From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/50186: sparc memfault panic after 7.99.21 ARP changes
Date: Tue, 1 Sep 2015 15:52:45 +0900

 Hi,
 
 On Tue, Sep 1, 2015 at 12:40 PM,  <jdbaker%mylinuxisp.com@localhost> wrote:
 >>Number:         50186
 >>Category:       kern
 >>Synopsis:       sparc memfault panic after 7.99.21 ARP changes
 >>Confidential:   no
 >>Severity:       critical
 >>Priority:       high
 >>Responsible:    kern-bug-people
 >>State:          open
 >>Class:          sw-bug
 >>Submitter-Id:   net
 >>Arrival-Date:   Tue Sep 01 03:40:00 +0000 2015
 >>Originator:     John D. Baker
 >>Release:        NetBSD/sparc-7.99.21
 >>Organization:
 >>Environment:
 > NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (JEAN) #0: Mon Aug 31 20:21:50 CDT 2015  sysop%skuld.technoskunk.fur@localhost:/d0/build/current/obj/sparc/sys/arch/sparc/compile/JEAN sparc
 >
 > NetBSD jean.technoskunk.fur 7.99.21 NetBSD 7.99.21 (GENERIC) #19: Mon Aug 31 20:03:50 CDT 2015  sysop%skuld.technoskunk.fur@localhost:/d0/build/current/obj/sparc/sys/arch/sparc/compile/GENERIC sparc
 >
 >>Description:
 > Following the changes to ARP cache handling beginning with the
 > following commit:
 >
 >   http://mail-index.netbsd.org/source-changes/2015/08/31/msg068612.html
 >
 > sparc platform will panic after an indeterminate time (probably when
 > about to expire an ARP entry) as follows:
 >
 > From custom kernel JEAN:
 >
 > cpu0: data fault: pc=0xf008350c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
 > panic: kernel fault
 > Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
 > o7, %g0, %g1
 > db> bt
 > cpu_Debugger(0xf03a4758, 0xf99efd20, 0xf0432400, 0xf04331a8, 0xf0433000, 0x104) a
 > t netbsd:panic+0x20
 > panic(0xf03a4758, 0x0, 0xf008350c, 0x10, 0xf99efd40, 0xf040cc00) at netbsd:mem_a
 > ccess_fault4m+0x5a4
 > mem_access_fault4m(0x9, 0x326, 0x10, 0xf99efde0, 0xf0409ff0, 0xf0a0d540) at netb
 > sd:memfault_sun4m+0xe8
 > memfault_sun4m(0xf0b366ac, 0x1, 0x0, 0xf041e318, 0xf0a0d544, 0xf0a0d544) at netb
 > sd:arptimer+0x6c
 > arptimer(0xf0b36600, 0xf0a0d540, 0xf0b39008, 0x0, 0xf0b366ac, 0xf0437800) at net
 > bsd:callout_softclock+0x154
 > callout_softclock(0xf041e31c, 0x1000000, 0x10000, 0xf041e318, 0xf0b36600, 0xf008
 > 3478) at netbsd:softint_thread+0x94
 > softint_thread(0xf0a0d540, 0x3000, 0x2000, 0x0, 0x0, 0xf99e8218) at netbsd:lwp_t
 > rampoline+0x8
 > db>
 >
 >
 > From GENERIC:
 >
 > cpu0: data fault: pc=0xf00a626c addr=0x10 sfsr=0x326<PERR=0x0,LVL=0x3,AT=0x1,FT=0x1,FAV,OW>
 > panic: kernel fault
 > Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        or              %
 > o7, %g0, %g1
 > db> bt
 > cpu_Debugger(0xf03efb58, 0xf9ac7d20, 0xf0482c00, 0xf0483a58, 0xf0483800, 0x104) a
 > t netbsd:panic+0x20
 > panic(0xf03efb58, 0x0, 0xf00a626c, 0x10, 0xf9ac7d40, 0xf045c800) at netbsd:mem_a
 > ccess_fault4m+0x5b0
 > mem_access_fault4m(0x9, 0x326, 0x10, 0xf9ac7de0, 0xf0459b20, 0xf0a60540) at netb
 > sd:memfault_sun4m+0xe8
 > memfault_sun4m(0xf0b8852c, 0x1, 0x0, 0xf04712a0, 0xf0a60544, 0xf0a60544) at netb
 > sd:arptimer+0x6c
 > arptimer(0xf0b88480, 0xf0a60540, 0xf0b8c808, 0x0, 0xf0b8852c, 0xf0488800) at net
 > bsd:callout_softclock+0x154
 > callout_softclock(0xf04712a4, 0x1000000, 0x10000, 0xf04712a0, 0xf0b88480, 0xf00a
 > 61d8) at netbsd:softint_thread+0x94
 > softint_thread(0xf0a60540, 0x3000, 0x2000, 0x0, 0x0, 0xf9ac0218) at netbsd:lwp_t
 > rampoline+0x8
 > db>
 >
 > Machine is SPARCstation 5, 110Mhz, 256MB RAM.  Operating diskless.
 > (NetBSD-7.0_RC3 on local disk)
 >
 > I hope to confirm this observation on another system, but it is
 > engaged in another task at this time.
 >>How-To-Repeat:
 > Build sparc release from 201509010100 or later and boot GENERIC.
 >>Fix:
 >
 
 I investigated where it happens:
 
 ----
 $ ~/git/netbsd-src/work.tools/sparc--netbsdelf/bin/nm -n
 work.sparc/sys/arch/sparc/compile/GENERIC/netbsd |grep arptimer
 f00a61d8 t arptimer
 $ ruby -e 'puts (0xf00a61d8 + 0x6c).to_s(16)'
 f00a6244
 $ ~/git/netbsd-src/work.tools/sparc--netbsdelf/bin/objdump -d -S
 work.sparc/sys/arch/sparc/compile/GENERIC/netbsd.gdb |grep -10
 f00a6244
         ifp = lle->lle_tbl->llt_ifp;
 f00a6234:       c2 06 20 40     ld  [ %i0 + 0x40 ], %g1
 
         callout_stop(&lle->la_timer);
 f00a6238:       90 10 00 1b     mov  %i3, %o0
 f00a623c:       40 03 34 68     call  f01733dc <callout_stop>
 f00a6240:       f4 00 60 10     ld  [ %g1 + 0x10 ], %i2
 
         /* XXX: LOR avoidance. We still have ref on lle. */
         LLE_WUNLOCK(lle);
 f00a6244:       40 02 f7 63     call  f0163fd0 <rw_exit>
 f00a6248:       90 10 00 1c     mov  %i4, %o0
 /*
  * Free an arp entry.
  */
 static void arptfree(struct llentry *la)
 {
         struct rtentry *rt = la->la_rt;
 f00a624c:       f6 06 20 b0     ld  [ %i0 + 0xb0 ], %i3
 
         KASSERT(rt != NULL);
 ----
 
 Hmm, the place calling rw_exit? Or just before/after it?
 I'm not familiar with sparc so I may be wrong on the
 investigation.
 
 Thanks,
   ozaki-r
 


Home | Main Index | Thread Index | Old Index