Re: Panic on evbarm triggered by dumpfs

To: Petri Laakso <petri.laakso%asd.fi@localhost>
Subject: Re: Panic on evbarm triggered by dumpfs
From: Eduardo Horvath <eeh%NetBSD.org@localhost>
Date: Wed, 15 Jan 2014 21:28:17 +0000 (UTC)

On Wed, 15 Jan 2014, Petri Laakso wrote:

> On Wed, 15 Jan 2014 19:05:47 +0000 (UTC)
> Eduardo Horvath <eeh%NetBSD.org@localhost> wrote:
> 
> > On Wed, 15 Jan 2014, Petri Laakso wrote:
> > 
> > > On Wed, 15 Jan 2014 16:43:39 +0000 (UTC)
> > > Eduardo Horvath <eeh%NetBSD.org@localhost> wrote:
> > > 
> > > > Can you toggle DIAGNOSTIC and see if there is a change in the behavior? 
> > > >  
> > > > If you hit the first panic, disabling DIAGNOSTIC will make it go away
> > > > (although things may crash a different way later).  If you hit the 
> > > > second 
> > > > panic, DIAGNOSTIC may give more useful information about something 
> > > > going 
> > > > wrong earlier.
> > > 
> > > Thank you!
> > > 
> > > Here is panic with options DIAGNOSTIC. I'll try understand the problem and
> > > provide more information before weekend...
> > > 
> > > # panic: pool_get(pvepl): free list modified: magic=0; page 0xc82cc000; 
> > > item add
> > > r 0xc82cc1e0
> > > 
> > > Stopped in pid 11.1 (sh) at     netbsd:cpu_Debugger+0x4:        bx      
> > > r14
> > > db> bt
> > > 0xcbab7d70: netbsd:vpanic+0x10
> > > 0xcbab7d88: netbsd:printf_nolog
> > > 0xcbab7dc8: netbsd:pool_get+0x548
> > > 0xcbab7e1c: netbsd:pmap_enter+0x770
> > > 0xcbab7f48: netbsd:uvm_fault_internal+0xe50
> > > 0xcbab7fac: netbsd:prefetch_abort_handler+0x174
> > > 
> > > # from ddb show pools
> > > POOL pvepl: size 16, align 4, ioff 0, roflags 0x00000040
> > >         alloc 0xc044e1f0
> > >         minitems 512, minpages 3, maxpages 4294967295, npages 6
> > >         itemsperpage 254, nitems 539, nout 985, hardlimit 4294967295
> > >         nget 3691, nfail 0, nput 2706
> > >         npagealloc 6, npagefree 0, hiwat 6, nidle 2
> > 
> > In this case it looks like the pool page header is ok but one of the free 
> > items has been stepped on.  Maybe it was accessed after it was freed.
> > 
> > I would suggest turning on DEBUG as well since it will add additional 
> > checking.
> > 
> > There are three modifiers to the ddb `show pool' command, `l', `c', and 
> > `p' to display the log, cache entries, and pages.  Maybe there's some 
> > interesting information there.  Can you try all three?  Printing the page 
> > list should dump the headers for each page.
> 
> options DEBUG on
> 
> # panic: pool_get(pvepl): free list modified: magic=0; page 0xc82c2000; item 
> addr 0xc82c21e0
> 
> Stopped in pid 13.1 (sh) at     netbsd:cpu_Debugger+0x4:        bx      r14
> 
> show pools/c
> ...
> POOL pvepl: size 16, align 4, ioff 0, roflags 0x00000040
>         alloc 0xc044e1f4
>         minitems 512, minpages 3, maxpages 4294967295, npages 6
>         itemsperpage 254, nitems 539, nout 985, hardlimit 4294967295
>         nget 4269, nfail 0, nput 3284
>         npagealloc 6, npagefree 0, hiwat 6, nidle 2
> 
> show pools/l
> ...
> POOL pvepl: size 16, align 4, ioff 0, roflags 0x00000040
>         alloc 0xc044e1f4
>         minitems 512, minpages 3, maxpages 4294967295, npages 6
>         itemsperpage 254, nitems 539, nout 985, hardlimit 4294967295
>         nget 4269, nfail 0, nput 3284
>         npagealloc 6, npagefree 0, hiwat 6, nidle 2
> 
> show pools/p
> ...
> POOL pvepl: size 16, align 4, ioff 0, roflags 0x00000040
>         alloc 0xc044e1f4
>         minitems 512, minpages 3, maxpages 4294967295, npages 6
>         itemsperpage 254, nitems 539, nout 985, hardlimit 4294967295
>         nget 4269, nfail 0, nput 3284
>         npagealloc 6, npagefree 0, hiwat 6, nidle 2
> 
>         empty page list:
>                 page 0xcac44000, nmissing 0, time 1
>                 page 0xcac43000, nmissing 0, time 1
> 
>         full page list:
>                 page 0xc82ba000, nmissing 254, time 7
>                 page 0xc82ab000, nmissing 254, time 6
>                 page 0xcac42000, nmissing 254, time 1
> 
>         partial-page list:
>                 page 0xc82c2000, nmissing 223, time 15
>                         item 0xc82c21e0, magic 0x0
>                         item 0xc82c21d0, magic 0x0
>         curpage 0xc82c2000


Looks like two free items on this page are corrupt.  


> 
> 
> > You have the address of the page and the item.  Try to dump the item and 
> > some of the surrounding memory.  I think it should be 16 bytes.  If we can 
> > see some pattern, such as a segment of memory that's been zero-ed, it 
> > might tell us something useful.  And depending on the results of dumping 
> > the page header, you might want to do the same thing for the header.
> 
> Item @ 0xc82c21e0
> 
> db> x/m 0xc82c21a0,20                                                   
> c82c21a0:       00000000 00000000 00000000 00000000     ................
> c82c21b0:       00000000 00000000 00000000 00000000     ................
> c82c21c0:       00000000 00000000 00000000 00000000     ................
> c82c21d0:       00000000 00000000 00000000 00000000     ................
> c82c21e0:       00000000 d0212cc8 fc2f2cc8 00000000     .....!,../,.....
> c82c21f0:       b0a62bc8 002026c8 00b01440 12000000     ..+.. &....@....
> c82c2200:       d0ab2bc8 002026c8 00f01440 12000000     ..+.. &....@....
> c82c2210:       50a52bc8 002026c8 00e01440 12000000     P.+.. &....@....

Looks like the stuff before c82c21e4 has been zeroed out.

Looking at arch/arm/arm32/pmap.c, most of the memset() calls are single 
pages, however pmap_boot_pagealloc() does seem to have some interesting 
calls to memset() with pv entries and random sizes.  Looks like 
pmap_boot_pagealloc() should not be called after the VM subsystem has been 
initialized.  If it's called after that it may cause problems.  I'd 
recommend adding an assert in there that uvm.page_init_done must be false.

Beyond that I'm not sure what to recommend.  Something definitely looks 
like it's stomping on free pv entries, but figuring out exactly what is 
doing that will be very difficult.

Eduardo

References:
- Re: Panic on evbarm triggered by dumpfs
  - From: Christos Zoulas
- Re: Panic on evbarm triggered by dumpfs
  - From: Petri Laakso
- Re: Panic on evbarm triggered by dumpfs
  - From: Eduardo Horvath
- Re: Panic on evbarm triggered by dumpfs
  - From: Petri Laakso
- Re: Panic on evbarm triggered by dumpfs
  - From: Eduardo Horvath
- Re: Panic on evbarm triggered by dumpfs
  - From: Petri Laakso

Prev by Date: [no subject]
Next by Date: Re: bootxx_ffsv1 compilation failure on amd64
Previous by Thread: Re: Panic on evbarm triggered by dumpfs
Next by Thread: Re: Panic on evbarm triggered by dumpfs
Indexes:

Home | Main Index | Thread Index | Old Index