Re: Panic on evbarm triggered by dumpfs

To: Petri Laakso <petri.laakso%asd.fi@localhost>
Subject: Re: Panic on evbarm triggered by dumpfs
From: Eduardo Horvath <eeh%NetBSD.org@localhost>
Date: Wed, 15 Jan 2014 16:43:39 +0000 (UTC)

On Wed, 15 Jan 2014, Petri Laakso wrote:

> I was able to do that on second try. lots of output below.
> Yesterday I disabled logging, but it didn't change anything.
> 
> Now when I tried hard, I was able to crash the system without shutdown.
> 
> # Second try @ single user read-only
> 
> [repeated calls to dumpfs, only picked up interesting ones below]
> # dumpfs /dev/rld0a
> dumpfs: /dev/rld0a: could not find superblock, skipped
> # dumpfs /dev/rld0a
> [1]   Bus error               dumpfs /dev/rld0a
> # dumpfs /dev/rld0a
> dumpfs: (null): could not find superblock, skipped
> # dumpfs /dev/rld0a
> dumpfs: /dev/rld0a: could not find superblock, skipped
> # scan_ffs /dev/rld0a
> Disk: STORAGE DEVICE  fictitious
> Total sectors on disk: 3932160
> 
> panic: pool_get: pvepl: page empty
> Stopped in pid 46.1 (scan_ffs) at       netbsd:cpu_Debugger+0x4:        bx
> r
> 14
> db> bt
> 0xcbab9c1c: netbsd:vpanic+0x10
> 0xcbab9c34: netbsd:printf_nolog
> 0xcbab9c6c: netbsd:pool_get+0x304
> 0xcbab9cc8: netbsd:pmap_enter+0x748
> 0xcbab9d00: netbsd:vmapbuf+0xbc
> 0xcbab9d60: netbsd:physio+0x28c
> 0xcbab9d80: netbsd:ldread+0x40
> 0xcbab9da0: netbsd:cdev_read+0x40
> 0xcbab9e04: netbsd:spec_read+0x6c
> 0xcbab9e14: netbsd:ufsspec_read+0x44
> 0xcbab9e3c: netbsd:VOP_READ+0x38
> 0xcbab9e64: netbsd:vn_read+0x84
> 0xcbab9eb4: netbsd:dofileread+0x84
> 0xcbab9eec: netbsd:sys_pread+0xa0
> 0xcbab9f80: netbsd:syscall+0x88
> 0xcbab9fac: netbsd:swi_handler+0x9c
> db>

I don't think this really has much to do with the filesystem other than it 
triggering the latent problem.

This is coming from pmap_enter() which trying to allocate a pv structure 
to hold the physical->virtual mapping information for a page that is 
probably being added to the kernel pmap.  The pool being used to allocate 
the pv entries is upset about something.  The two places it will panic 
are here:

        if (pp->pr_roflags & PR_NOTOUCH) {
#ifdef DIAGNOSTIC
                if (__predict_false(ph->ph_nmissing == 
pp->pr_itemsperpage)) {
                        mutex_exit(&pp->pr_lock);
                        panic("pool_get: %s: page empty", pp->pr_wchan);
                }
#endif
                v = pr_item_notouch_get(pp, ph);
        } else {
                v = pi = LIST_FIRST(&ph->ph_itemlist);
                if (__predict_false(v == NULL)) {
                        mutex_exit(&pp->pr_lock);
                        panic("pool_get: %s: page empty", pp->pr_wchan);
                }
#ifdef DIAGNOSTIC
                if (__predict_false(pp->pr_nitems == 0)) {
                        mutex_exit(&pp->pr_lock);
                        printf("pool_get: %s: items on itemlist, nitems 
%u\n",
                            pp->pr_wchan, pp->pr_nitems);
                        panic("pool_get: nitems inconsistent");
                }
#endif

Unfortunately both have the same panic string so it's difficult to tell 
them apart.  I think PR_NOTOUCH is probably not set so it's likely the 
second panic.

Can you toggle DIAGNOSTIC and see if there is a change in the behavior?  
If you hit the first panic, disabling DIAGNOSTIC will make it go away
(although things may crash a different way later).  If you hit the second 
panic, DIAGNOSTIC may give more useful information about something going 
wrong earlier.

Anyway, let's assuming you're hitting the second panic.  A pool should 
maintain a pointer to a page that has free entries.  The page has a pool 
header which has a list of the free entries on that page.  In this case 
the list is empty.  This is probably due to something stomping on the page 
in question.

The ARM pv pool is special in the sense that it has a custom page 
allocator so it can be used before the VM subsystem is initialized.  
Here's the routine in question:

static void *
pmap_bootstrap_pv_page_alloc(struct pool *pp, int flags)
{
        extern void *pool_page_alloc(struct pool *, int);
        vaddr_t new_page;
        void *rv;

        if (pmap_initialized)
                return (pool_page_alloc(pp, flags));

        if (free_bootstrap_pages) {
                rv = free_bootstrap_pages;
                free_bootstrap_pages = *((void **)rv);
                return (rv);
        }

        new_page = uvm_km_alloc(kernel_map, PAGE_SIZE, 0,
            UVM_KMF_WIRED | ((flags & PR_WAITOK) ? 0 : UVM_KMF_NOWAIT));

        KASSERT(new_page > last_bootstrap_page);
        last_bootstrap_page = new_page;
        return ((void *)new_page);
}

It may be that the page in question is one of the bootstrap pages and 
it's been lost and is being used in some other way.

Anyway if you can figure out the address of the pool header that's causing 
the problem and dump its contents maybe we can get some idea about what's 
stepping on that page.

Eduardo

Follow-Ups:
- Re: Panic on evbarm triggered by dumpfs
  - From: Petri Laakso

References:
- Re: Panic on evbarm triggered by dumpfs
  - From: Christos Zoulas
- Re: Panic on evbarm triggered by dumpfs
  - From: Petri Laakso

Prev by Date: Re: Panic on evbarm triggered by dumpfs
Next by Date: Re: Panic on evbarm triggered by dumpfs
Previous by Thread: Re: Panic on evbarm triggered by dumpfs
Next by Thread: Re: Panic on evbarm triggered by dumpfs
Indexes:

Home | Main Index | Thread Index | Old Index