Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Crash on -current in pool_drain()



Under heavy load, and after several hours of building packages, I am
seeing the following crash.  I'm doing a bisect to narrow down more,
but it has been happening at least a week ago, with kernel and all
modules build from sources updated on 2015-10-13 at 08:30:00 UTC.

(This is on amd64)

Here's the backtrace from gdb:

(gdb) bt
#0  0xffffffff801196a5 in cpu_reboot (howto=howto@entry=256,
    bootstr=bootstr@entry=0x0)
    at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671
#1  0xffffffff80237fd6 in db_sync_cmd (addr=<optimized out>,
    have_addr=<optimized out>, count=<optimized out>,
    modif=<optimized out>)
    at /build/netbsd-local/src/sys/ddb/db_command.c:1359
#2  0xffffffff80238797 in db_command (
    last_cmdp=last_cmdp@entry=0xffffffff806b90a0 <db_last_command>)
    at /build/netbsd-local/src/sys/ddb/db_command.c:908
#3  0xffffffff80238ade in db_command_loop ()
    at /build/netbsd-local/src/sys/ddb/db_command.c:566
#4  0xffffffff8023c16d in db_trap (type=type@entry=6, code=code@entry=0)
    at /build/netbsd-local/src/sys/ddb/db_trap.c:90
#5  0xffffffff80116130 in kdb_trap (type=type@entry=6,
    code=code@entry=0, regs=regs@entry=0xfffffe810f528ce0)
    at /build/netbsd-local/src/sys/arch/amd64/amd64/db_interface.c:227
#6  0xffffffff8011a82f in trap (frame=0xfffffe810f528ce0)
    at /build/netbsd-local/src/sys/arch/amd64/amd64/trap.c:287
#7  0xffffffff80100fde in alltraps ()
#8  0xffffffff80333415 in pool_drain (ppp=ppp@entry=0xfffffe810f528e30)
    at /build/netbsd-local/src/sys/kern/subr_pool.c:1429
#9  0xffffffff802d1791 in uvm_pageout (arg=<optimized out>)
    at /build/netbsd-local/src/sys/uvm/uvm_pdaemon.c:343
#10 0xffffffff80100807 in lwp_trampoline ()
#11 0x0000000000000000 in ?? ()
(gdb) fr 8
#8  0xffffffff80333415 in pool_drain (ppp=ppp@entry=0xfffffe810f528e30)
    at /build/netbsd-local/src/sys/kern/subr_pool.c:1429
1429                    if (drainpp == NULL) {
(gdb) disass pool_drain
Dump of assembler code for function pool_drain:
   0xffffffff803333da <+0>:     push   %rbp
   0xffffffff803333db <+1>:     mov    %rsp,%rbp
   0xffffffff803333de <+4>:     push   %r13
   0xffffffff803333e0 <+6>:     push   %r12
   0xffffffff803333e2 <+8>:     push   %rbx
   0xffffffff803333e3 <+9>:     mov    %rdi,%r12
   0xffffffff803333e6 <+12>:    cmpq   $0x0,0x38d732(%rip)        # 0xffffffff806c0b20 <pool_head>
   0xffffffff803333ee <+20>:    je     0xffffffff803334a2 <pool_drain+200>
   0xffffffff803333f4 <+26>:    mov    $0xffffffff80732f98,%rdi
   0xffffffff803333fb <+33>:    callq  0xffffffff8011bbc0 <mutex_enter>
   0xffffffff80333400 <+38>:    mov    0x38d719(%rip),%rcx        # 0xffffffff806c0b20 <pool_head>
   0xffffffff80333407 <+45>:    mov    0x3ffb92(%rip),%rax        # 0xffffffff80732fa0 <drainpp>
   0xffffffff8033340e <+52>:    xor    %ebx,%ebx
   0xffffffff80333410 <+54>:    test   %rax,%rax
   0xffffffff80333413 <+57>:    je     0xffffffff8033342a <pool_drain+80>
=> 0xffffffff80333415 <+59>:    mov    (%rax),%rdx
   0xffffffff80333418 <+62>:    mov    %rax,%rbx
   0xffffffff8033341b <+65>:    mov    0x58(%rbx),%eax
   0xffffffff8033341e <+68>:    test   %eax,%eax
   0xffffffff80333420 <+70>:    jne    0xffffffff80333442 <pool_drain+104>
   0xffffffff80333422 <+72>:    mov    %rdx,%rax
   0xffffffff80333425 <+75>:    test   %rax,%rax
...
(gdb) list pool_drain
1413     *
1414     * Note, must never be called from interrupt context.
1415     */
1416    bool
1417    pool_drain(struct pool **ppp)
1418    {
1419            bool reclaimed;
1420            struct pool *pp;
1421
1422            KASSERT(!TAILQ_EMPTY(&pool_head));
1423
1424            pp = NULL;
1425
1426            /* Find next pool to drain, and add a reference. */
1427            mutex_enter(&pool_head_lock);
1428            do {
1429                    if (drainpp == NULL) {
1430                            drainpp = TAILQ_FIRST(&pool_head);
1431                    }
1432                    if (drainpp != NULL) {
1433                            pp = drainpp;
1434                            drainpp = TAILQ_NEXT(pp, pr_poollist);
1435                    }
1436                    /*
1437                     * Skip completely idle pools.  We depend on at least
1438                     * one pool in the system being active.
1439                     */
1440            } while (pp == NULL || pp->pr_npages == 0);
1441            pp->pr_refcnt++;
1442            mutex_exit(&pool_head_lock);

Interestingly, the symbol pp doesn't seem to be available here, even
though it controls (partially) the enclosing while (...) loop.

(gdb) print *pp
No symbol "pp" in current context.

The routine's argument doesn't seem to exist here, either

(gdb) print **ppp
No symbol "ppp" in current context.

drainpp seems to point to the end of the LIST at pool_head:

(gdb) print drainpp
$1 = (struct pool *) 0xffffffff8099fb40
(gdb) print pool_head
$2 = {tqh_first = 0xffffffff80724880 <uvm_amap_cache>,
  tqh_last = 0xffffffff8099fb40}

I'm not good enough at x86 assembler to decode much further...

Anyone got a clue?


+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+


Home | Main Index | Thread Index | Old Index