tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: low memory problem - pool calls vmem calls uvm calls pool etc.



On 02/11/13 12:36, matthew green wrote:
> a sunblade 2500 with 4GB ram and one cpu hung earlier today. the
> stack trace of the active lwp from ddb shows me that there is a
> loop between pool/uvm/vmem in low memory condition.
>
> "show uvm" says that there is only 1 free page, and the bt is:
>
> db{0}> bt
> intr_list_handler(59c86d0, a, e0017ed0, 330, 1203f20, 0) at netbsd:intr_list_handler+0x10 > sparc_interrupt(0, a, 400cc000, 575dec0, 0, 0) at netbsd:sparc_interrupt+0x22c > mutex_spin_enter(18aac80, ff070000000001, ffffffffffffffff, 4000001, 0, 0) at netbsd:mutex_spin_enter+0xa0 > bt_refill(18abd18, 1002, ff070000000001, 874061e8, 0, 0) at netbsd:bt_refill+0x100
> vmem_xalloc(18abd18, 2000, 2000, 0, 0, 0) at netbsd:vmem_xalloc+0x6c
> vmem_alloc(18abd18, 2000, 1002, 87405d68, 0, 0) at netbsd:vmem_alloc+0x94
> pool_page_alloc_meta(18a93f0, 2, ff070000000001, 87406aa8, 0, 0) at netbsd:pool_page_alloc_meta+0x2c
> pool_grow(18a93f0, 2, 2000, 0, 0, 0) at netbsd:pool_grow+0x1c
> pool_get(18a94a8, 2, ff070000000001, 330, 0, 0) at netbsd:pool_get+0x3c
> pool_cache_put_slow(18ad840, a, 400cc000, 575dec0, 0, 0) at netbsd:pool_cache_put_slow+0x160 > pool_cache_put_paddr(18ad600, 400cc000, ffffffffffffffff, 4000001, 0, 0) at netbsd:pool_cache_put_paddr+0xa4
> [ this repeats 30 more times
> uvm_km_kmem_alloc(c, 2000, 0, 87411628, 0, 0) at netbsd:uvm_km_kmem_alloc+0x104
> vmem_xalloc(18abd18, 18abfb0, 2000, 0, 0, 0) at netbsd:vmem_xalloc+0x8ac
> vmem_alloc(18abd18, 2000, 1002, 874117c8, ff7fffff, ffdfffff) at netbsd:vmem_alloc+0x94 > pool_page_alloc_meta(18a93f0, 2, ff070000000001, 13, 0, 0) at netbsd:pool_page_alloc_meta+0x2c
> pool_grow(18a93f0, 2, 59c2000, 13, 7d, 0) at netbsd:pool_grow+0x1c
> pool_get(18a94a8, 2, 59c2000, 5, 10c2660, ffffffffffffffff) at netbsd:pool_get+0x3c > pool_cache_put_slow(57b7780, 0, 28f50940, 4, 0, 0) at netbsd:pool_cache_put_slow+0x160 > pool_cache_put_paddr(57b7540, 28f50940, ffffffffffffffff, 201b, 0, 0) at netbsd:pool_cache_put_paddr+0xa4
> ]
> ffs_reclaim(0, 59c2000, 59c2000, 0, 0, 0) at netbsd:ffs_reclaim+0xec
> VOP_RECLAIM(28f54e70, 1, 0, 59c2000, 0, 0) at netbsd:VOP_RECLAIM+0x28
> vclean(28f54e70, 8, 0, 0, ff7fffff, ffdfffff) at netbsd:vclean+0x134
> cleanvnode(1884500, 0, 64, 6, 28f54e94, 1884500) at netbsd:cleanvnode+0xc4 > vdrain_thread(59c2000, 59c2000, 0, 1c05d38, 7d, 0) at netbsd:vdrain_thread+0x90 > lwp_trampoline(f005d730, 113800, 113c00, 111880, 111ce0, 1117f8) at netbsd:lwp_trampoline+0x8

I believe I'm seeing pretty much the same problem, on a stock NetBSD 6.1 amd64 Xen DOMU. Usually happens during /etc/daily for some reason, but sadly not reliably, nor when I run /etc/daily in a tight loop. But it does seem to happen every few days, and I can't drop to ddb when it does.

pool_cache_put_slow() has to allocate some administrative storage, via vmem_alloc(), then uvm_km_kmem_alloc(). In uvm_km_kmem_alloc() it calls first vmem_alloc() and then uvm_pagealloc(). If uvm_pagealloc() fails, it calls vmem_free() on the allocation from vmem_alloc() and returns ENOMEM.

The problem is that vmem_free() attempts to re-pool the allocation (if QCACHE is defined), which starts the whole process again.

void
vmem_free(vmem_t *vm, vmem_addr_t addr, vmem_size_t size)
{

        KASSERT(size > 0);

#if defined(QCACHE)
        if (size <= vm->vm_qcache_max) {
int qidx = (size + vm->vm_quantum_mask) >> vm->vm_quantum_shift;
                qcache_t *qc = vm->vm_qcache[qidx - 1];

                pool_cache_put(qc->qc_cache, (void *)addr);
                return;
        }
#endif /* defined(QCACHE) */

        vmem_xfree(vm, addr, size);
}

I'm going to try the below, which has the effect of never attempting to re-pool the freed allocation in the ENOMEM case.

Technically vmem_alloc() and vmem_xfree() should not be mixed, but in this case I see no problem with it functionally. It's just awkward that the documentation and a sense of aesthetics tells us not to :)

--- sys/uvm/uvm_km.c.orig       2013-12-03 16:33:14.000000000 +1300
+++ sys/uvm/uvm_km.c    2013-12-03 16:34:16.000000000 +1300
@@ -787,7 +787,7 @@
                        } else {
                                uvm_km_pgremove_intrsafe(kernel_map, va,
                                    va + size);
-                               vmem_free(kmem_va_arena, va, size);
+                               vmem_xfree(kmem_va_arena, va, size);
                                return ENOMEM;
                        }
                }




Home | Main Index | Thread Index | Old Index