kern/41765: kernel panic while allocating swap space pages

>Number:         41765
>Category:       kern
>Synopsis:       kernel panic while allocating swap space pages
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 22 12:00:00 +0000 2009
>Originator:     Wolfgang Stukenbrock
>Release:        NetBSD 4.0
Dr. Nagler & Company GmbH
System: NetBSD s012 4.0 NetBSD 4.0 (NSW-S012) #9: Fri Mar 13 12:31:52 CET 2009 
wgstuken@s012:/usr/src/sys/arch/amd64/compile/NSW-S012 amd64
Architecture: x86_64
Machine: amd64
        There is a limitation in the blist implementation in kern/subr_blist.c.
        It cannot allocate more pages at once as specified by BLIST_BMAP_RADIX.
        For this purpose there is a definition BLIST_MAX_ALLOC in blist.h.
        In uvm/uvm_swap.c uvm_swap_alloc() there is no check against this 
number and if
        the e.g. the pagedeamon tries to swap out more, the system will panic 
        "panic: blist_meta_alloc: allocation too large".
        Remark: this will only happen if MAXPHYS is larger than the space in 
        pages. Normaly this is not the case (64k < 128k). We have changed 
MAXPHYS on the
        system to something more suitable for a backup server, because 64k 
blocksize is
        much to small - we use 1MB.
        Compile a kernel with MAXPHYS larger than 128k and allocate a large 
amount of memory
        in tmpfs. The system will try to allocate pages for MAXPHYS Kbyte and 
will panic.
        The fix is easy. Just check the number of pages in uvm_swap_alloc() 
prio calling blist_alloc().
        Due to the fact, that umv_swap_alloc() may fail and the caller must be 
able to handle this,
        we can abort the allocation if we may not return a smaller number of 
pages as requested.
        Otherwise we just clamp it to BLIST_MAX_ALLOC.
        The following fix runs fine on our system without any problems with 
pagedeamon anymore.
        remark: I assume, that I can change *nslots without allocating the 
uvm.swap_data_lock lock.
        Otherwise this code must be moved behind the simple_lock(), that comes 
two lines below
        the unified diff and the lock must be freed prior the "return 0;".

        Of cause it would be better to fix the allocation problem in 
subr_blist.c, but this is
        a small and effectiv work-around for the allocation problem.

        The following patch should be applyed to uvm/uvm_swap.c:
--- uvm_swap.c  2009/07/22 11:30:38     1.1
+++ uvm_swap.c  2009/07/22 11:39:44
@@ -1447,6 +1447,14 @@
        if (uvmexp.nswapdev < 1)
                return 0;
+       if (*nslots > BLIST_MAX_ALLOC) {
+               /* avoid panic in blist_alloc() below - cannot handle this 
amount of blocks
+                * see comment in subr_blist.c
+                */
+               if (!lessok) return 0; /* sorry - cannot handle this ... */
+               *nslots = BLIST_MAX_ALLOC;
+       }
         * lock data lock, convert slots into blocks, and enter loop


