tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: netbsd-6: pagedaemon freeze when low on memory



On 2013-03-04 20:43, Richard Hansen wrote:
> Hi all,
> 
> I believe I have found a bug in the pagedaemon (uvm_pageout() in
> src/sys/uvm/uvm_pdaemon.c) that causes the system to freeze when the
> kmem_arena runs low (<10% free):
> 
>   1. line 254: uvm_km_va_starved_p() returns true
>   2. line 258: the !kmem_va_starved condition prevents the pagedaemon
> from sleeping
>   3. lines 330--346: no memory is freed -- it's all still in use
>   4. go to step #1
> 
> To reproduce the freeze:
> 
>   1. acquire an i386 system with 4GB of memory and lots of files in the
> filesystem
>   2. set the kern.maxvnodes sysctl as high as it will go
>   3. run 'du -skx /'
>   4. wait for the kernel to run low on memory (vnode allocations)
> 
> I believe the bug was introduced in this commit:
>   http://mail-index.netbsd.org/source-changes/2012/02/01/msg031411.html
>   http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/uvm/uvm_pdaemon.c#rev1.105
> 
> Attached is an initial attempt at fixing this.  The patch allows the
> pagedaemon to sleep if no memory was reclaimed the last time through the
> loop.  This is probably not a correct or complete fix; I don't yet have
> a comfortable understanding of the inner workings of the kernel.
> 
> With the patch applied, the pagedaemon no longer freezes.  However, LWPs
> start piling up in vmem_alloc() waiting for memory to become available.
>  So it seems like this change is necessary but not sufficient.
> 
> Thoughts?
> 
> Thanks,
> Richard
> 

Hi all,

the cause for this behavior is that kern.mxvnodes can be set too high in
regards of memory available in kmem.

Currently on startup the size of the vnode caching is calculated as 10%
of physmem and 20% of KVA (kernel_map size)
see: http://nxr.netbsd.org/source/xref/src/sys/kern/init_main.c#447

This nicely fits but the calculation is done on basis of the kernel_map
while the allocations are drawn from the kmem_arena, which is only a
part of the kernel_map.

sysctl allows setting the vnode caching to 75% of physmem and to 75% of
the kernel_map which is too large on kva limited archs with default kmem
size.
see: http://nxr.netbsd.org/xref/src/sys/kern/init_sysctl.c#965

Those calculations take place under the assumption that the cost per
vnode are 2k so there is some safe margin as the memory required per
vnode seems to be slightly above 1k.

The attached patch changes the behavior. The vnode cache size is
calculated in terms of kmem with the default size (nearly) unchanged in
case of beeing limited by KVA and unchanegd in case of beeing physmem
limited.

If there are no objections I'll commit in a few days.

kind regards,
Lars


-- 
------------------------------------

Mystische Erklärungen:
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.

   -- Friedrich Nietzsche
   [ Die Fröhliche Wissenschaft Buch 3, 126 ]
Index: sys/kern/init_main.c
===================================================================
RCS file: /cvsroot/src/sys/kern/init_main.c,v
retrieving revision 1.447
diff -u -r1.447 init_main.c
--- sys/kern/init_main.c        21 Feb 2013 01:39:55 -0000      1.447
+++ sys/kern/init_main.c        8 Mar 2013 16:25:48 -0000
@@ -444,8 +444,8 @@
         * 10% of memory for vnodes and associated data structures in the
         * assumed worst case.  Do not provide fewer than NVNODE vnodes.
         */
-       usevnodes =
-           calc_cache_size(kernel_map, 10, VNODE_VA_MAXPCT) / VNODE_COST;
+       usevnodes = calc_cache_size(vmem_size(kmem_arena, VMEM_FREE|VMEM_ALLOC),
+           10, VNODE_KMEM_MAXPCT) / VNODE_COST;
        if (usevnodes > desiredvnodes)
                desiredvnodes = usevnodes;
 #endif
@@ -1078,20 +1078,17 @@
 }
 
 /*
- * calculate cache size (in bytes) from physmem and vm_map size.
+ * calculate cache size (in bytes) from physmem and vsize.
  */
 vaddr_t
-calc_cache_size(struct vm_map *map, int pct, int va_pct)
+calc_cache_size(vsize_t vsize, int pct, int va_pct)
 {
        paddr_t t;
 
        /* XXX should consider competing cache if any */
        /* XXX should consider submaps */
        t = (uintmax_t)physmem * pct / 100 * PAGE_SIZE;
-       if (map != NULL) {
-               vsize_t vsize;
-
-               vsize = vm_map_max(map) - vm_map_min(map);
+       if (vsize != 0) {
                vsize = (uintmax_t)vsize * va_pct / 100;
                if (t > vsize) {
                        t = vsize;
Index: sys/kern/init_sysctl.c
===================================================================
RCS file: /cvsroot/src/sys/kern/init_sysctl.c,v
retrieving revision 1.196
diff -u -r1.196 init_sysctl.c
--- sys/kern/init_sysctl.c      7 Mar 2013 18:02:54 -0000       1.196
+++ sys/kern/init_sysctl.c      8 Mar 2013 16:25:49 -0000
@@ -961,8 +961,9 @@
        if (new_vnodes <= 0)
                return (EINVAL);
 
-       /* Limits: 75% of KVA and physical memory. */
-       new_max = calc_cache_size(kernel_map, 75, 75) / VNODE_COST;
+       /* Limits: 75% of kmem and physical memory. */
+       new_max = calc_cache_size(vmem_size(kmem_arena, VMEM_FREE|VMEM_ALLOC),
+           75, 75) / VNODE_COST;
        if (new_vnodes > new_max)
                new_vnodes = new_max;
 
Index: sys/kern/vfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
retrieving revision 1.242
diff -u -r1.242 vfs_bio.c
--- sys/kern/vfs_bio.c  30 Dec 2012 09:19:24 -0000      1.242
+++ sys/kern/vfs_bio.c  8 Mar 2013 16:25:49 -0000
@@ -396,6 +396,7 @@
 buf_memcalc(void)
 {
        u_long n;
+       vsize_t mapsz;
 
        /*
         * Determine the upper bound of memory to use for buffers.
@@ -417,7 +418,8 @@
                        printf("forcing bufcache %d -> 95", bufcache);
                        bufcache = 95;
                }
-               n = calc_cache_size(buf_map, bufcache,
+               mapsz = vm_map_max(buf_map) - vm_map_min(buf_map);
+               n = calc_cache_size(mapsz, bufcache,
                    (buf_map != kernel_map) ? 100 : BUFCACHE_VA_MAXPCT)
                    / PAGE_SIZE;
        }
Index: sys/rump/librump/rumpkern/emul.c
===================================================================
RCS file: /cvsroot/src/sys/rump/librump/rumpkern/emul.c,v
retrieving revision 1.154
diff -u -r1.154 emul.c
--- sys/rump/librump/rumpkern/emul.c    7 Mar 2013 19:07:05 -0000       1.154
+++ sys/rump/librump/rumpkern/emul.c    8 Mar 2013 16:25:50 -0000
@@ -192,7 +192,7 @@
 }
 
 vaddr_t
-calc_cache_size(struct vm_map *map, int pct, int va_pct)
+calc_cache_size(vsize_t vasz, int pct, int va_pct)
 {
        paddr_t t;
 
Index: sys/sys/param.h
===================================================================
RCS file: /cvsroot/src/sys/sys/param.h,v
retrieving revision 1.425
diff -u -r1.425 param.h
--- sys/sys/param.h     13 Feb 2013 14:03:49 -0000      1.425
+++ sys/sys/param.h     8 Mar 2013 16:25:50 -0000
@@ -149,8 +149,8 @@
 #define        NVNODE  (NPROC + NTEXT + 100)
 #define        NVNODE_IMPLICIT
 #endif
-#ifndef VNODE_VA_MAXPCT
-#define        VNODE_VA_MAXPCT 20
+#ifndef VNODE_KMEM_MAXPCT
+#define        VNODE_KMEM_MAXPCT       60
 #endif
 #ifndef BUFCACHE_VA_MAXPCT
 #define        BUFCACHE_VA_MAXPCT      20
Index: sys/sys/systm.h
===================================================================
RCS file: /cvsroot/src/sys/sys/systm.h,v
retrieving revision 1.257
diff -u -r1.257 systm.h
--- sys/sys/systm.h     3 Aug 2012 18:08:01 -0000       1.257
+++ sys/sys/systm.h     8 Mar 2013 16:25:50 -0000
@@ -62,7 +62,6 @@
 struct uio;
 struct vnode;
 struct vmspace;
-struct vm_map;
 
 extern const char *panicstr;   /* panic message */
 extern int doing_shutdown;     /* shutting down */
@@ -534,6 +533,6 @@
 #define        ASSERT_SLEEPABLE()      /* nothing */
 #endif /* defined(DEBUG) */
 
-vaddr_t calc_cache_size(struct vm_map *, int, int);
+vaddr_t calc_cache_size(vsize_t , int, int);
 
 #endif /* !_SYS_SYSTM_H_ */


Home | Main Index | Thread Index | Old Index