tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Deadlock on fragmented memory?



> Date: Thu, 19 Oct 2017 09:57:02 +0200
> From: Martin Husemann <martin%duskware.de@localhost>
> 
> I sometimes see my shark (arm v4, strongarm 110, 96 MB of ram, main
> characteristics: very *slow* disk access) "hang" with many processes
> blocked on "vmem". All of etc.daily is hit by that, lots of cron instances,
> and no new exec(2) work (or so it seems).
> 
> Various suggestions have been raised off-list (like prepopulating the
> exec pool), but nothing concrete ever came of it.

Here's the quick and rather embarrassing history:

1. We used to carve out MAXEXEC*ARG_MAX bytes of KVA on boot in a VM
   submap exec_map of kernel_map.

2. In 2008, ad@ switched to using a pool of ARG_MAX-byte (= 256k)
   chunks of KVA:

   https://mail-index.netbsd.org/tech-kern/2008/06/25/msg001854.html

   With this change, we stopped carving out the KVA on boot and
   instead started to allocate it on exec.  That doesn't work so well
   after the system has been running for a while and KVA has gotten
   fragmented enough that there are no more contiguous ARG_MAX-byte
   chunks of it.

3. In 2011, I noticed a problem with similar symptoms and spent a long
   time analyzing it:

   https://gnats.netbsd.org/45718

4. In 2012, not knowing the history of exec_map, I sketched some ways
   to work around it:

   https://mail-index.netbsd.org/tech-kern/2012/12/09/msg014695.html

5. Since then, it's some up on IRC from time to time but nobody has
   done anything about it.  I switched all my machines to 64-bit and
   stopped caring about KVA fragmentation.

Attached is a patch that attempts to restore the original behaviour of
carving out enough KVA on boot, by setting the exec_pool low-water
mark to be the same as the hard limit.

If this helps, I suggest that we:

(a) Apply it to HEAD.

(b) Pull it up to all branches.

(c) Consider, in netbsd-9, replacing contiguous ARG_MAX-byte execargs
    buffers by an array of page-sized buffers allocated from a pool
    like the current exec_pool, so that we don't have to worry about
    KVA fragmentation here at all.  This is a bit more work, of
    course, to edit all the string hacking in kern_exec.c.

(And maybe we ought to set a fixed upper bound on the size of any
contiguous buffer in the kernel so that we can limit the damage of KVA
fragmentation.)

Thoughts?
Index: sys/kern/kern_exec.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_exec.c,v
retrieving revision 1.446
diff -p -u -r1.446 kern_exec.c
--- sys/kern/kern_exec.c	29 Sep 2017 17:47:29 -0000	1.446
+++ sys/kern/kern_exec.c	19 Oct 2017 14:36:26 -0000
@@ -1825,6 +1825,7 @@ exec_init(int init_boot)
 		pool_init(&exec_pool, NCARGS, 0, 0, PR_NOALIGN|PR_NOTOUCH,
 		    "execargs", &exec_palloc, IPL_NONE);
 		pool_sethardlimit(&exec_pool, maxexec, "should not happen", 0);
+		pool_setlowat(&exec_pool, maxexec);
 	} else {
 		KASSERT(rw_write_held(&exec_lock));
 	}


Home | Main Index | Thread Index | Old Index