NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

PR/59252 CVS commit: src/sys/rump/librump/rumpkern



The following reply was made to PR misc/59252; it has been noted by GNATS.

From: "Taylor R Campbell" <riastradh%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost
Cc: 
Subject: PR/59252 CVS commit: src/sys/rump/librump/rumpkern
Date: Sun, 6 Apr 2025 01:13:55 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Apr  6 01:13:55 UTC 2025
 
 Modified Files:
 	src/sys/rump/librump/rumpkern: lwproc.c
 
 Log Message:
 rump: Nix leaked struct lwp on every rump_server syscall.
 
 This leak was introduced by lwproc.c rev. 1.51 back in 2020, when the
 kmem_zalloc(sizeof(*l), KM_SLEEP) was factored out of callers and
 into lwproc_makelwp:
 
 -static void
 -lwproc_makelwp(struct proc *p, struct lwp *l, bool doswitch, bool procmake)
 +static struct lwp *
 +lwproc_makelwp(struct proc *p, bool doswitch, bool procmake)
  {
 +	struct lwp *l = kmem_zalloc(sizeof(*l), KM_SLEEP);
 ...
 @@ -435,8 +431,12 @@ rump_lwproc_newlwp(pid_t pid)
  		kmem_free(l, sizeof(*l));
  		return EBUSY;
  	}
 +	mutex_exit(p->p_lock);
  	mutex_exit(&proc_lock);
 -	lwproc_makelwp(p, l, true, false);
 +
 +	/* XXX what holds proc? */
 +
 +	lwproc_makelwp(p, true, false);
 
 Unfortunately, the kmem_zalloc in rump_lwproc_newlwp was not deleted.
 So it just leaked.
 
 And this routine is called on _every_ syscall handled by rump_server
 (via struct rumpuser_hyperup::hyp_lwproc_newlwp):
 
     674 static void
     675 serv_handlesyscall(struct spclient *spc, struct rsp_hdr *rhdr, uint8_t *data)
     676 {
     677 	register_t retval[2] = {0, 0};
     678 	int rv, sysnum;
     679
     680 	sysnum = (int)rhdr->rsp_sysnum;
     681 	DPRINTF(("rump_sp: handling syscall %d from client %d\n",
     682 	    sysnum, spc->spc_pid));
     683
  => 684 	if (__predict_false((rv = lwproc_newlwp(spc->spc_pid)) != 0)) {
     685 		retval[0] = -1;
     686 		send_syscall_resp(spc, rhdr->rsp_reqno, rv, retval);
     687 		return;
     688 	}
     689 	spc->spc_syscallreq = rhdr->rsp_reqno;
     690 	rv = rumpsyscall(sysnum, data, retval);
 
 https://nxr.netbsd.org/xref/src/lib/librumpuser/rumpuser_sp.c?r=1.77#684
 
 So this leak would grow fairly quickly in processes issuing rump
 syscalls to rump_servers, which t_sp:sigsafe's helper h_sigcli does
 as fast as it can in a loop for 5sec -- which is just long enough for
 the i386 releng testbed with about 128 MB of RAM to run out of memory
 in roughly half of the test runs, but not long enough on my laptop
 with 64 GB of RAM to ever reproduce the problem.
 
 Found by:
 
 (a) running the test for 20sec rather than 5sec to amplify the leak;
 (b) counting bytes allocated by return addresses of rumpuser_malloc;
 (c) chasing that to rump_hypermalloc (culprit) vs uvm_km_alloc;
 (d) chasing that to pgctor, uvm_map, uvm_km_kmem_alloc (culprit),
     vmapbuf;
 (e) then to pool_page_alloc, then to pool_allocator_alloc, then to
     pool_grow, then to pool_get, then to pool_cache_get, then to
     kmem_zalloc;
 (f) finally to the call to kmem_zalloc in rump_lwproc_newlwp, which
     at last yielded to my stare.
 
 This instrumentation was extremely ad hoc -- I just created a table
 of 4096 entries for each routine, and populated by linear scan with
 atomics:
 
 struct {
 	void *volatile ra;
 	volatile unsigned long n;
 } pool_cache_get_bytes;
 
 void *
 kmem_zalloc(size_t size, km_flags_t flags)
 {
 	...
 	void *const ra = __builtin_return_address(0);
 	size_t i;
 	for (i = 0; i < __arraycount(kmem_zalloc_bytes); i++) {
 		if (pool_cache_get_bytes[i].ra == ra ||
 		    (pool_cache_get_bytes[i].ra == NULL &&
 			atomic_cas_ptr(&pool_cache_get_bytes[i].ra, NULL,
 			    ra) == NULL)) {
 			atomic_add_long(&pool_cache_get_bytes[i].n, size);
 			break;
 		}
 	}
 	...
 }
 
 Would be nice to systematize this.  Would also be nice to bring back
 malloc tags for accounting purposes so you don't need to match up the
 kmem_zalloc return addresses with the kmem_free return addresses to
 find leaks -- I got lucky here because there were very few return
 addresses to piece through.
 
 PR misc/59252: tests/rump/rumpkern/t_sp:sigsafe: out of memory
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.58 -r1.59 src/sys/rump/librump/rumpkern/lwproc.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
 


Home | Main Index | Thread Index | Old Index