Source-Changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
CVS commit: src/sys/rump/librump/rumpkern
Module Name: src
Committed By: riastradh
Date: Sun Apr 6 01:13:55 UTC 2025
Modified Files:
src/sys/rump/librump/rumpkern: lwproc.c
Log Message:
rump: Nix leaked struct lwp on every rump_server syscall.
This leak was introduced by lwproc.c rev. 1.51 back in 2020, when the
kmem_zalloc(sizeof(*l), KM_SLEEP) was factored out of callers and
into lwproc_makelwp:
-static void
-lwproc_makelwp(struct proc *p, struct lwp *l, bool doswitch, bool procmake)
+static struct lwp *
+lwproc_makelwp(struct proc *p, bool doswitch, bool procmake)
{
+ struct lwp *l = kmem_zalloc(sizeof(*l), KM_SLEEP);
...
@@ -435,8 +431,12 @@ rump_lwproc_newlwp(pid_t pid)
kmem_free(l, sizeof(*l));
return EBUSY;
}
+ mutex_exit(p->p_lock);
mutex_exit(&proc_lock);
- lwproc_makelwp(p, l, true, false);
+
+ /* XXX what holds proc? */
+
+ lwproc_makelwp(p, true, false);
Unfortunately, the kmem_zalloc in rump_lwproc_newlwp was not deleted.
So it just leaked.
And this routine is called on _every_ syscall handled by rump_server
(via struct rumpuser_hyperup::hyp_lwproc_newlwp):
674 static void
675 serv_handlesyscall(struct spclient *spc, struct rsp_hdr *rhdr, uint8_t *data)
676 {
677 register_t retval[2] = {0, 0};
678 int rv, sysnum;
679
680 sysnum = (int)rhdr->rsp_sysnum;
681 DPRINTF(("rump_sp: handling syscall %d from client %d\n",
682 sysnum, spc->spc_pid));
683
=> 684 if (__predict_false((rv = lwproc_newlwp(spc->spc_pid)) != 0)) {
685 retval[0] = -1;
686 send_syscall_resp(spc, rhdr->rsp_reqno, rv, retval);
687 return;
688 }
689 spc->spc_syscallreq = rhdr->rsp_reqno;
690 rv = rumpsyscall(sysnum, data, retval);
https://nxr.netbsd.org/xref/src/lib/librumpuser/rumpuser_sp.c?r=1.77#684
So this leak would grow fairly quickly in processes issuing rump
syscalls to rump_servers, which t_sp:sigsafe's helper h_sigcli does
as fast as it can in a loop for 5sec -- which is just long enough for
the i386 releng testbed with about 128 MB of RAM to run out of memory
in roughly half of the test runs, but not long enough on my laptop
with 64 GB of RAM to ever reproduce the problem.
Found by:
(a) running the test for 20sec rather than 5sec to amplify the leak;
(b) counting bytes allocated by return addresses of rumpuser_malloc;
(c) chasing that to rump_hypermalloc (culprit) vs uvm_km_alloc;
(d) chasing that to pgctor, uvm_map, uvm_km_kmem_alloc (culprit),
vmapbuf;
(e) then to pool_page_alloc, then to pool_allocator_alloc, then to
pool_grow, then to pool_get, then to pool_cache_get, then to
kmem_zalloc;
(f) finally to the call to kmem_zalloc in rump_lwproc_newlwp, which
at last yielded to my stare.
This instrumentation was extremely ad hoc -- I just created a table
of 4096 entries for each routine, and populated by linear scan with
atomics:
struct {
void *volatile ra;
volatile unsigned long n;
} pool_cache_get_bytes;
void *
kmem_zalloc(size_t size, km_flags_t flags)
{
...
void *const ra = __builtin_return_address(0);
size_t i;
for (i = 0; i < __arraycount(kmem_zalloc_bytes); i++) {
if (pool_cache_get_bytes[i].ra == ra ||
(pool_cache_get_bytes[i].ra == NULL &&
atomic_cas_ptr(&pool_cache_get_bytes[i].ra, NULL,
ra) == NULL)) {
atomic_add_long(&pool_cache_get_bytes[i].n, size);
break;
}
}
...
}
Would be nice to systematize this. Would also be nice to bring back
malloc tags for accounting purposes so you don't need to match up the
kmem_zalloc return addresses with the kmem_free return addresses to
find leaks -- I got lucky here because there were very few return
addresses to piece through.
PR misc/59252: tests/rump/rumpkern/t_sp:sigsafe: out of memory
To generate a diff of this commit:
cvs rdiff -u -r1.58 -r1.59 src/sys/rump/librump/rumpkern/lwproc.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Home |
Main Index |
Thread Index |
Old Index