pg_jobc is a process group struct member counting the number of processes with a parent capable of doing job control. Once reaching 0, the process group is orphaned. With the addition of asserts checking for pg_jobc > 0 (by maxv@), we caught issues that pg_jobc can go negative. I have landed new ATF tests that trigger this reliably with ptrace(2) (src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was originally triggered with GDB. There are also other sources of these asserts due to races The ptrace(2) ATF tests are reliable triggering a negative pg_jobc, however there are racy tests doing the same as triggered by syzkaller (mentioned at least in [1]). Should we allow pg_jobc going negative? (Other BSDs allow this.) Is going negative in the first place a real bug? Are the races triggered by syzkaller real bugs? [1] http://mail-index.netbsd.org/current-users/2020/05/04/msg038511.html On 20.04.2020 18:32, Maxime Villard wrote: > Module Name: src > Committed By: maxv > Date: Mon Apr 20 16:32:03 UTC 2020 > > Modified Files: > src/sys/kern: kern_proc.c > > Log Message: > Add three KASSERTs, to detect refcount bugs. > > This narrows down an unknown bug in some place near, that has manifested > itself in various forms (use-after-frees, uninit accesses, page faults, > segmentation faults), all pointed out by syzbot. > > The first KASSERT in fixjobc() fires when the bug is encountered. > > > To generate a diff of this commit: > cvs rdiff -u -r1.244 -r1.245 src/sys/kern/kern_proc.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > > > Modified files: > > Index: src/sys/kern/kern_proc.c > diff -u src/sys/kern/kern_proc.c:1.244 src/sys/kern/kern_proc.c:1.245 > --- src/sys/kern/kern_proc.c:1.244 Sun Apr 19 20:31:59 2020 > +++ src/sys/kern/kern_proc.c Mon Apr 20 16:32:03 2020 > @@ -1,4 +1,4 @@ > -/* $NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej Exp $ */ > +/* $NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp $ */ > > /*- > * Copyright (c) 1999, 2006, 2007, 2008, 2020 The NetBSD Foundation, Inc. > @@ -62,7 +62,7 @@ > */ > > #include <sys/cdefs.h> > -__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej Exp $"); > +__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp $"); > > #ifdef _KERNEL_OPT > #include "opt_kstack.h" > @@ -554,6 +554,7 @@ proc_sessrele(struct session *ss) > { > > KASSERT(mutex_owned(proc_lock)); > + KASSERT(ss->s_count > 0); > /* > * We keep the pgrp with the same id as the session in order to > * stop a process being given the same pid. Since the pgrp holds > @@ -1181,8 +1182,11 @@ fixjobc(struct proc *p, struct pgrp *pgr > if (entering) { > pgrp->pg_jobc++; > p->p_lflag &= ~PL_ORPHANPG; > - } else if (--pgrp->pg_jobc == 0) > - orphanpg(pgrp); > + } else { > + KASSERT(pgrp->pg_jobc > 0); > + if (--pgrp->pg_jobc == 0) > + orphanpg(pgrp); > + } > } > > /* > @@ -1197,8 +1201,11 @@ fixjobc(struct proc *p, struct pgrp *pgr > if (entering) { > child->p_lflag &= ~PL_ORPHANPG; > hispgrp->pg_jobc++; > - } else if (--hispgrp->pg_jobc == 0) > - orphanpg(hispgrp); > + } else { > + KASSERT(hispgrp->pg_jobc > 0); > + if (--hispgrp->pg_jobc == 0) > + orphanpg(hispgrp); > + } > } > } > } >
Attachment:
signature.asc
Description: OpenPGP digital signature