pg_jobc is a process group struct member counting the number of
processes with a parent capable of doing job control. Once reaching 0,
the process group is orphaned.
With the addition of asserts checking for pg_jobc > 0 (by maxv@), we
caught issues that pg_jobc can go negative. I have landed new ATF tests
that trigger this reliably with ptrace(2)
(src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was
originally triggered with GDB.
There are also other sources of these asserts due to races
The ptrace(2) ATF tests are reliable triggering a negative pg_jobc,
however there are racy tests doing the same as triggered by syzkaller
(mentioned at least in [1]).
Should we allow pg_jobc going negative? (Other BSDs allow this.)
Is going negative in the first place a real bug?
Are the races triggered by syzkaller real bugs?
[1] http://mail-index.netbsd.org/current-users/2020/05/04/msg038511.html
On 20.04.2020 18:32, Maxime Villard wrote:
> Module Name: src
> Committed By: maxv
> Date: Mon Apr 20 16:32:03 UTC 2020
>
> Modified Files:
> src/sys/kern: kern_proc.c
>
> Log Message:
> Add three KASSERTs, to detect refcount bugs.
>
> This narrows down an unknown bug in some place near, that has manifested
> itself in various forms (use-after-frees, uninit accesses, page faults,
> segmentation faults), all pointed out by syzbot.
>
> The first KASSERT in fixjobc() fires when the bug is encountered.
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.244 -r1.245 src/sys/kern/kern_proc.c
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>
>
> Modified files:
>
> Index: src/sys/kern/kern_proc.c
> diff -u src/sys/kern/kern_proc.c:1.244 src/sys/kern/kern_proc.c:1.245
> --- src/sys/kern/kern_proc.c:1.244 Sun Apr 19 20:31:59 2020
> +++ src/sys/kern/kern_proc.c Mon Apr 20 16:32:03 2020
> @@ -1,4 +1,4 @@
> -/* $NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej Exp $ */
> +/* $NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp $ */
>
> /*-
> * Copyright (c) 1999, 2006, 2007, 2008, 2020 The NetBSD Foundation, Inc.
> @@ -62,7 +62,7 @@
> */
>
> #include <sys/cdefs.h>
> -__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej Exp $");
> +__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp $");
>
> #ifdef _KERNEL_OPT
> #include "opt_kstack.h"
> @@ -554,6 +554,7 @@ proc_sessrele(struct session *ss)
> {
>
> KASSERT(mutex_owned(proc_lock));
> + KASSERT(ss->s_count > 0);
> /*
> * We keep the pgrp with the same id as the session in order to
> * stop a process being given the same pid. Since the pgrp holds
> @@ -1181,8 +1182,11 @@ fixjobc(struct proc *p, struct pgrp *pgr
> if (entering) {
> pgrp->pg_jobc++;
> p->p_lflag &= ~PL_ORPHANPG;
> - } else if (--pgrp->pg_jobc == 0)
> - orphanpg(pgrp);
> + } else {
> + KASSERT(pgrp->pg_jobc > 0);
> + if (--pgrp->pg_jobc == 0)
> + orphanpg(pgrp);
> + }
> }
>
> /*
> @@ -1197,8 +1201,11 @@ fixjobc(struct proc *p, struct pgrp *pgr
> if (entering) {
> child->p_lflag &= ~PL_ORPHANPG;
> hispgrp->pg_jobc++;
> - } else if (--hispgrp->pg_jobc == 0)
> - orphanpg(hispgrp);
> + } else {
> + KASSERT(hispgrp->pg_jobc > 0);
> + if (--hispgrp->pg_jobc == 0)
> + orphanpg(hispgrp);
> + }
> }
> }
> }
>
Attachment:
signature.asc
Description: OpenPGP digital signature