NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/52640: /bin/sh can "lose" background children when waiting on foreground ones

>Number:         52640
>Category:       bin
>Synopsis:       /bin/sh can "lose" background children when waiting on foreground ones
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Oct 23 05:40:00 +0000 2017
>Originator:     Robert Elz
>Release:        NetBSD 8.99.1  (lots and lots of releases...)
System: NetBSD 8.99.1 NetBSD 8.99.1 (VBOX64-1.3-20170812) #39: Sat Aug 12 15:25:04 ICT 2017 amd64
Architecture: x86_64
Machine: amd64
	The following script

#! /bin/sh

(sleep 3; exit 3) & PID=$!
sleep 10

(wait $PID; echo "In child:  status" $?)

wait $PID; echo "In parent: status" $?

	should print:

In child:  status 127
In parent: status 3

	as all other shells I could find to test do (except bosh,
	which is just broken, and appears to return status 0 from
	the wait command in all cases, and zsh, which is just weird,
	in this and so many other ways)

	instead, on all currently available NetBSD sh's we see

In child:  status 127
In parent: status 127

	That's because the background job completes while sh is waiting
	for the later foreground job, and when that happens (at least
	in many cases) the background job is simply discarded (if it
	exited with a signal, that will be immediately reported, but
	it will still be discarded.)

	Fix that problem and we instead get

In child:  status 3
In parent: status 3

	!!!   The sub-shell has no children, it should not be
	      able to get status from one of its siblings.

	This only happens when the child has already exited before
	the sub-shell is forked, and only when the status of that
	child has not already been discarded (including incorrectly
	discarded as above.)

	This is because when a sub-shell is forked, the job table
	(which holds the results of completed tasks, and the status
	of active ones) is just marked invalid, not actually cleared
	(until a new job needs to be created), but the shell's "wait"
	command only bothers to look at the "invalid" flag in the
	case of a simple "wait" (ie: not "wait pid") which is actually
	backwards - the "wait" case does not really need it, though it
	avoids wasting (cpu) time, whereas the "wait pid" case does.


	Write any script that runs a short background job, then a
	longer foreground one (which is probably why this hasn't
	been noticed - most commonly the timings are inverted),
	and observe what happens when the script eventually waits
	for the (already completed) background job.

	Coming soon....   Will request pullup to -8, the shells on
	the older systems are so out of date that they can just
	continue to suffer with this (and many other) problems that
	are usually never noticed in the wild.

Home | Main Index | Thread Index | Old Index