Re: sh(1) wait builtin command and stopped jobs

To: Edgar Fuß <ef%math.uni-bonn.de@localhost>
Subject: Re: sh(1) wait builtin command and stopped jobs
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Wed, 16 Mar 2022 04:23:28 +0700

Date: Tue, 15 Mar 2022 16:48:09 +0100
From: Edgar =?iso-8859-1?B?RnXf?= <ef%math.uni-bonn.de@localhost>
Message-ID: <YjC1OR7F2YJ4xMnO%trav.math.uni-bonn.de@localhost>

| I guess "enters stopped state" includes the case where the process
| already was in the stopped state when the wait command was issued?

Yes, sequencing doesn't matter (though it makes a difference with the
current code, that's a prime motivation for fixing it).

| I don't have any strong opinion, but also find it slightly more natural
| that way.

[normal wait only waits for exited jobs] Yes, that is in my sources now,
being tested (very slowly, as I find the time... this has been this way for
quite a while, there's no great hurry to fix it, if I miss the -10 branch,
this can easily be pulled up).

So, as there were no comments or objections (until your comment just now,
and apart from a useful off-list discussion with Mouse which helped me
clarify some things I was thinking), that's the way things are going to go.
First commit will be to just make wait(1) (in sh) ignore stopped jobs.
(They will be treated identically to running ones).

Next an option will get added to allow wait to return stopped processes
as well as exited ones (I briefly considered allowing waiting for only
stopped processes, ignoring exited ones, but doing that makes no sense
at all, so that won't happen). I will probably also implement a do-nothing
option to wait only for exited jobs, in the hope that those shells where
wait waits for stopped and exited jobs by default will pick up that
option, and wait only for exited ones - and ideally also our wait for
stopped or exited jobs option, which would be a no-op for them. This
is just to allow portable scripts to be written.

Lastly, and just possibly, this I am not sure of yet, probably won't
be until I implement it, and try it out for a while to see whether it
works sanely, if the shell is waiting for a job (without the option to
return stopped jobs) and that job stops, then resume it in the
foreground. It has to be foreground, if backgrounded, whatever made
it stop previously will probably make it stop again immediately, which
would result in the shell simply going into a loop continually restarting
that job.

The discussions with Mouse raised the question of waiting on multiple
jobs (as for example, the simple "wait" command with no args at all) if
several of them stop. For that, we would need to restart them in
foreground, serially. As in the last paragraph, they cannot be in the
background, and since each separate job will be in its own process group,
only one of them can be foreground at a time (the controlling tty must be
in the foreground job's process group). Hence, as unappealing as it
sounds, when waiting for more than one job, if more than one of them
stops, resume one, wait for that one to finish (any others that finish
without stopping are fine of course, and get included in the wait),
then restart the next, wait for it (restarting it again if it stops
again) until it completes, and on to the next...

Aside from not doing the restart at all, which is certainly still a
possible outcome, there really is no other choice (in that case, the
shell would simply wait forever, and the stopped job(s) would need to
be continued, or killed, externally).

And finally, to repeat something I said last time ... this is all just
an obscure corner case. In practice, people rarely do wait inteactively,
and if they do, and a job stops (and we don't do auto restart in that
case) the "stopped" message will still appear on stderr (or stdout, or
wherever those things appear now, I've forgotten) and the user can SIGINT
the wait and carry on.

None of this applies to normal scripts, which don't usually enable job
control, so the shell running the script, and everything it runs, are all
in the same process group, if something stops one of the processes (other
that a SIGSTOP sent just to one process - which must be from some other agent,
which can be assumed will SIGCONT it when appropriate) then the whole process
group gets signalled, which stops the shell along with its children. There's
no question of what the shell should do in this state - the only possibility
is nothing.

So, none of what happens as a result of this "discussion" should affect
anything that almost anyone ever sees.

kre
| Long ago, I used processes stopping themselves as a primitive synchronisation tool (not from a shell script, however). I used an ELC to feed four CD writers, which worked well when the four cdrdao processes were in sync, but miserably failed otherwise. So I added a --stop option to cdrdao which stopped the process as soon as the lengthy initialization was complete and then manually issued a kill -CONT to make them continue.
|

References:
- Re: sh(1) wait builtin command and stopped jobs
  - From: Edgar Fuß
- sh(1) wait builtin command and stopped jobs
  - From: Robert Elz

Prev by Date: Re: inetd tests failing
Next by Date: Re: inetd tests failing
Previous by Thread: Re: sh(1) wait builtin command and stopped jobs
Next by Thread: inetd tests failing
Indexes:

Home | Main Index | Thread Index | Old Index