tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sh: killing a pipe head from the tail

    Date:        Wed, 8 Dec 2021 08:42:21 -0500 (EST)
    From:        Mouse <mouse%Rodents-Montreal.ORG@localhost>
    Message-ID:  <202112081342.IAA29335%Stone.Rodents-Montreal.ORG@localhost>

  | > That would not have worked, non-interactive shells are forbidden from
  | > allowing the script to change the status of a signal which is ignored
  | > when the shell starts: [...POSIX...]
  | Then POSIX is broken and there needs to be a way to disable this
  | particular bit of braindamage;

POSIX is (once again) caught between its two masters there.

On one hand it is (in this case, it is similar for the rest of the
standard, just different wording would be used ... in my e-mail I mean)
providing a specification for application script writers telling
them what the shell will do if they do xyz.

For this, it must specify what (bug free) shells actually do (and is why
we end up with a whole bunch of "... is unspecified" (or worse, undefined
in a few cases) when shells don't agree on what to do.

But this one is a case where the shells all do agree, as it has been
like this since Bourne added the trap command in the earliest (released)
Bourne shell.  (If you like, it means that you have had > 40 years to
submit a bug report about this, don't you think you're a little late now?)

So, POSIX needs to tell the script writers that if they attempt to
set a trap on a signal which was ignored when the shell started, it
will not work (similarly if they attempt to reset it to SIG_DFL).

On the other hand (the other master), POSIX is also a spec for shell
implementors telling them what to implement.   It is sometimes
difficult to distinguish those two functions, but if it was done,
there are many places where the user could be told "this is what
works" and the implementors could be told "implement it better than
that, this is how it should work" - but there seems to be no interest
in going down that path, and I kind of understand why, doing all of
it would be a huge task, and if it was only part done, there would
be real confusion about how to interpret the sections which had not
been updated.

For what it is worth, I believe all this came about, because way back
in 6th edition days (or perhaps even earlier) also early 7th edition
and 32V (probably even 3BSD as well) the paradigm for setting a
signal handler in a C program was

	if (signal(s, SIG_IGN) != SIG_IGN)
		signal(s, handler);

(where handler could also be SIG_DFL, but obviously would
make no sense for it to be SIG_IGN).

Often the result from the first signal() call would be saved so
it can be restored when appropriate - but that depends upon the needs
of the application, and is irrelevant to this discussion.

All this developed in the old days, when there was no way to block
signals, no process groups, no job control, ...

When a user typed the interrupt character, the kernel would simply
queue a SIGINT (similarly for SIGQUIT and SIGHUP in appropriate
circumstances) to all processes with that terminal as their controlling
tty.   Foreground processes, background processes, everything (not that
there was any other type).

When a user ran a command in the background, the shell would start it
with SIGINT and SIGQUIT ignored, so even though the kernel would send
the relevant signal to the process every time the relevant key was typed,
it would just be ignored.  That is, provided the program did not enable
signals.   (SIGHUP was ignored by the nohup command).

So, for processes that wanted (including needed) to see SIGINT when run
in the foreground, and so wanted to do

	oldsigint = signal(SIGINT, intr_handler);

the procedure was to code it as above, so as to make sure that if it happened
to be run in background, there was not even the smallest window where if the
user happened to type the intr char at the wrong time, it would affect the
background process.

Of course, this left a (very small) window where a foreground process (and in
this, do remember, that there was no practical way for a process to determine
whether it was foreground or background, the only difference was whether the
parent was waiting for it or not) was arranging to catch the signal,
where a typed intr would end up being ignored (if it just happened to occur
during the small interval between the two signal calls).   That's a much
less serious problem, as when the first intr the user types doesn't work, they
just send another one, which will work.

Now, really, all of this is only important for the tty generated signals,
originally SIGHUP SIGINT and SIGQUIT (perhaps worth noting that they are
1 2 and 3 in the signal number list...) (and now the SIGTSTP SIGTTIN SIGTTOU
are added) - there's no real need to be concerned about a SIGSEGV or SIGEMT
(or even SIGPIPE) being accidentally sent to the process, at just the wrong
time, when it wasn't wanted.  That just doesn't happen.  Never did.

But that kind of analysis wasn't common (and perhaps wouldn't even be believed)
by lots of people who had been affected by sloppy programs which didn't use
this technique (for the tty signals) and caused background processes to just
"mysteriously vanish" when the user sent an interrupt signal intended for
some other process.

The long and short if it is that it became accepted as that 2 line sequence
is the one true way to trap a signal, and anything else is broken, and must
be fixed.

Dumb perhaps, and barely even relevant in these days of process groups, where
the tty signals are only sent to the foreground process group, and background
processes never see them at all, but that's what happened, as best I remember
and understood it all.

Anyway, what is clear, is that that is exactly what Bourne wrote when his
shell was coded and the trap command was added.

That's why the standard says that there isn't required to be an error
message when a script attempts to set a trap on an ignored signal, as
the shell just did all the normal trap setting bookkeeping overhead, and
then called its signal setting code to catch the signal, and that code
used the sequence above, resulting in an already ignored signal remaining
ignored.   (It is actually a bit more complex than that, as the shell
did allow, as do all current ones, a script to ignore a signal, and then
set it back to default, or trap it, later, but that's all just implementation
detail - the principle that a signal ignored on entry remains ignored was
ingrained, and remains that way).

Good luck getting this one changed.   Posix is certainly not going to even
think about changing this as long as shells continue to behave the way
they do (which in this case, is to do what the standard says, which is also
just copying what every other Bourne compat shell has done forever).

Getting shells to alter would be difficult as well, as when making a change
like this, it is never clear what scripts might exist that depend upon it,
and no-one wants to break backwards compatibility with old scripts, or not
without a very good reason, and lots of advance planning.

The most likely route forward would be for shells to implement some new
builtin command, similar to trap, but with some of the current weirdness,
including this, fixed.   But to do that, how the new method interacts
with the old would need to be sorted out.


Home | Main Index | Thread Index | Old Index