Forthcoming shell (/bin/sh) changes

To: tech-userlevel%netbsd.org@localhost
Subject: Forthcoming shell (/bin/sh) changes
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Mon, 22 Apr 2019 00:43:31 +0700
I have a fix for PR bin/53550  wrong exit status of a command that is
(more or less) just a here-doc which contains a command substitution...

Yes, I know, no-one cares ...   that fix is done, but is waiting for
me to get up the energy to add some ATF tests for it (& perhaps various
similar related issues).

More importantly here, and completely unrelated to the previous issue,
I am planning to change the way traps (particularly interrupt traps)
work in interactive shells (plus one more trap related change).

First, in an interactive shell, a caught SIGINT will get the trap
executed, and then another prompt issued, if the SIGINT happens while
waiting to read the next command from the terminal.   Currently, the
trap does not fire until the user enters a \n to complete the current
command line (often an empty command line).

You can see the difference if you (using sh) do
	xxxx ^C
with SIGINT not trapped (you'll get a new prompt immediately).

On the other hand, if you do
	trap 'echo interrupt' INT
first, and then repeat the

	xxxx ^C

you're likely to see no response at all, until
you later type another ^C (wondering if you actually
typed the first...) which will give a new prompt.

In either case, when you (after this) type a \n the
trap (the "echo interrupted") will fire, and "interrupted"
will appear on the terminal.   If you waited a while between
the ^C and the \n, and forget the ^C was ever entered, you
might even wonder what the message is about, as nothing was
interrupted anytime particularly near when it appears.

Caution: if you do end up typing just one ^C (and assuming
you have command line editing enabled - but almost everyone
does) then the "xxxx" that had been typed will still be
executed as a command (a second ^C will abort that).

There are reasons for all this odd behaviour in the way that
traps are handled in the shell, but I don't really believe
that any of them are particularly good ones.   That is, I doubt
any of it is intentional, it all just happens because of the
way all this is imlemented.


Second, in (some) other shells (but not all), if you are
executing a complex command, eg:

	for n in 1 2 3
	do
		sleep 10
	done

(in the foreground) and you interrupt (^C) (and SIGINT is being
trapped) then the current action (shared with other shells - this
is ancient behaviour) is for the interrupt to kill the running "sleep"
(that's normal, and required), then for the trap to fire (with the
trap command as above, "interrupted" will appear), and then the loop
will continue to run, starting the next sleep.   In this example,
three ^C's (one during each sleep) end the loop quickly.

That's exactly what should happen for a non-interactive shell (for
any trap) but for SIGINT in particular, in an interactive shell,
IMO the better behaviour (more intuitive) is for a trapped SIGINT
to act just like an untrapped one (except to also run the trap action
command of course) and abort the loop (any complex command) completely.

So, unless there are (reasoned) objections, that is what I am planning
to make happen.   It will probably not affect almost anyone, as setting
traps in interactive shells is not a common thing to do.

Note: this looks like it might be related to PR bin/50431 ("sh job control
doesn't work for loops") - but it isn't.   That is a much more difficult
problem, which won't be attacked before NetBSD 9.   I thought I had found
a relatively easy to implement fix for that, but the idea was nonsense,
(I mean, complete unbelievable trash) so I abandoned that one.   Before
I went off the rails on that, I had conceived the basis of a different
fix for that PR, but that would be a major operation, which, while it
shouldn't affect any normal use of the shell (when you're not attempting
to use job control on loops etc) it might.   It will also take time to
code properly.

The changes above really only apply to SIGINT and are restricted to
that (SIGINT is handled differently in the shell to all other signals
already - though not when it is trapped, that is the root cause of the
weird differences in behaviour).   Also note that the two changes are
not completely independent.


I could be convinced to make SIGQUIT act the same, if people think that
is worth it - SIGQUIT isn't normally used very much, except when you're
running a command and want to force it to dump core (that works now,
and won't change) - an interactive shell largely just ignores (but not
in the SIG_IGN sense) SIGQUIT if it is not trapped.   If it is, it acts
like any other trapped signal (that is, its behaviour, when trapped,
is just like reported above for SIGINT when trapped, as it currently is.)


The unrelated change (just because I was looking at traps) is to alter
a long (very long) time behaviour related to traps and SIGKILL and SIGSTOP.

We know (I hope) that those signals are magic - no process can do anything
to alter their standard behaviour in any way at all, and the shell is not
exempt from that.   But the shell pretends to not know that, and allows
users to pretend to catch or ignore those signals via the trap command.
So do most (but not all) other shells (in POSIX attempting to do anything
to those two signals produces undefined behaviour).

I can't imagine a reason anything would ever even attempt to trap or ignore
either of those signals (I've never seen anything do that .. other than
for testing) but I am proposing to issue an error instead of simply accepting
the command, pretending it worked, but otherwise ignoring it.   That is
have the trap command exit with status 1 instead of 0, and issue an error
message (except in SMALL shells ... just because the code to generate a
nicely appropriate message is more than I want to add to a SMALL shell, it
could have a much simpler message if anyone really believes that anyone
would ever see it!)   Along with that, the trap command with no args would
no longer ever list those signals (as it only outputs anything for traps
not in their default state, and these would no longer be able to ever
depart that state after the change).   But "trap -p" (which POSIX is
planning to copy from us, and include in Issue 8) lists all signals
(POSIX are not planning to require these two to be included however)
so I am planning to not include them there (any more) either.   Including
them just makes more output, for no benefit...    However, "trap -p KILL"
will still work (but is guaranteed to generate "trap -- - KILL" (same for
STOP) and those commands if executed (that is, explicitly setting SIGKILL
or SIGSTOP "back" to the default state) will continue to work without error.

Any opinions on that one?

Lastly, a change I am not making (now) but perhaps might some day.

There has been a big argument (well, between me, and one other list
member) about what the standard requires as output from the "trap"
command for signals that are ignored on entry to a non-interactive
shell, but which the script has attempted to catch (or set back to,
effectively, SIG_DFL).   It is clear that the trap cannot work, if
the signal is ignored at entry, then it should stay ignored throughout
the life of the shell - the reasons for this relate mostly to terminal
generated signals and they way the terminal sent such signals in the
days before job control and process groups.   Yet all shells (including
us) make that happen - or at least pretend to - to the script.

However, ash based shells (which includes our shell of course) allow
the script to set an action for such signals - and just never arrange
to catch the signal to make the trap work, or to set the signal back
to SIG_DFL if that is what the trap command requested).   In this ash
copied the original 7th edition Bourne shell, which acted this way too
(though its output format was useless for anything, that ash also
originally copied, but that was fixed ages ago.)


There is some opinion that this is incorrect (according to POSIX),
and than any attempt to set a trap on a signal which was ignored at
the startup of a non-interactive shell should simply be ignored (have
no effect at all ... generating an error is clearly not required, and
probably should be forbidden - no-one does that, and doing so might
break scripts).   Shells that act this way (mostly) don't report such
signals in the output of the (no arg) trap command.   The one exception
is modern versions of bash (it changed sometime during the bash4 era I
believe) which reports such signals as ignored in "trap" output.   There
was some opinion (not from the current, and long time bash maintainer
incidentally) that POSIX actually requires this bash behaviour (despite
no shells at the time the relevant text was written - none - implementing
that).    I think (hope) that opinion has now been debunked...

But whether or not we are actually permitted to accept an attempt to
alter the trap for one of those signals, and in the trap command, report
its state, as if it worked, is less clear.   The "reference implementation"
shells from when the standard was created did not do that (they simply
ignore the trap command that attempts to alter the state, and emit nothing
related to such signals in the trap (without args) command).

Just in case, are there any opinions from people here about what is the
best behaviour here?   If people think it is worthwhile, we could change
now, we would certainly not be moving away from POSIX by doing that, and
might be moving closer.

What we should (according to the standard) report for "trap -p" or
"trap -p SIGxxx" for such a signal) is not yet set in stone, but I
doubt that there is likely to be much difference (indicentally, the
-p arg to trap, with signal name args as well was invented in ksh93
I believe, and copied, with the same purpose, but different output format,
in bash ... what NetBSD added was a meaning for "trap -p" without args
in order to fix a problem with the "trap" command without args, which
cannot reasonably be fixed there - it is too much of a difference for
something so old and set in so much concrete).   Both forms of the -p
option are being added to the standard - with signal arg(s) which
copies the bash output format (which is the same as the no-arg format
which POSIX specifies) rather than the ksh93 format (which is arguably
easier to process) -- we copied bash's format as well.   Without signal
args the -p option is just like trap with no args at all - except that
the output is required to include all possible signals, except KILL and
STOP as an option .. if they are included it places constraints upon other
aspects of the trap command (which is why I am planning on no longer
including them .. not that it really makes a lot of difference.)

OK, enough for now, I am still working on verifying the trap changes
I am currently proposing making (making sure they work correctly, all
the time) and of course, there is plenty of opportunity to make
different (or even no, though I really hope we don't do that) changes
if that's the general opnion.

kre
Prev by Date: Re: _NETBSD_SOURCE always defined leading to unwanted inclusions
Next by Date: Re: C11 threads patch proposal
Previous by Thread: _NETBSD_SOURCE always defined leading to unwanted inclusions
Next by Thread: terminus font set for netbsd
Indexes:
Home | Main Index | Thread Index | Old Index