tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Possible "new" redirect style for /bin/sh (needs a name)



Sometime, in the now moderately distant past, I recall a notable
NetBSD developer say that the one useful thing missing from our /bin/sh
that is present in bash & ksh (and zsh) is {var}>whatever type redirects.

I have been hesitant about implementing this as I could not think
of a way (in our shell) to do it safely.

For anyone unaware, this kind of redirect does the redirect, picks
a "random" fd, ("random" here just means unknown to the script writer, no
mathematical randomness properties implied, or implemented) and assigns
its value to the variable named in the braces.   It is particularly useful
in functions, where a temporary fd is needed, but where it is unknown
which fds the application might be using (so the function cannot simply
safely pick fd=7 (or some other number) and hope).

Our problem is what "random" fd to assign?    Traditionally, shells
have simply assigned anything (currently unused) >= 10, and with ksh
that works fine, as users are only allowed to use (in redirects) fds
between 0 and 9 (incl) - that is, a single digit.   We don't have that
limitation, so I have always been worried about how to avoid

	exec {var}>/some/hidden/file	; # sh assigns var=10 and fd 10 is open
	# some arbitrary amount of intervening code here
	exec 10>/other/public/file

	echo secret text > ${var}

I could never see a way to stop that from happening.  Other shell internal
use fd's the shell can simply renumber when the user attempts to use the
same fd for the script's use (and we do that) - but that's not possible here
as we cannot tell what the user might have done with the contents of var

Eg:
	exec {var}>/some/hidden/file
	echo $var >/some/hidden/fd-number

and then later

	echo whatever > "$(cat /some/hidden/fd-number)"

This kind of thing might seem particularly perverse, but it is perfectly
legal, and should work, and be safe.

I am still unable to think of a way to make this safe ... bash is currently
the only shell that has this problem, and there it is simply ignored.
It hasn't caused any reported problems, but that may be because most
shells still don't allow explicit use of an fd >= 10, so most scripts
simply don't attempt to do that.

Relying on that kind of "works much of the time" has never been good
enough for me.

While I still cannot think of a way to automatically handle this, I
do have a way to allow the script to tell the shell the biggest fd
it will ever use, after which the shell simply always assigns fds
bigger than that when using this new kind of redirect.   If the script
never explicitly uses fds >= 10, then it need do nothing (this is the
normal case), otherwise it should make some explicit reference to the
fd in code that is seen by the parser (it doesn't need to actually be
executed) before the {var}> type redirect is first executed.

The simplest way would be via a command

	: 27>&-

(that is, run the ':' command (do nothing) with fd 27 closed - it is
immaterial for this whether fd 27 was open already or not).

It would be good enough to do

	false && : 27>&-

if you're worried about the cost or some potential error from this
(but there cannot really be one, at worst, a shell which does not
allow fd's >= 10 in places like this would parse that as ": 27 >&-"
that is, run the ':' command, with arg "27" and stdout closed.)

That tells the shell that the script might use any fd up to, and including,
fd 27 - so for {var}>type redirects the shell will assign fds >= 28.

So, now that we have a (I believe) semi-reasonable way to avoid the
issue I was worried about, I have done an implementation of this kind
of redirect, and I'd like to know if the community would like this added
to our /bin/sh

I have a (at least mostly) working implementation - but no doc for it yet.

The latter is partly because I have no idea what this kind of redirection
is called, and apparently, nor does anyone else.   I believe that it needs
a name, so it can be better documented, and talked about (including in
e-mail like this).

Any suggestions?

Chet Ramey (bash maintainer) said that he calls it (internally only) a
"varassign redirect" - which is kind of OK, but I don't like it, as to
me, a varassign is a variable assignment that precedes a command, as in

	TERM=vt100 xterm ...

or
	COLUMNS=40 ls

and I don't much like the confusion that use of "varassign" for this kind
of redirect would cause.

Given my objection, Chet says it is my turn to try picking a name... but
that's something I dislike doing, so I am asking for suggestions.


Anyway, that aside, the semantics I have implemented (slightly different
from those of bash, less different from ksh, but still slightly different)
are:

Any redirect can be of the form {var}> whatever (just like they can
be N> where N is a decimal number).

where {var} is a legal variable name, enclosed in braces, the whole thing
entirely unquoted (up to and including the redirect operator), and immediately
preceding the redirect operator - no spaces intervening (any of the redirect
operators is possible), and "whatever" can be anything appropriate (no
different from any other redirect).

In all but two cases, the effect is to do the redirect as normal, pick a
"random" fd (as discussed above) to use for it, and assign a string
representing the decimal encoding of that fd to the variable named in
the braces.   (This is the same everywhere this kind of thing is implemented).

The exceptions are

1. when the fd is being closed, as in >&- ... it makes no sense to assign
something to the variable in that case, so for {var}>&- or {var}<&- the
semantic is that the fd found in variable var (ie: ${var}) is closed.
If $var is not in the form of a valid fd, for us it is a redirect error
(so "var=foo; {var}>&-" will generate an error message - a non-interactive
shell would exit).   bash and ksh93 don't treat this as an error, but I
think that's a mistake.  I have no idea what zsh is doing, it seems weird.

2. when a script does {var}>&${var} that will (only in our shell, for now
anyway) be treated just like we treat 5>&5, as a signal to the shell to
disable "close on exec" if it is set on the file descriptor, but otherwise
do nothing.   Other shells treat this as "assign a different open fd to var"
(and unless the old value of var was saved somewhere else, simply lose
track of the old fd that $var used to reference).  That doesn't seem useful
to me, but our semantic does.

The reason for that, is that like any other fd assigned by the exec
command (which is the most common way to use {var}> type redirects,
and I wish I had a name rather than keeping referring to it that way,
is using the exec command so the redirect stays open in the shell) in
our shell, if the fd is > 2 (and these ones always are that), we set
close on exec by default.

If you want to do something like

	exec {fd}</my/config/file
	cat /dev/fd/${fd}

it would fail, as when cat runs. $fd will be closed (in it) because of the
close on exec.  The same happens with

	exec 5</my/file
	cat /dev/fd/5

The 5<&5 syntax (<& and >& do exactly the same thing, in all cases) was
invented (in our sh) to handle this

	cat /dev/fd/5 5<&5

works as expected - the reasoning being that if a command line explicitly
redirects fd 5, then it obviously expects it to be open, just as if the
command had been

	cat /dev/fd/5 5</my/file

which is more or less the same thing, in this case, but here fd 5 does
not remain open in the shell, it would need to be opened again if needed,
so file positioning is lost.

With {var}> type redirects, the only way to have done this would have
been something like

	eval "cat /dev/fd/${fd} ${fd}<&${fd}"
which turns into
	eval "cat /dev/fd/10 10<&10"
when expanded, and would work - but using "eval" correctly can be tricky
sometimes (easy in this trivial case, not so easy if there are other args
that would be expanded twice, and shouldn't be.  It can be handled, but
people often get it incorrect.

So., I thought that just

	cat /dev/fd/${fd} {fd}<&${fd}

would be a good analog to the version using explicit fds.

Note that it is not possible to do

	cat /dev/fd/${fd} {fd}</some/file

as the prescribed order of evaluation (ie: POSIX) requires the args
be expanded first, amd the redirects done later, so this would turn
into something like

	cat /dev/fd/ 10</some/file

which is not useful.   It is however (will be, in our shell) possible
to do

	export fd; command --use-fd-from-var=fd {fd}</some/file

The "export" causes fd=$fd to be placed in the environment of the
command when it is run, and that is almost the last thing that happens
before the exec of command - after the redirects are done, and the
file descriptor value has been stored in fd.   The command simply has to
be told (or know a-priori) what variable name to hunt for in the environment
to discover which fd was opened for it.

Note that

	fd=${fd} command use-fd=fd {fd}</some/file

is not guaranteed to work, and will not work in our shell, varassigns
before a command are performed before redirects (in some shells, for some
commands, the order is reversed).

Those are the more or less standard sematics.

In addition to those, I have added a couple.

First, the variable named is kind of special - if it is unset or
a new value is assigned to it, the fd it used to reference is closed.
(This happens only when the variable used in the redirect that opened
the file descriptor is altered .. including fd=${fd}.  If that variable
is copied to another
	fd2=${fd}
then fd2 is usable in redirects >&${fd2} - just as long as $fd is not
changed - if $fd loses its value, the file descriptor is closed.

I decided this was needed to allow

	fn() {
		local fd

		exec {fd}<&0	# save stdin
		# do more commands, presumably altering stdin
	}

and have the fd referred to by $fd inside the function not be
abandoned when the function returns (for any reason, including
some kind of error).  A perfect function would make sure to do

	exec {fd}>&-

before returning, but that can sometimes be hard to guarantee.

When the local value of fd is lost (and any previous value is
restored to it) when the function returns, $fd will be closed,
meaning far less chance of "lost" file descriptors.   (If fd was
not local, then it will still exist globally, and nothing special
happens).

In addition, when the script does explicitly close the fd, using
{fd}>&- then fd will be unset.    On the other hand if it did
{fd2}>&- (assuming the "fd2=$[fd}" from above) then the $fd is still
closed, but fd remains untouched (fd2 would be unset).  This is dangerous
however (and eventual doc will warn against it) as fd still remains magic
in that case, and if the script later did "{var}>/somewhere" var might be
set to the same value as fd, and then if fd is unset or reassigned,
the fd would magically close (${var} would no longer refer to a valid fd),
and that would continue happening.

Lastly, the fdflags builtin command has been extended to allow {var}
as an arg, so it can be used to manipulate the flags (including
close on exec) for the fd in ${var}.   This is something of a frill,
as ${var} would also work...   (The {var} form however gives a more
reasonable error message if var does not contain a fd).  Further, in
most common cases (this sometimes doesn't work) in output from a simple
"fdflags" (no args) command, the fd will be shown as {fd} (name as appropriate)
instead of as the value of the fd.   Eg:

	[jinx]+DEBUG{2}$ exec {foo}>/dev/null
	[jinx]+DEBUG{2}$ fdflags
	0: 
	1: 
	2: 
	{foo}: cloexec
	[jinx]+DEBUG{2}$ exec {bar}>/dev/null; exec {bar}>&${bar}; fdflags
	0: 
	1: 
	2: 
	{bar}: 
	{fd}: cloexec

This is a total kludge, and is the most likely thing to fail to survive.

That's mostly it I think (I may have forgotten some details, ask if something
appears to be missing, or confusing).

The implementation is done, and being (slowly) tested - and should be more
or less ready to make available in HEAD if there is some agreement that
doing do would be a good idea.  No guarantee at the minute that the new
parts are all completely bug free, but my current testing seems to be
indicating it is OK.   I am reasonably confident (but not certain) that
scripts (and users) that don't use {var}> type redirects (which means
everything currently using the NetBSD sh) will be unaffected.

Suggestions for improvements or changes welcome (no guarantee I will be
able to implement all though).

But (at least if you believe this currently unnamed thing is worth having)
please do suggest a name to give it.

kre



Home | Main Index | Thread Index | Old Index