Re: bin/53919: please revert changes that make /bin/sh evaluate $ENV non-interactively

To: kre%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, "Greg A. Woods" <woods%planix.ca@localhost>
Subject: Re: bin/53919: please revert changes that make /bin/sh evaluate $ENV non-interactively
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Wed, 30 Jan 2019 12:10:01 +0000 (UTC)
The following reply was made to PR bin/53919; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: bin/53919: please revert changes that make /bin/sh evaluate $ENV non-interactively
Date: Wed, 30 Jan 2019 19:06:40 +0700

     Date:        Tue, 29 Jan 2019 18:55:01 +0000 (UTC)
     From:        "Greg A. Woods" <woods%planix.ca@localhost>
     Message-ID:  <20190129185501.744187A1E5%mollari.NetBSD.org@localhost>
 
   |  Broken?  I think not.
 
 It is, and always was.   Not in the sense of not working, more in the
 sense of being something that is and always was unsafe and
 unpredictable, but just happened to work OK in the environments
 that you have encountered (perhaps even most that you could have
 encountered for a long time.)
 
 This is very much like the ancient C programs that used to assume that
 all chars were signed (so they were a safe place to put small negative
 integers, as well as small positive ones) or that computers are all
 little endian, so *(char *)&var = 1;  works to assign 1 to a var which
 was 0 before, regardless of the size of the int type that var happened to
 be, or that *(char *)NULL == '\0';
 
 All of those, for a long time, worked on many systems, and were used
 by many programs, and worked just fine - or seemed to.   Until they
 didn't.
 
 Those were always broken, never guaranteed to work, and as much as
 some people would have preferred that the underlying assumptions could
 have been converted into requirements, that was never going to happen.
 
 Your usage of ENV is just the same kind of thing.   It is making
 assumptions that neither are, nor ever have been, promised by
 anything, but which for a long time just happened to be satisfactory.
 
   |  However I'd like to prevent any /bin/sh in any current or future NetBSD
   |  release from being the first to so misbehave.
 
 It isn't misbehaving, it just is not doing what you want.
 
   |  So, with a bit more debugging, I see that the fundamental reason NetBSD
   |  /bin/sh hasn't been a problem for Korn and Korn-like Shell users since
   |  growing its $ENV hack is that it has been silently failing to do
   |  anything with David Korn's long-recommended setting for $ENV.
 
 Yes, if the file named fails to exist, sh (and all shells, I think this may
 even be part of the rules, and if not, probably should be) silently ignore
 attempts to open profile type files that don't exist.
 
 Note however, that if you don't cause $ENV to have a fully qualified
 file name then ...
 
   |   13571  1 sh  NAMI  "${ENVFILE[(_$-=1)+(_=0)-(_$-!=_${-%%*i*})]}"
 
 [I deleted some spaces for line length reasons]
 
 if you're in a directory (say /tmp) where others have write permission,
 and someone creates a file with just that name (which is easy to do,
 especially as you are so public about what you use) then that file is
 going to be read as a sh script and executed (as you).   I'd call that a
 security problem that you ought to address.   Doing so is easy, you
 just need to make sure that (no matter what) ENV is set to a path that
 starts with / and always goes to somewhere safe.   Regardless of how
 it is expanded (or is not expanded.)
 
 One might think that sh could help with this, by checking ownership or
 something, but it can't - there's no rule that says that a $ENV file
 needs to be owned by the user whose shell uses it, which is something
 that has been used for years, with less experienced users using the
 setup files of more experienced ones who do a better job of making a
 nice environment.    Sure, they are taking a risk - but that is just of
 trusting another (known, and specific) user not to be malicious, not
 simply running a strangely named named file that happens to be
 in $PWD when the shell starts.
 
   |  So, the thing that made my new -current system entirely unusable was a
   |  new error message, and the most frustrating thing was the futility and
   |  uselessness of that feckless error message.
 
 If all you need is for the error message to go away, that's probably
 something that could happen.
 
   |  One thing is certain:  The $ENV form using arrays is not going away.  It
   |  has worked for 30+ years, and is widely recommended, including in the
   |  "The Kornshell" book (since 1989).
 
 As I recall it (it has been a long time since I looked there), the
 Kornshell book is about using ksh right?   Obviously if you're using
 ksh (especially where /bin/sh is a ksh variant) that is going to work
 just fine.   And in 1989 or thereabouts, *(char *)NULL worked fine too...
 
   | This method has worked A-OK with
   |  _all_ Ksh derivatives I've ever encountered,
 
 Of course it would, it is ksh specific, any ksh variant, or anything that
 emulates ksh, is (or should) handle that (well, perhaps, see below.)
 
 How about if you were to stick zsh'isms in your $ENV (perhaps as
 recommended by some zsh book ... or these days, more likely, website)
 and then attempt to use those when the shell is ksh.   Would you
 complain just as loudly that ksh is not doing the right thing by not
 implementing the zsh extensions (there are *many*) or not interpreting
 them the same way ... or by giving an error message when it fails
 to comprehend what the zsh extension means?
 
   |  I'm rather sad that the POSIX committee was somehow able to justify, at
   |  least to to themselves, a half-baked appropriation of this ancient Ksh
   |  feature.  But that's water under the bridge.  (Perhaps I'm most sad that
   |  I personally wasn't able to be more involved in POSIX, but that's also
   |  flowed long ago into sea of time.)
 
 You could always try now, and good luck with it - I agree that the POSIX
 spec for ENV isn't really adequate, but nothing has, or is likely to, alter
 now.    This is just like the people who really believe that char should
 have been specified to always be a signed type.   Whoever is right about
 issues like this, it is just too late, the spec is set, and the chances of
 it changing now are close to 0 ... and if it did, all it could change to
 is "unspecified", which really helps no-one.
 
   |  So I think I would be happy enough if /bin/sh guaranteed never to make a
   |  peep about $ENV expansion or open() failures in all normal circumstances.
   |  (i.e. maintain the status quo of behaviour in existing releases).
 
 The open failures, sure, it has always been that way, I suspect might be
 required to be that way, and isn't going to change (for many reasons.)
 
 The other, yes, I could suppress the error msgs when an expansion fails,
 though in that case, I think I'd tend to make the result be something
 safe, like "" rather than the original value.    I will wait on opinions
 from others before making any changes here though, as I believe that
 some people are of the opinion that errors should always be reported.
 
   |  Of course I would also be happy if there were some command-line flag
   |  that made the shell very noisy about $ENV issues.
 
 There is no real need for that, if you suspect $ENV is not doing what
 you want, then:
 	eval printf '%s\\n' "${ENV}"
 would tell you soon enough (in the shell in question, both interactively
 and not, if it is intended that there be a difference).
 
 And
   |  Perhaps "-x" would suffice,
 
 that will let you see if anything that is in $ENV is being executed
 (if you're not expecting it to be, there is no issue with why it is not,
 and you would not be concerned) as...
 
   |  though I see it is already quelled by read_profile(),
 
 No, it isn't, unless you use -q, which exists for exactly that purpose -
 but which has unfortunate side effects which are hard to overcome,
 so I do not recommend it.
 
   |  so something stronger would probably be needed for profile and
   |  ENV file debugging.
 
 I doubt anything "stronger" is needed there.
 
   |  Sorry, I don't think that is a better solution at all.  It's a bad
   |  (inefficient) hack.  Change for the better is fine, but not change for
   |  the worse.
 
 It would be reliable, and use only defined mechanisms.  Your current
 method is even more broken than you suspect, it really is only working
 correctly because of a whole set of ksh design choices, some of which
 are unlikely to ever change in ksh, but others which easily could.
 
 For example, to use your explanation of why your hack (which is
 what it is) works, as  you included it in PR42828 in 2010  ...
 
 First:
         # It does work with all the versions of Ksh I've run across,
         # including ksh-85 and the one AIX-3.2, as well as pdksh.
 
 that I am not disputing, for for most of the rest of this, let's assume
 the shell evaluating the expression is not ksh or something which
 has deliberately copied it.
 
         # Note that any string
         # variable can be accessed as an array with one element in the
 
 Not if it was a shell with arrays that I had implemented (as unlikely as
 that is to ever happen).   Accessing a regular var as an array is more
 likely to be a mistake, and would quite likely just generate an error
 message.   The variable would need to be declared (somehow) as
 an array for it to be able to be accessed that way.
 
 Further, I am not sure I would use the conventional '[' ']' to indicate
 array subscripts are enclosed - I might decide to use '(' and ')'
 instead (as in fortran and tcl) as name(xxx) has no current defined
 meaning in sh (whether in a var de-ref or elsewhere) whereas
 when the $ is missing, name[xxx] is just a glob expression.   The
 only place where a name (that is, not a keyword) followed by a
 '(' now has any meaning at all in sh, is in a function definition, and
 in one of those the '(' is always followed by a ')' (with only white
 space intervening) which is not useful as an array index - that
 must be something, even if it evaluates to a null string).
 
 That is, when he added arrays, I think Korn got the syntax wrong.
 But then, he was trying to make it look good, and familiar, to
 C and csh programmers, for whom '[' ']' is the natural choice,
 even if it isn't really for sh.
 
         # Also remember that arithmetic expressions are evaluated
         # inside array index references (i.e. inside the square brackets)
 
 Again, not if I were implementing arrays - there would only be one
 type, associative arrays, which could be indexed by any string,
 which just might happen to be an integer for some uses.   Of course
 overcoming this one is as simple as enclosing the expression in
 $(( )) to force arithmetic evaluation, so this just requires a minor
 variation.   This "automatic arithmetic evaluation" is disgusting.
 
         # The flags variable ($-) is the key to forming our magic
         # expression.  If the shell is interactive, the flags will
         # contain an 'i'.
 
 Sorry, not required at all.   There is no 'i' flag or option in a purely
 posix shell.   Just about all shells have it, so this is a fairly safe
 assumption, but it is certainly not required to happen.   (Not all shells
 allow the 'i' flag, to be altered, but that's not relevant here.)
 
 In the following, I have, I hope, undone the QP encoding that
 persists in the PR (don't we all just love gnats!) but if I manage
 to miss some, or do it incorrectly, my apologies...
 
         # (_$-=1)     This creates a variable named "_BLAH", where
         #               "BLAH" is the value of the variable "$-", and
         #               assigns it the value "1".  The expression has
         #               the result "1" as well.
 
 That's correct.
 
         # (_=0)       This creates a variable named "_" and assigns
         #               it the value "0".  The expression has the result
         #               "0" as well.
 
 You'd think that would be OK, and safe, but $_ is kind of magic and
 almost anything is possible.  Especially in ksh which seems to have
 about 50 different uses for it, and add more every new revision...
 The "assigns 0 to it" part is likely to work OK, even ksh is unlikely to
 break that - but arithmetic is supposed to be evaluated according to
 how it would be done in C, and in C the value of an assignment operator
 is the value of the variable after the assignment.   Even assigning 0 to _
 does not guarantee that $_ will expand to 0.   Especially in ksh.
 Probably this will work, but no guarantees.
 
 Of course, this one is also easy to fix, you just use __ instead of _
 (or _X for any other letter, or digit, that you choose) throughout the
 expression, or anything else unlikely to matter instead of the _.
 
         # (_$- != _${-%%*i*}) This compares the value of _BLAH with
         #                       the value of another variable that
         #                       will either have the same name (and
         #                       thus the same value), or the name "_".
 
 Same problem with $_ but otherwise that's correct.   Of course,
 BLAH might be the empty string, so it would be comparing $_ to $_
 in all cases, but this version of that expression anticipates that
 
         # For an interactive shell these variables will be guaranteed
         # to have different names
 
 Outside ksh, that's not necessarily true.   If there's no 'i' in $-
 (ever) then the two vars will always be the same.
 
 But assuming the 'i' flag does exist, as is very common, the complete
 expression (after variable substitution) turns out to be either ...
 
         #       (_BLAH=1) + (_=0) - (_BLAH != _BLAH)
 or
         #       (_BLAH=1) + (_=0) - (_BLAH != _)
 
 (the latter if $- contains 'i').   That looks like it should be
 
        #	(1) + (0) - (1 != 1)
 or
        #	(1) + (0) - (1 != 0)
 
 which is 1 in the first case (no -i flag) and 0 in the second (-i)
 which is what you want.
 
 Except that part is not guaranteed (even ignoring the issues
 accessing the _ variable, and the perhaps non-existent 'i' flag)
 
 First, neither C, nor posix, guarantees anything about the order of
 evaluation of subexpressions in an expression like this, a shell could
 just as easily evaluate the sub-expressions right to left as left to right.
 In the first of them this makes no difference at all, as whatever value
 _BLAH has, it will not compare unequal to itself, so the third term is
 always 0 there, and that case works.
 
 But in the second one, we do not know what the relationship between
 the values of _BLAH and _ will be when that expression is evaluated.
 _ may not even be numeric, and you might get an arithmetic error (and
 POSIX requires that an error message be issued in such a case, and
 evaluation abort).    Bash on the other hand (maybe ksh as well) will,
 I think, just try to treat the value as another var name, and keep hunting
 till it finds either a number, or an unset var (which is interpreted as 0).
 The most likely outcome for this, if evaluated this way, is 0 != 0 which
 is not what you want.
 
 But even if you think that order of evaluation (which I suspect that even
 ksh does not guarantee) is unlikely to be anything except left to right in
 an expression like this, it has another problem.   Neither POSIX nor C
 require that the assignments to the variables be made during evaluation
 of the expression, merely that the values of the variables, after the
 expression has been evaluated, be as set by the expression.  That's
 a part of the fallout from unordered sub-expr evaluation, but is worthy
 of specific mention, as it means that you cannot expect an assignment
 to any var to be available for use anywhere else in the same expression.
 
 So:
         # Since
         # the first variable has been assigned the value "1" and the
         # second one has the value "0", result of this last expression
         # will always be 1 for any interactive shell.
 
 is simply not guaranteed, and even a simple change (perhaps
 optimisation) to how ksh evaluates arithmetic could make that fail.
 
         # Remember class, the test is tomorrow!
 
 And it always helps if the instructor gets the correct answer, and
 is not mistaken about the way that things are defined to work, as
 distinct from the way they just happen to work.
 
   |  The only good solution is to do something smart during the runtime
   |  expansion of $ENV to prevent the open from occurring or at least from
   |  succeeding,
 
 Personally, I think the number 1 rule in deciding what is good is to
 be correct first, and then once you're correct, you can worry about
 finding a way to be more efficient.   Being fast, but wrong, is not
 anyone I know's definition of good.
 
 kre
 
 ps: bosh (which prides itself on being a true SysVR4 /bin/sh clone,
 with some more recent posix additions), dash, the FreeBSD sh, and ours,
 all give a "bad substitution" type error when asked to evaluate your
 $ENV expression, though that might not appear in all (or perhaps even
 any) when it is de-referenced at sh startup, either because they do not
 param expand $ENV (in which case they will be attempting to open a file
 of that name in $PWD) or by just suppressing the errors at that point.
Prev by Date: Re: bin/53927 (bozohttpd(8) lacks documentation of -r option)
Next by Date: Re: kern/53928: modules/t_builtin:disable test case randomly fails
Previous by Thread: Re: bin/53919: please revert changes that make /bin/sh evaluate $ENV non-interactively
Next by Thread: Re: bin/53919: please revert changes that make /bin/sh evaluate $ENV non-interactively
Indexes:
Home | Main Index | Thread Index | Old Index