NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]



The following reply was made to PR bin/57773; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
Date: Mon, 25 Dec 2023 11:43:35 +0700

 OK, I am about to commit the (very trivial) fix for this issue.
 
 But, unlike some of my previous commits, the plan this time is
 not to include a 50 line explantion of what went wrong and why the
 fix works in the commit message - but instead do that here, and
 then simply refer to this PR there.
 
 First, in all internal processing, sh needs to know what it is
 dealing with, so all strings being temporaily used contain
 markers that tell the shell what special processing to do
 with the string in question - those are (currently) the
 bytes 0201..0213 (129..139) incl - the ones identified in
 the PR as having the problem.
 
 In earlier versions of the shell values from 1 were used (there
 are ancient PRs about problems with ^A ec - all long since fixed).
 Any byte values should work for this purpose, as one of these
 CTL chars (sh terminology) means "treat the next byte literally"
 - just like \ in many places (incl sh input).
 
 The current values are used as they are less likely to occur in
 normal char strings for anyone using UTF-8 encoding.   That is,
 less escaping is typically needed.
 
 Normally whenever sh processes a string (like reading a command
 line, or expanding a variable) it inserts that CTL char (CTLESC)
 in the result before any byte value which would be interpreted
 as one of those CTL chars, and then removes it again when all
 processing is done.
 
 But sometimes when we "know" that there is no more processing
 going to happen, and if we inserted CTLESC all over the place,
 the only thing those would ever be good for is to be removed
 again, we just don't bother - and leave the string un-escaped.
 (Slighty simplified for this explanation).
 
 sh variable assignments, which are all always a single sh
 "word" of the form:
 	name=value
 are one such place, as once "value" is expanded nothing is done
 with the expansion, other than what posix calls "quote removal" 
 - removing any quote chars that were in the original command.
 In sh, those are long gone, replaced by CTL chars that achieve
 the same effect.
 
 Var assigns do no field splitting, no pathname expanion,
 nothing after the var expansions happen - except quote removal,
 which in sh is the removal of any CTL chars (which should only
 be CTLESC by that point) that remain.   Hence adding extra
 CTLESC chars seemed unnecessary, and generally is.
 
 [That's why quotes aren't needed around simple var=${other} type
  assignments or var=$(whatever) -- only when we have something like
  var="x y|x" with the value containing (as written) spaces or sh
  operator (or quote) chars are the quotes ever needed.]
 
 Thus in the v="X'${c}'" statement, even when c had the same
 value as a CTL code, there was no need for a CTLESC to be inserted
 before the value of ${c} even when that looked like a CTL char,
 and it all just worked.
 
 But in the assignment containing the substring assignment,
 there was a problem ... there the word after the % in the
 substring expansion is expanded the same as it always is,
 meaning a CTLESC is inserted before any bytes that could be
 interpreted as CTL codes.   Threin lies the rub...  The pattern
 match was looking for a value which contained a CTLESC char (to
 protect the 'magic' byte value that followd) in a string which
 didn't contain that value.   No hope of success.
 
 In the expansion as a regular command arg (as in the [ (test)
 example given) this isn't an issue, as the initial var expand
 also inserts a CTLESC so the matching simply has strings which
 seem to it to be 1 byte longer (in the examples given) than was
 given -- none of the CTL codes is anything but a simple byte to
 pattern matching, so that all just works.
 
 One possible fix to this would have been to make the pattern
 matching code understand the special meaning of CTLESC, and
 ignore it whenever seen - but that code is ugly enough as it
 is, brittle enough, and would have required many updates, that
 I didn't want to go there.
 
 An alternate fix is just to treat var assigns more like anything
 else when doing expansions - causing CTLESC bytes to be inserted
 wherever the look needed, just as is done with other expansions.
 That is what I have done ... the fix is just to add one more
 flag bit to the call of argstr() when expanding the value of
 a variable asignment.   Hence trivial (but can add a minor extra
 processing cost in these unusual cases).
 
 I have been testing this since the day after the PR was submitted
 -- all the NetBSD ATF tests pass (or rather those that failed had
 nothing to do with anything sh did) , and I have used this sh to
 build about 1000 packages from pkgsrc (including several of the big
 ones) without issue.
 
 I could do a system build using it - and eventually will - but
 build.sh is actually very gentle as far as sh usage goes, and
 is very unlikely to be affected (or prove anything at all).
 
 Thus I think this PR is fixed.   Thanks for finding the problem
 and telling us about it.
 
 I still need to look at whether a similar problem might occur when
 expanding the arg (usually filename) following a redirect operator,
 which is a somewhat similar case, I will get to that soon.
 
 kre
 
 ps: Merry Christmas to all who celebrate that.
 
 [This copy resent without the "cc", as apparently I canot
 spell netbsd ...]
 


Home | Main Index | Thread Index | Old Index