Re: PR bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]

To: gnats-bugs%nebsd.org@localhost
Subject: Re: PR bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Mon, 25 Dec 2023 11:32:44 +0700

OK, I am about to commit the (very trivial) fix for this issue.

But, unlike some of my previous commits, the plan this time is
not to include a 50 line explantion of what went wrong and why the
fix works in the commit message - but instead do that here, and
then simply refer to this PR there.

First, in all internal processing, sh needs to know what it is
dealing with, so all strings being temporaily used contain
markers that tell the shell what special processing to do
with the string in question - those are (currently) the
bytes 0201..0213 (129..139) incl - the ones identified in
the PR as having the problem.

In earlier versions of the shell values from 1 were used (there
are ancient PRs about problems with ^A ec - all long since fixed).
Any byte values should work for this purpose, as one of these
CTL chars (sh terminology) means "treat the next byte literally"
- just like \ in many places (incl sh input).

The current values are used as they are less likely to occur in
normal char strings for anyone using UTF-8 encoding.   That is,
less escaping is typically needed.

Normally whenever sh processes a string (like reading a command
line, or expanding a variable) it inserts that CTL char (CTLESC)
in the result before any byte value which would be interpreted
as one of those CTL chars, and then removes it again when all
processing is done.

But sometimes when we "know" that there is no more processing
going to happen, and if we inserted CTLESC all over the place,
the only thing those would ever be good for is to be removed
again, we just don't bother - and leave the string un-escaped.
(Slighty simplified for this explanation).

sh variable assignments, which are all always a single sh
"word" of the form:
	name=value
are one such place, as once "value" is expanded nothing is done
with the expansion, other than what posix calls "quote removal" 
- removing any quote chars that were in the original command.
In sh, those are long gone, replaced by CTL chars that achieve
the same effect.

Var assigns do no field splitting, no pathname expanion,
nothing after the var expansions happen - except quote removal,
which in sh is the removal of any CTL chars (which should only
be CTLESC by that point) that remain.   Hence adding extra
CTLESC chars seemed unnecessary, and generally is.

[That's why quotes aren't needed around simple var=${other} type
 assignments or var=$(whatever) -- only when we have something like
 var="x y|x" with the value containing (as written) spaces or sh
 operator (or quote) chars are the quotes ever needed.]

Thus in the v="X'${c}'" statement, even when c had the same
value as a CTL code, there was no need for a CTLESC to be inserted
before the value of ${c} even when that looked like a CTL char,
and it all just worked.

But in the assignment containing the substring assignment,
there was a problem ... there the word after the % in the
substring expansion is expanded the same as it always is,
meaning a CTLESC is inserted before any bytes that could be
interpreted as CTL codes.   Threin lies the rub...  The pattern
match was looking for a value which contained a CTLESC char (to
protect the 'magic' byte value that followd) in a string which
didn't contain that value.   No hope of success.

In the expansion as a regular command arg (as in the [ (test)
example given) this isn't an issue, as the initial var expand
also inserts a CTLESC so the matching simply has strings which
seem to it to be 1 byte longer (in the examples given) than was
given -- none of the CTL codes is anything but a simple byte to
pattern matching, so that all just works.

One possible fix to this would have been to make the pattern
matching code understand the special meaning of CTLESC, and
ignore it whenever seen - but that code is ugly enough as it
is, brittle enough, and would have required many updates, that
I didn't want to go there.

An alternate fix is just to treat var assigns more like anything
else when doing expansions - causing CTLESC bytes to be inserted
wherever the look needed, just as is done with other expansions.
That is what I have done ... the fix is just to add one more
flag bit to the call of argstr() when expanding the value of
a variable asignment.   Hence trivial (but can add a minor extra
processing cost in these unusual cases).

I have been testing this since the day after the PR was submitted
-- all the NetBSD ATF tests pass (or rather those that failed had
nothing to do with anything sh did) , and I have used this sh to
build about 1000 packages from pkgsrc (including several of the big
ones) without issue.

I could do a system build using it - and eventually will - but
build.sh is actually very gentle as far as sh usage goes, and
is very unlikely to be affected (or prove anything at all).

Thus I think this PR is fixed.   Thanks for finding the problem
and telling us about it.

I still need to look at whether a similar problem might occur when
expanding the arg (usually filename) following a redirect operator,
which is a somewhat similar case, I will get to that soon.

kre

ps: Merry Christmas to all who celebrate that.

Follow-Ups:
- Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
  - From: Robert Elz

Prev by Date: Re: lib/57792: SIGSEGV in ld.elf_so / _rtld_call_ifunc
Next by Date: Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
Previous by Thread: PR/57767 CVS commit: src/external/bsd/blocklist/bin
Next by Thread: Re: bin/57773 [/bin/sh: Substring processing in assignments fail for quoted control characters]
Indexes:

Home | Main Index | Thread Index | Old Index