tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Next steps for /bin/sh



    Date:        Fri, 11 Mar 2016 06:26:42 +0100
    From:        Joerg Sonnenberger <joerg%britannica.bec.de@localhost>
    Message-ID:  <20160311052642.GB27565%britannica.bec.de@localhost>

  | Three questions here. First is, how much work is it to go from NUL
  | delimited strings to explicitly sized strings?

Lots.   I would not even attempt that particular change alone.  While
certainly an attractive idea, it would (I think) require altering the
intermediate format to not just be strings representing commands, and
slightly later in the processing, redirects, assignments and args (the
cmd name is just arg 0).

  | Second is, would that allow most of the special chars to go away?

If done properly, yes, they would all vanish, and be incorporated into
the tree structure (or something isomorphic.)   I think this could be a
long term project (perhaps even a GSoC in the future.)

  | Third is, alternatively, would it allow to move to a more consistent
  | scheme using NUL as escape character?

Unfortunately, no.  Believe it or not, sh already uses '\0' as a
separator in places (since the code knows that '\0' simply cannot be
a data character.)    And, no, you really don't want to know how that works...

And from a slightly earlier message ...

  | We can also switch to using isalpha_l and friends with explicit C locale. 

Yes, that would get about half the advantage (avoiding the locale dependent
syntax that it currently has) without the (slight) speedup.  For the shell
I doubt that's worth the bother - this stuff is only used for recognising shell 
syntax elements, for which the char set is (or should be) largely fixed.

What user data might be the shell doesn't care (which is partly why it is
totally ignorant of any i18n issues.)  Those are just bytes...   This means
that doing our own char -> char_type mapping is just fine (I have no idea
why FreeBSD felt the need to change it, sometime in the mid 90's .. I suspect
that it came to NetBSD without much evaluation, just "they did it, their shell
has less bugs than ours" ... the NetBSD PR that caused this was one of the
myriad of "set -e is broken" that NetBSD has had over the years - the patches
that were incorporated included stuff related to that, and all kinds of other
changes, including this one.)

The syntax tables & macros that the shell uses, are actually one of the better
designed features of the implementation - or at least, once UPEOF goes
away they will be.  They actually work as they should, and are implemented
correctly (assuming that sh only needs to deal with characters that fit
in octets, which for shell syntax, is OK.)

If it matters, making this change will make the shell slightly smaller, not
bigger, the char syntax array (even though unused) was never removed, and it
would no longer require the <ctype.h> arrays.

Switching which values represent the special characters should make no
externally visible change at all (other than, if done alone, which user
data characters get squashed.)

kre



Home | Main Index | Thread Index | Old Index