Source-Changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

CVS commit: src/bin/sh



Module Name:    src
Committed By:   kre
Date:           Mon Aug 21 13:20:49 UTC 2017

Modified Files:
        src/bin/sh: expand.c parser.c parser.h sh.1 syntax.c syntax.h

Log Message:
Add support for $'...' quoting (based upon C "..." strings, with \ expansions.)

Implementation largely obtained from FreeBSD, with adaptations to meet the
needs and style of this sh, some updates to agree with the current POSIX spec,
and a few other minor changes.

The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 )
[see note 2809 for the current proposed text] is yet to be approved,
so might change.  It currently leaves several aspects as unspecified,
this implementation handles those as:

Where more than 2 hex digits follow \x this implementation processes the
first two as hex, the following characters are processed as if the \x
sequence was not present.  The value obtained from a \nnn octal sequence
is truncated to the low 8 bits (if a bigger value is written, eg: \456.)
Invalid escape sequences are errors.  Invalid \u (or \U) code points are
errors if known to be invalid, otherwise can generate a '?' character.
Where any escape sequence generates nul ('\0') that char, and the rest of
the $'...' string is discarded, but anything remaining in the word is
processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd.

Differences from FreeBSD:
  FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C,
  but the current sh proposal differs.) reeBSD also continues consuming
  as many hex digits as exist after \x (permitted by the spec, but insane),
  and reject \u0000 as invalid).  Some of this is possibly because that
  their implementation is based upon an earlier proposal, perhaps note 590 -
  though that has been updated several times.

Differences from the current POSIX proposal:
  We currently always generate UTF-8 for the \u & \U escapes.   We should
  generate the equivalent character from the current locale's character set
  (and UTF8 only if that is what the current locale uses.)
  If anyone would like to correct that, go ahead.

  We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate
  the appropriate control character (SOH for \cA for example) with whatever
  value that has in the current character set.   Apart from EBCDIC, which
  we do not support, I've never seen a case where they differ, so ...


To generate a diff of this commit:
cvs rdiff -u -r1.119 -r1.120 src/bin/sh/expand.c
cvs rdiff -u -r1.143 -r1.144 src/bin/sh/parser.c
cvs rdiff -u -r1.23 -r1.24 src/bin/sh/parser.h
cvs rdiff -u -r1.163 -r1.164 src/bin/sh/sh.1
cvs rdiff -u -r1.4 -r1.5 src/bin/sh/syntax.c
cvs rdiff -u -r1.8 -r1.9 src/bin/sh/syntax.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.




Home | Main Index | Thread Index | Old Index