tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Alternative to hash-bang



Hello,

Thomas Klausner <wiz%NetBSD.org@localhost> wrote:
 |On Sat, Jul 19, 2014 at 03:36:48PM +0200, Steffen Nurpmeso wrote:
 |> How about supporting the more-and-more common Unicode
 |> Byte-Order-Mark for UTF-8 encoded shell scripts?
 |
 |As wikipedia says:
 |
 |The Unicode Standard permits the BOM in UTF-8,[2] but does not require
 |or recommend its use.[3]
 |
 |[2] "The Unicode Standard 5.0, Chapter 2:General Structure" \
 |(PDF). p. 36. Retrieved 2009-03-29. "Table 2-4. The Seven \
 |Unicode Encoding Schemes"
 |
 |[3] "The Unicode Standard 5.0, Chapter 2:General Structure" \
 |(PDF). p. 36. Retrieved 2008-11-30. "Use of a BOM is neither \
 |required nor recommended for UTF-8, but may be encountered \
 |in contexts where UTF-8 data is converted from other encoding \

Yes.  Using BOM won't work in `$ cat f1 f2 > f3' etc. which is why
i personally don't use them -- but i'm in a privileged situation
regarding my text etc. files.

 |forms that use a BOM or where the BOM is used as a UTF-8 signature"

So this last part is the one that i think about.  Many text
editors will get that right and/or use it right away, as do some
scripting languages, like, say Thomas Klausner, like perl(1):

  =item C<BOM>-marked scripts and UTF-16 scripts autodetected

  If a Perl script begins marked with the Unicode C<BOM> (UTF-16LE, UTF16-BE,
  or UTF-8), or if the script looks like non-C<BOM>-marked UTF-16 of either
  endianness, Perl will correctly read in the script as Unicode.
  (C<BOM>less UTF-8 cannot be effectively recognized or differentiated from
  ISO 8859-1 or other eight-bit encodings.)

And because of this last part again i finally come the conclusion
that the UTF-8 BOM will become a vivid part of the future, because
it carries information of a file's encoding along with the file as
a part of the encoding itself.

The real question is: what should be done with BOMs in `$ cat f1
f2 > f3', they cannot simply become stripped off?

--steffen


Home | Main Index | Thread Index | Old Index