tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



> Non-NUL UTF8 sequences can contain bytes with value 0,

How?  As far as I can see, the only way to get a 0 octet into a
UTF-8-encoded string is to encode Unicode codepoint 0.  RFC3629 seems
to think so too:

   o  Character numbers from U+0000 to U+007F (US-ASCII repertoire)
      correspond to octets 00 to 7F (7 bit US-ASCII values).  [...]
[page break]
   o  US-ASCII octet values do not appear otherwise in a UTF-8 encoded
      character stream.  [...]

and, as far as I can see, the encoding actually described does indeed
have those properties.

What am I missing?

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index