[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A draft for a multibyte and multi-codepoint C string interface
On Tue, Apr 02, 2013 at 12:21:03PM -0400, Thor Lancelot Simon wrote:
> On Tue, Apr 02, 2013 at 06:08:01PM +0200, tlaronde%polynum.com@localhost
> > That UTF-8 is the answer, since this allows to use C "char" (at least an
> > octet, signed or unsigned) programs.
> Except it can't, really, quite be UTF-8 -- it has to be "Modified UTF-8",
> because C strings can't contain 0.
Yes... But how could such a string, with an embedded '\0' get to a
filesystem as a filename, going through programs generally written
in C? And one could always decide that such a poetry be converted
to the two octets sequence "^@"...
I think the best of the explanation about the advantages of
UTF are simply the "Hello World, or..." paper from Rob Pike and Ken
Thompson. (For example that, being a byte encoding, it is byte-order
independant; when one consider heterogeneous network, it is not the
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Main Index |
Thread Index |