tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



On Tue, Apr 02, 2013 at 12:21:03PM -0400, Thor Lancelot Simon wrote:
> On Tue, Apr 02, 2013 at 06:08:01PM +0200, tlaronde%polynum.com@localhost 
> wrote:
> > 
> > That UTF-8 is the answer, since this allows to use C "char" (at least an
> > octet, signed or unsigned) programs.
> 
> Except it can't, really, quite be UTF-8 -- it has to be "Modified UTF-8",
> because C strings can't contain 0.

Yes... But how could such a string, with an embedded '\0' get to a
filesystem as a filename, going through programs generally written
in C? And one could always decide that such a poetry be converted
to the two octets sequence "^@"...

I think the best of the explanation about the advantages of
UTF are simply the "Hello World, or..." paper from Rob Pike and Ken
Thompson. (For example that, being a byte encoding, it is byte-order
independant; when one consider heterogeneous network, it is not the
least advantage.)
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index