tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



tlaronde%polynum.com@localhost wrote:
  |On Mon, Apr 15, 2013 at 05:51:33PM -0400, James K. Lowden wrote:
  |> I'm interested in useability. 
  |
  |To drop codepages, and to mandate that, on the user level, UTF-8
  |is the rule (bye-bye localization and so on) allows this. But this
  |has been done for Plan9 and this should be considered a reference.
  |Specifically, what has be done, and what has not be done.
 
But Unicode collation is somewhat locale-specific, in permanent
transition and very complicated.  Plan9 simply doesn't know about
locales at all (?) and gives a s..t about any standards.  I think
they're right, who can be creative and invent new things with
something like POSIX on their back..

  |What would be a great step, is the ability for the text utilities
  |to deal with UTF-8. But this is in user space, because it is where
  |it belongs.

I totally agree -- using a round-trip with wchar_t for file I/O
just to be able to get access to proper character classification
is just a terrible thing to do; at least unless you store the
files in UTF-32 (or, 0xDEADBEEF!, UTF-16), but which is just as
terrible.

  |Furthermore, Unicode is kind of a mess (for example, instead of

It's a pretty closed community, with a lot lot of political and
economical interest noise.  Errors happened (they admit that), but
the promise of stability etc. is also a reason for some rough
edges.  But, you know -- i'm *far* from being an expert with all
the linguistic problems etc. that these linguists had to deal
with.  They surely would do things a bit different if they could
start anew from scratch…

  |Unicode is not something fixed, perfect, set one and forever. It

Well, certainly not forever, but certainly for the next decades.
Where, except for ISO 10646, but which is almost brought into line
with Unicode, is the approved intellectual power to deal with the
worlds languages?  And who is willing to spend the money for
something new.  And why?  Certainly for the next decades.

  |that could be adapted to whatever encoding appears later. Keep the
  |OS clean with nul terminated octet strings. Enhance utilities to
  |deal with UTF-8 (and not runes for exchange; runes are only
  |internal).
  |And keep it agnostic about the meaning of the glyphes.

I agree.

--steffen



Home | Main Index | Thread Index | Old Index