Re: wide characters and i18n

To: Erik Fair <fair%netbsd.org@localhost>
Subject: Re: wide characters and i18n
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Thu, 15 Jul 2010 20:42:43 +0000

On Wed, Jul 14, 2010 at 07:38:42PM -0700, Erik Fair wrote:
 > Theoretically, the POSIX locale stuff is supposed to handle things
 > beyond that, but it's a more complicated and subtle problem than those
 > POSIX committees really thought about.

Indeed.

 > I commend this well written paper to your attention:
 > 
 > http://plan9.bell-labs.com/sys/doc/utf.html
 > 
 > which discusses what the Plan 9 people (Rob Pike, Ken Thompson,
 > et. al) did about the software problem (and what they did about
 > it), and explicitly what they decided to punt on. A precis: "we
 > replaced the ASCII assumption with Unicode/UTF-8 because UTF-8 is a
 > proper superset of ASCII (i.e. backward compatible) and also
 > subsumes pretty much all other interesting character sets (with
 > some warts) so we can translate into it without (much) semantic
 > information loss."

The problem with UTF-8 in Unix is that it doesn't actually solve the
labeling problem: given comprehensive adotpion you no longer really
need to know what kind of text any given file or string is, but you
still need to know if the file contains text (UTF-8 encoded symbols)
or binary (octets), because not all octet sequences are valid UTF-8.

I don't see a viable way forward that doesn't involve labeling
everything.

-- 
David A. Holland
dholland%netbsd.org@localhost

Follow-Ups:
- Re: wide characters and i18n
  - From: Erik Fair

References:
- wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: der Mouse
- Re: wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: der Mouse
- Re: wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: Erik Fair

Prev by Date: Re: OS/Environment integration [was: wide characters and i18n]
Next by Date: Re: wide characters and i18n
Previous by Thread: Re: wide characters and i18n
Next by Thread: Re: wide characters and i18n
Indexes:

Home | Main Index | Thread Index | Old Index