tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n

At 16 Jul 2010 18:33:42 +1000, Giles Lean <> 
Subject: Re: wide characters and i18n
> 2. the idea that the use of Unicode is sufficient excuse to
>    provide any of the functionality of locales

This is why the problem is usually broken down into two different
sections, with only occasional overlap in ideal scenarios!  :-)

Internationalisation (I18N) and Localisation (L10N)

The first time I learned these two were better separate than together I
learned a whole lot of new things and many light bulbs came on bright
and bells rang clear for me.

There's also multilingualisation (M17N), which in a sense is ideally a
better term than localisation, since it implies implicitly performing
localisation for every target locale all at once, but it seems that term
only gets used in some domains, so perhaps it's best to stick to L10N.

Indeed Plan 9 did not address localisation at all (and sadly the paper
doesn't use that more formal term either) -- it was, after all,
initially built in America for Americans, by Americans.  ;-)  Indeed the
paper actually sates in many places that Plan 9 (at the time) did not
even begin to address the issue of localisation.

One might say they even punted on I18N, but as others have pointed out
the paper already mentions these caveats

As the paper concludes, it "at least [has] the capacity to be

BTW, I think Plan 9's insistence that everything "textual" inside the
system always be in Unicode in UTF-8 all the time is one of its key
features.  That means _everything_ coming into the system has to be
converted before it can be used usefully by any application, or indeed
to have any meaning whatsoever.  This solves some of the niggles you
worried about.

The combination of Plan 9's universal use of Unicode in UTF-8, and its
policy of requiring everything to be converted to Unicode in UTF-8
either on input, import, or at least before it can be used, makes for
the firm foundations of a system upon which one can _begin_ the next
task of localisation.

This is where IEEE POSIX / UNIX(tm) _should_ go, IMNSHO.  Get rid of all
the old non-UTF-8 crap for different character sets.  Ideally get rid of
ANSI/ISO "wide char" crap too -- for the reasons given in the Plan 9
paper (though maybe choose 32-bits for Runes?).  Then, and only then,
begin thinking about how to do locales better.

(Yes, I know where to find Plan 9 and how to run it!  :-))

(BTW, it would be good to have a recording or transcript of Pike's when
he presented the "Hello World" paper at Usenix '93.  It really helped
set the context and I think give more advice than the paper alone,
though the paper really stands up well, and indeed tries to teach us
many lessons which we still have not even come close to learning yet.)

> Which still leaves open the problem of locales and issues of
> multi-lingual documents and applications where a single
> Unicode glyph really should be represented differently
> depending upon what language it is being used for, but I did
> say at the start of this too-lengthy message that the issues
> get ugly.


                                                Greg A. Woods
                                                Planix, Inc.

<>       +1 250 762-7675

Attachment: pgpyovxYUcMxC.pgp
Description: PGP signature

Home | Main Index | Thread Index | Old Index