Re: wide characters and i18n

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: wide characters and i18n
From: Alan Barrett <apb%cequrux.com@localhost>
Date: Fri, 16 Jul 2010 15:51:17 +0200

On Fri, 16 Jul 2010, Ken Hornstein wrote:
> But every time I try to read something written
> by someone who understands what is going on, I get lost, and I have never
> really seen anyone explain the answers to some basic questions:

The "Terminology" section of the wikipedia article on "Character encoding"
is not great, but it may help.
<http://en.wikipedia.org/wiki/Character_encoding#Terminology>.

> - How, exactly, are UTF-8 and Unicode related? 

Unicode is a lot of things, but for the purposes of contrasting Unicode
with UTF-8, think of Unicode as a mapping from 21-bit integers to
characters; UTF-8 is then a set of rules for representing those
21-bit integers using sequences of 8-bit bytes or octets.

> - What exactly is a "code point"?

A code point is an integer, which maps to a character in a coded
character set.  For example, the code point for the letter "A" in the
ASCII coded character set is 65 or 0x41.  For all characters that appear
in the ASCII repertoire, their code points in ASCII and in Unicode are
identical (modulo quibbles about <hyphen> versus <minus sign> versus
<hyphen-minus>, and <apostrophe> versus <left single quote>).

> - What, exactly, do people mean by "normalization" in this context?

Do you represent <capital letter FOO with accent BAR> as a single
character, or as the two-character sequence <capital letter
FOO><combining accent BAR>?  What about <capital letter FOO with
accent BAR and accent BAZ>?  Is <ligature "ffi"> equivalent to <letter
"f"><letter "f"><letter "i">?  There are various types of normalisation
rules giving different answers to these and other questions.

--apb (Alan Barrett)

References:
- Re: wide characters and i18n
  - From: Giles Lean
- Re: wide characters and i18n
  - From: Ken Hornstein

Prev by Date: Re: wide characters and i18n
Next by Date: Re: wide characters and i18n
Previous by Thread: Re: wide characters and i18n
Next by Thread: Re: wide characters and i18n
Indexes:

Home | Main Index | Thread Index | Old Index