tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Unicode programming



On Wed, Oct 05, 2011 at 07:54:47PM -0400, Ken Hornstein wrote:
> >>   I'm also wondering what people do about things
> >>   like finding out how many columns a particular series of Unicode 
> >> codepoints
> >>   occupies
> >
> >This is very much nontrivial. There are a certain number of codepoints
> >which have an ambiguous number of columns. You might also run into
> >situations where the renderer might not be able to display combining
> >diacritics in the expected way.
> 
> Is this true for stuff inside of the BMP?

Yeah, they exist within the BMP, mostly within CJK/East Asian; see
http://unicode.org/reports/tr11/#Ambiguous for some info.

As far as surrogates in UTF-16: yeah, they only exist in UTF-16; they're
one of the primary differentiations between UTF-16 and UCS-2. One of the
_other_ bugaboos with UTF-16 is that you need to keep track of the byte
order and/or insert a BOM to deambiguate what kind of stream you're
generating.



Home | Main Index | Thread Index | Old Index