tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Unicode programming



On Wed, 05 Oct 2011 15:51:52 -0400
Ken Hornstein <kenh%pobox.com@localhost> wrote:

> - Assuming the above is correct ... what do programmers do in terms of
>   parsing things like UTF-8 into Unicode codepoints, since you don't
>   necessarily know that mbrtowc() will give you a Unicode codepoint on
>   some (looks like many) systems.  I guess iconv() looks like something
>   that handles a lot of encodings, and it seems to be lots of places;
>   I'm also aware of icu.  I'm also wondering what people do about things
>   like finding out how many columns a particular series of Unicode codepoints
>   occupies; I know about things like wcswidth(), but again you're not
>   guaranteed that wide characters are Unicode codepoints.

When doing it in C, I used a custom library
(http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/mmsoftware/mmlib/utf8.c?rev=1.2;content-type=text%2Fplain
and
http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/mmsoftware/mmlib/utf8.h?rev=1.1;content-type=text%2Fplain),
but I've not used it in some time and have recently used a higher level
language which supports unicode and already includes the conversion
facilities (and more advanced unicode features than only
encoding/decoding).  I used iconv from the shell when I needed it,
however, and remember using it from PHP (I'm not sure if that one was
PHPs or if it used libc's, though)...
-- 
Matt


Home | Main Index | Thread Index | Old Index