tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



On Fri, 16 Jul 2010 16:50:12 +0100
Sad Clouds <cryintothebluesky%googlemail.com@localhost> wrote:

> 2. The interfaces for C library multi-byte to wide, and wide to
> multi-byte conversion functions are so badly designed, it's not even
> funny. The biggest problem with those functions is the fact they expect
> NULL terminated strings. If you have a partial (not NULL terminated)
> string in the buffer, you cant call string conversion function on it,
> because it won't stop until it finds a NULL and you end up with buffer
> overrun. You cannot "artificially" NULL terminate the string, because
> after reading NULL char, the function will reset mbstate_t object to the
> initial state. This will mess up the next sequence of multi-byte
> characters if the encoding had state.
> 
> I spent two days, jumping through the hoops and trying to figure out
> how to convert partial strings. I think I nailed it in the end with 30%
> performance penalty, but still 3.5 times faster than iconv().
> 
> If anyone is interested, I can post the code for the wrapper
> functions...

In case it can serve, I also wrote an implementation of UTF-8 <->
UTF-32 and put it under BSD-like license:

http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/~checkout~/mmondor/mmsoftware/mmlib/utf8.c?rev=1.2;content-type=text%2Fplain
http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/~checkout~/mmondor/mmsoftware/mmlib/utf8.h?rev=1.1;content-type=text%2Fplain

I however have no benchmark comparing it against an other implementation.
-- 
Matt


Home | Main Index | Thread Index | Old Index