Re: wide characters and i18n

To: tech-userlevel%netbsd.org@localhost
Subject: Re: wide characters and i18n
From: Sad Clouds <cryintothebluesky%googlemail.com@localhost>
Date: Tue, 13 Jul 2010 12:48:06 +0100

OK, I wrote a few wrapper functions around C library 
mbsrtowcs(), mbrtowc(), wcsrtombs() and wcrtomb() functions. This
allows me to convert segments of non-null terminated strings from
multi-byte to wide strings in current locale and vice versa.

The wrapper functions use string conversion functions if strings are
long enough not to cause buffer overrun, and then fall back to
character conversion functions to convert the remaining data. The extra
steps are needed because C library functions expect source strings to
be null terminated, which may not be the case if you have a string
fragment in the buffer.

I did some quick benchmarks:

My wrapper functions are about 30% slower than simple function calls to
mbsrtowcs()/mbsrtowcs()


Time for converting 1032 bytes of utf-8 strings (mixed 1 and 2-byte
multi-byte characters) to utf-32 strings with iconv() in a loop 100000
times is: 19.04 seconds

Time for converting 1032 bytes of utf-8 strings (mixed 1 and 2-byte
multi-byte characters) to wchar_t strings with my wrapper functions in a
loop 100000 times is: 5.42 seconds

Using iconv() is about 3.5 times slower, which is a bit surprising.

References:
- wide characters and i18n
  - From: Sad Clouds

Prev by Date: Re: wide characters and i18n
Next by Date: Re: wide characters and i18n
Previous by Thread: Re: File types [was: Re: wide characters and i18n]
Next by Thread: Reorganizing src/tests
Indexes:

Home | Main Index | Thread Index | Old Index