tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



OK, I wrote a few wrapper functions around C library 
mbsrtowcs(), mbrtowc(), wcsrtombs() and wcrtomb() functions. This
allows me to convert segments of non-null terminated strings from
multi-byte to wide strings in current locale and vice versa.

The wrapper functions use string conversion functions if strings are
long enough not to cause buffer overrun, and then fall back to
character conversion functions to convert the remaining data. The extra
steps are needed because C library functions expect source strings to
be null terminated, which may not be the case if you have a string
fragment in the buffer.

I did some quick benchmarks:

My wrapper functions are about 30% slower than simple function calls to
mbsrtowcs()/mbsrtowcs()


Time for converting 1032 bytes of utf-8 strings (mixed 1 and 2-byte
multi-byte characters) to utf-32 strings with iconv() in a loop 100000
times is: 19.04 seconds

Time for converting 1032 bytes of utf-8 strings (mixed 1 and 2-byte
multi-byte characters) to wchar_t strings with my wrapper functions in a
loop 100000 times is: 5.42 seconds

Using iconv() is about 3.5 times slower, which is a bit surprising.


Home | Main Index | Thread Index | Old Index