tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



On Sat, Jul 10, 2010 at 11:15:10PM +0100, Sad Clouds wrote:
> I'm not sure how portable it is to assume that input character data is
> in UTF-8 format. Some articles suggest to let the user set locale
> environment variables and let C library routines perform the correct
> conversion from multi-byte to wchar_t characters. This should be
> MT-safe with restartable multi-byte functions, as long as setlocale()
> is not called. This basically binds you to one locale at run time.

Depending on your environment, the UTF8 assumption is questionable.
In many European countries, either one of the ISO-8859 charsets or
Unicode (UTF-8 or UTF-16) is used. IIRC China tends to use its own
character set a lot too.

You are correct about the setlocale() issue. There have been discussions
about supporting multiple locales at the same time, but nothing
implemented (yet).

> If you need to convert character encodings which are different from the
> current locale, then I guess the only option is to use something like
> iconv or custom conversion functions...

Use iconv. It is part of SUS and has a portable implementation with
libiconv for systems that (still) don't provide it natively.

Joerg


Home | Main Index | Thread Index | Old Index