tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



On Sat, 10 Jul 2010 23:15:10 +0100
Sad Clouds <cryintothebluesky%googlemail.com@localhost> wrote:

> I'm not sure how portable it is to assume that input character data is
> in UTF-8 format. Some articles suggest to let the user set locale
> environment variables and let C library routines perform the correct
> conversion from multi-byte to wchar_t characters. This should be
> MT-safe with restartable multi-byte functions, as long as setlocale()
> is not called. This basically binds you to one locale at run time.

Indeed, assuming an UTF-8 external format is only valid for protocols
where UTF-8 is the norm (which was my use case, although it was also
allright otherwise as I'm using an UTF-8 locale, UTF-8 aware tools and
terminals).

But "locale -a" lists various encodings, and most probably that wide
character conversion C99 functions take those into consideration (after
checking I now see that for instance wcsrtombs(3) is implemented in
src/libc/locale/ from the citrus project and it seems to have
locale-specific handling), so I think you're right.

> If you need to convert character encodings which are different from the
> current locale, then I guess the only option is to use something like
> iconv or custom conversion functions...

I've had to use iconv(3) (and iconv(1)) at times and noticed that
it could be destructive depending on the conversion, but it seemed fine
otherwise.
-- 
Matt


Home | Main Index | Thread Index | Old Index