Re: wide characters and i18n

To: tech-userlevel%netbsd.org@localhost
Subject: Re: wide characters and i18n
From: Matthew Mondor <mm_lists%pulsar-zone.net@localhost>
Date: Sat, 10 Jul 2010 19:08:32 -0400

On Sat, 10 Jul 2010 23:15:10 +0100
Sad Clouds <cryintothebluesky%googlemail.com@localhost> wrote:

> I'm not sure how portable it is to assume that input character data is
> in UTF-8 format. Some articles suggest to let the user set locale
> environment variables and let C library routines perform the correct
> conversion from multi-byte to wchar_t characters. This should be
> MT-safe with restartable multi-byte functions, as long as setlocale()
> is not called. This basically binds you to one locale at run time.

Indeed, assuming an UTF-8 external format is only valid for protocols
where UTF-8 is the norm (which was my use case, although it was also
allright otherwise as I'm using an UTF-8 locale, UTF-8 aware tools and
terminals).

But "locale -a" lists various encodings, and most probably that wide
character conversion C99 functions take those into consideration (after
checking I now see that for instance wcsrtombs(3) is implemented in
src/libc/locale/ from the citrus project and it seems to have
locale-specific handling), so I think you're right.

> If you need to convert character encodings which are different from the
> current locale, then I guess the only option is to use something like
> iconv or custom conversion functions...

I've had to use iconv(3) (and iconv(1)) at times and noticed that
it could be destructive depending on the conversion, but it seemed fine
otherwise.
-- 
Matt

References:
- wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: Matthew Mondor
- Re: wide characters and i18n
  - From: Sad Clouds

Prev by Date: Re: wide characters and i18n
Next by Date: Re: wide characters and i18n
Previous by Thread: Re: wide characters and i18n
Next by Thread: Re: wide characters and i18n
Indexes:

Home | Main Index | Thread Index | Old Index