tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



On Sat, 10 Jul 2010 15:33:34 -0400
Matthew Mondor <mm_lists%pulsar-zone.net@localhost> wrote:

> On Sat, 10 Jul 2010 10:32:39 +0100
> Sad Clouds <cryintothebluesky%googlemail.com@localhost> wrote:
> 
> > Hi, I'm trying to understand how to write portable C code that
> > supports international character sets. As I understand so far, it
> > has a lot to do with C library and current locale setting.
> > 
> > 1. What is the recommended way for user level applications to deal
> > with different character encodings? Treat all external character
> > data as multi-byte and use C library wchar_t and wide stdio
> > functions to convert multi-byte to wchart_t for internal character
> > processing?
> 
> Others might also have good suggestions, I only have some experience
> with UTF-8 and UTF-32/UCS-4 here, and they were using custom code
> rather than the wchar C99 related functions.  I can however share some
> of the "issues" I encountered.

OK thanks, I've spent hours searching the Internet for documentation
and howtos and I think I'm beginning to understand how it fits together
on Unix system.

I'm not sure how portable it is to assume that input character data is
in UTF-8 format. Some articles suggest to let the user set locale
environment variables and let C library routines perform the correct
conversion from multi-byte to wchar_t characters. This should be
MT-safe with restartable multi-byte functions, as long as setlocale()
is not called. This basically binds you to one locale at run time.

If you need to convert character encodings which are different from the
current locale, then I guess the only option is to use something like
iconv or custom conversion functions...


Home | Main Index | Thread Index | Old Index