Re: wide characters and i18n

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: wide characters and i18n
From: Sad Clouds <cryintothebluesky%googlemail.com@localhost>
Date: Sun, 11 Jul 2010 13:40:16 +0100

On Sun, 11 Jul 2010 07:19:12 -0400 (EDT)
der Mouse <mouse%Rodents-Montreal.ORG@localhost> wrote:

> > If you want to do something like regular expression string matching,
> > you would call mbsrtowcs() to convert multi-byte filename string to
> > a fixed wide character string.
> 
> Maybe.  If you want to do regular expression matching against
> _character_ strings, yes.  If _octet_ strings, no.

I'm not sure if simply comparing 8-byte integer units is going to work.
Some encodings (e.g. JIS) may use escape sequences to indicate shifting
to two byte encoding.

If the escape sequence to shift to Kanji is '<ESC>$B' and you're
looking for ASCII '$' character, then part of the escape sequence will
match.

It seems to defeat the whole point of doing character comparison,
because you end up matching control data, which is not part of a
logical character sequence that represents the string.

Follow-Ups:
- Re: wide characters and i18n
  - From: Erik Fair

References:
- wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: der Mouse
- Re: wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: der Mouse

Prev by Date: Re: wide characters and i18n
Next by Date: Re: wide characters and i18n
Previous by Thread: Re: wide characters and i18n
Next by Thread: Re: wide characters and i18n
Indexes:

Home | Main Index | Thread Index | Old Index