tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



On Sun, 11 Jul 2010 07:19:12 -0400 (EDT)
der Mouse <mouse%Rodents-Montreal.ORG@localhost> wrote:

> > If you want to do something like regular expression string matching,
> > you would call mbsrtowcs() to convert multi-byte filename string to
> > a fixed wide character string.
> 
> Maybe.  If you want to do regular expression matching against
> _character_ strings, yes.  If _octet_ strings, no.

I'm not sure if simply comparing 8-byte integer units is going to work.
Some encodings (e.g. JIS) may use escape sequences to indicate shifting
to two byte encoding.

If the escape sequence to shift to Kanji is '<ESC>$B' and you're
looking for ASCII '$' character, then part of the escape sequence will
match.

It seems to defeat the whole point of doing character comparison,
because you end up matching control data, which is not part of a
logical character sequence that represents the string.


Home | Main Index | Thread Index | Old Index