Re: wide characters and i18n

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: wide characters and i18n
From: der Mouse <mouse%Rodents-Montreal.ORG@localhost>
Date: Sun, 11 Jul 2010 07:19:12 -0400 (EDT)

>> [...]: the file name is not a character sequence but an octet
>> sequence (which may or may not be an encoded character sequence).
> I guess this can be a problem if [...] different filenames are
> encoded in different encodings.

Yes, if you have to treat them as character sequences rather than octet
sequences.  (If treating them as opaque octet sequences is good enough
for your purposes, then there's no problem.)

> If you want to do something like regular expression string matching,
> you would call mbsrtowcs() to convert multi-byte filename string to a
> fixed wide character string.

Maybe.  If you want to do regular expression matching against
_character_ strings, yes.  If _octet_ strings, no.

> What I'm trying to figure out is this: if filename encoding does not
> match user's locale setting, mbsrtowcs() can stop on a character
> sequence it does not think is legal, how do you skip it?

That's exactly the kind of problem I was talking about: you are given
some data (a file name) which is an octet sequence, which may or may
not be an encoded character sequence, and if it is it may or may not be
in your, or the user's, preferred encoding, and you want to turn it
into a character sequence.

What the right way to handle that is is application-specific.
Sometimes something like what you sketch is a right answer.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Follow-Ups:
- Re: wide characters and i18n
  - From: Sad Clouds

References:
- wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: der Mouse
- Re: wide characters and i18n
  - From: Sad Clouds

Prev by Date: Re: wide characters and i18n
Next by Date: Re: wide characters and i18n
Previous by Thread: Re: wide characters and i18n
Next by Thread: Re: wide characters and i18n
Indexes:

Home | Main Index | Thread Index | Old Index