tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



>> [...]: the file name is not a character sequence but an octet
>> sequence (which may or may not be an encoded character sequence).
> I guess this can be a problem if [...] different filenames are
> encoded in different encodings.

Yes, if you have to treat them as character sequences rather than octet
sequences.  (If treating them as opaque octet sequences is good enough
for your purposes, then there's no problem.)

> If you want to do something like regular expression string matching,
> you would call mbsrtowcs() to convert multi-byte filename string to a
> fixed wide character string.

Maybe.  If you want to do regular expression matching against
_character_ strings, yes.  If _octet_ strings, no.

> What I'm trying to figure out is this: if filename encoding does not
> match user's locale setting, mbsrtowcs() can stop on a character
> sequence it does not think is legal, how do you skip it?

That's exactly the kind of problem I was talking about: you are given
some data (a file name) which is an octet sequence, which may or may
not be an encoded character sequence, and if it is it may or may not be
in your, or the user's, preferred encoding, and you want to turn it
into a character sequence.

What the right way to handle that is is application-specific.
Sometimes something like what you sketch is a right answer.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index