tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: wide characters and i18n



On Fri, 16 Jul 2010 12:34:31 -0400
Ty Sarna <ty%sarna.org@localhost> wrote:

> On Jul 16, 2010, at 11:50 AM, Sad Clouds wrote:
> 
> > Sometimes I do it from left to right, but occasionally I may need
> > to do it from right to left. For example if you have a filename:
> > 
> > some_long_file_name.txt
> > 
> > To quickly extract the suffix '.txt' you just scan the string from
> > right to left, until you hit '.' char. I think with utf-8 this type
> > of string manipulation would be quite messy and you would have to
> > use a special library that understands utf-8 encodings, etc.
> 
> Nope, because:
> 
> - ASCII characters are expressed in utf-8 identically ('.' is '.')
> - No non-ASCII utf-8 character includes in its multibyte
> representation any byte which is also an ASCII character (all bytes
> of multibyte utf-8 characters have the high bit set). Thus, you can't
> accidentally mistake part of some other character as '.'

Yeah you're right. 


Home | Main Index | Thread Index | Old Index