tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/57544: sed(1) and regex(3) problem with encoding



> This whole "i18n" and "l10n" is a nightmare---and this is a not
> english native speaker who writes it...

And as a native anglophone - who knows a smattering of assorted other
languages - I agree.

I just recently ran into an occasion where something actually got me to
send mail to a domain whose mail was hosted by Google.  I sent it as
8859-14, because it involved a small amount of text in one of the
Gaelic dialects and I prefer to use seanċló when I can.

The text included a ċ.  But apparently, despite my marking it as
8859-14, by the time it got displayed (in their webmail interface, I
think), it had been converted into U+0104, LATIN CAPITAL LETTER A WITH
OGONEK, rather than the correct mapping, U+010B, LATIN SMALL LETTER C
WITH DOT ABOVE.

So I sent a test mail, containing each of the accented vowels and each
of the dotted consonants (well, most of them; I forgot Ṫ and ṫ, but
that's minor).

That mail, for all that it was also marked as being 8859-14, got
displayed as if it were 8859-1.

Not even Google, apparently, can get it even vaguely right.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index