Subject: Re: telnet problem
To: Bill Studenmund <skippy@macro.stanford.edu>
From: Erik E. Fair <fair@clock.org>
List: current-users
Date: 06/30/1998 00:12:14
At 22:29 -0700 6/29/98, Bill Studenmund wrote:
>On Mon, 29 Jun 1998, Erik E. Fair wrote:
>
>> There is a word that needs to be spoken that explains this:
>>
>> ISO-8859-1
>>
>> TELNET doesn't know from character sets beyond the original NETASCII. I
>> don't know if there is a TELNET option for specifying character set, but
>> it's something we should explicitly support, instead of "just send 8 bits."
>
>Should there be one? telnet in BINARY mode's supposed to just be a way of
>getting the bytes to the other side.
>
>Dealing with ISO character sets, etc, is the job of the terminal emulator,
>not something like telnet/telnetd.
The trouble is that the the server and client must agree on what the
character set is ("code set" to be more precise) so that there is no
confusion about what the bytes really are. BINARY mode is just bits with no
particular interpretation attached. Where you get into trouble is in places
where the code sets don't match, like on the Mac and PC - there are
differing high-bit-on characters in quite a few code positions. Is that a
bullet? An Apple logo? Or something else?
Please believe me when I say that I know whereof I speak; I spent a few
years in the IETF MIME wars over this very issue. If we ever want even a
*prayer* of internationalizing NetBSD, it must be done in a
character-set-explicit manner. To "just send 8 bits" with no explicit
negotiation of character set leads to chaos and madness, because it is a
lead-pipe-cinch that not everyone will agree on the interpretation of the
bits without a label identifying *positively* what they represent.
This is such an Evil area, particularly where legacy systems like UNIX are
concerned - lots of software to fix. It would be so much easier if everyone
would simply capitulate to American Cultural Imperialism and use English
exclusively, but I think everyone here agrees that particular scenario is
not likely to come about in our lifetimes.
Fortunately for us, a lot of this work has already been done, APIs
specified, and so on. We just need to implement and integrate. We also need
to watch out for situations like this one with TELNET, where we're sending
unlabelled data (TELNET BINARY MODE) with a *presumption* of how it will be
interpreted on the other end (ISO-8859-1, I bet).
The funny thing is that the oldest PR in the GNATS database is an i18n
issue. It's been with us since the very beginning. We just need to be a
touch more sensitive to it.
please forgive the rant,
Erik <fair@clock.org>