Subject: Re: finger
To: SUNAGAWA Keiki <kei_sun@ba2.so-net.ne.jp>
From: Greg A. Woods <woods@weird.com>
List: tech-userlevel
Date: 09/11/2002 00:01:16
[ On Wednesday, September 11, 2002 at 08:26:01 (+0900), SUNAGAWA Keiki wrote: ]
> Subject: Re: finger
>
> > The failures with emacs dired vs. locale-sensitive ftpd and ls are
> > _NOT_, repeat _NOT_, a problem with use of locales in those utilities.
> > The bug is _entirely_ and _only_ in 'dired' not also being locale-
> > sensitive.
> 
> Ok, so you will set up the locale setting of your ftp
> (inetd, actually) server to EUC-JP for me, Thanks.  But you
> will want to read mangled characters of your terminal when
> connecting your local server, won't you?

No, you mis-understand.  The 'dired' _failures_ were/are primarily due
to _client_ code using locale settings that 'dired' did/does not take
proper control over.  That's why the failures occur for 'ls' as well.

The FTP server need only follow the FTP protocol and to pass data
transparently to/from the client.  The FTP protocol relies entirely on
numeric response codes.  The text is just there for humans who might
find it interesting to read while they're waiting for their data to
transfer.  If the text from your FTP server happens to be in Japanese
and encoded using EUC-JP, well then that's too bad for me if I can't
read and understand it.  However provided I can supply the correct
pathname for the file I wish to retrieve then I won't really be any
worse off.

(well there is the broken assumption made by many FTP users that the
directory listing returned by something like "dir -l" will be exactly as
if "ls -l" were run in the same directory directly on the server.
However that's just a broken assumption about how to interpret the
output of the 'dir' command when giving it a magic option -- not a
failing in the protocol.  "dir -l" is a bad hack and its misuse by
things like 'dired' is not really a fault of any FTP daemon)

> > Indeed the network need not even be invovled for this issue to arise.
> > The local 'finger' program may read data from someone's ~/.plan that is
> > not intended for the same locale as the terminal that data is about to
> > be displayed upon.
> > 
> > Of course on the receiving end the client program must take care to
> > ensure that the data received will not adversely affect the user's
> > terminal (and it must live with the assumption that the client user has
> > correctly set his or her locale to match the capabilities of the
> > terminal).  This same rule must apply regardless of whether the data
> > has been transfered over a network or just sucked from a local file.
> 
> Why you simply make a assumption that all the user use same
> locale in a machine?  No, you cannot.

I'm afraid I didn't make myself clear.  I was trying to say exactly what
you have said.  My point is that the 'finger' program will display its
prompts, error messages, and the content of another's ~/.plan, etc. in
the locale that the user running the program has configured (or if none
was set specifically then the default locale the administrator has
configured).  There is no way for even the local client to know what
locale the other user edited their ~/.plan in and thus what encoding it
is stored using.  The 'finger' program must simply do its best to ensure
that it does not send any data which could adversely affect the user's
terminal, no matter what byte values are passed to it, even if the
locale it is using is a multi-byte one.

>  It is same situation
> on ftpd also.  The idea of locale-sensitive daemon is just
> wrong.

We're not talking about daemons here, especially not any that try to
interpret a client's (or a local user's) locale.  We're talking about
simple data transfer programs which are used indepently by users who may
have different ideas about what character encodings they are each using.
It doesn't matter if the data is read from a file on the local file
system or over the network from some finger server on a remote system --
both the filesystem and the network client/server implementations _must_
stay out of the way and simply pass all data without modification or
omission.

It is up to the users to decide if they can understand the text that is
ultimately presented on their terminal and if not to try to adjust their
terminal and their locale settings until they can.  (or possibly to
retrieve the raw data in some way and pass it through something like
'recode' so that it is in a form their terminal can properly display)

Of course the character encoding used by a user to write a ~/.plan file
may not even be the same encoding used by default to present system
information about the user

The only real comprehensive solution that will allow all users to see
all languages simultaneously is to force all users to read and store all
data in a unified encoding system that can simultaneously represent all
characters that any user might ever use anywhere.  Even then though the
job of the filesystem and of the network client/server programs remains
exactly the same:  pass all data transparently without modification or
omission.  It's the same idea as with something like timezones in some
shared collection such as a CVS repository.  There must be one universal
standard time that everyone agrees on and which all timestamps are
recorded relative to; and then for those who cannot deal directly with
the universal time in its native form there can be translations made at
presentation time to values that a given local user can understand.
However these values must be stored in, and transmitted in, the
universal encoding.  However to achive this nirvana we must all abandon
our localised encodings and agree upon one universal unified all-
encompassing standard encoding and convert all our stored data to that
new standard so that no matter who retrieve it, or how they mix it with
other data from anywhere else, it retains its original form and meaning.

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>