IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SFTP and unicode file names...



>>> [...filenames...UTF-8...character set...conversions...]

I see a philosophical problem here that I suspect is at the root of
many of these difficulties.

What are file names?

On most Unix variants, file names are opaque octet sequences (with 0x00
octets not permitted and 0x2f octets permitted only as component
separators).  (Some very old variants also forbid octets 0x80-0xff.)

Note that they are not character sequences.  Interpreting those octets
as characters is something done only for human-interaction purposes and
is done using whatever character set the display device is using.
Thus, even _contemplating_ doing character set conversions anywhere is,
from a Unix perspective, a grave mistake.

However, other systems complicate matters.  I have seen it said (with
what truth I know not - I am no Windows geek) that Windows filenames
are sequences of Unicode characters, or more precisely sequences of
Unicode codepoints, from which perspective character set conversions
are perfectly reasonable.

I see no good way to cater to both philosophies other than either to
have a bit indicating whether a given filename is an octet sequence or
(an encoding of) a character sequence, or perhaps to have a character
set indicator but with a special value reserved to indicate "raw octet
sequence".

Thoughts?  Agree?  Disagree?

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse%rodents.montreal.qc.ca@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B



Home | Main Index | Thread Index | Old Index