Subject: Re: utf-8 and userland
To: None <tech-userlevel@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-userlevel
Date: 03/12/2004 21:32:21
> I wonder if the day isn't past that ls(1) should worry about this.

I am inclined to agree with this.  However...

> If the filenames are UTF-8 and the fonts are UTF-8 and the xterm is
> UTF-8 but LC_CTYPE is wrong, should ls refuse to copy the filenames
> to stdout unmolested?

Yes, I think so, unless it is decided that ls should (FWVO "should")
always pass all filenames to stdout unmolested.

> OTOH, if LC_CTYPE is UTF-8 but one of those other things is out of
> whack, then it should assume everything's kosher?

Again, yes, I believe so.

> Clearly, LC_CTYPE is a pretty poor window on what's what.

Yes, but it is the only one ls has.  Setting your environment variables
so as to lie to ls about what kind of display environment you have
counts, to my mind, as pilot error pure and simple.  LC_CTYPE and its
ilk exist in order to make it possible to communicate that information
to applications, after all.

> The we-sell-rope principle suggests ls(1) should write the filenames
> to its output, and let the bytes fall where they may.

Yes, and I'm at least somewhat inclined to that point of view.  But I'm
also somewhat inclined to the contrary; I don't like the idea of
someone naming a file with an escape sequence to program a terminal's
answerback message and then request the terminal send its answerback,
then wait until root does an ls on it.

I see no good balance between the two.  Lacking that, I prefer to err
on the side of more-secure operation, which means having ls, at the
very least, default to censoring octets that are known to be capable of
introducing such escape sequences in not-too-uncommon environments (ESC
and CSI come to mind immediately).

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B