NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
UTF-8 (Re: QT4 file widget doesn't see image with %C3%AD in file name)
(Posted to netbsd-users since most of it isn't directly relevant on
pkgsrc-users. Also, it would be nice to see some wider discussion about
this sort of thing.)
On Wed, 22 Oct 2008, Jeremy C. Reed wrote:
I don't know if this is a NetBSD problem, a QT4 problem or an LyX problem.
(I have latest LyX ready to commit to pkgsrc.)
I haven't followed pkgsrc-users in a while, and I'm not familiar with QT,
LyX, or even much of NetBSD's handling of this kind of stuff.
But I know a little bit about text encoding!
So the problem here is that you have a filename with a non-ASCII character
in it. (well, *duh*, you say..)
http://en.wikipedia.org/wiki/Image:Presidio_La_Bah%C3%ADa.jpg
That's URL-escaped UTF-8 encoded Unicode. The "i"-with-accent character
can be represented in one byte in some encodings (0xED in ISO 8859-1), but
here it is UTF-8 encoded as two bytes, and since those bytes are not valid
ASCII they get escaped as %C3%AD in the URL.
Seems that the filename is saved UTF-8 encoded (with 0xC3 0xAD) too:
My xterm shows it as (with two question marks):
Presidio_La_Bah??a.jpg
"ls" refuses to emit those non-ASCII bytes and goes with '?' mangling
instead.
(-q : "this is the default when output is to a terminal")
Copying and pasting the name from Mozilla beeps and loses the character:
Presidio La Baha.jpg
And when you input that into the xterm directly, via pasting, your shell
takes it as a personal insult and refuses to deal with those weird foreign
bytes because they smell funny.
(bytes, or just byte. Since both [0xC3 0xAD] and [0xED] is non-ASCII
you'd see the same thing.)
Is this a NetBSD problem that NetBSD should fix? A NetBSD problem that Qt
should work around? A QT problem?
I'd personally guess that the immediate problem is in Qt. Both bytes
should be printable as regular characters on a system that handles
ISO8859-1, but the second ("soft hyphen") seems to offer some opportunity
for getting it wrong. Or it might even be recognized as UTF-8 by Qt...
(It's funny, though. Opera (which uses Qt) collapses the %C3%AD in the
URL to a single character in the address bar. Copy / paste from the page
works just fine; both to a UTF-8-using mlterm and a ISO8859-using wterm
(rxvt), which is kind of surprising if you think about it.)
But apart from that, there's still ls, the terminal emulator and the
shell. You can use a good terminal emulator (x11/mlterm is nice IMO), and
use "ls -w" (apparently, I just looked it up) and the filename should show
up correctly in a listing.
If you can figure out how to tease your shell out from the 1970's, cut and
paste should work just fine too. It is not hard, but it probably differs
a lot from shell to shell.
Any suggestions?
Round up everyone who still thinks 7-bit ASCII is a good idea and deport
them to Venus.
MAgnus
Home |
Main Index |
Thread Index |
Old Index