Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: IDN hostname resolution in NetBSD
Geoff Adams wrote:
On 27 May 2010, at 3:59 AM, Johnny Billquist wrote:
Personally, I stay far away from UTF-8 whenever I can. It's not a good
solution, the only problem being that it's now the standard, so no other better
solution is going to come along. :-(
(Actually, Unicode is part of the problem, but that is here to stay as well.)
Hmmm. I don't particularly want to get into a character set or encoding war,
here, but I've been working extensively with mutli-lingual systems for over a
decade, now, and I find the increasing adoption of Unicode to be a welcome
development. It's finally possible to operate seamlessly in multiple languages
at once, and I only wish there were more universal adoption.
Oh, no doubt about the fact that some solution was needed. It's just
that Unicode was/is a bad compromise in many ways. That was what my
complaint was about, not that a solution was needed, and Unicode fills
that need. I just wished that Unicode had had a concept instead of being
designed by a committe who couldn't decide what they wanted to do, or how.
I won't argue about the "correctness" or "bestness" or Unicode; it largely
doesn't matter. We needed a universal character set, and now we've got one. Its existence as a
standard is, in fact, its most useful property.
I agree. And since my complaints was about the "goodness" of Unicode, we
don't need to argue at all then.
UTF-8, though, is a fantastic thing. It's very well thought-out and has several
technical features that make it really useful. Combined with the feature that
US-ASCII, the ISO 8869-* character sets (including 8859-1, aka Latin 1), and
Unicode all have the same first 128 code points, UTF's encoding of the first
128 code points of Unicode using the exact same bytes means that any US-ASCII
text can be correctly interpreted as UTF-8. This makes it very easy to take
many systems that were implemented only understanding US-ASCII and convert them
to full Unicode support without breaking backward compatibility. In some cases,
code doesn't have to change at all.
Bleh. And then total confusion ansues because someone throws in some
other coding, and things just gets messed up. Not to mention all the
programs that assume a fixed width font, and that tries to make output
match up in columns, who totally gets lost when UTF-8 characters comes
along, since they suddenly means that the column count don't match the
number of bytes output. (And there is a lot of that out there.)
All that aside, I really just wanted to make a point about why Chris's attempts
to set an IDN hostname are important. Now that the root name servers contain
some IDN zones, there are real-live non-US-ASCII domain names in the wild right
now. (Well, or punycode domain names, if you prefer.) For instance:
http://ÙØØØØ-ØÙØØØØÙØØ.ÙØØ/ (that's Arabic for <Ministry of
Communications>.<Egypt>)
This is a mixed blessing.
I have no way of either typing that, understanding what it is, what it
points at, or even if some other URL might be the same one, since I
cannot even remember the glyphs long enough to compare it to something
I'm looking at 2 seconds later.
Now, is this a good thing? I'd hesitate to say yes. It's the same thing
with user names, and email addresses.
Going for adoption to every language and character set in the world for
these things means that Internet will eventually cease to be Internet,
and will become more limited, narrow and local from everyones view.
Of course, it's good from the point of view of those who currently don't
understand or write in english, and currently don't access the internet
at all, but is the solution really to divide the whole idea up in small
subpartitions?
Of course, even today, that web server could live on NetBSD, since (a) it's
probably using virtual hosting (in fact, that server's PTR record is
'mcit.gov.eg'), and (b) it could have its hostname be the punycode version of
that name even if it weren't.
Still, it will be more and more desirable to have hostnames (which should,
really, match the host's name in DNS) be IDNs, and it would be really good if
NetBSD could support this.
The in-kernel hostname, ideally, should be in something like Unicode (as UTF-8
would be nice), and only DNS-resolving software would need to worry about the
conversion to punycode. Of course, there's a lot of DNS-aware software out
there. How much of it is also aware of the local hostname? Certainly, MTAs are.
And, of course, this also brings up all the issues of what encoding the user's
locale is set to, the other issues Chris brought up, and no doubt yet others.
It won't be a quick fix, but will take some careful planning and coordination.
Well, one part of me is saying that we should allow the hostname to be
anything. Why put limits on it? Another part of me really dislikes where
this will all end up.
But that is another story. :-)
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt%softjar.se@localhost || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Home |
Main Index |
Thread Index |
Old Index