Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: IDN hostname resolution in NetBSD



Geoff Adams wrote:
On 27 May 2010, at 3:59 AM, Johnny Billquist wrote:

Personally, I stay far away from UTF-8 whenever I can. It's not a good 
solution, the only problem being that it's now the standard, so no other better 
solution is going to come along. :-(
(Actually, Unicode is part of the problem, but that is here to stay as well.)

Hmmm. I don't particularly want to get into a character set or encoding war, 
here, but I've been working extensively with mutli-lingual systems for over a 
decade, now, and I find the increasing adoption of Unicode to be a welcome 
development. It's finally possible to operate seamlessly in multiple languages 
at once, and I only wish there were more universal adoption.

Oh, no doubt about the fact that some solution was needed. It's just that Unicode was/is a bad compromise in many ways. That was what my complaint was about, not that a solution was needed, and Unicode fills that need. I just wished that Unicode had had a concept instead of being designed by a committe who couldn't decide what they wanted to do, or how.

I won't argue about the "correctness" or "bestness" or Unicode; it largely 
doesn't matter. We needed a universal character set, and now we've got one. Its existence as a 
standard is, in fact, its most useful property.

I agree. And since my complaints was about the "goodness" of Unicode, we don't need to argue at all then.

UTF-8, though, is a fantastic thing. It's very well thought-out and has several 
technical features that make it really useful. Combined with the feature that 
US-ASCII, the ISO 8869-* character sets (including 8859-1, aka Latin 1), and 
Unicode all have the same first 128 code points, UTF's encoding of the first 
128 code points of Unicode using the exact same bytes means that any US-ASCII 
text can be correctly interpreted as UTF-8. This makes it very easy to take 
many systems that were implemented only understanding US-ASCII and convert them 
to full Unicode support without breaking backward compatibility. In some cases, 
code doesn't have to change at all.

Bleh. And then total confusion ansues because someone throws in some other coding, and things just gets messed up. Not to mention all the programs that assume a fixed width font, and that tries to make output match up in columns, who totally gets lost when UTF-8 characters comes along, since they suddenly means that the column count don't match the number of bytes output. (And there is a lot of that out there.)

All that aside, I really just wanted to make a point about why Chris's attempts 
to set an IDN hostname are important. Now that the root name servers contain 
some IDN zones, there are real-live non-US-ASCII domain names in the wild right 
now. (Well, or punycode domain names, if you prefer.) For instance:

        http://ÙØØØØ-ØÙØØØØÙØØ.ÙØØ/        (that's Arabic for <Ministry of 
Communications>.<Egypt>)

This is a mixed blessing.
I have no way of either typing that, understanding what it is, what it points at, or even if some other URL might be the same one, since I cannot even remember the glyphs long enough to compare it to something I'm looking at 2 seconds later.

Now, is this a good thing? I'd hesitate to say yes. It's the same thing with user names, and email addresses.

Going for adoption to every language and character set in the world for these things means that Internet will eventually cease to be Internet, and will become more limited, narrow and local from everyones view.

Of course, it's good from the point of view of those who currently don't understand or write in english, and currently don't access the internet at all, but is the solution really to divide the whole idea up in small subpartitions?

Of course, even today, that web server could live on NetBSD, since (a) it's 
probably using virtual hosting (in fact, that server's PTR record is 
'mcit.gov.eg'), and (b) it could have its hostname be the punycode version of 
that name even if it weren't.

Still, it will be more and more desirable to have hostnames (which should, 
really, match the host's name in DNS) be IDNs, and it would be really good if 
NetBSD could support this.

The in-kernel hostname, ideally, should be in something like Unicode (as UTF-8 
would be nice), and only DNS-resolving software would need to worry about the 
conversion to punycode. Of course, there's a lot of DNS-aware software out 
there. How much of it is also aware of the local hostname? Certainly, MTAs are.

And, of course, this also brings up all the issues of what encoding the user's 
locale is set to, the other issues Chris brought up, and no doubt yet others. 
It won't be a quick fix, but will take some careful planning and coordination.

Well, one part of me is saying that we should allow the hostname to be anything. Why put limits on it? Another part of me really dislikes where this will all end up.

But that is another story. :-)

        Johnny

--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol



Home | Main Index | Thread Index | Old Index