Re: IDN hostname resolution in NetBSD

To: Geoff Adams <gadams+netbsd%avernus.com@localhost>
Subject: Re: IDN hostname resolution in NetBSD
From: Johnny Billquist <bqt%softjar.se@localhost>
Date: Fri, 28 May 2010 14:03:18 +0200

Geoff Adams wrote:

On 27 May 2010, at 3:59 AM, Johnny Billquist wrote:

Personally, I stay far away from UTF-8 whenever I can. It's not a good 
solution, the only problem being that it's now the standard, so no other better 
solution is going to come along. :-(
(Actually, Unicode is part of the problem, but that is here to stay as well.)


Hmmm. I don't particularly want to get into a character set or encoding war, 
here, but I've been working extensively with mutli-lingual systems for over a 
decade, now, and I find the increasing adoption of Unicode to be a welcome 
development. It's finally possible to operate seamlessly in multiple languages 
at once, and I only wish there were more universal adoption.

Oh, no doubt about the fact that some solution was needed. It's justthat Unicode was/is a bad compromise in many ways. That was what mycomplaint was about, not that a solution was needed, and Unicode fillsthat need. I just wished that Unicode had had a concept instead of beingdesigned by a committe who couldn't decide what they wanted to do, or how.

I won't argue about the "correctness" or "bestness" or Unicode; it largely 
doesn't matter. We needed a universal character set, and now we've got one. Its existence as a 
standard is, in fact, its most useful property.

I agree. And since my complaints was about the "goodness" of Unicode, wedon't need to argue at all then.

UTF-8, though, is a fantastic thing. It's very well thought-out and has several 
technical features that make it really useful. Combined with the feature that 
US-ASCII, the ISO 8869-* character sets (including 8859-1, aka Latin 1), and 
Unicode all have the same first 128 code points, UTF's encoding of the first 
128 code points of Unicode using the exact same bytes means that any US-ASCII 
text can be correctly interpreted as UTF-8. This makes it very easy to take 
many systems that were implemented only understanding US-ASCII and convert them 
to full Unicode support without breaking backward compatibility. In some cases, 
code doesn't have to change at all.

Bleh. And then total confusion ansues because someone throws in someother coding, and things just gets messed up. Not to mention all theprograms that assume a fixed width font, and that tries to make outputmatch up in columns, who totally gets lost when UTF-8 characters comesalong, since they suddenly means that the column count don't match thenumber of bytes output. (And there is a lot of that out there.)

All that aside, I really just wanted to make a point about why Chris's attempts 
to set an IDN hostname are important. Now that the root name servers contain 
some IDN zones, there are real-live non-US-ASCII domain names in the wild right 
now. (Well, or punycode domain names, if you prefer.) For instance:

        http://ÙØØØØ-ØÙØØØØÙØØ.ÙØØ/        (that's Arabic for <Ministry of 
Communications>.<Egypt>)


This is a mixed blessing.

I have no way of either typing that, understanding what it is, what itpoints at, or even if some other URL might be the same one, since Icannot even remember the glyphs long enough to compare it to somethingI'm looking at 2 seconds later.

Now, is this a good thing? I'd hesitate to say yes. It's the same thingwith user names, and email addresses.

Going for adoption to every language and character set in the world forthese things means that Internet will eventually cease to be Internet,and will become more limited, narrow and local from everyones view.

Of course, it's good from the point of view of those who currently don'tunderstand or write in english, and currently don't access the internetat all, but is the solution really to divide the whole idea up in smallsubpartitions?

Of course, even today, that web server could live on NetBSD, since (a) it's 
probably using virtual hosting (in fact, that server's PTR record is 
'mcit.gov.eg'), and (b) it could have its hostname be the punycode version of 
that name even if it weren't.

Still, it will be more and more desirable to have hostnames (which should, 
really, match the host's name in DNS) be IDNs, and it would be really good if 
NetBSD could support this.

The in-kernel hostname, ideally, should be in something like Unicode (as UTF-8 
would be nice), and only DNS-resolving software would need to worry about the 
conversion to punycode. Of course, there's a lot of DNS-aware software out 
there. How much of it is also aware of the local hostname? Certainly, MTAs are.

And, of course, this also brings up all the issues of what encoding the user's 
locale is set to, the other issues Chris brought up, and no doubt yet others. 
It won't be a quick fix, but will take some careful planning and coordination.

Well, one part of me is saying that we should allow the hostname to beanything. Why put limits on it? Another part of me really dislikes wherethis will all end up.


But that is another story. :-)

        Johnny

--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol

References:
- IDN hostname resolution in NetBSD
  - From: Chris Ross
- Re: IDN hostname resolution in NetBSD
  - From: Chris Ross
- Re: IDN hostname resolution in NetBSD
  - From: Johnny Billquist
- Re: IDN hostname resolution in NetBSD
  - From: Geoff Adams

Prev by Date: Re: any support for Atheros AR9285 wireless and AR8132 ethernet?
Next by Date: Re: any support for Atheros AR9285 wireless and AR8132 ethernet?
Previous by Thread: Re: IDN hostname resolution in NetBSD
Next by Thread: Re: IDN hostname resolution in NetBSD
Indexes:

Home | Main Index | Thread Index | Old Index