Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: IDN hostname resolution in NetBSD



On 27 May 2010, at 3:59 AM, Johnny Billquist wrote:

> Personally, I stay far away from UTF-8 whenever I can. It's not a good 
> solution, the only problem being that it's now the standard, so no other 
> better solution is going to come along. :-(
> (Actually, Unicode is part of the problem, but that is here to stay as well.)

Hmmm. I don't particularly want to get into a character set or encoding war, 
here, but I've been working extensively with mutli-lingual systems for over a 
decade, now, and I find the increasing adoption of Unicode to be a welcome 
development. It's finally possible to operate seamlessly in multiple languages 
at once, and I only wish there were more universal adoption.

I won't argue about the "correctness" or "bestness" or Unicode; it largely 
doesn't matter. We needed a universal character set, and now we've got one. Its 
existence as a standard is, in fact, its most useful property.

UTF-8, though, is a fantastic thing. It's very well thought-out and has several 
technical features that make it really useful. Combined with the feature that 
US-ASCII, the ISO 8869-* character sets (including 8859-1, aka Latin 1), and 
Unicode all have the same first 128 code points, UTF's encoding of the first 
128 code points of Unicode using the exact same bytes means that any US-ASCII 
text can be correctly interpreted as UTF-8. This makes it very easy to take 
many systems that were implemented only understanding US-ASCII and convert them 
to full Unicode support without breaking backward compatibility. In some cases, 
code doesn't have to change at all.

All that aside, I really just wanted to make a point about why Chris's attempts 
to set an IDN hostname are important. Now that the root name servers contain 
some IDN zones, there are real-live non-US-ASCII domain names in the wild right 
now. (Well, or punycode domain names, if you prefer.) For instance:

        http://ÙØØØØ-ØÙØØØØÙØØ.ÙØØ/     (that's Arabic for <Ministry of 
Communications>.<Egypt>)

Of course, even today, that web server could live on NetBSD, since (a) it's 
probably using virtual hosting (in fact, that server's PTR record is 
'mcit.gov.eg'), and (b) it could have its hostname be the punycode version of 
that name even if it weren't.

Still, it will be more and more desirable to have hostnames (which should, 
really, match the host's name in DNS) be IDNs, and it would be really good if 
NetBSD could support this.

The in-kernel hostname, ideally, should be in something like Unicode (as UTF-8 
would be nice), and only DNS-resolving software would need to worry about the 
conversion to punycode. Of course, there's a lot of DNS-aware software out 
there. How much of it is also aware of the local hostname? Certainly, MTAs are.

And, of course, this also brings up all the issues of what encoding the user's 
locale is set to, the other issues Chris brought up, and no doubt yet others. 
It won't be a quick fix, but will take some careful planning and coordination.

- Geoff


Home | Main Index | Thread Index | Old Index