UTF-8 cleanliness

Is it assumed that any part of pkgsrc can contain characters that are
not UTF-8 clean?

There are a number of places where this is always going to be the
case, for example some of the aspell-* packages install specific
language files, and some DESCR files contain author names.

Is there any reason why a package name couldn't?  Would we ever want
it to?  Same for anything else that might be meaningfully used.

Ideally we'd have some documentation which is explicit about what
formats are supported in various parts of the infrastructure, but I've
not found much.

(Background: I'm working on something that parses various things and
am now coming up against this).

Jonathan Perkin  -  Joyent, Inc.  -

