[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: UTF-8 cleanliness
* On 2019-09-17 at 14:30 BST, Martin Husemann wrote:
> On Tue, Sep 17, 2019 at 03:24:30PM +0200, Hauke Fath wrote:
> > On 2019-09-16 18:10, Kamil Rytarowski wrote:
> > > I would prefer to normalize to ASCII as such things tend to break across
> > > filesystems/setups/archives etc.
> > I would like to second that.
> You are talking about filenames, I am pretty sure the original question was
> about what is in the files.
Well, both really. If you are writing, for example, a pkg_summary
parser, then you'd need to support ISO-8859 characters from DESCR
files for the DESCRIPTION= fields (we currently have a few files that
match this), but not really want to allow those for PKGNAME= et al.
Same goes for PLIST files where filenames are embedded alongside other
entries (@name, @*dep, etc) that should (ideally) be more strict.
> For pkgsrc filenames I agree - ascii should be enough. For content, I don't
> see why we would support anything but UTF8 (and that only in certain fields,
> so the original question IMHO was a very interesting one).
FWIW I'm in agreement with what others have proposed (permit ISO-8859
filenames, restrict package names and related to ASCII, UTF-8
everywhere else), I was just curious if anyone knew of a specification
somewhere that spelled this out clearly.
Jonathan Perkin - Joyent, Inc. - www.joyent.com
Main Index |
Thread Index |