tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF-8 cleanliness

* On 2019-09-17 at 14:30 BST, Martin Husemann wrote:

> On Tue, Sep 17, 2019 at 03:24:30PM +0200, Hauke Fath wrote:
> > On 2019-09-16 18:10, Kamil Rytarowski wrote:
> > > I would prefer to normalize to ASCII as such things tend to break across
> > > filesystems/setups/archives etc.
> > 
> > I would like to second that.
> You are talking about filenames, I am pretty sure the original question was
> about what is in the files.

Well, both really.  If you are writing, for example, a pkg_summary
parser, then you'd need to support ISO-8859 characters from DESCR
files for the DESCRIPTION= fields (we currently have a few files that
match this), but not really want to allow those for PKGNAME= et al.

Same goes for PLIST files where filenames are embedded alongside other
entries (@name, @*dep, etc) that should (ideally) be more strict.

> For pkgsrc filenames I agree - ascii should be enough. For content, I don't
> see why we would support anything but UTF8 (and that only in certain fields,
> so the original question IMHO was a very interesting one).

FWIW I'm in agreement with what others have proposed (permit ISO-8859
filenames, restrict package names and related to ASCII, UTF-8
everywhere else), I was just curious if anyone knew of a specification
somewhere that spelled this out clearly.

Jonathan Perkin  -  Joyent, Inc.  -

Home | Main Index | Thread Index | Old Index