Subject: Re: CVS commit: src/sys/dev/usb
To: Dieter Baron <dillo@danbala.tuwien.ac.at>
From: Tom Spindler <dogcow@babymeat.com>
List: tech-kern
Date: 03/03/2007 22:20:01
> IN need not be NUL-terminated, in which case OUT will not be eiter.
> Neither the pathname component passed to VOP_LOOKUP, nor the on disk
> structure of at leaset HFS+ are NUL-terminated, so it makes little
> sense to require NUL-termination in the conversion routine.
Oh, yuck. In that case, yeah.
> > > I'm not sure how to handle invalid imput.
> > >
> > > -) fs/unicode.h assumes invalid UTF-8 sequences to be ISO 8859-1
> > > (Latin 1). NB: ISO 8859-1 text has a very low likelihood of being
> > > valid UTF-8.
> >
> > I think this is not reasonable.
Oh, crap. I meant 'this is not UNreasonable'. I don't think a sysctl
knob is needed.
> > > -) What about UTF-16 surrogates that are not paired?
> >
> > [4] says "Therefore a converter must treat this as an error." I'm
> > inclined to agree.
>
> Okay, so do I. The question was how to treat the error. If it is a
> file name stored on disk, do we drop it and make it impossible to
> access that file? Or do we simply UTF-8 encode the singel surrogate,
> returning invalid UTF-8 that can, however, be used to access the file?
I think the latter is preferable.
> The UTF-8 specification explicitly forbids decoding them, since they
> may bypass character checks made by Unicode unaware routines (like,
> e.g. the pathname splitting on '/'). If we deocded the overlong
> encoding above, we would create a file with '/' as part of its name.
> Problematic at best.
Perhaps (silently) elide them?