Subject: Re: CVS commit: src/sys/dev/usb
To: Dieter Baron <dillo@danbala.tuwien.ac.at>
From: Tom Spindler <dogcow@babymeat.com>
List: tech-kern
Date: 03/03/2007 22:20:01
>   IN need not be NUL-terminated, in which case OUT will not be eiter.
> Neither the pathname component passed to VOP_LOOKUP, nor the on disk
> structure of at leaset HFS+ are NUL-terminated, so it makes little
> sense to require NUL-termination in the conversion routine.

Oh, yuck. In that case, yeah.

> > >   I'm not sure how to handle invalid imput.
> > > 
> > >     -) fs/unicode.h assumes invalid UTF-8 sequences to be ISO 8859-1
> > >        (Latin 1).  NB: ISO 8859-1 text has a very low likelihood of being
> > >        valid UTF-8.
> > 
> > I think this is not reasonable.

Oh, crap. I meant 'this is not UNreasonable'. I don't think a sysctl
knob is needed.
 
> > >     -) What about UTF-16 surrogates that are not paired?
> > 
> > [4] says "Therefore a converter must treat this as an error." I'm
> > inclined to agree.
> 
>   Okay, so do I.  The question was how to treat the error.  If it is a
> file name stored on disk, do we drop it and make it impossible to
> access that file?  Or do we simply UTF-8 encode the singel surrogate,
> returning invalid UTF-8 that can, however, be used to access the file?

I think the latter is preferable.

>   The UTF-8 specification explicitly forbids decoding them, since they
> may bypass character checks made by Unicode unaware routines (like,
> e.g. the pathname splitting on '/').  If we deocded the overlong
> encoding above, we would create a file with '/' as part of its name.
> Problematic at best.

Perhaps (silently) elide them?