Subject: Re: UTF-8 file names?
To: Bill Studenmund <wrstuden@netbsd.org>
From: Arto Huusko <arto.huusko@utu.fi>
List: tech-userlevel
Date: 09/13/2004 21:55:34
On Mon, 13 Sep 2004, Bill Studenmund wrote:

> On Mon, Sep 13, 2004 at 04:25:51PM +0200, Hubert Feyrer wrote:
> > On Mon, 13 Sep 2004, Thomas Klausner wrote:
> > > How can I make NetBSD's base system programs use UTF-8 file
> > > names by default?
> >
> > Use... how?
> >
> > In Unix, filenames can contain any chars except NUL and '/'.
> > Maybe start by using a terminal that's UTF-8 capable (uxterm, ...)?
>
> The one problem I see is that a multi-byte UTF-8 sequence can end in what
> looks like an ASCII '/'. Since it's part of a multi-byte sequence, that
> would be wrong.

No. One of the properties of UTF-8 is that all the octets in an
UTF-8 character that is encoded using multiple octets have their
highest bit set.

Quoting RFC 3629:

"US-ASCII octet values do not appear otherwise in a UTF-8 encoded
character stream. This provides compatibility with file systems
or other software that parse based on US-ASCII values..."