Subject: Re: utf-8 and userland
To: James K. Lowden <>
From: Bill Studenmund <>
List: tech-userlevel
Date: 03/17/2004 13:38:08
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Mar 16, 2004 at 12:45:42AM -0500, James K. Lowden wrote:
> On Mon, 15 Mar 2004, Bill Studenmund <> wrote:
> > > The filesystem has no label indicating the encoding of its metadata.=
> > > It's not an error -- as far as the filesystem is concerned -- to have
> > > different filenames encoded differently. =20
> >=20
> > You are correct. However when we fully support locales, I expect that we
> >=20
> > will report directory entries to userland (or at least to a given user)
> > in a specific character encoding.
> >=20
> > So if I'm using a UTF-8 locale, ls will always see UTF-8 file names.
> I don't see how that's possible, Bill.  We can't "report directory entries
> to userland" in anything, unless we first know how they're encoded in
> situ.  Absent a label, we can only assume, and you know where that leads.=

It's possible for a number of file systems. NTFS and HFS do it. We can add=
it to ffs with a mount option or some encoding in the FS itself (say an=20
attribute off of the root inode). So it's not at all hard to deal with=20
knowing how the file names are encoded. Thus the kernel would be able to=20
correctly translate to userland.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)