Subject: Re: utf-8 and userland
To: James K. Lowden <jklowden@schemamania.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-userlevel
Date: 03/17/2004 13:38:08
--qOrJKOH36bD5yhNe
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Mar 16, 2004 at 12:45:42AM -0500, James K. Lowden wrote:
> On Mon, 15 Mar 2004, Bill Studenmund <wrstuden@NetBSD.org> wrote:
> > > The filesystem has no label indicating the encoding of its metadata.=
=20
> > > It's not an error -- as far as the filesystem is concerned -- to have
> > > different filenames encoded differently. =20
> >=20
> > You are correct. However when we fully support locales, I expect that we
> >=20
> > will report directory entries to userland (or at least to a given user)
> > in a specific character encoding.
> >=20
> > So if I'm using a UTF-8 locale, ls will always see UTF-8 file names.
>=20
> I don't see how that's possible, Bill.  We can't "report directory entries
> to userland" in anything, unless we first know how they're encoded in
> situ.  Absent a label, we can only assume, and you know where that leads.=
=20

It's possible for a number of file systems. NTFS and HFS do it. We can add=
=20
it to ffs with a mount option or some encoding in the FS itself (say an=20
attribute off of the root inode). So it's not at all hard to deal with=20
knowing how the file names are encoded. Thus the kernel would be able to=20
correctly translate to userland.

Take care,

Bill

--qOrJKOH36bD5yhNe
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFAWMVAWz+3JHUci9cRAn1gAJ4i66vYLl+6qOUsdrWxpnJG3SWJugCfVHuB
tRxrfBN8FQQfEDOlsWwchgE=
=9SiL
-----END PGP SIGNATURE-----

--qOrJKOH36bD5yhNe--