[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A draft for a multibyte and multi-codepoint C string interface
On Tue, Apr 02, 2013 at 05:31:03PM +0200, Steffen Daode Nurpmeso wrote:
> So, for this, some locale-dependent pre/after parser is or would
> be necessary.
But doesn't this all mean, that this handling is done at the user level.
That a filename is just an user mean to refer to some sequential set of
For this filename to be able to be human readable, it has to be
interpreted with a mapping to glyphes.
That this means that the filename, as far as the kernel is concerned,
should be in an universal encoding and without making assumptions about
That UTF-8 is the answer, since this allows to use C "char" (at least an
octet, signed or unsigned) programs.
So that the kernel interface should take and give UTF-8, and that
filesystem drivers should take and give UTF-8, user level utilities
converting from the current encoding to unicode and UTF-8.
But that's all. If one user really wants to take into account acrobatics
about collating sequences and the like, he can use/develop a program to
But as far as the kernel and the drivers are concerned, a filename is
uniquely defined by a C char string (happening to be UTF-8).
UTF-8 has the same role as UTC time. There is one and only one canonical
representation, fixed. And the display of the information is customized
according to user level rules.
UTF-8 has the properties (designed for that) to be an historical
compatible encoding (C char strings) with a limited impact in size
for current names (ASCII; even in France, I see rarely names with
accented characters in filenames), and without limitation with what
can be encoded.
What I don't get is why the kernel should be plagued with user level or
Even an 8bit clean implementation can be "converted" to UTF-8 since it
is just a matter of convention: it can do UTF-8 without even knowing...
UTF-8, I can see. Unicode with 16bits or 24bits, I personnally
don't want to be trapped to use. And, at the kernel or drivers
level, name equality being defined as strcmp(a, b) == 0 (if a user
wants something else at user level, that's his problem and his
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Main Index |
Thread Index |