tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface

On Sat, Apr 06, 2013 at 07:47:00PM -0400, Mouse wrote:
> I'm talking about the way 0x00 and 0x2f octets are special in pathnames
> at the syscall interface.  This is annoying to applications that want
> to name files with fundamentally non-character-string data.  The live
> use case I mentioned a message or two ago is an example - that really
> _wants_ to name files with time_t values (actually, time_t plus a
> disambiguator serial number).

I think you will admit that your example of "binary" filenames is not a
common use. This means that to use these filenames, the utilities have
to know about the convention. A "binary" filename can be represented
without ado by an hexadecimal (or whatever else base) string that is
full ASCII, that is identically UTF-8, that is already possible now
without changing everything (and if one really wants, the binary
filenames can have a suffix giving the base: 0x...).

The solution is mainly in userspace, not at the kernel level, because
UTF-8 allows "more", allows an hexadecimal encoding allowing all, and is
compatible with all utilities expecting strings.

There is no panacea. But UTF-8 is the most interesting solution, because
it allows existing, allows non existing, and does not cost a lot of
modifications to the existing code base.
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

Home | Main Index | Thread Index | Old Index