tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



> I'm saying something very simple that you agree with: that it's
> impossible to interpret a string correctly without knowing its
> encoding.  You're just unwilling to tolerate any change to make that
> possible.

Eh.  No.

To make it possible, I'm fine with.  What I find intolerable (as in, I
would/will not tolerate it on my systems) is your apparent zeal to
mandate it, to impose it even on use cases that not only do not benefit
but would even be hampered by it.  It's only in the last message or two
that you've indicated you'd be okay with having equal support for ASCII
or 8859-1 or the like; up until then what I've seen from you has been
UTF-8 all the way all the time.

>> (It _would_ be nice if what we had were closer to a truly
>> encoding-blind opaque octet string model....)
> I am curious what you mean by that.  You've alluded to it a couple of
> times.

I'm talking about the way 0x00 and 0x2f octets are special in pathnames
at the syscall interface.  This is annoying to applications that want
to name files with fundamentally non-character-string data.  The live
use case I mentioned a message or two ago is an example - that really
_wants_ to name files with time_t values (actually, time_t plus a
disambiguator serial number).  I'd _like_ to see interfaces that,
rather than using C strings for things such as directory entry names or
pathnames, represent them in some form that's completely content-blind.
A single component might be a length-and-count; a pathname might be a
counted list of such things - or there might be a better way; I haven't
given the possibilities much thought.  (The existing interfaces would
presumably continue to be supported for compatability and/or
convenience, but they would be somewhat in the nature of
abbreviations.)

After all, there's no reason d_name[] has to have any forbidden octet
values (to pick the filesystem I know best), and, indeed, non-Unix
systems speaking to Unix NFS servers have been known to create
directory entries which (in Unix terms) have slashes in their d_name
strings, much to the consternation of people trying to work with them
from the Unix side.  (I've never heard of the analogous problem with
0x00, probably at least in part because the NFS clients tend to use
C-style strings too, probably at least in part because the NFS servers
_also_ tend to use C strings.)

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index