[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A draft for a multibyte and multi-codepoint C string interface
>> It costs quite a lot, actually, because it would mean that
>> everything working with pathnames as other than opaque octet strings
>> has to be aware of its idiosyncracies, such as the normalization
>> rules, as I mentioned above.
> No. Did you read Rob Pike and Ken Thompson' paper about UTF-8? They
> decided to not take into account the way the characters may be
> rendered or collating sequences and so on.
That's fine for them, but that's not the position this thread started
out being about and it's not the position I've been arguing against.
The position I've been arguing against is the one that wants UTF-8
awareness, normalization in particular, in the kernel, or at least on
the called, not caller, side of open(2) and related calls.
If Pike and Thompson think the syscall interface should be opaque octet
strings, with UTF-8 awareness limited to userland, I agree with them.
> What I don't understand is [...] a proposal to put encodings
> considerations in the filesystems or in the file handling system
I think I understand it. I just think it's a wrong choice - or, since
none of the people involved in the discussion are stupid, it might be
more useful to say that I disagree with the relevant people (mostly
jkl, I think) about the relative values of the various things being
> [...]. And I hate case insensitive filesystems, so I don't want an
> enforcement of some cryptic policy deciding that two distinct strings
> passed are, indeed, the same thing: this is a user decision, not a
> system one.
You and I are in furious agreement here. But people - as I said,
mostly jkl I think - have been arguing that this is indeed something
that belongs in the kernel.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Main Index |
Thread Index |