tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



James K. Lowden wrote:
>> [...] it allows to avoid an entire class of confusions which
>> happen when glob() performs as the specifications say but against the
>> user's intent.
> 
> I suspect that "entire class of confusions" is the empty set.

A well-known one was the choice by RedHat (IIRC) to enable the "user"
locale rather than "C"/"POSIX" in /etc/profile: it suddenly broke a
large number of system scripts which were silently assuming things like
[a-b] to not include neither A nor B.


> What "intent" could he have regarding their encoding?  

While I agree he should not, fact is he does. An example could be the
encoding of the month within a file name. Notwithstanding the use of
standards like ISO-8166 and here 04 (two-digit form), practice shows
creativity is this field; people often realize quickly that APR is not
what they want because of the collation disorder which quickly results,
so they start to use numbers; a common "solution" is to use 4; which of
course become interesting in October; again, the use of two-digits 10 is
not perfect; which might lead to use letter A to mean October, B for
November and C for December.
And that user has put some intent over the encoding.


Antoine


Home | Main Index | Thread Index | Old Index