tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



On Sun, 14 Apr 2013 11:17:16 +0200
tlaronde%polynum.com@localhost wrote:

> On Sat, Apr 13, 2013 at 07:58:19PM -0400, James K. Lowden wrote:
> > The user should not be forced to use a wildcard specification to 
> > match semantically equivalent strings.
> 
> No, the policy shall be that the filenames are encoded to be strings
> of language graphics real atoms ("letters" or ideograms or whatever)
> without any _rendering_ effect. "oe" is not a letter, but typographic
> sugar adding etymology. The same goes for "ffi" "ffl" that are
> typographic sugar (purely visual ones) and, furthermore, that do
> not exist in every font!  There is no ligature in the Computer
> Modern fixed size fonts, so entering the codepoint for the ligature
> "ffi" will only do for one font, and not for another; while keeping
> 'f', 'f', 'i' in the source allows a different rendering depending
> on the font without loosing information (because the font is organized
> by its designer to give the best visual result; the user shall not
> enforce something that the font designer did not want).

Hmm.  I appreciate that you've dealt with this, and after all it's
your language, not mine.  But I don't think we can insist that "oe" is
mere typography.  

French schoolchildren, as you well know, are taught to overlap the "oe"
in coeur when they write it.  Granted the grapheme collates as two
letters.  But I tried a little test:

$ echo 'c\(oeur \(finance' | groff -Tps > oe.ps
$ pstopdf -o oe.pdf oe.ps

If I open the pdf, copy the text, and paste it here, I get "c?ur
finance".  That's consistent with my experience: I can't think of one
time when ligatures in English text weren't recognized as separate
letters when the "the formatting was removed" (however that
might have been done).  But ISTM that "oe" is more than just a
ligature, both because a separated "oe" in "coeur" is *wrong*
linguistically, and because as a technical matter the grapheme is (at
least sometimes)  treated as a unit, not as mere prettification of two
letters.  

--jkl


I get "c?ur finance"


Home | Main Index | Thread Index | Old Index