tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



On Tue, Apr 02, 2013 at 08:01:42PM -0400, James K. Lowden wrote:
> 
> You can't fob it off to userspace.  At least I don't think so.  
> 

The best revue of this is the Rob Pike and Ken Thompson paper. If Plan9
has solved to a large extent the problem, it is because they drew the
line somewhere. It is in the paper.

As far as the kernel and the driver are concerned, the UTF-8 string
identifies without typographics or semantics.

If a user specifies a name with an interface that uses special glyphes
and so on, the interface "knows" what are these conventions. It is its
responsability to convert this representation to what it is supposed to
"mean".

I know that nowadays, a browser is universal. But I don't want to have
to use a typesetting system to create files, depending on the fonts, the
kerning, the pretty printing, the ligatures, the size of spaces, the
unbreakable spaces and so on.

The OS should allow every policy. This means: I take the UTF-8 string as
is and I leave it alone. Feed me with what you feel is sensible, and
don't expect me to guess.

Again: all the discussion is already in the "Hello World, or..." paper.
And if these guys having defined this solution have defined _this_
solution, my guts feeling is that there are sensible and engineering
reasons...

-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index