tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



>If Pike and Thompson think the syscall interface should be opaque octet
>strings, with UTF-8 awareness limited to userland, I agree with them.

In the "Hello World" paper they technically did not address the system
call interface.  From that paper:

    Little change was required: null-terminated UTF strings are
    equivalent to null-terminated ASCII strings for most purposes of the
    operating system.

Note that they said ASCII, not "opaque octet strings".  Also:

    There are a couple of aspects of the Unicode Standard we have not
    faced. One is the issue of right-to-left text such as Hebrew or
    Arabic. Since that is an issue of display, not representation, we
    believe we can defer that problem for the moment without affecting
    our ability to solve it later. Another issue is diacriticals and
    ‘combining characters’, which cause overstriking of multiple
    Unicode characters. Although necessary for some scripts, such as
    Thai, Arabic, and Hebrew, such characters confuse the issues for
    Latin languages because they generate multiple representations
    for accented characters. ISO 10646 describes three levels
    of implementation; in Plan 9 we decided not to address the
    issue. Again, this can be labeled as a display issue and its
    finer points are still being debated, so we felt comfortable
    deferring. Mañana.

So they knew it was an issue, and decided to not deal with it.

--Ken



Home | Main Index | Thread Index | Old Index