Subject: Restart: better XI18N future API
To: None <i18n@XFree86.Org>
From: None <hiura@unicode.org>
List: tech-x11
Date: 12/21/2000 16:01:55
It was unfortunate that we had the discussion much earlier, but
it is not too late to design and implement the better solution.
I'd like to thrust the gear to the forward to facilitate the XI18N 
future API design discussion by once resetting the Xutf8* discussion, 
but taking the useful information from our discussions from both of
the pro-Xutf8* and the anti-Xutf8* side.

First of all, there are a couple of limitations we are aware of which
are missing from current XI18N API and we'd like to enhance in some
form. One of the approaches presented for evaluation is Xutf8*
functions. 

Here are the what I consolidated as important requirement from our
previous discussion: 

#1. Thread safe multi-locale equivalent capability for X for the
    applications willing to do the extra-things for I18N without 
    changing systems' global locale. 
#2. Processing the text in the different encodings from the encodings
    set by the system's global locale, particularly in UTF-8.
#3. Upward compatibility with the existing API, avoid unnecessary
    duplication for the short circuiting.
#4. Support multiple encodings including Unicode and legacy, provide
    options for both users and *programmers* to chose which encoding(s)
    to deal with, but no hardwiring to particular encoding.

Both of the two approaches on the discussion, 1. UTF-8 hardwired API,
2. Switching the global locale on *.UTF-8, are not quite satisfactory
to achieve all of the requirements above, therefore, the two parties 
stood still on the point with major disagreement.

It is not optimal to create multiple incompatible API sets to
enable those requirements.

Once we re-evaluate those requirements in open mind, I believe
we can design the better solution. 

As first step, let's summarize what Xutf8* functions really are.
They are just a set of Xmb* functions hardwired to UTF-8 encoding.

Here is the consideration.
If XmbDrawString takes "encoding" parameter, such as
XmbDrawString(....., encoding, string, ..), it perfoms
the same as
   Xutf8DrawString(....., string, ..)
when the encoding parameter is set to UTF-8. 

Let's call XmbDrawString(....., encoding, string, ..) as
XembDrawString, stands for X encoded multi-byte DrawString for
now for the sake of discussion.

The XembDrawString approach provide much more flexible option
while achiveing everything Xutf8* do. We XI18N designers do not have
to force Unicode to the programers who want to achive the 
requirement #2 but Unicode is not sufficient for their needs.
If a programmer wants to use the TRON code, mule code, ISO-2022,
etc, as its encoding independent from system's global locale,
they can specify those as its encoding.
We can implement the encoding handling module dynamically
pluggable. As far as we prorovide UTF-8 modules, we don't
lose any important features from what Xutf8* approach can provide.

Another possible consideration we can make here is that both the
XembDrawString approach and Xutf8DrawString approach require a full
copy of Xmb* functions as function entry point.
It would be nice if we can avoid it.

As the Xutf8* proposal demonstrated, the encoding option may be
global in one application, so we may not need to specify every
single time as a parameter if we put such information as a part
of context.

Since the XmbDrawString(...., OC, ...., string, ..) takes OC,
another possibility is to add one attribute for OC to store
the encoding information independent from global-locale.

Suppose we have XNEncoding for XCreateOC/XSetOCValues/XGetOCValues
to specify the encoding which is different from the encoding set by the
system's global locale for all of the subsequent operations on
the OC. 
When you specify UTF-8 to an OC, the XmbDrawString with such OC
works identical with Xutf8DrawString.

To make it more convenient for the programmers who creates
multiple output contexts, it would be better to allow the
same operation on XSetOMValues/XGetOMValues, so that 
the default encoding of OC would inherit the encoding of OM
instead of taking it from the encoding set by the system's global
locale. 

Since this approach bind the encoding context to OM, OC, which is
not a global state, it achieves the part of requirement #1, the thread
safeness, as Xutf8* does.

To be continued....

What do you think?

--
hiura@{sun.com,li18nux.org,kondara.org,unicode.org} http://www.li18nux.org
Chair, Li18nux/Linux Internationalization Initiative, Free Standards Group
Architect/Sr. Staff Engineer, Sun Microsystems, Inc,      FAX 650-786-9553