Subject: Re: [Summer of Code]Wide Character Support in curses
To: Brett Lymn <blymn@baesystems.com.au>
From: Ruibiao Qiu <ruibiao@arl.wustl.edu>
List: tech-userlevel
Date: 06/13/2005 10:15:55
On Mon, 13 Jun 2005, Brett Lymn wrote:
> On Sun, Jun 12, 2005 at 04:23:42PM -0500, Ruibiao Qiu wrote:
>> To improve the memory usage, I propose a different structure than my
>> original structure. Essentially, it is about the same as the existing
>> storage structure.  That is, the character value is still an 8-bit
>> character.
>
> No. A wide character is 32bits.

Brett,

Thanks for your feedback.

I guess I did not make it quite clear there.  Sorry about the confusion.  What 
I really meant is to keep a storage structure for each display position on a 
screen.  The 32-bit value of a wide character is split into the value fields 
of the four character display cells, using bit mask and shift operations. 
Similarly, a 16-bit wide character is split into two cells.

For example, a wide character of width 2 with a value of 0x9ABC would occupy 
two storage cells, one with a character value of 0x9A and the other with 
0xBC.  In addition, the attribute of the first half-character indicates that 
it is the beginning of a wide character, and the second be the end.

IMHO, this storage structure does not make cursor positioning more difficult.
Because there will be characters of different widths on a screen (see the 
reasons below), moving up or down the cursor simply make it go to the display 
cell directly using the current position without summing up all the width in 
all the proceeding cells of the current and next lines.  It may fall into the 
middle of a wide character, but with the help of the alignment field, the 
start of the wide character can be easily located, if necessary.

Besides, there were suggestions that all wide characters in the same screen 
have the same width from the discussion.  From a wide character application 
users' perspective, I think that is too restricted.  A user may want to mix 
single-width character and wide characters in the same screen because it saves 
screen space and looks nicer.  For example, phone numbers and an English 
address are normally displayed in single-character formats.  There is 
a single/wide character switch function in all Chinese input method module 
just for this purpose.  So, I think the storage structure should have a way to 
indicate the width of the character, although it does not necessarily need a 
byte.

Anyway, it is always good to have more people discuss the proposed solutions. 
I really appreciate it.  I plan to implement with the several viable 
alternative storage cell structures discussed here, and compare their memory 
usage and performance to find out the good solution.

Please keep sending your comments and feedback.  Thanks.

Ruibiao