Subject: Re: [Summer of Code]Wide Character Support in curses
To: Julian Coleman <jdc@coris.org.uk>
From: Ruibiao Qiu <ruibiao@arl.wustl.edu>
List: tech-userlevel
Date: 06/12/2005 16:23:42
On Tue, 7 Jun 2005, Julian Coleman wrote:

> In order to support these functions, the curses internal storage of
> characters and attributes needs to be modified.  For example, each
> character position might be described by a structure containg:
>
> 	character value (32 bits)
> 	character attributes (32 bits)
> 	character width
> 	non-spacing character list/pointer

As Thor pointed out, this layout could quadruple the memory footprint.  It can 
be argued that wide characters normally are also multi-column characters, 
therefore there are less characters in a line, and thus the memory footprint 
may not necessarily quadruple.  However, it still could more than double the 
memory for certain character sets, e.g. 2-character sets like simplified and 
traditional Chinese and possibly Japanese Kanji.

To improve the memory usage, I propose a different structure than my original 
structure. Essentially, it is about the same as the existing storage 
structure.  That is, the character value is still an 8-bit character.  In this 
case, we don't need a width field, and a m-column wide character uses m 
storage structures.  To make it represent the correct meaning of the wide 
characters, we need to add an attribute of alignment or position-in-word to 
indicate the start of a wide character.  The value of a wide character can be 
recovered with fast bit operations from all characters with the correct 
alignment and order.  Similarly, when inserting a wide character, a bit 
operation can put the character values in the m structure, and set up the 
alignment attribute right.

This is just some initial ideas of mine, and I may overlook some points.  Any 
suggestion and feedback is highly appreciated.  Please cc your reply to me, as 
I do not currently subscribe to the tech-userlevel mailing list.  Thanks.

Ruibiao