tech-userlevel: Re: [Summer of Code]Wide Character Support in curses

Subject: Re: [Summer of Code]Wide Character Support in curses
To: Ruibiao Qiu <ruibiao@arl.wustl.edu>
From: Brett Lymn <blymn@baesystems.com.au>
List: tech-userlevel
Date: 06/13/2005 17:58:22
On Sun, Jun 12, 2005 at 04:23:42PM -0500, Ruibiao Qiu wrote:
> 
> As Thor pointed out, this layout could quadruple the memory
> footprint. 
>

Yes, it would be more than quadruple but, no matter.  There is no real
avoiding this if wide characters are going to be supported.  Even just
using the wide character type quadruples memory utilisation.  This is
why Julian was suggesting a method of tailoring out the wide character
support, so low memory model machines would still function albeit
without wide character support.

> It 
> can be argued that wide characters normally are also multi-column 
> characters, therefore there are less characters in a line, and thus the 
> memory footprint may not necessarily quadruple.  However, it still could 
> more than double the memory for certain character sets, e.g. 2-character 
> sets like simplified and traditional Chinese and possibly Japanese Kanji.
> 

One thing that you need to be mindful of is that you don't create
yourself a cursor positioning hell.  If you don't have character cells
that are all the same size (or can be handled consistently) then
positioning the cursor on the screen will turn into a nightmare.  This
is why Julian and I came up with using a structure to hold the
character, the attributes and the associated non-spacing characters -
it makes addressing the character array consistent.  It was the best
we could think of... there may be a better way but we couldn't see it.

> To improve the memory usage, I propose a different structure than my 
> original structure. Essentially, it is about the same as the existing 
> storage structure.  That is, the character value is still an 8-bit 
> character.

No. A wide character is 32bits.

> In this case, we don't need a width field, and a m-column wide 
> character uses m storage structures.  To make it represent the correct 
> meaning of the wide characters, we need to add an attribute of alignment or 
> position-in-word to indicate the start of a wide character.  The value of a 
> wide character can be recovered with fast bit operations from all 
> characters with the correct alignment and order.  Similarly, when inserting 
> a wide character, a bit operation can put the character values in the m 
> structure, and set up the alignment attribute right.
> 

Inserting characters is not the hard bit - probably the worst bit is
working out what part of the character array you need to address when
you move the cursor down one row (for example) any scheme you come up
with must be able to simply and quickly determine what row/column maps
to what bit of memory in the curses in memory screen representation.

-- 
Brett Lymn