Subject: Re: [Summer of Code]Wide Character Support in curses
To: None <tech-userlevel@netbsd.org>
From: James K. Lowden <jklowden@schemamania.org>
List: tech-userlevel
Date: 06/12/2005 20:44:45
Ruibiao Qiu wrote:
> On Tue, 7 Jun 2005, Julian Coleman wrote:
> 
> > In order to support these functions, the curses internal storage of
> > characters and attributes needs to be modified.  For example, each
> > character position might be described by a structure containg:
> >
> > 	character value (32 bits)
> > 	character attributes (32 bits)
> > 	character width
> > 	non-spacing character list/pointer

One way to address Thor's concern would simply be to make the character
value size a compile-time constant.  Effectively it's 1 today; using 2
bytes would meet the vast majority of needs.  It's hard to imagine UTF-32
*curses* applications. 

ISTM that wide characters have the same attributes as "narrow" ones, so
that storage requirement doesn't change.  Looking at
/usr/include/curses.h, attributes in __LDATA are 18 bits (and data 8).  A
little preprocessor magic should let you dereference __LDATA differently,
depending on whether you're using 1-byte widths (data in bits 0-7 of
__LDATA) or more (data in next word).  

(I'm not sure this is worth the effort and complexity, really.  I wonder
what platform/application Thor has in mind that would be materially
affected by even a 4x increase in curses memory?  How much memory are we
talking about?  The smaller the device, the more languages it needs to
support....)  

The character width is fixed; it needn't be stored per-character.  E.g.,
'aaaaaa' has only one character width, that of 'a'.  And what's the domain
of character widths?  0-4 cells, no?  Can any character be wider than
that?  That needs only 2 bits/character, or 16K for the whole of UCS-2. 
You can use the character value to index into the width map.  Actually,
there *is* room in __LDATA for 2 bits of width data, but I'm from the
"never copy, always reference" school of data management, so I would
always refer to the map.  

Nonspacing characters have a width of zero.  I don't see any advantage to
maintaining a separate list of them.  If you do, though, the list will be
short; there aren't many.  

Sounds like an interesting project.  

--jkl