tech-userlevel: Re: utf-8 and userland

Subject: Re: utf-8 and userland
To: None <tech-userlevel@NetBSD.org>
From: James K. Lowden <jklowden@schemamania.org>
List: tech-userlevel
Date: 03/12/2004 20:02:45

On Sat, 13 Mar 2004, Noriyuki Soda <soda@sra.co.jp> wrote:
> >>>>> On Fri, 12 Mar 2004 13:23:09 -0800, "Wolfgang S. Rupprecht"
> 	<wolfgang+gnus20040312T095618@dailyplanet.dontspam.wsrcc.com>
> 	said:
> 
> > I wonder if there is already a table of tables listing the chars that
> > can safely be output for each codeset.
> 
> Yes, you can use iswprintf(3) by converting the multibyte characters
> to wide characters.

I don't see how that can be right.  iswprint(3) takes a wint_t argument;
the UTF-8 character will be a sequence of 1-4 bytes.  Even if you redefine
the argument, how is ls(1) supposed to know where the character boundaries
are?  

It's my understanding that "wide characters" refer to a class of encodings
that predate Unicode and UTF-8.  New times, new features....

--jkl