Subject: Re: isprint() and isblank()
To: Noriyuki Soda <soda@sra.co.jp>
From: None <itojun@iijlab.net>
List: tech-userlevel
Date: 01/21/2001 11:33:29
>So, the "_B" bit in _ctype_[] is only used for isprint() test,
>and never used for isblank() test.
>As itojun-san pointed out, we cannot set _B bit for '\t' character,
>because it breaks isprint() implementation.
>But it is not problem, because we do not actually use the _B bit,
>and don't have to use it in future. We have to use for another
>mechanism for isblank() implementation for loadable LC_CTYPE,
>though.

	let me re-cap.

	- isblank() is hardcoded for C locale definition (' ' or '\t')
	  isblank() is a function.
	- _B is used for isprint().  is _B is true, isprint() becomes true.
	  isprint() is a macro.

	so, with the current macro/libc function, _B means "isprint() goes
	true even if it is not isgraph()".  _B does not mean "isblank() goes
	true" (it is what _B should be - chrtbl(8) and many comments says
	so).
	the real problem is in isprint(), not isblank() if we keep the
	current localetable.  and isprint() is not replaceable for compiled
	binary.

	now, locale files:

	- if we load INCORRECT locale table (which sets _B for ' ' only)
	  isprint() will behave correctly (true for ' ', false for '\t')
	- if we load CORRECT locale table (which sets _B for ' ' and '\t')
	  isprint() will behave incorrectly (true for both ' ' and '\t')

	- the problem has been hidden since C locale _ctype_ table, and
	  old locale table files, are all INCORRECT.
	- we now ship with correct locale table, and lib/libc/locale/runeglue.c
	  converts it into _ctype_.  now _ctype_ can have correct locale bit
	  declarations and isprint() can behave strange.

	i can think of couple of workarounds.  goal is to not break compiled
	binaries.  now my question is, which looks best?  for me (1a) is the
	best but a little bit slower than current code (since we avoid
	macro).

	1. keep _ctype_ broken.
	1a. change ctype.h and/or lib/libc/gen/isctype.c.  basically, do:
		#define isprint(x)	iswprint(x)
	    this is not a problem since (1) isprint() is declared only for 0x00
	    to 0xff and -1 (2) wint_t is really a int.  new binaries will
	    always refer correct multibyte locale table.
	    when we load locale declaration file, we make some trick about _B.
	    PROS: no macro, then we no longer need to worry about compiled
		macro issues in the future.
	    CONS: slower.
	1b. same as (1a), but do a macro expansion of
	    lib/libc/locale/iswctype.c.
	    PROS: comparable to the current performance (assuming _CACHED_RUNES
		is 255)
	    CONS: macro issue remains.
	1c. don't change ctype.h declarations.
	    when we load locale declaration file, we make some trick about _B.
	    PROS: smallest amont of changes.
	    CONS: new binaries will have incorrect isprint() and isblank(),
		forever.

	2. fix _ctype_.
	2a. version it into two.  when we load locale declaration file,
	    we make some trick about _B on old _ctype_ if the locale
	    declaration file is correct.  we make some trick about _B on new
	    _ctype_ if the locale decralataion file is incorrect.
	    discourage people from using old locale declaration files.
	    CONS: why do we have to maintain two ctype tables when we change
		the code?  (1a) or (1b) looks much better.

itojun