Subject: Re: binary/character floating point conversion
To: None <tech-userlevel@netbsd.org>
From: Christos Zoulas <christos@tac.gw.com>
List: tech-userlevel
Date: 03/03/2005 00:13:26
In article <87zmxle6er.fsf@orac.acorntoolworks.com>,
J.T. Conklin <jtc@acorntoolworks.com> wrote:
>One of my co-workers plans to use the C++ log4cxx logging library,
>which requires wide character strings (std::wstring).  I have compared
>the relative complexity of adding wide strings to NetBSD with the work
>necessary to modify log4cxx to only use narrow strings; adding support
>to NetBSD makes the most long term sense.
>
>I've been working on the *wprintf functions (adapted from the narrow
>character implementations in libc), since they are prerequisites for
>libstdc++'s wide string implementation.
>
>I'm about done except for floating point support, which I deferred
>because *printf uses the buffer (of narrow characters) returned by
>__dtoa() directly.  Now that I'm here, it's probably a good idea to
>look at binary/character conversion more generally.
>
>As mentioned in the PRs lib/14168 and lib/18803, our *printf does not
>support long double.  It also appears to have a problem with thread-
>safety (__dtoa() returns a Bigint / char* that is freed the next time
>it is called, if one thread is time-sliced out before it is done with
>the buffer and another calls *printf with a floating point argument,
>it looks like bad things may happen).  David M. Gay's gdtoa binary/
>character library could be used to fix both problems.

Sounds good to me. I don't know how hard it would be to add long double
support in our dtoa() through, and changing our dtoa() api to be thread
safe does not seem that difficult. Our compiler long double support is
a bit in flux too, so I am not sure if that is going to be a short term
win. It looks like this library is supported, and it would be a good
thing for the long term. Someone with floating point clue should give
an opinion here.

>I think the first step would be to integrate the new library and make
>the slight modifications to *printf() to use it (adding an explicit
>freegdtoa() for thread-safety), and adding long double and *wprintf
>support once that's solid.  I'll make a patch to do that first, but
>have some questions.
>
>* Where the dtoa implementation was combined with strtod.c in a single
>  file, the full gdtoa library is ~50 files.  It probably makes sense
>  to import it into dist/gdtoa.  But in the past (long past), it was
>  a requirement for libc and the kernel subtrees to be self contained
>  without vpaths out.  What are the current guidelines?

When I replaced the bind code in libc, I copied the files in the
libc directories, instead of making reachout changes to the Makefiles.
My incentive was:
	- to be able to upgrade bind independently from the libc
	  resolver.
	- to keep all the libc code in one place
	- to clearly separate out parts of the bind code that were
	  not used in libc and would only contribute to bloat.
I don't know what is best in this case, but I think that following
what FreeBSD did [s/contrib/dist] and providing the whole package
is reasonable. On the other hand there are only 16 files used from
the gdtoa package... I am a bit torn on that. If was really pressed
to vote, I'd vote for splitting it like FreeBSD did. But don't take
my word on that.

>* The gdtoa library contains a lot of non-standard strto* functions 
>  with options for specifying rounding direction, computing intervals, 
>  etc.  FreeBSD's integration of gdtoa does not include these, I don't
>  think we want to include them in libc either.  Comments?

I agree.

>* The current __dtoa function is treated as an internal implementation
>  detail and is not exported.  I assume that we want to do the same
>  for a gdtoa based implementation, and the gdtoa.h header would not
>  be installed?

Yes, I don't think that exposing the internal implementation details
is beneficial in this case.

>* dtoa's strtod.c was modified with platform specific #define's that
>  describe the floating point type, etc.  The gdtoa library includes
>  the "arithchk" program which figures it out and generates "arith.h";
>  Similarly, it includes the "qnan" program which figures out the bit
>  patterns for quiet NANs.  Like gdtoa.h, I think these headers would 
>  be private to libc.
>
>  Should we have a single "arith.h", or use arithchk to generate one
>  header per architecture and check it in libc/arch/<cpu>/arith.h or 
>  libc/arch/<cpu>/gdtoa/arith.h?  Likewise for gd_qnan.h?

It depends on how different they are. Is every platform different, or
they fall into categories? How ugly would the ifdefs be putting them
into a single file? I can't answer that without looking into this in
detail.

Thanks for doing all that! 

christos

PS: Any updates on your string regression tests? I'd really like to add
them to src/regress when they are ready!