current-users: Re: sort manpage is obscure on this point

Subject: Re: sort manpage is obscure on this point
To: None <vax@linkdead.paranoia.com>
From: Jim Meyering <meyering@asic.sc.ti.com>
List: current-users
Date: 06/25/1996 21:40:48

From: VaX#n8 <vax@linkdead.paranoia.com>
Date: Tue, 25 Jun 1996 00:31:24 -0500

| NetBSD uses the GNU sort from the textutils.

In textutils-1.18 (the most recent release), sort -g does what you want,
in that it converts strings to doubles using strtod.

This option is mentioned in the output from `sort --help':

  % sort --h |grep -e -g
    -g               compare according to general numerical value, imply -b

| What the sort manpage does not say is that it's numeric comparison does
| not use atoi() or any of the normal string-to-machine conversions but
| rather does it's own (admittedly fast) algorithm.
|
| However, it does not handle leading "+"s very well, where the other ones do.

POSIX says that sort -n should handle an optional leading `-',
but doesn't mention `+'.  I just tried /bin/sort on SunOS4.1.3,
HPUX-9.0.5, and HPUX-10.10.  None of them recognize a leading `+'
as part of a numeric field.  All did this:

  % printf '+2\n1\n' |/bin/sort -n
  +2
  1

On what system did the vendor-supplied sort -n handle a leading `+'
differently?

| It also does not do scientific notation.
| In any case, the man page otta say this.

Regarding the man page, I am no longer keeping it up to date with
the code.  The authoritative documentation is in texinfo;
in the distribution it's the file doc/textutils.texi, once you've
run `make install' it's also in $(prefix)/info/textutils.info*
(where prefix is often /usr/local).

If someone is willing to keep the man pages in sync with the `real'
documentation, *and* they are prepared to deal with paperwork disclaiming
their changes, I'll gladly accept patches.

Here's an excerpt from what I get when I type `info sort'
(the GNU info program is part of the texinfo distribution):

`-g'
     Sort numerically, but use strtod(3) to arrive at the numeric
     values.  This allows floating point numbers to be specified in
     scientific notation, like `1.0e-34' and `10e100'.  Use this option
     only if there is no alternative;  it is much slower than `-n' and
     numbers with too many significant digits will be compared as if
     they had been truncated.  In addition, numbers outside the range
     of representable double precision floating point numbers are
     treated as if they were zeroes; overflow and underflow are not
     reported.