Subject: Re: sort(1) behavior?
To: NetBSD-current Discussion List <current-users@NetBSD.org>
From: Greg A. Woods <email@example.com>
Date: 04/11/2004 15:59:41
[ On Saturday, April 10, 2004 at 10:21:14 (-0400), Steven M. Bellovin wrote: ]
> Subject: Re: sort(1) behavior?
> In message <m1BC8y6-0002VGC@proven.weird.com>, "Greg A. Woods" writes:
> > [ On Thursday, April 8, 2004 at 16:54:43 (-0500), MLH wrote: ]
> > > Subject: sort(1) behavior?
> > >
> > > There is seen a high degree of variability in results with sort(1)
> > > on various systems, particularly when using the -k field specifications.
> > I'll note first off that you're using '-n' and you've got what appear to
> > be decimal fractions in your test data. I haven't looked into all the
> > issues surrounding the interpretation of decimal numbers with decimal
> > fractions by sort, but I have found that it doesn't always do what one
> > might expect despite claims in the NetBSD manual page.
> > Note also that it appears as if POSIX doesn't require support of decimal
> > numbers (quoted from SuSv3, aka IEEE Std 1003.1-2001):
(I meant of course to say "decimal fractions", not "decimal numbers")
> > -n
> > Restrict the sort key to an initial numeric string, consisting
> > of optional <blank>s, optional minus sign, and zero or more
> > digits with an optional radix character and thousands separators
> > (as defined in the current locale), which shall be sorted by
> > arithmetic value. An empty digit string shall be treated as
> > zero. Leading zeros and signs on zeros shall not affect
> > ordering.
> I don't read it that way -- I read that as saying that the radix point
> need not appear in the actual number being sorted, not that the sort
> command can ignore it if it occurs.
Hmm... It does seem that I was confused over the use of "radix
character" in the above quote from the standard. I've since found the
POSIX definition of this phrase:
The character that separates the integer part of a number from the
I was confusing this strange POSIX definition of "radix character" with
something more in line with the traditional meaning of "radix" combined
with the word "character" in this way, i.e. "an identifier specifying
the base of the numbering system", e.g. "D" for "decimal", "H" for
hexidecimal, "O" for octal, "B" for binary, etc. I should know by now
never to assume anything about what a standards document means when it
uses what I would consider a common phrase. :-)
(I'm guessing the POSIX folks decided they had to choose some other
phrase than "decimal point" since that one might be too confusing for
those locales which don't use a "period or full stop" as the character
indicating the "decimal point" even though the phrase "decimal point"
should be totally unambiguous regardless of what character any given
locale might use to indicate the decimal point. Even Webster's 1913
edition clearly defined the phrase "decimal point", as do all of my more
modern dictionaries, though of course since they're all English language
dictionaries they describe this separator character only as "a dot or
full stop", but I'm sure any more international dictionary would include
other characters used by other locales.)
This is starting to split hairs since the example originally posted
didn't use any non-zero fractional parts, but I still don't see anything
there which mentions any significance for the decimal fractions, but
that might just be my bias for integers showing through. :-)
Greg A. Woods
+1 416 218-0098 VE3TCP RoboHack <firstname.lastname@example.org>
Planix, Inc. <email@example.com> Secrets of the Weird <firstname.lastname@example.org>