Subject: Re: sort(1) behavior?
To: Steven M. Bellovin <smb@research.att.com>
From: MLH <mlh@goathill.org>
List: current-users
Date: 04/11/2004 17:26:45
> 
> In message <m1BC8y6-0002VGC@proven.weird.com>, "Greg A. Woods" writes:
> >[ On Thursday, April 8, 2004 at 16:54:43 (-0500), MLH wrote: ]
> >> Subject: sort(1) behavior?
> >>
> >> There is seen a high degree of variability in results with sort(1)
> >> on various systems, particularly when using the -k field specifications.
> >
> >I'll note first off that you're using '-n' and you've got what appear to
> >be decimal fractions in your test data.  I haven't looked into all the
> >issues surrounding the interpretation of decimal numbers with decimal
> >fractions by sort, but I have found that it doesn't always do what one
> >might expect despite claims in the NetBSD manual page.
> >
> >Note also that it appears as if POSIX doesn't require support of decimal
> >numbers (quoted from SuSv3, aka IEEE Std 1003.1-2001):
> >
> >     -n                                                                       
> >  
> >             Restrict the sort key to an initial numeric string, consisting   
> >  
> >             of optional <blank>s, optional minus sign, and zero or more      
> >  
> >             digits with an optional radix character and thousands separators 
> >  
> >             (as defined in the current locale), which shall be sorted by     
> >  
> >             arithmetic value. An empty digit string shall be treated as      
> >  
> >             zero. Leading zeros and signs on zeros shall not affect          
> >  
> >             ordering.                                                        
> 
> I don't read it that way -- I read that as saying that the radix point 
> need not appear in the actual number being sorted, not that the sort 
> command can ignore it if it occurs.  (OTOH, in the past I've noticed 
> significant non-portability of sort command option strings between 
> different operating systems.)

So you can understand my concern over precisely what sort is supposed
to be doing, much less having it (or any sort I've tried) do that
correctly.

The way I read all of this is that when using -n (numeric), -b
shouldn't matter, or that the behavior should be identical.  One
problem with having empty digit strings being treated the same as
zero is in the case (data that I included) of a completely empty
numeric string. If an empty field is evaluated as anything other
than infinitely less than zero, problems abound one way. If zero,
they abound the other.

I mainly wanted to try to determine :

1) what the behavior is supposed to be, especially with the -n(b)
behavior - how to handle leading spaces and empty fields with
numeric fields

2) try to help get NetBSD's sort(1) to abide by that behavior
specification

Then I can figure out what to do with my program to get the results
I need.

Is there a process we can start to try to accomplish these two
goals?

Thanks for your input