tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Minor updates to sort ?



Living on the edge and top posting aggresively, these changes would be good
to have.

Thanks,
Alistair

On Saturday, May 28, 2016, Robert Elz <kre%munnari.oz.au@localhost> wrote:

> Inspired by Paul Goyette's question (on netbsd-users) I took a look at
> sort,
> and I'd like to commit the following updates if no-one objects.
>
> The only changes that should affect anything are the addition of the posix
> C option, which is identical to c, but doesn't write messages to stderr
> if the input file is not sorted, and fixing bugs in the processing of -R
> such that if -R is used (and without setting it to \n - the processing of
> which is also fixed in case it is set that wat using -R 10) \n does not
> become a field separator regardless of what might be set later with -t (if
> -t preceded -R it would have worked correctly, but not the other way.)
>
> Aside from that the changes are more or less cosmetic - they enforce using
> only one of -c -C and -m (which make no sense used together), make the
> usage() reflect reality (including formatting it to stop assuming it
> is outputting to an 80 column display..., and reflect the man page changes
> mentioned next), and fix a minor bug in a comment, removed the unused 'x'
> option (what was that?) from SORT_OPTS (no effect, generates usage() either
> way) and sorted the option processing (R comes before S...)
>
> In the man page, -C is documented, the synopsis is split to show the
> (only one file allowed) different usage for -C or -c, and perhaps most
> importantly, the names "field1" and "filed2" are changed to "kstart" and
> "kend" to make it (a little more) clear that the -k argument does not
> specify
> or use a field as such, but designates the start and end of the sort keys
> (with the designators using fields as an addressing object - which is all
> fields are used for in sort, unlike awk, cut, etc.) and -R is fully
> documented.
>
> There are no changes (at all) to anything actually related to sorting...
>
> Anyone object to these changes?   (patch appended)
>
> kre
>
> Index: msort.c
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/msort.c,v
> retrieving revision 1.30
> diff -u -r1.30 msort.c
> --- msort.c     5 Feb 2010 21:58:42 -0000       1.30
> +++ msort.c     29 May 2016 05:00:34 -0000
> @@ -365,7 +365,7 @@
>   * check order on one file
>   */
>  void
> -order(struct filelist *filelist, struct field *ftbl)
> +order(struct filelist *filelist, struct field *ftbl, int quiet)
>  {
>         get_func_t get = SINGL_FLD ? makeline : makekey;
>         RECHEADER *crec, *prec, *trec;
> @@ -387,10 +387,14 @@
>                 exit(0);
>         while (get(fp, crec, crec_end, ftbl) == 0) {
>                 if (0 < (c = cmp(prec, crec))) {
> +                       if (quiet)
> +                               exit(1);
>                         crec->data[crec->length-1] = 0;
>                         errx(1, "found disorder: %s",
> crec->data+crec->offset);
>                 }
>                 if (UNIQUE && !c) {
> +                       if (quiet)
> +                               exit(1);
>                         crec->data[crec->length-1] = 0;
>                         errx(1, "found non-uniqueness: %s",
>                             crec->data+crec->offset);
> Index: sort.1
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/sort.1,v
> retrieving revision 1.34
> diff -u -r1.34 sort.1
> --- sort.1      29 May 2013 15:00:35 -0000      1.34
> +++ sort.1      29 May 2016 05:00:34 -0000
> @@ -67,16 +67,26 @@
>  .Nd sort or merge text files
>  .Sh SYNOPSIS
>  .Nm
> -.Op Fl bcdfHilmnrSsu
> +.Op Fl bdfHilmnrSsu
>  .Oo
>  .Fl k
> -.Ar field1 Ns Op Li \&, Ns Ar field2
> +.Ar kstart Ns Op Li \&, Ns Ar kend
>  .Oc
>  .Op Fl o Ar output
>  .Op Fl R Ar char
>  .Op Fl T Ar dir
>  .Op Fl t Ar char
>  .Op Ar
> +.Nm
> +.Fl c|C
> +.Op Fl bdfilnru
> +.Oo
> +.Fl k
> +.Ar kstart Ns Op Li \&, Ns Ar kend
> +.Op Fl t Ar char
> +.Oc
> +.Op Fl R Ar char
> +.Op Ar file
>  .Sh DESCRIPTION
>  The
>  .Nm
> @@ -101,6 +111,10 @@
>  produces no output.
>  See also
>  .Fl u .
> +.It Fl C
> +Identical to
> +.Fl c
> +without the error messages in the case of unsorted input.
>  .It Fl H
>  Ignored for compatibility with earlier versions of
>  .Nm .
> @@ -137,9 +151,13 @@
>  option, check that there are no lines with duplicate keys.
>  .El
>  .Pp
> -The following options override the default ordering rules.
> -When ordering options appear independent of key field
> -specifications, the requested field ordering rules are
> +The following options,
> +which should be given before any
> +.Fl k
> +options, override the default ordering rules.
> +When ordering options appear independent of,
> +and before, key field specifications,
> +the requested field ordering rules are
>  applied globally to all sort keys.
>  When attached to a specific key (see
>  .Fl k ) ,
> @@ -224,12 +242,21 @@
>  This should be used with discretion;
>  .Fl R Aq Ar alphanumeric
>  usually produces undesirable results.
> +If char is not a single character, then it
> +specifies the value of the desired record
> +separator as an integer specified in any
> +of the normal NNN, 0ooo, or 0xXXX ways,
> +or as an octal value preceded by \e.
> +Caution: do not attempt to specify Ctl-A
> +as
> +.Dq -R 1
> +which will not do what was intended at all!
>  The default record separator is newline.
> -.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2
> +.It Fl k Ar kstart Ns Op Li \&, Ns Ar kend
>  Designates the starting position,
> -.Ar field1 ,
> +.Ar kstart ,
>  and optional ending position,
> -.Ar field2 ,
> +.Ar kend ,
>  of a key field.
>  The
>  .Fl k
> @@ -265,16 +292,16 @@
>  Fields are specified
>  by the
>  .Fl k
> -.Ar field1 Ns Op \&, Ns Ar field2
> +.Ar kstart Ns Op \&, Ns Ar kend
>  argument.
>  A missing
> -.Ar field2
> +.Ar kend
>  argument defaults to the end of a line.
>  .Pp
>  The arguments
> -.Ar field1
> +.Ar kstart
>  and
> -.Ar field2
> +.Ar kend
>  have the form
>  .Ar m Ns Li \&. Ns Ar n
>  and can be followed by one or more of the letters
> @@ -284,7 +311,7 @@
>  .Cm r ,
>  which correspond to the options discussed above.
>  A
> -.Ar field1
> +.Ar kstart
>  position specified by
>  .Ar m Ns Li \&. Ns Ar n
>  .Pq Ar m , n No \*[Gt] 0
> @@ -296,7 +323,7 @@
>  A missing
>  .Li \&. Ns Ar n
>  in
> -.Ar field1
> +.Ar kstart
>  means
>  .Ql \&.1 ,
>  indicating the first character of the
> @@ -314,7 +341,7 @@
>  field.
>  .Pp
>  A
> -.Ar field2
> +.Ar kend
>  position specified by
>  .Ar m Ns Li \&. Ns Ar n
>  is interpreted as
> @@ -451,7 +478,7 @@
>  Thus performance depends highly on efficient choice of sort keys, and the
>  .Fl b
>  option and the
> -.Ar field2
> +.Ar kend
>  argument of the
>  .Fl k
>  option should be used whenever possible.
> Index: sort.c
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/sort.c,v
> retrieving revision 1.61
> diff -u -r1.61 sort.c
> --- sort.c      16 Sep 2011 15:39:29 -0000      1.61
> +++ sort.c      29 May 2016 05:00:34 -0000
> @@ -117,7 +117,7 @@
>  main(int argc, char *argv[])
>  {
>         int ch, i, stdinflag = 0;
> -       char cflag = 0, mflag = 0;
> +       char mode = 0;
>         char *outfile, *outpath = 0;
>         struct field *fldtab;
>         size_t fldtab_sz, fld_cnt;
> @@ -145,9 +145,9 @@
>         fldtab = emalloc(fldtab_sz * sizeof(*fldtab));
>         memset(fldtab, 0, fldtab_sz * sizeof(*fldtab));
>
> -#define SORT_OPTS "bcdD:fHik:lmno:rR:sSt:T:ux"
> +#define SORT_OPTS "bcCdD:fHik:lmno:rR:sSt:T:u"
>
> -       /* Convert "+field" args to -f format */
> +       /* Convert "+field" args to -k format */
>         fixit(&argc, argv, SORT_OPTS);
>
>         if (!(tmpdir = getenv("TMPDIR")))
> @@ -158,8 +158,10 @@
>                 case 'b':
>                         fldtab[0].flags |= BI | BT;
>                         break;
> -               case 'c':
> -                       cflag = 1;
> +               case 'c': case 'C': case 'm':
> +                       if (mode)
> +                               usage("Incompatible operation modes");
> +                       mode = ch;
>                         break;
>                 case 'D': /* Debug flags */
>                         for (i = 0; optarg[i]; i++)
> @@ -179,15 +181,33 @@
>
>                         setfield(optarg, &fldtab[++fld_cnt],
> fldtab[0].flags);
>                         break;
> -               case 'm':
> -                       mflag = 1;
> -                       break;
>                 case 'o':
>                         outpath = optarg;
>                         break;
>                 case 'r':
>                         REVERSE = 1;
>                         break;
> +               case 'R':
> +                       if (REC_D != '\n')
> +                               usage("multiple record delimiters");
> +                       REC_D = *optarg;
> +                       if (optarg[1] != '\0') {
> +                               char *ep;
> +                               int t = 0;
> +
> +                               if (optarg[0] == '\\')
> +                                       optarg++, t = 8;
> +                               REC_D = (int)strtol(optarg, &ep, t);
> +                               if (*ep != '\0' || REC_D < 0 ||
> +                                   REC_D >= (int)__arraycount(d_mask))
> +                                       errx(2, "invalid record delimiter
> %s",
> +                                           optarg);
> +                       }
> +                       if (REC_D == '\n')
> +                               break;
> +                       d_mask['\n'] = d_mask[' '];
> +                       d_mask[REC_D] = REC_D_F;
> +                       break;
>                 case 's':
>                         /*
>                          * Nominally 'stable sort', keep lines with equal
> keys
> @@ -213,30 +233,11 @@
>                         SEP_FLAG = 1;
>                         d_mask[' '] &= ~FLD_D;
>                         d_mask['\t'] &= ~FLD_D;
> +                       d_mask['\n'] &= ~FLD_D;
>                         d_mask[(u_char)*optarg] |= FLD_D;
>                         if (d_mask[(u_char)*optarg] & REC_D_F)
>                                 errx(2, "record/field delimiter clash");
>                         break;
> -               case 'R':
> -                       if (REC_D != '\n')
> -                               usage("multiple record delimiters");
> -                       REC_D = *optarg;
> -                       if (REC_D == '\n')
> -                               break;
> -                       if (optarg[1] != '\0') {
> -                               char *ep;
> -                               int t = 0;
> -                               if (optarg[0] == '\\')
> -                                       optarg++, t = 8;
> -                               REC_D = (int)strtol(optarg, &ep, t);
> -                               if (*ep != '\0' || REC_D < 0 ||
> -                                   REC_D >= (int)__arraycount(d_mask))
> -                                       errx(2, "invalid record delimiter
> %s",
> -                                           optarg);
> -                       }
> -                       d_mask['\n'] = d_mask[' '];
> -                       d_mask[REC_D] = REC_D_F;
> -                       break;
>                 case 'T':
>                         /* -T tmpdir */
>                         tmpdir = optarg;
> @@ -254,13 +255,13 @@
>                 /* Don't sort on raw record if keys match */
>                 posix_sort = 0;
>
> -       if (cflag && argc > optind+1)
> +       if ((mode == 'c' || mode == 'C') && argc > optind+1)
>                 errx(2, "too many input files for -c option");
>         if (argc - 2 > optind && !strcmp(argv[argc-2], "-o")) {
>                 outpath = argv[argc-1];
>                 argc -= 2;
>         }
> -       if (mflag && argc - optind > (MAXFCT - (16+1))*16)
> +       if (mode == 'm' && argc - optind > (MAXFCT - (16+1))*16)
>                 errx(2, "too many input files for -m option");
>
>         for (i = optind; i < argc; i++) {
> @@ -309,8 +310,8 @@
>                 num_input_files = argc - optind;
>         }
>
> -       if (cflag) {
> -               order(&filelist, fldtab);
> +       if (mode == 'c' || mode == 'C') {
> +               order(&filelist, fldtab, mode == 'C');
>                 /* NOT REACHED */
>         }
>
> @@ -348,7 +349,7 @@
>                         err(2, "output file %s", outfile);
>         }
>
> -       if (mflag)
> +       if (mode == 'm')
>                 fmerge(&filelist, num_input_files, outfp, fldtab);
>         else
>                 fsort(&filelist, num_input_files, outfp, fldtab);
> @@ -393,13 +394,20 @@
>  static void
>  usage(const char *msg)
>  {
> +       const char *pn = getprogname();
> +
>         if (msg != NULL)
>                 (void)fprintf(stderr, "%s: %s\n", getprogname(), msg);
>         (void)fprintf(stderr,
> -           "usage: %s [-bcdfHilmnrSsu] [-k field1[,field2]] [-o output]"
> -           " [-R char] [-T dir]", getprogname());
> +           "usage: %s [-bdfHilmnrSsu] [-k kstart[,kend]] [-o output]"
> +           " [-R char] [-T dir]\n", pn);
>         (void)fprintf(stderr,
>             "             [-t char] [file ...]\n");
> +       (void)fprintf(stderr,
> +           "   or: %s -[cC] [-bdfilnru] [-k kstart[,kend]] [-o output]"
> +           " [-R char]\n", pn);
> +       (void)fprintf(stderr,
> +           "             [-t char] [file]\n");
>         exit(2);
>  }
>
> Index: sort.h
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/sort.h,v
> retrieving revision 1.35
> diff -u -r1.35 sort.h
> --- sort.h      5 Aug 2015 07:10:03 -0000       1.35
> +++ sort.h      29 May 2016 05:00:34 -0000
> @@ -191,7 +191,7 @@
>  int     makeline(FILE *, RECHEADER *, u_char *, struct field *);
>  void    makeline_copydown(RECHEADER *);
>  int     optval(int, int);
> -__dead void     order(struct filelist *, struct field *);
> +__dead void     order(struct filelist *, struct field *, int);
>  void    putline(const RECHEADER *, FILE *);
>  void    putrec(const RECHEADER *, FILE *);
>  void    putkeydump(const RECHEADER *, FILE *);
>
>
>


Home | Main Index | Thread Index | Old Index