tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Minor updates to sort ?
Living on the edge and top posting aggresively, these changes would be good
to have.
Thanks,
Alistair
On Saturday, May 28, 2016, Robert Elz <kre%munnari.oz.au@localhost> wrote:
> Inspired by Paul Goyette's question (on netbsd-users) I took a look at
> sort,
> and I'd like to commit the following updates if no-one objects.
>
> The only changes that should affect anything are the addition of the posix
> C option, which is identical to c, but doesn't write messages to stderr
> if the input file is not sorted, and fixing bugs in the processing of -R
> such that if -R is used (and without setting it to \n - the processing of
> which is also fixed in case it is set that wat using -R 10) \n does not
> become a field separator regardless of what might be set later with -t (if
> -t preceded -R it would have worked correctly, but not the other way.)
>
> Aside from that the changes are more or less cosmetic - they enforce using
> only one of -c -C and -m (which make no sense used together), make the
> usage() reflect reality (including formatting it to stop assuming it
> is outputting to an 80 column display..., and reflect the man page changes
> mentioned next), and fix a minor bug in a comment, removed the unused 'x'
> option (what was that?) from SORT_OPTS (no effect, generates usage() either
> way) and sorted the option processing (R comes before S...)
>
> In the man page, -C is documented, the synopsis is split to show the
> (only one file allowed) different usage for -C or -c, and perhaps most
> importantly, the names "field1" and "filed2" are changed to "kstart" and
> "kend" to make it (a little more) clear that the -k argument does not
> specify
> or use a field as such, but designates the start and end of the sort keys
> (with the designators using fields as an addressing object - which is all
> fields are used for in sort, unlike awk, cut, etc.) and -R is fully
> documented.
>
> There are no changes (at all) to anything actually related to sorting...
>
> Anyone object to these changes? (patch appended)
>
> kre
>
> Index: msort.c
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/msort.c,v
> retrieving revision 1.30
> diff -u -r1.30 msort.c
> --- msort.c 5 Feb 2010 21:58:42 -0000 1.30
> +++ msort.c 29 May 2016 05:00:34 -0000
> @@ -365,7 +365,7 @@
> * check order on one file
> */
> void
> -order(struct filelist *filelist, struct field *ftbl)
> +order(struct filelist *filelist, struct field *ftbl, int quiet)
> {
> get_func_t get = SINGL_FLD ? makeline : makekey;
> RECHEADER *crec, *prec, *trec;
> @@ -387,10 +387,14 @@
> exit(0);
> while (get(fp, crec, crec_end, ftbl) == 0) {
> if (0 < (c = cmp(prec, crec))) {
> + if (quiet)
> + exit(1);
> crec->data[crec->length-1] = 0;
> errx(1, "found disorder: %s",
> crec->data+crec->offset);
> }
> if (UNIQUE && !c) {
> + if (quiet)
> + exit(1);
> crec->data[crec->length-1] = 0;
> errx(1, "found non-uniqueness: %s",
> crec->data+crec->offset);
> Index: sort.1
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/sort.1,v
> retrieving revision 1.34
> diff -u -r1.34 sort.1
> --- sort.1 29 May 2013 15:00:35 -0000 1.34
> +++ sort.1 29 May 2016 05:00:34 -0000
> @@ -67,16 +67,26 @@
> .Nd sort or merge text files
> .Sh SYNOPSIS
> .Nm
> -.Op Fl bcdfHilmnrSsu
> +.Op Fl bdfHilmnrSsu
> .Oo
> .Fl k
> -.Ar field1 Ns Op Li \&, Ns Ar field2
> +.Ar kstart Ns Op Li \&, Ns Ar kend
> .Oc
> .Op Fl o Ar output
> .Op Fl R Ar char
> .Op Fl T Ar dir
> .Op Fl t Ar char
> .Op Ar
> +.Nm
> +.Fl c|C
> +.Op Fl bdfilnru
> +.Oo
> +.Fl k
> +.Ar kstart Ns Op Li \&, Ns Ar kend
> +.Op Fl t Ar char
> +.Oc
> +.Op Fl R Ar char
> +.Op Ar file
> .Sh DESCRIPTION
> The
> .Nm
> @@ -101,6 +111,10 @@
> produces no output.
> See also
> .Fl u .
> +.It Fl C
> +Identical to
> +.Fl c
> +without the error messages in the case of unsorted input.
> .It Fl H
> Ignored for compatibility with earlier versions of
> .Nm .
> @@ -137,9 +151,13 @@
> option, check that there are no lines with duplicate keys.
> .El
> .Pp
> -The following options override the default ordering rules.
> -When ordering options appear independent of key field
> -specifications, the requested field ordering rules are
> +The following options,
> +which should be given before any
> +.Fl k
> +options, override the default ordering rules.
> +When ordering options appear independent of,
> +and before, key field specifications,
> +the requested field ordering rules are
> applied globally to all sort keys.
> When attached to a specific key (see
> .Fl k ) ,
> @@ -224,12 +242,21 @@
> This should be used with discretion;
> .Fl R Aq Ar alphanumeric
> usually produces undesirable results.
> +If char is not a single character, then it
> +specifies the value of the desired record
> +separator as an integer specified in any
> +of the normal NNN, 0ooo, or 0xXXX ways,
> +or as an octal value preceded by \e.
> +Caution: do not attempt to specify Ctl-A
> +as
> +.Dq -R 1
> +which will not do what was intended at all!
> The default record separator is newline.
> -.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2
> +.It Fl k Ar kstart Ns Op Li \&, Ns Ar kend
> Designates the starting position,
> -.Ar field1 ,
> +.Ar kstart ,
> and optional ending position,
> -.Ar field2 ,
> +.Ar kend ,
> of a key field.
> The
> .Fl k
> @@ -265,16 +292,16 @@
> Fields are specified
> by the
> .Fl k
> -.Ar field1 Ns Op \&, Ns Ar field2
> +.Ar kstart Ns Op \&, Ns Ar kend
> argument.
> A missing
> -.Ar field2
> +.Ar kend
> argument defaults to the end of a line.
> .Pp
> The arguments
> -.Ar field1
> +.Ar kstart
> and
> -.Ar field2
> +.Ar kend
> have the form
> .Ar m Ns Li \&. Ns Ar n
> and can be followed by one or more of the letters
> @@ -284,7 +311,7 @@
> .Cm r ,
> which correspond to the options discussed above.
> A
> -.Ar field1
> +.Ar kstart
> position specified by
> .Ar m Ns Li \&. Ns Ar n
> .Pq Ar m , n No \*[Gt] 0
> @@ -296,7 +323,7 @@
> A missing
> .Li \&. Ns Ar n
> in
> -.Ar field1
> +.Ar kstart
> means
> .Ql \&.1 ,
> indicating the first character of the
> @@ -314,7 +341,7 @@
> field.
> .Pp
> A
> -.Ar field2
> +.Ar kend
> position specified by
> .Ar m Ns Li \&. Ns Ar n
> is interpreted as
> @@ -451,7 +478,7 @@
> Thus performance depends highly on efficient choice of sort keys, and the
> .Fl b
> option and the
> -.Ar field2
> +.Ar kend
> argument of the
> .Fl k
> option should be used whenever possible.
> Index: sort.c
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/sort.c,v
> retrieving revision 1.61
> diff -u -r1.61 sort.c
> --- sort.c 16 Sep 2011 15:39:29 -0000 1.61
> +++ sort.c 29 May 2016 05:00:34 -0000
> @@ -117,7 +117,7 @@
> main(int argc, char *argv[])
> {
> int ch, i, stdinflag = 0;
> - char cflag = 0, mflag = 0;
> + char mode = 0;
> char *outfile, *outpath = 0;
> struct field *fldtab;
> size_t fldtab_sz, fld_cnt;
> @@ -145,9 +145,9 @@
> fldtab = emalloc(fldtab_sz * sizeof(*fldtab));
> memset(fldtab, 0, fldtab_sz * sizeof(*fldtab));
>
> -#define SORT_OPTS "bcdD:fHik:lmno:rR:sSt:T:ux"
> +#define SORT_OPTS "bcCdD:fHik:lmno:rR:sSt:T:u"
>
> - /* Convert "+field" args to -f format */
> + /* Convert "+field" args to -k format */
> fixit(&argc, argv, SORT_OPTS);
>
> if (!(tmpdir = getenv("TMPDIR")))
> @@ -158,8 +158,10 @@
> case 'b':
> fldtab[0].flags |= BI | BT;
> break;
> - case 'c':
> - cflag = 1;
> + case 'c': case 'C': case 'm':
> + if (mode)
> + usage("Incompatible operation modes");
> + mode = ch;
> break;
> case 'D': /* Debug flags */
> for (i = 0; optarg[i]; i++)
> @@ -179,15 +181,33 @@
>
> setfield(optarg, &fldtab[++fld_cnt],
> fldtab[0].flags);
> break;
> - case 'm':
> - mflag = 1;
> - break;
> case 'o':
> outpath = optarg;
> break;
> case 'r':
> REVERSE = 1;
> break;
> + case 'R':
> + if (REC_D != '\n')
> + usage("multiple record delimiters");
> + REC_D = *optarg;
> + if (optarg[1] != '\0') {
> + char *ep;
> + int t = 0;
> +
> + if (optarg[0] == '\\')
> + optarg++, t = 8;
> + REC_D = (int)strtol(optarg, &ep, t);
> + if (*ep != '\0' || REC_D < 0 ||
> + REC_D >= (int)__arraycount(d_mask))
> + errx(2, "invalid record delimiter
> %s",
> + optarg);
> + }
> + if (REC_D == '\n')
> + break;
> + d_mask['\n'] = d_mask[' '];
> + d_mask[REC_D] = REC_D_F;
> + break;
> case 's':
> /*
> * Nominally 'stable sort', keep lines with equal
> keys
> @@ -213,30 +233,11 @@
> SEP_FLAG = 1;
> d_mask[' '] &= ~FLD_D;
> d_mask['\t'] &= ~FLD_D;
> + d_mask['\n'] &= ~FLD_D;
> d_mask[(u_char)*optarg] |= FLD_D;
> if (d_mask[(u_char)*optarg] & REC_D_F)
> errx(2, "record/field delimiter clash");
> break;
> - case 'R':
> - if (REC_D != '\n')
> - usage("multiple record delimiters");
> - REC_D = *optarg;
> - if (REC_D == '\n')
> - break;
> - if (optarg[1] != '\0') {
> - char *ep;
> - int t = 0;
> - if (optarg[0] == '\\')
> - optarg++, t = 8;
> - REC_D = (int)strtol(optarg, &ep, t);
> - if (*ep != '\0' || REC_D < 0 ||
> - REC_D >= (int)__arraycount(d_mask))
> - errx(2, "invalid record delimiter
> %s",
> - optarg);
> - }
> - d_mask['\n'] = d_mask[' '];
> - d_mask[REC_D] = REC_D_F;
> - break;
> case 'T':
> /* -T tmpdir */
> tmpdir = optarg;
> @@ -254,13 +255,13 @@
> /* Don't sort on raw record if keys match */
> posix_sort = 0;
>
> - if (cflag && argc > optind+1)
> + if ((mode == 'c' || mode == 'C') && argc > optind+1)
> errx(2, "too many input files for -c option");
> if (argc - 2 > optind && !strcmp(argv[argc-2], "-o")) {
> outpath = argv[argc-1];
> argc -= 2;
> }
> - if (mflag && argc - optind > (MAXFCT - (16+1))*16)
> + if (mode == 'm' && argc - optind > (MAXFCT - (16+1))*16)
> errx(2, "too many input files for -m option");
>
> for (i = optind; i < argc; i++) {
> @@ -309,8 +310,8 @@
> num_input_files = argc - optind;
> }
>
> - if (cflag) {
> - order(&filelist, fldtab);
> + if (mode == 'c' || mode == 'C') {
> + order(&filelist, fldtab, mode == 'C');
> /* NOT REACHED */
> }
>
> @@ -348,7 +349,7 @@
> err(2, "output file %s", outfile);
> }
>
> - if (mflag)
> + if (mode == 'm')
> fmerge(&filelist, num_input_files, outfp, fldtab);
> else
> fsort(&filelist, num_input_files, outfp, fldtab);
> @@ -393,13 +394,20 @@
> static void
> usage(const char *msg)
> {
> + const char *pn = getprogname();
> +
> if (msg != NULL)
> (void)fprintf(stderr, "%s: %s\n", getprogname(), msg);
> (void)fprintf(stderr,
> - "usage: %s [-bcdfHilmnrSsu] [-k field1[,field2]] [-o output]"
> - " [-R char] [-T dir]", getprogname());
> + "usage: %s [-bdfHilmnrSsu] [-k kstart[,kend]] [-o output]"
> + " [-R char] [-T dir]\n", pn);
> (void)fprintf(stderr,
> " [-t char] [file ...]\n");
> + (void)fprintf(stderr,
> + " or: %s -[cC] [-bdfilnru] [-k kstart[,kend]] [-o output]"
> + " [-R char]\n", pn);
> + (void)fprintf(stderr,
> + " [-t char] [file]\n");
> exit(2);
> }
>
> Index: sort.h
> ===================================================================
> RCS file: /cvsroot/src/usr.bin/sort/sort.h,v
> retrieving revision 1.35
> diff -u -r1.35 sort.h
> --- sort.h 5 Aug 2015 07:10:03 -0000 1.35
> +++ sort.h 29 May 2016 05:00:34 -0000
> @@ -191,7 +191,7 @@
> int makeline(FILE *, RECHEADER *, u_char *, struct field *);
> void makeline_copydown(RECHEADER *);
> int optval(int, int);
> -__dead void order(struct filelist *, struct field *);
> +__dead void order(struct filelist *, struct field *, int);
> void putline(const RECHEADER *, FILE *);
> void putrec(const RECHEADER *, FILE *);
> void putkeydump(const RECHEADER *, FILE *);
>
>
>
Home |
Main Index |
Thread Index |
Old Index