source-changes: Re: CVS commit: src/usr.bin/split

Subject: Re: CVS commit: src/usr.bin/split
To: None <source-changes@NetBSD.org>
From: Alan Barrett <apb@cequrux.com>
List: source-changes
Date: 06/01/2007 12:02:41
On Thu, 31 May 2007, Jan Schaumann wrote:
> > If you change this line in split3() from
> >         split1(sb.st_size/chunks, chunks);
> > to
> >         split1((sb.st_size + chunks - 1)/chunks, chunks);
> > 
> > then the last file will never be larger than the others, there won't
> > be any "spillover", and you can remove all the new special cases in
> > split1().
> 
> If by "all the new special cases", you mean the counting of the files
> created,

Yes...

> then I'm not sure if using your approach does the right thing.
> 
> Consider a file of 100 bytes size that you want to split into 11 files:

It's just impossible to do that without violating either (A) the
principle that all files should be the same size except that the last
file may be smaller, or (B) the principle that exactly the requested
number of files should be created and none should be empty.

If you choose 9 bytes per file, you have one leftover byte that doesn't
fit in the last file (but your special-case code deals with that by
making the last file larger than the others, violating principle A
above).

If you choose 10 bytes per file, then the first 10 files use up all the
input data, so either the 11th file won't exist at all, or it will be
empty, violating principle B above.

> My approach says "split bytewise into files with 100/11 = 9 bytes, no
> more than 11 files" (ie 10 files with 9 bytes each, one file is 10
> bytes, total # of files is 11).
> 
> Your approach says "split bytewise into files with (100 + 11 - 1)/11 =
> 10 bytes" (ie 10 files with 10 bytes each).

Yes, that's right.  In this case (and in other cases where the input
file size is less than (N-1)*(N-1)+1, where N is the number of files),
my way results in the last file being empty or nonexistent.  Your way
always results in the last file being larger than the others (unless the
input size is an exact multiple of the number of files).

Another pathological case is where the input size is smaller than the
requested number of files.  My way will put 1 byte in each of the first
few files, and will then stop.  Your way will create empty files for all
except the last, and then place all the input data in the last file.

I think that my way is better in all these cases, but of course others
may disagree.

--apb (Alan Barrett)