Subject: Re: CVS commit: src/usr.bin/split
To: None <jschauma@netmeister.org>
From: John Darrow <John.P.Darrow@wheaton.edu>
List: source-changes
Date: 05/31/2007 18:20:05
On 31 May 2007 12:00:13 -0500, Jan Schaumann <jschauma@netmeister.org> wrote:
>Alan Barrett <apb@cequrux.com> wrote:
>> > Add a new command-line option "-n chunk_count", that splits the input
>> > file into chunk_count smaller files.  Each file will be size/chunk_count
>> > bytes large, with whatever spillover there is ending up in the chunk_co=
>unth
>> > file.
>>
>> If you change this line in split3() from
>>
>>         split1(sb.st_size/chunks, chunks);
>>
>> to
>>
>>         split1((sb.st_size + chunks - 1)/chunks, chunks);
>>
>> then the last file will never be larger than the others, there won't
>> be any "spillover", and you can remove all the new special cases in
>> split1().
>
>If by "all the new special cases", you mean the counting of the files
>created, then I'm not sure if using your approach does the right thing.
>
>Consider a file of 100 bytes size that you want to split into 11 files:
>
>My approach says "split bytewise into files with 100/11 =3D 9 bytes, no
>more than 11 files" (ie 10 files with 9 bytes each, one file is 10
>bytes, total # of files is 11).
>
>Your approach says "split bytewise into files with (100 + 11 - 1)/11 =3D
>10 bytes" (ie 10 files with 10 bytes each).

This "miscounting" is a very rare special case, only _possibly_
occurring when (st_size / chunks) <= chunks, and even then only for
certain particular degenerate sizes (e.g. replacing 100 by 99 or 101
in your example will result in his code still producing 11 files).

Normally, the number of bytes per chunk is much larger than number
of chunks, in which case Alan's version still yields the proper number
of chunks, and greatly simplifies the rest of the code, so I think we
should go with it (possibly including a comment indicating awareness
of the degenerate case).

jdarrow