Subject: Re: bin/10625: /usr/bin/cmp is unable to compare rather large files
To: Jaromír Doleček <dolecek@ibis.cz>
From: Greg A. Woods <woods@weird.com>
List: tech-userlevel
Date: 07/27/2000 17:23:27
[ On Thursday, July 27, 2000 at 20:43:35 (+0200), Jaromír Doleček wrote: ]
> Subject: Re: bin/10625: /usr/bin/cmp is unable to compare rather large files
>
> > SIZE_T_MAX / 2 is still too large, because it does not account for
> > the kernel address space and the memory that is used by cmp for
> > other purposes.
> 
> Actually, it's not needed to take other memory users into
> considerations.

Of course it is!!!!  It *must* be.  There's only one flat virtual
address space that's bounded by at most 2^32 bytes on a 32-bit
byte-addressed machine running *BSD.

> Since cmp(1) uses mmap(..MAP_SHARED|MAP_FILE) and
> madvise(2)s kernel to MADV_SEQUENTIAL, it's using just one real
> memory page for each compared file at any time (the the no longer
> needed memory pages are continuously freed). MADV_SEQUENTIAL is cool :)

That's not cool enough!  OFF_T_MAX is twice as large as SIZE_T_MAX on
any 32-bit machine running any 4.4BSD derrivative!

I.e. you can ``easily'' have a file bigger than can possibly ever be
walked over by a normal 32-bit pointer....

You also can't ever have two pointers in the same process that range
from 0 through SIZE_T_MAX and expect them to access two different memory
regions at the same time.  At the VERY best you can do 0 .. (SIZE_T_MAX/2).

Then there's the sparc issue Todd points out....

Making the assumption that you can mmap() an entire file at once for
this purpose is just plainly wrong, let alone two files!

I.e. mmap(), if ever used _AT_ALL_ should probably always be hidden in
the I/O library where the complexity of these issues can all be dealt
with once, and hopefully correctly.  In my short experience I've seen
mmap() used incorrectly like this almost as many times as Ousterhout
says average programmers mis-use threads!  :-)

> Files up to about 3.6GB should be ok even on 32bit machines (or
> two files which have together this size). If not, that would be a bug.

Maybe comparing two files up to (3.6GB/2) each would be possible, but
not two files up to 3.6GB each -- that's literally impossible.  2^(32-1)
is the maximum file size you could ever hope to map twice into the same
process.

According to the mmap(2) manual page the behaviour of trying to map a
file bigger than will fit in available remaining address space is
undefined....

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>