Subject: Re: bin/10625: /usr/bin/cmp is unable to compare rather large files
To: R. C. Dowdeswell <elric@imrryr.org>
From: Jaromír Doleček <dolecek@ibis.cz>
List: tech-userlevel
Date: 07/27/2000 20:43:35
> SIZE_T_MAX / 2 is still too large, because it does not account for
> the kernel address space and the memory that is used by cmp for
> other purposes.

Actually, it's not needed to take other memory users into
considerations. Since cmp(1) uses mmap(..MAP_SHARED|MAP_FILE) and
madvise(2)s kernel to MADV_SEQUENTIAL, it's using just one real
memory page for each compared file at any time (the the no longer
needed memory pages are continuously freed). MADV_SEQUENTIAL is cool :)

> Probably the best approach would be to use mmap(2)
> in a manner more consistent with how one would use read(2), i.e.:
> mmap(2) relatively small chunks of the file in a loop.

"why bother with mmap() then?" :)
Yes, this should probably be done. But it's too hairy to find
out what is the maximum usable "relatively small chunk" on given
machine/OS, so it's probably just easier to mmap() whole file
and if it's too big, just use read().

> It really has to fail, since there is no way to give you back a
> char * that can access data beyond the end of your address space.

Right. You can't have bigger than 4GB address space on 32bit machines
available for single program (though the machine can support
bigger address space - newer Intel processors support 36bit address
space with some special hacks; it's even supported on WIndows NT
(via special API) and Solaris (where it "just works")).

> And in fact it is more hairy than that because it must find a free
> contiguous region in your address space to hand back to you.  This
> may be the reason that 2.6GB of files couldn't be mapped.

Files up to about 3.6GB should be ok even on 32bit machines (or
two files which have together this size). If not, that would be a bug.
Note that the original mmap(2) call in cmp(1) failed for the files because it
used MAP_PRIVATE - once that has been removed or substituted
by MAP_SHARED, the mmap(2) succeeded.

3.6GB is the maximum address space available for userland programs
on i386 ATM IIRC.

Jaromir
-- 
Jaromir Dolecek <jdolecek@NetBSD.org>      http://www.ics.muni.cz/~dolecek/
@@@@  Wanna a real operating system ? Go and get NetBSD, damn!  @@@@