Subject: Re: mmap
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Chuck Cranor <chuck@maria.wustl.edu>
List: current-users
Date: 11/14/1998 22:36:03
hi-

>The first failure ("shared map does not match within page") is:
>    open file, 
>    mmap file as shared,
>    write(2) to mapped backing file,
>    do msync(MS_INVALIDATE),
>    read addresses in mapped file corresponding to write(2)

>The error is that after the msync(), the contents of the MAP_SHARED
>region are not the same as the mapped (backing) file. the msync()
>should guarantee that they are. 

>I think that's a bug in UVM.  Chuck?


yes.   its due to me trying to copy the semantics of the BSD VM system
calls.   it turns out the BSD VM system's msync has a bug too (but it
comes out in a different way).   here's the story:

    the msync system call [sys___msync13() in uvm_mmap.c] calls 
    uvm_map_clean() [from uvm_map.c] to clean out the map.  if 
    you look at uvm_map_clean in rev 1.32 of uvm_map.c, note the
    following code:


                /*
                 * flush pages if writing is allowed.   note that object is
                 * locked.
                 * XXX should we continue on an error?
                 */

                if (object && object->pgops &&
                    (current->protection & VM_PROT_WRITE) != 0) {
                        if (!object->pgops->pgo_flush(object, offset,
                            offset+size, flags)) {


    what that says is if the mapping has a valid backing object
    and the user process has write access to the mapping then call
    the pgo_flush pager operation to flush out the old data.

    the problem here is that in the test program the memory mapping
    is establish by mmap(2) with PROT_READ (read-only), thus 
    (current->protection & VM_PROT_WRITE) == 0.

    after thinking about it, it seems to me that that permission check
    doesn't really belong there, so i removed it and that fixed the 
    problem.


    this bug got me wondering why i put the permission check in there
    in the first place.   i started looking at BSD VM, and discovered
    this in vm_map_clean() in vm_map.c:

                /*
                 * Flush pages if writing is allowed.
                 * XXX should we continue on an error?
                 */
                if ((current->protection & VM_PROT_WRITE) &&
                    !vm_object_page_clean(object, offset, offset+size,
                                          syncio, FALSE)) {
                        vm_object_unlock(object);
                        vm_map_unlock_read(map);
                        return(KERN_FAILURE);
                }
                if (invalidate)
                        vm_object_page_remove(object, offset, offset+size);

    [note that vm_object_page_clean does what uvm's pgo_flush does]


    that explains where i got the check from (i was attempting to
    make sure the UVM system calls had the same semantics as BSD VM).

    however, if you look at the code quoted above, you will note that
    it has a major bug in it!   the invalidate check is outside of the 
    protection check.   this allows BSD VM to throw away valid modified
    data without saving it to backing store... thus you can easily get
    [or even trigger] data loss.


    consider the following case:

    you've got two users: user1 and user2.   suppose the following
    sequence of events happens:

    user1				user2
    - create a file, mode 644
    - write random data to the file
    - mmap the file in, shared
    - modify shared mapping
	(thus overwriting the 
	 previous data)

					- open user1's file O_RDONLY (to 
						avoid EPERM)
					- mmap the file PROT_READ, SHARED
					- call msync(2) with MS_INVALIDATE
						set


    now at the point where user2 calls msync(2), the file has the
    orig data in the buffer cache, but it has the modified data in
    the VM system's vm_object (recall that we have separate VM and 
    buffer cache).   when the quoted section of BSD VM's vm_map_clean
    runs, the vm_object_page_clean() call will _not_ happen because
    user2 does not have write access to the mapping.  however, the 
    vm_object_page_remove() _will_ happen and this will caused the
    valid data in the VM system's vm_object to get discarded 
    incorrectly.

    i verified this problem by running a couple of test programs under
    BSD VM.



so for UVM, what i'm going to do is remove the unnecessary permission
check to fix the bug.   

i suppose the thing to do for BSD VM is to remove the permission check
there also?   

chuck