Subject: Re: UVM optimalisations / remarks
To: None <eeh@netbsd.org>
From: Reinoud Zandijk <reinoud@netbsd.org>
List: tech-kern
Date: 03/17/2002 20:56:40
On Sun, Mar 17, 2002 at 07:23:31PM -0000, eeh@netbsd.org wrote:
> So instead of allocating and zeroing a new page on a read fault you do it on a
> write fault?  I don't really see this as such a huge optimization.  
> 
> Let's examine:
> 
> With the current scheme, when you access a ZFOD page either for reading or
> writing you take a fault, a page is allocated and zeroed and mapped in as
> modified.  You take no more faults on the page until some other operation 
> changes it's attributes.

true indeed.... 

> If we have a single zero page and do COW you will usually take two faults:
> first you take one to map in the zero page on the first read, then when the
> first write occurs you take a more expensive fault to remove the old mapping
> and allocate a new ZFOD page.  The COW path needs to be made more complicated
> to recognize the special COW of page zero so you can allocate from the zero'ed
> page list or use pmap_clear_page() instead of pmap_copy_page() (which is twice
> as expensive).

Euhm.... not quite ... if it is really so that normally a write is done 
first then the scheme is the same ... i.e. just one pagefault. The COW path 
only needs to know if its the zerofilled page (just a flag?) so it takes 
one from the zeroed page list instead of using pmap_copy_page().

If we read the page first we take one pagefault and then it gets mapped
with _the_ zero page and all successive reads get the same page over and
over. I agree that this is mostly interesting for physically mapped caches
and doesn't have effect (pitty...)  on virtually mapped caches.

If the page is then written on again it has to take the 2nd page fault true
but it follows the normal path : i.e. take a free zeroed page and map it on
that place.
 
The virtual address mapped cache doesn't even need to be flushed too since
the page in that area was zero anyway so even though the physical address
changed for the cache nothing changed. For physical addressed caches 
nothing changed.

> What do we gain from making this change?  We don't need to allocate swap
> immediately when we first read from a ZFOD page.  However, we must still
> reserve swap for it in case we do write to the page.  How often do you read 
> from a ZFOD page without writing to it immediately afterwards?

Thats a good question .... the cases i am thinking of are large area's
allocated with calloc() as some GNU programs tend to do... if we can make
better use of our knowledge that calloc()'d pages are zero instead of the
current calloc() implementation in src/lib/libc/stdlib/calloc.c then i 
think that would sure be a benefit :

void * 
calloc(num, size)
        size_t num; 
        size_t size;
{
        void *p;

        size *= num;
        if ((p = malloc(size)) != NULL)
                memset(p, '\0', size);
        return(p);
}

Though a system call might be a bit expensive the malloc system call is
made allready here ... and a calloc() system call alone would be faster 
since the memset wouldn't be nessisary.

> I expect that change is probably not an over all optimization.  However,
> you can try to gather useage statistics and prove me wrong.

I dunno if i have the understanding yet of the UVM code to do this but i'll 
keep it in mind ....

Cheers and thanks for the feedback,
Reinoud