tech-kern: Re: Page daemon behavior part N

Subject: Re: Page daemon behavior part N
To: None <smd@ebone.net, thorpej@zembu.com>
From: None <eeh@netbsd.org>
List: tech-kern
Date: 01/25/2001 20:37:14
	jthorpe writes

	|  > We should be able to fix that by puting UBC pages directly on to the
	|  > inactive list as soon as current UBC operation is complete.  As far
	|  > as I'm concerned there's no reason UBC pages should ever be `active'
	|  > unless they are mmapp()ed into some process' address space.
	|
	| I agree completely.

	Uh, just checking: are we are going to give better performance to
	processes which mmap in a file and then sparsely look at / touch
	pages, compared to a process that uses open/lseek/read/write to
	look at / modify the same file in a sparse manner?

Actually, by putting UBC-only pages immediately on the 
inactive list you give read/write and advantage over 
mmap().  

In general, sequential I/O tends to be large and not
re-use data, while random I/O tends to be small, 
scattered, and benefits from caching.  If you do
large, sequential I/O you want to recycle the buffer
cache pages as soon as possible so as not to displace
more important data.  If you do random I/O you want
it to remain 'cause you may get back to it.

Read and write are sequential access methods, and 
can only be converted to random through lseek(). 
As such you want to re-cycle pages ASAP.  OTOH
mmap() is a random access method.  If you use
mmap() for large, sequential I/O you should also
be using madvise() to tell the kernel when you 
are finished with the data.

	| However, still is the case that pages could be not cleaned quickly enough.
	| Maybe we need to have more aggressive cleaning of pages recently involved
	| in a UBC write operation?

	This is probably smart for sequential bulk file writing, probably
	not so great for directory blocks that were written out, or any
	kind of "too-small" block-size... imagine "process | dd of=foo bs=512"
	under heavy loads leading to the same block being zorched from the
	cache multiple times, while something LRU is staying in core.

If you're doing `dd of=foo bs=512' you want to purge your output
from be buffer cache ASAP so you have room to allocate a page
for the next block.

	I'm kinda leery of a one-size-fits-all pageout policy: NetBSD
	wants to run on vastly different systems in terms of amount of memory
	and secondary storage cost, and NetBSD users are a pretty diverse
	bunch in terms of system utilization.

So you prefer to compile different algorithms for different types
of machines?  What about machines that have multiple roles: a workstation
that also NFS exports disks?  Do you need to switch kernels depending
on whether you're at the console running X11?

Eduardo