Subject: Re: UBC status
To: None <tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 09/29/1999 12:49:18
ok, I think all of this can be summarized as:

(1)  trickle-sync is a good thing.
(2)  it'd be nice to have more differentiation between different kinds of pages,
     to allow different policies on flushing and reuse.

the implementation of (1) will be somewhat different for page-cache data
than it is for buffer-cache data (ie. in the softupdate code), but this
may be possible to do for the first release with UBC.  y'all can chew on
this one.  the main rule to keep in mind is that you're not allowed to
add anything to struct vm_page.

for (2), soda pointed me to this URL:
	http://www.sun.com/sun-on-net/performance/priority_paging.html
which describes something Sun is working on.  I've been thinking about
similar stuff for a while, and I'll post my ideas once they've solidified
a bit more.  or if someone else has some firm ideas already, post away.


as for B_ORDERED and other device-drivery things, these sound like fine
ideas that should be followed up on, but they're beyond the scope of
what I'm trying to do right now.  I'd encourage people to work on these
independently.

-Chuck



On Sat, Sep 25, 1999 at 02:10:21PM -0700, Eduardo E. Horvath wrote:
> On Sat, 25 Sep 1999, Neil A. Carson wrote:
> 
> > Chuck Silvers wrote:
> > 
> > > yea, I'm not very excited about a limit on cached file data either,
> > > but many people have talked about such a thing so I listed it tentatively.
> > > I was including limiting dirty pages under "pagedaemon optimizations"...
> > > could you elaborate on the extremely clever ways this could be avoided?
> 
> [Description of pageout issues deleted]
> 
> There are two different issues here: handling dirty pages and allocating
> clean pages for new buffers.  
> 
> There are many solutions for problems caused by dirty buffers and most are
> not that complex.
> 
> > FreeBSD works around this by having a small limit on the amount of dirty
> > data despite allowing the cache to grow. This works very well in
> > practice, althoughg I don't really believe this to be the solution
> > either, since all the buffer cache junk in there still has the 'blow out
> > in one go' problem (although by default you don't notice it).
> 
> Does FreeBSD have a limit on page allocations?
> 
> > I think the real rules you need to play by would be something like:
> > 	- Always keep the IO subsystem active as regards spooling
> > 	  dirty data.
> 
> I always thought that trying to run the pagescanner at a very low rate in
> the idle loop would be a good idea.  Since the system is idle you're not
> stealing CPU cycles from something more important.  However, the CPU may
> be idle because it's waiting for I/O, and the last thing you want to do to
> a system that's already thrashing under a high I/O load is to add some
> more.
> 
> > 	- Implement an IO prioritisation scheme (with some
> > 	  heuristics based on drive head location etc) which places
> > 	  interactive operations over trickle page-outs
> 
> Interesting, but rather complicated since it requires extremely good
> sharing of information between the disk and HBA drivers and the
> pagedaemon.
> 
> > 	- If the amount of dirty data starts to accumulate too
> > 	  much (ie the IO subsystems are continually saturated)
> > 	  then stop it growing further.
> 
> Solaris has a nice solution to this problem.  Traditionally, the update
> daemon would run every 30 seconds to flush all dirty buffers to disk.
> When machines started to have 64MB, 256MB, or more of RAM, dumping
> possibly 100's of MB of data to the disk all at the same time.  The
> solution to that was to run it every 10s over 1/5 of RAM on the system,
> leading to a much more even I/O load.
> 
> The problem with this is that the buffers were tracked by inode, and sync
> used to operate over inodes, so they needed to rewrite it to operate on
> pages.  The result was an increase in CPU usage during scanning.
> 
> A similar solution could be designed that scans through some fraction of
> active inodes in the system.   Or when a dirty page is created, the
> associated inode could be timestamped and after 30 seconds it could be
> flushed.
> 
> > 
> > In this way, I guess, you effectively have an 'adaptive limit' on the
> > amount of dirty data.
> > 
> > Does this make sense?
> 
> Yes.
> 
> Now on the more interesting side, what to do about page allocation.  
> 
> The high water mark solution is well tested but seems rather arbitrary.
> You would want different settings depending on system RAM, current load,
> the types of jobs running, etc.
> 
> Wiring down pages just because they are shared by a lot of processes does
> not seem like such a good idea if those pages are used very seldom.
> 
> I have long been speculating whether we could make use of a separate set
> of active and inactive memory lists only for buffer cache pages that would
> be scanned at a faster rate than the current ones, allowing faster re-use
> of buffer pages but not requiring a hard high-water mark.
> 
> =========================================================================
> Eduardo Horvath				eeh@one-o.com
> 	"I need to find a pithy new quote." -- me