Subject: diffs for UVM/UBC improvements available
To: None <tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 05/22/2001 02:38:33
hi folks,

I've been working on some improvements for UVM/UBC for the past couple months,
and I'm at the point now where I'd like to get some other people to try them
prior to integrating them with -current.  it started off as performance
improvements for sequential writes (since it was pointed out that write
performance was much worse on slower machines like a decstation 5000/33),
but it's turned into a big mish-mash of changes.  the list includes:

 - improved write performance on my decstation 5000/200, within 1% of
   the speed of NetBSD 1.5 (but now it's worse again due to the disk driver
   issue that I posted about here yesterday).
 - move the vnode lock from struct vnode back into the fs-specific data.
   the layered filesystems don't have a separate lock for their vnodes,
   so the lock isn't really a vnode property.
 - implement pmap_k{enter_pa,remove}() for real on additional platforms.
   note that this is now REQUIRED, ie. it is no longer legal to implement
   pmap_kenter_pa() with pmap_enter().
 - remove special treatment of pager_map mappings in pmaps.  this is also
   REQUIRED now, since I've removed the globals that expose the address range.
 - remove kmem_object and mb_object since they were useless.
 - in the mips pmap, implement pools for pv entries, use __HAVE_VM_PAGE_MD,
   and remove a bunch of unnecessary spl*() calls.
 - eliminate struct uvm_vnode by moving its fields into struct vnode.
 - clean up struct vnode by removing all the fields that we don't use anymore.
   (I also got rid of v_lease on the principle that it's an optimization for
   an almost-unused feature.)
 - change vmapbuf() to use pmap_kenter_pa() instead of pmap_enter().
 - rewrite the pageout path.  the pager is now responsible for handling the
   high-level requests instead of only getting control after a bunch of work
   has already been done on its behalf.  this will allow us to UBCify LFS,
   which needs tighter control over its pages than other filesystems do.
   writing a page to disk no longer requires making it read-only, which
   allows us to write wired pages without causing all kinds of havoc.
 - we now use a new PG_PAGEOUT flag to indicate that a page should be freed
   on behalf of the pagedaemon when it's unlocked.  this flag is very similar
   to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
   pageout fails due to eg. an indirect-block buffer being locked.
   this allows us to remove the "version" field from struct vm_page,
   and together with shrinking "loan_count" from 32 bits to 16,
   struct vm_page is now 4 bytes smaller.
 - no longer use PG_RELEASED for swap-backed pages.  if the page is busy
   because it's being paged out, we can't release the swap slot to be
   reallocated until that write is complete, but unlike with vnodes we
   don't keep a count of in-progress writes so there's no good way to
   know when the write is done.  instead, when we need to free a busy
   swap-backed page, just sleep until we can get it busy ourselves.
 - implement a fast-path for extending writes which allows us to avoid
   zeroing new pages.  this substantially reduces cpu usage.
 - encapsulate the data used by the genfs code in a struct genfs_node,
   which must be the first element of the filesystem-specific vnode data
   for filesystems which use genfs_{get,put}pages().
 - eliminate many of the UVM pagerops, since they aren't needed anymore
   now that the pager "put" operation is a higher-level operation.
 - eliminate VOP_ISLOCKED() because it's not useful in an MP world.
 - enhance the genfs code to allow NFS to use the genfs_{get,put}pages
   instead of a modified copy.
 - eliminate "vm_page_t" in favor of "struct vm_page *".  typedefs to
   pointer types just bug me.  I'll be getting rid of some more of these
   from UVM before this goes in.
 - lots of other random cleanup.


this diff also includes the changes I posted to port-mips a while back
to inline the spl* calls, but I'll remove those before this goes in.
(it's just so I can get more useful profiling numbers.)
I also made a bunch of functions non-static so I could tell what was
going on when profiling, I'll make those static again too.

as I was working on this stuff, I didn't really expect that it would have
much affect on a more complex task like building a NetBSD snapshot.
to my surprise, it does!  on my 400MHz pc, I see this improvement in
"make build" performance:

-current:
4852.751u 779.388s 2:50:07.17 55.1%     0+0k 153979+322668io 42565pf+0w

with these diffs:
4855.911u 883.693s 2:29:39.24 63.9%     0+0k 16927+58502io 15800pf+0w


another benefit of these changes is that we're much more robust in
low-memory situations.  runaway processes which allocate anonymous memory
in a loop will now be killed much more reliably when swap space fills up.
there's still more work to be done in this area, but this is a big
improvement over what we have now.


I've tested this new code on these platforms:

	alpha
	i386
	mips - pmax
	m68k - sun3 sun3x
	sparc - sun4m sun4c
	sparc64
	powerpc - macppc


vax and arm32 (dnard) don't work for me in -current right now,
so I couldn't test those, but I've made the changes that I think are needed.
I haven't touched most of the m68k platforms, port maintainers for those
platforms will have to make the pmap changes listed above themselves.


so at this point I'd like those of you who like living on the bleeding edge
to try out this new code and let me know your experience.  the code is at

	ftp://ftp.netbsd.org/pub/NetBSD/misc/chs/ubc-perf/

there's one new file (genfs_node.h, which goes in sys/miscfs/genfs),
and a dated diff.  I'll update the diff pretty frequently for the
hopefully short period before this new code is integrated into -current.
the best way to upgrade to a new version of the diff is

	patch -R -p0 < diff.old
	cvs update
	patch -p0 < diff.new


as usual, please post here or mail me with any questions or comments.

-Chuck