Subject: Re: Why the partitioning should stay the same
To: None <louie@TransSys.COM, tech-kern@NetBSD.ORG>
From: Mike Hibler <mike@cs.utah.edu>
List: tech-kern
Date: 01/31/1995 11:27:51
Here are some random thoughts/observations (based on 4.4-lite so Chris or
someone can correct me if they no longer apply to NetBSD).

The bottleneck in swapping/paging currently is not going to be the path
through the raw-partition/vnode code.  I am pretty sure it is the fact that
paging happens in units of 1 page that will kill you.  Before 4.4-lite I
added code to the paging interface to do clustered requests and fixed up the
pageout path to use it, but it still didn't "feel" any faster.  I suspect
that clustered pagein (e.g. read-ahead) is more important and that pageout
is plagued by bad/poorly-tuned algorithms.

Re: raw partition vs. paging to the FS.  There are two ways to do this, the
easy way is to use the "vn" disk as has been suggested (that is why it was
written in the first place).  I deliberately decided to not go through the
buffer cache because I was worried about the effect of large swap operations
on relatively small buffer caches.  This was back in 4.3bsd which did
clustered swap operations and on an hp300 with a small buffer cache.  In 4.4
this also avoids double-caching of data due to the non-unified vm/fs caches.
The 4.4-lite version of vn.c, is further modified to take advantage of the
"read-ahead" information returned by the BMAP call so it can now do clustered
read/write operations much like the filesystem code.  So in theory, if your
swap file is layed out optimally (w.r.t. rotdelay, maxcontig) you should be
able to read/write a vn disk at "raw-disk" speeds (by the same theory that
allows you to do sequential file IO at raw-disk speeds :-).  However, I tried
this last night (using dd) and only got 5mb/sec out of a striped disk capable
of 8mb/sec.  So there still non-trivial overheads.  Note that the vn driver
doesn't handle expansion (or contraction) of the underlying files.  The swap
pager doesn't really know how to deal with this either.  I wasn't too
concerned about expanding swap files because you could always just configure
a second file and "swapon" to enable it.

Anyway, the second method would be to make the vnode_pager be the "default"
pager.  This is much more work since you have to decide how to manage swap
files and then you would have the double-caching problems.

Finally note that you don't even need a swap partition/device since the
system does lazy allocation.  I ran for a long time on my machine at home
before I discovered that by swap partition wasn't being enabled.  Not a
long term solution unless you have lots of memory though...