Subject: Re: Swapping problems
To: None <current-users@NetBSD.ORG>
From: None <mika@cs.caltech.edu>
List: current-users
Date: 06/27/1996 13:40:51
Laine Stump writes:
>George Michaelson writes:
>> two different things might help:
>> 
>> 	merged vm/iobuf cache. would remove some dup usage and copying.
>> 	probably would see perf gains to match sunos on that platform.
>> 
>> 	improve minfree behaviour. what to swap when you are close to
>> 	thrashing is a good question. surely just tuning the minfree
>> 	requirement down to ensure some "core" code (update/init whatever)
>> 	is around would help?
>
>Can you expand on this a bit? Is any of this doable by just tuning
>things in the kernel configuration, or does it take digging into
>/usr/src/sys/vm/*? I'd really like to see this improved, but my last
>exposure to vm stuff (other than diddling with page registers on a 386,
>which is really a totally different topic) was about 13 years ago in an
>undergrad OS course.

I assume that this is about the i386 molasses-like swapping... I am 
almost certain this is a bug somewhere. Inefficient implementations
may make it worse, but it it so HORRIBLE that just can't be the
whole story.

I spent some time with gdb changing minfree and umm.. what that other
parameter is called that triggers the clock algorithm. Couldn't make
it do anything discernibly different. You run out of pages and then
grind, grind, grind.. 

>
>> increasing swap markedly might also have helped
>
>We have it configured for 512MB of swap (256MB on each of the two
>disks). I can't imagine it ever having a use for more.

Same here.. on our server, we're swapping onto two fast/wide Barracudas..
64 MB real memory and 590 MB of swap, with the swap partitions 
configured identically on the two (identical) drives. Very fast machine
until you hit swap and then... grind (one disk light flashes for an 
instant)..nothing happens for a second... grind (the other disk light
flashes for an instant).. nothing happens for a second.. grind (the first
disk light flashes for an instant).. you get the idea. When the machine
is out of real memory, I measured less than 200 kilobytes per second
writing to MFS.. which is about 1/40th the speed of writing to the 
disk directly (not theoretical, actual speed.. the drives, even with
the supposedly flaky old AIC7870 driver, do about 4 MB/s each...)

>
>> I thought the swap/vm
>> thing was kinda related for long-lived processes. your make doesn't
>> sound quite the same as people finding a 60mn named has locked up...
>
>It's a different problem. I haven't really been bothered by the swap
>leak problem; I guess the long lived processes I have running (no named,
>just nfsd, nfsiod, ypbind, gated, sendmail, and the "standard" things
>like init et al) don't do as much allocation as named.

Same here.. we've got so much swap on our machines (what else do you
do with 1GB drives that have 13 MB taken up by /sbin and /bin??) that
the leak doesn't matter. 

I can see at least two problems for *certain* with the paging--I don't
know how many others there are.

1. The algorithm seems to start way too late.. it only starts marking pages
for consideration when you are already running low on memory. As a result,
when you run out, it pages out the wrong pages.. pages that have to get
paged in again immediately.

2. It allows the amount of available memory to drop to zero.. this causes
the system to "almost deadlock" because it already paged out the wrong
pages and it needs to page something else out to page in the page it just
paged out.. duh (Take a look at what SunOS 4.1 does if you start a process
that eats a lot of memory--I don't know what it does, but it pulls some
tricks to ensure that this doesn't happen :))

3? The one-second delays in activity? An interrupt problem? (I've heard
this is not so bad on ports other than i386..??) 

Mika

P.S. Our kernel sources are dated 5/11/96.