Subject: NFS swap woes.
To: NetBSD/VAX Mailing List <port-vax@NetBSD.ORG>
From: Brian D Chase <brianc@carpediem.com>
List: port-vax
Date: 06/30/1997 05:32:06
Over the weekend while recompiling the 1.2G sources, I was once again
experiencing some serious difficulties in compiling the kernel.  Not in
the way of utilities being incompatible though.  My VS3100/38 would
frequently lock up tight in the middle of compiles.

Upon further examination, it seemed to occur right around the time when
there was a burst of paging activity.  The console also displayed the 
"/netbsd: le0: device timeout" usually quickly followed by a "message
repeated N times" message.  The device timeout messages occur fairly
frequently throughout any I/O intensive operations, but these particular
timeouts were occuring right around the time the paging was going on.

I'd speculate, in at least the case of pulling a page back into physical
memory, that it would be a very bad thing for access to the NFS mounted
swap to be delayed even for a few seconds.  I don't know that there's any
technical way around that, or even if it is a problem.  It just seems like
it could be a problem.

I tried running several different kernels:  Bertram's, my own spin on a VS
only kernel, and then two GENERIC 1.2G kernels compiled from source
seperated by about a week of changes.  All of the kernels exhibited the
same behavior, on both a VS3100/38 and a VS3100/30.

---

Having had some bad experiences with different Unix vendors
implementations of NFS not wanting to talk to one another as well as the
should... (Namely IRIX and Solaris -- specifically in the case of IRIX
writing to NFS mounted filesystems from Solaris/Sparc workstations.  The
write performance in this direction is about 1/4 to 1/6 of what the write
performance is from IRIX to IRIX or from Solaris to Solaris).  Having had
that experience, I thought I'd do some experimenting.  I moved the NFS
server VAX files from my Linux pentium box to my OpenBSD 386 box.

So far, the compile seems to be working flawlessly.  I still get loads of
timeout messages but the compile has yet to hang on me.  Already it's
much further along in the compilation over the past 8hrs than it was over
the past two days (due to continuously having to reboot the VAXstation).

---

Anyway, I'm still left with a lot of questions on this one -- and not many
conclusions apart from serving the files and swap space off of the OpenBSD
machine is working better than off of the Linux machine.  I don't know if
it's because of something wrong with Linux's low-level network driver, the
networking code, or perhaps its implementation of NFS?  Then again, it
could be a problem with the cheap NE2000 clone card in the machine?  Or
maybe like IRIX and Solaris, NetBSD and Linux just don't speak NFS on
equal terms?

If others experience lockup problems on their VAXen while NFS swapping to
a Linux fileserver, you might try serving at least the swap from one of
the BSD operating systems.  Also, if you do have lockup problems during
swap to a Linux fileserver, I'd like to hear about it.

[Needless to say, my compile of a 1.2G world is quite a bit delayed.]

-brian.
---------------------------------------------------------------------------
Brian D. Chase         Systems Coordinator        brian.chase@carpediem.com
-- Compression, Inc. - 13765 Alton Pkwy, Suite B - Irvine, CA 92618, USA --