Subject: RFC: client-side NFS locking
To: None <tech-kern@NetBSD.org>
From: Edgar =?iso-8859-1?B?RnXf?= <Edgar.Fuss@bn2.maus.net>
List: tech-kern
Date: 07/26/2006 21:45:00
Hello.

This is probably a bit strange for a first post on the list, but
mathematicians are known to be strange people.

I desperately need NFS client-side locking on our (math.uni-bonn.de)
mail servers. I was so happy they're running NetBSD now, but still no
NFS locking in 3.0 or current. After a plea to ws@ to port what's in
FreeBSD failed with ENOTIME, I spent most of a weekend reading a thick
book with a red cover featuring a funny creature carrying a fork, phoned
ws umpteen times the following weeks and wrote something you can find at
http://www.math.uni-bonn.de/people/ef/nfslock.tar.gz. The tarball contains
a file full of patches and several new files. I effectively split
lockd_lock.c into lock_server.c and lock_common.c, added some lines to them
and a new lock_client.c---plus a kernel part or course. It's all quite
different from what BSDI/FreeBSD does.

My client can operate either synchronously (NLM_LOCK etc.) or
asynchronously (NLM_LOCK_MSG etc.), with or without threading.
The async mode doesn't work yet, possibly due to problems with the
get_client() routine in lock_proc.c. Async without threading will
surely deadlock. I'd prefer to do it event-driven, but how do you call
RPCs event-driven?

There also is a small test program called locktest. If you don't
understand what it wants you to type, type '?' or read the source.
If you don't understand how one can write a program that wants you
to type ..., well.

Since I've never ever written anything like Unix kernel stuff, never
used RPC or posix threads before and given C being not exactly my
favourite programming language, this may all be sub-optimal.
I expect lots of comments (thus the subject).

Some sundry things in pseudo-random order I stumbled over in the process
of writing this:

In vnode.h, why doesn't VOPARG_OFFSET use offsetof()?

In vnodeops(9), under VOP_ADVLOCK, I don't understand the wording after
"The argument".

In kern_descrip.c, the code silently adds the current file position
without changing SEEK_CUR to SEEK_SET. Moreover, this behaviour seems
to be undocumented.

In the same file, in sys-flock(), lf.pid seems to be uninitialized.
No clue whether that is problematic.

In rpc.lockd/lockd_lock.c, a host was never unmonitored.

In rpc/lockd/lock_proc.c, getclient(), the comment talks, err, writes
about -udp- where the code uses -tp-.

I've read different opinions whether fcntl(..., F_SETLK, ...) should
return EACCESS or EAGAIN when it can't get the lock.

As I wrote above, I never used RPCs before, but I thought the whole point
about the async versions like LOCK_MSG was that they returned immediately,
i.e. as soon as the arguments have arrived. However, lock_proc.c calls the
handling routine, sends the reply RPC (LOCK_MSG or alike) and only then
returns. I would argue that as long as my call to LOCK_MSG hasn't returned,
I haven't made the call so there can be no LOCK_RES referring to the call
I didn't yet issue. Moreover, I had to thread the client because of this.

The NLM server always seems to fhopen() RDWR, meaning one can't lock
files root can't write to.

Moreover, the server seems to be only able to handle one single lock per
file, probably because keeping track of different processes locking
different parts of the file isn't much fun. I'm not sure I want to
rewrite this. Maybe it would be easier if the kernel exposed an interface
at the lf_xxx() level including some sort of callback if a lock becomes
available?

There is a typo in rpc(3) reading "rpc_reg structure" instead of
"rpc_req" (that's in bold so / doesn't find it).

In rpc/rpc.h, why does clnt_call() cast the pointer to char * and
clnt_freeargs() doesn't?

I hope I didn't forget anything. I've accumulated awful amounts of
little papers around here while writing this code.

By the way, I must say that build.sh is really cool. As the SpacStation
10/20's I used to test this at home are rather slow, I wrote and compiled
the whole stuff on my PowerBook running Max OS X. When I timed out on
build.sh build, I went to work, let one of the two-Opteron-irons running
NetBSD do the build (32min), tar'ed the whole 200M destdir.sparc (<2sec)
and took the tarball home.

So, please go ahead and comment. I haven't tested much yet, though it
basically pretends to work. I'm going to find stupid errors as soon as
I have posted this.