Subject: Re: RFC: client-side NFS locking
To: None <>
From: YAMAMOTO Takashi <>
List: tech-kern
Date: 09/16/2006 20:30:55

> I desperately need NFS client-side locking on our (
> mail servers. I was so happy they're running NetBSD now, but still no
> NFS locking in 3.0 or current. After a plea to ws@ to port what's in
> FreeBSD failed with ENOTIME, I spent most of a weekend reading a thick
> book with a red cover featuring a funny creature carrying a fork, phoned
> ws umpteen times the following weeks and wrote something you can find at
> The tarball contains
> a file full of patches and several new files. I effectively split
> lockd_lock.c into lock_server.c and lock_common.c, added some lines to them
> and a new lock_client.c---plus a kernel part or course. It's all quite
> different from what BSDI/FreeBSD does.

thanks for taking a look at this, and sorry for very late reply.

> I'd prefer to do it event-driven, but how do you call
> RPCs event-driven?

no way with our rpc library, afaik.

> In vnode.h, why doesn't VOPARG_OFFSET use offsetof()?

i don't think there is a fundamental reason.  i guess it's merely historical.

> In vnodeops(9), under VOP_ADVLOCK, I don't understand the wording after
> "The argument".


> In kern_descrip.c, the code silently adds the current file position
> without changing SEEK_CUR to SEEK_SET. Moreover, this behaviour seems
> to be undocumented.


> In the same file, in sys-flock(), seems to be uninitialized.
> No clue whether that is problematic.

it shouldn't be a problem because l_pid member is output only.
ie. it's only used for F_GETLK.

> In rpc.lockd/lockd_lock.c, a host was never unmonitored.

you are right.  it lacks unmonitoring.

> In rpc/lockd/lock_proc.c, getclient(), the comment talks, err, writes
> about -udp- where the code uses -tp-.

i'm not sure what you mean here.

> I've read different opinions whether fcntl(..., F_SETLK, ...) should
> return EACCESS or EAGAIN when it can't get the lock.

our local filesystems return EAGAIN.  SUSv3 seems to allow both.

> As I wrote above, I never used RPCs before, but I thought the whole point
> about the async versions like LOCK_MSG was that they returned immediately,
> i.e. as soon as the arguments have arrived. However, lock_proc.c calls the
> handling routine, sends the reply RPC (LOCK_MSG or alike) and only then

LOCK_RES, you mean?

> returns. I would argue that as long as my call to LOCK_MSG hasn't returned,
> I haven't made the call so there can be no LOCK_RES referring to the call
> I didn't yet issue. Moreover, I had to thread the client because of this.

do you mean that the current behaviour like the following:

		LOCK_MSG request ->
			<- LOCK_RES request
		LOCK_RES reply ->
			<- LOCK_MSG reply

should be:

		LOCK_MSG request ->
			<- LOCK_MSG reply
			<- LOCK_RES request
		LOCK_RES reply ->

i tend to agree.  but it's better to deal with such servers anyway.
with our rpc library interface, i think it requires some kind of
threading.  (well, instability of our pthread and thread-safeness of
our libraries might be problems, tho...)

> The NLM server always seems to fhopen() RDWR, meaning one can't lock
> files root can't write to.

a good point.  i'm not sure how often it can be a problem actually, tho.

> Moreover, the server seems to be only able to handle one single lock per
> file, probably because keeping track of different processes locking
> different parts of the file isn't much fun. I'm not sure I want to
> rewrite this.

yes, it's a big problem.

> Maybe it would be easier if the kernel exposed an interface
> at the lf_xxx() level including some sort of callback if a lock becomes
> available?

it sounds reasonable.
to implement nlm server properly, we need the ability to specify
a remote lock owner.
(an alternative would be moving lockd into kernel. :)

> There is a typo in rpc(3) reading "rpc_reg structure" instead of
> "rpc_req" (that's in bold so / doesn't find it).

fixed.  (it was svc_reg/req, not rpc_.)

> In rpc/rpc.h, why does clnt_call() cast the pointer to char * and
> clnt_freeargs() doesn't?

do you mean clnt_freeres?  i'm not sure.
i always feel these prototypes should have been void *.