Subject: Re: vnode uniqueness question
To: None <tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 08/09/2002 15:32:24
> However for kernfs does it matter if each lookup actually references
> a 'clone' of the entity.

Yes, I think it does matter.

> This would be not dissimilar to having a directory full of lots of
> copies of (say) boottime, and each caller being given a separate one.

For boottime I think this would work, because (a) boottime is read-only
and (b) it's a single int, small enough that a read of it is atomic
even with respect to other processors in an SMP system.  But for
read/write entities, I think it can actually break for an SMP kernel.
Consider:

- Process A looks up /kern/foo and gets vnode V1.
- Process B looks up /kern/foo and gets vnode V2.
- Process A tries to write to /kern/foo.  This locks V1 and drops into
  the kernfs write code for foo.

Now, in a single-processor system, since the kernel is single-threaded
and none of the kernfs write operations can block, process A's write is
atomic.  But on a multiprocessor machine, process B could be on another
processor and try to write /kern/foo at the same time; since V1 != V2
and kernfs uses the genfs locking, which locks just the vnode, B's lock
(on V2) won't conflict with A's lock (on V1), so B will drop into the
same kernfs write routine as A.  If the write routine is not totally
atomic (for example, writing /kern/tickadj is probably atomic, as it's
just a store to an int, but writing /kern/hostname probably isn't), the
two writes could be interleaved unpleasantly.  (Yes, I know
/kern/tickadj is a mouseism.)

In practice, this is unlikely to be a problem.  /kern/hostname, the
only writable entry in stock kernfs, is not something one writes
routinely, and it is _very_ unlikely to be written simultaneously from
two different processors, with different strings.  But the principle
remains, especially as the simple miscfs/ filesystems are likely to be
used as samples by people writing more complex filesystems (like, oh,
to pick a totally random example, me :-).

Also, if kernfs ever acquires writable fields whose write operations
can sleep for any reason, this will matter even on uniprocessor
machines.

> OTOH kernfs could just sit on a vnode for each entry!

Yes, it could; that might even be the best way of addressing the issue
for kernfs: have the kernfs mount structure contain an array of vnode
pointers, either created at mount time or lazily created as needed....

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B