Subject: Re: representation of persistent device status, was Re: devfs, was Re: ptyfs...
To: Daniel Carosone <>
From: Daniel Carosone <>
List: tech-kern
Date: 11/20/2004 12:15:36
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 19, 2004 at 01:03:03PM -0800, Bill Studenmund wrote:
> On Fri, Nov 19, 2004 at 01:25:50PM +1100, Daniel Carosone wrote:
> > There was a suggestion recently in a thread about cloning devices that
> > illustrates this, re making specfs change the major, minor numbers in
> > the returned vnode.  The problem with that, at least for the ptmx
> > case, is that having the cloned nodes in that namespace at all
> > potentially exposes various aliasing-type problems where another
> > normal dev node is created with the same numbers elsewhere.
> Cloning doesn't do that, we have that aliasing issue now. :-|

Yes, and I want to get rid of it. Pty masters does it via ptmx,
preventing squatting on master nodes that can be opened before they

> > It's almost certainly my own lack of understanding, but I've also
> > never been entirely clear and confident that various kinds of locking
> > done in the vnode vs driver layer for some devices is robust against
> > such aliasing.
> Given that we have a single-threaded kernel, the locking's fine. You are
> right that when we move to a file-grained approach we will probably want
> to do something different. Though since the driver will need device=20
> locking at that point, we will still be ok; the device lock and not the=
> vnode lock is the key one.

I wasn't clear, I mostly meant locking between userland applications.
I used to have a really good example I tripped over years ago, but I
can't remember it now. To make up a poor example on the spot: N
cooperating database servers share access to a raw device file
containing a transaction log, and use vnode locking to keep their
writes from interfering with eachother.  A fourth instance, running in
chroot, has a /dev node with the same major/minor - but it's a
different vnode (I think).

> The problme I see with that is that we have a number of interfaces
> that need to do i/o that don't have an "open handle" They are all
> kernel-internal, but they are significant. Like how a disk file
> system talks to its backing device.

Darn, yes indeed.  Still, something does open ref counting for the
disk device, and that includes the open mount.

> (I assume you are refering to an open file descriptor?).

I was using deliberately generic terminology, because there are other
kinds of "objects" that we might have handles on.  The mount example
above is perhaps one such, though not what was in my mind at the time.

(As a tangent, one thing I find easier about reverse engineering apps
under Windows is there's a rich mostly-unified name space of handles
for all sorts of resources: ipc and mutex mechanisms, shared
libraries, events, network endpoints, files, reg keys, etc etc. You
can learn quite a bit about what a program is doing just watching the
handles it has open for these things - and because they all have
permissions ACLs, you can do far too much work locking them all down.)

> 1) Device access is the kernel's job, yet in this example userland is=20
> responsible for controling it. Yes, you've closed a race hole (and I don'=
> see any others yet), but userland still has to do it. And we have to trus=
> userland to do it. One of the things I loved about the flags system (a la=
> chflags) is that we can not trust root. Yet here we have to trust a root=
> daemon to set things up right (and most importantly not to permit more=20
> access than desired).

Very valid point, thankyou.
> I don't really see what the advantage of a userland daemon is. With it
> around, we end up with a table of data in the kernel, in the daemon, and
> in the file on disk. I don't really see what the daemon buys us. A kernel
> thread that has direct access to the kernel table will do whatever we want
> the daemon to do.

The point in general was to remove heavier mechanism from the kernel;
it's not all the same data in all places.  I find the idea of the
kernel writing (and parsing!) files directly slightly funny (in the
sense of uncomfortable).  Especially when its trying to do what what
we already have filesystems to do.  I realise that the difference
between it looking for this data in a file, or on a block device in
the same way ffs or swap does, is minimal, but it still just seems

One of the things I want a devfs to be, is as comfortable and sane for
we old unix traditionalists as it is for people with..  fresher
perspectives.  Because unless and until it satisfies everyone, and all
the various legitimate niggles and established practices as well as
new requirements, it won't be Right.

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.6 (NetBSD)