Subject: Re: devfs, was Re: ptyfs fully working now...
To: Bill Studenmund <wrstuden@netbsd.org>
From: Eric Haszlakiewicz <erh@nimenees.com>
List: tech-kern
Date: 11/18/2004 20:26:27
On Thu, Nov 18, 2004 at 10:29:38AM -0800, Bill Studenmund wrote:
> On Sat, Nov 13, 2004 at 09:48:23PM -0600, Eric Haszlakiewicz wrote:
> > On Fri, Nov 12, 2004 at 10:42:25PM -0500, der Mouse wrote:
> > 
> > 	I was thinking that a better way to do this would be a devfs as a
> > slightly twisted unionfs style layer. The changes in permissions (if any)
> > would be stored in the filesystem as permissions on a symlink in the underlying
> > filestore.  e.g.:
> > 	Filestore         device       Visible
> >     ----------        ------       -------
> > foo -> .../dev/audio  audio        foo as an actual device node
> > (nothing)             mixerctl     mixerctl as dev node
> > fd0a -> .../deleted   fd0a         nothing visible
> >   (or perhaps a whiteout)
> > bar -> blah           (anything)   normal symlink to blah
> > xxx -> .../dev/aaa    (no aaa)     normal symlink
> 
> Why is that better? That would be a very messy to code, and we have some 
> very messy naming issues. We seem to have two name spaces, that of nodes 
> on the disk and that of devices. Sometimes the namespace for devices shows 
> up, sometimes it doesn't.

	I don't think it would be any messier than writing code to store
the information in a file instead of a filesystem.  With the file
as the backing for persistent state you have two name spaces: that
described by the file and that described by the devices the kernel
has at a particular moment.  Regardless of what the backing store is
those two need to be merged.

> Also, what happens when we rename a node? That name in the "node" space is 
> now open, so we could more another device's node there. Now say there's a 
> symlink to that name. Does it point to the original node, or the 
> newly-named node (i.e. does it point to the name at the node or at the 
> device level)?
>
> I uesd to think some idea using on-disk device nodes would work, but I 
> don't anymore. Why shouldn't we just synthesize the whole thing from 
> scratch?
	I think you're missing my point.  There _aren't_ any on-disk device
nodes.  The only time a device node appears is when it is synthesized
by the devfs.  devfs uses the on-disk state to modify how the in-kernel
devices appear to userland.  Let's take the rename a device node example:
From the userland perspective:
	devfs is mounted on /dev
	backing store is the underlying /dev directory. (using ffs)
		(it could just as easily be some other directory)
	There exists a /dev/foo corresponding to kernel device "foo"
	There does not exist a /dev/bar
	There exists a /somewhere/else symlink pointing to /dev/foo
I execute this command:
	mv /dev/foo /dev/bar
The expected thing happens:
	There exists a /dev/bar
	There does not exist a /dev/foo
	The /somewhere/else symlink point to a nonexistant target. (/dev/foo)

From the kernel point of view what happened was:
(/dev here is the plain ffs /dev directory)
	a symlink /dev/bar was created pointing at .../foo
	a whiteout /dev/foo was created.
		(I decided a whiteout makes more sense than .../deleted)

another example:
	userland does:
		"chown joe /dev/xyz"
	kernel does:
		create symlink /dev/xyz pointing at .../xyz
		make the symlink owned by joe

When a userland process attempts to access something in /dev
the devfs lookup algorithm goes something like this:
	if the pathname does not exist
		interpret it as a reference to a kernel device
	else if it is a symlink that starts with ".../"
		interpret it as a reference to a kernel device with the name
		being the portion of the symlink target after the ".../"
		use the permissions of the symlink as the permissions of
		the synthesized device node.
	else if it's a whiteout
		it doesn't exist
	else if it's a file
		pass through to underlying fs.
	else if it's a on-disk device node
		ignore it. (could have a compat mode here)
	
"..." is the string that distinguishes things that devfs should interpret
from normal symlinks.  That implies that creating those types of symlinks
in a mounted devfs using symlink() should be prohibited.

However, if the devfs isn't mounted you can go and fiddle with
the underlying /dev directory to set things up once it is mounted, and
you can do so using chmod, ln, chown, etc....  Running ls in the 
unmounted /dev, while you won't see device nodes, will provide a fairly
intuitive view of how the devfs will appear when mounted.

eric