Subject: Re: devfs, was Re: ptyfs fully working now...
To: Christos Zoulas <>
From: Bill Studenmund <>
List: tech-kern
Date: 11/13/2004 00:10:56
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 12, 2004 at 09:08:24PM -0500, Christos Zoulas wrote:
> On Nov 12,  4:13pm, (Bill Studenmund) wrote:
> -- Subject: devfs, was Re: ptyfs fully working now...
> Here's what I've been thinking. At boot time, you pass the mount struct
> of devfs a filename which contains a list of commands to be applied to
> it before it gets mounted. These have plain unix syntax and they can be
> chown id:id name	# change ownership to a configured node
> chmod mode name		# change permissions to a configured node
> rm name			# whiteout a configured node
> ln -s name		# make a symlink to an configured node if it exists
> mkdir name		# create a directory
> mknod name [b|c M M] [p]# create a node, for sockets we just create them.

I like the idea of a file that contains info about modes and owners, and I
hadn't thought about whiteouts - good idea. However I think a better way
to do this is a binary database. I think the keys should be locators;
where the device is in the config hierarchy. For each entry, we keep most
of the info you have below - name, uid, gid, mode (or ACL), mtime, atime,
ctime (I don't think birthtime matters as it won't show up in stat), and

I am not sure about the idea of making directories nor symlinks. It may be=
a good one.. Same with pipes in /dev.

The thing I don't like is that you're using dev_t in what seems like a=20
canonical manner. My understanding of the whole idea of devfs, though, is=
that dev_t really is just a number that gets thrown around; the kernel=20
returns it in stat, and userland can use it for comparison. While major=20
and minor numbers still make sense, the whole thing I wanted was for them=
to not at all matter from boot to boot.

What I was thinking was that as we boot, devices register their nodes
during configuration. Drivers add default info (like owner, mode, and most
importantly default name & locator) while registering. Then, like you
said, we read a file on boot. However my thought is that we merge the two
databases, based on locator. That way devices that are here now and were
here before have the exact settings as last time. Nodes that were here
last boot but aren't now show up with a NULL device pointer. Nodes that
are new show up with default settings.

> This file gets loaded at mount time by the kernel into an internal hash t=
> that contains:
>         LIST_ENTRY(devfsnode) hash;     /* hash chain */
> 	struct vnode    *vnode;   	/* vnode associated with this entry
	struct device	*device;	/* our device, NULL if not=20
					 * configured */
> 	devfstype        type;     	/* type of devfs node */
> 	u_long          ptyfs_fileno;   /* unique file id */=20
> 	char		name[16];
> 	uid_t		uid;
> 	gid_t		gid;
> 	mode_t		mode;
> 	int		flags;		/* immutable etc */
> 	dev_t		dev;		/* if device, device info */
> 	/* the timestamps for the node */
> 	struct timespec	mtime;
> 	struct timespec	ctime;
> 	struct timespec	atime;
> 	struct timespec	birthtime;
> 	int		flag;		/* below */
> #define DEVFS_OVERRIDE_MODE	0x01
> #define DEVFS_OVERRIDE_UID	0x02
> #define DEVFS_OVERRIDE_GID	0x04
> #define DEVFS_WHITEOUT		0x08
> #define DEVFS_MKDIR		0x10
> #define DEVFS_SYMLINK		0x20	/* target to be looked up in a different
> 					 * table */
> #define	DEVFS_MKNOD		0x40=09
> #define	DEVFS_ACCESSED		0x1000	/* Node was accessed */
> #define	DEVFS_MODIFIED		0x2000	/* Node was written */
> #define	DEVFS_CHANGED		0x4000	/* Node was changed perm/ownership */
> #define	DEVFS_DIRTY		0x8000	/* Changes not reflected to the file */
> This is the same struct used internally for book-keeping. When an mkdir,
> chmod, chown, rm, ln -s operation is done on devfs, the change is reflect=
> on the internal memory table, and the DIRTY flag is set. Occasionally [on=
> a minute if flag is DIRTY, the file we loaded get written with the updated
> permissions. Or if it is DIRTY it is written on unmount. The file can live
> under the mount if we don't want it accessible. We also provide a simple
> character device that when we cat it, it provides a textual description of
> the current set of commands.

I do like your ideas about db updating; chown, chmod, mv, and rm should
update the db. And a tool to turn the db file into a text representation
may be good. But as before, the whole idea is to make device probe order=20
not matter; partition "HR files" always has the same permissions=20
regardless of if it's sd0 or sd19. If we use dev_t the way I think you=20
described, we're still sensitive to probe order.

One issue that at least my thought of how devfs would have is that=20
locators are really important, and may need maintaining. Like we may want=
to make device node locators be tied to device ID, like a SCSI disk's WWN.=
So the partition "HR files" on the disk with WWN FOO could be=20
distinguished from a partition "HR files" on a zip drive someone hooked up=
to the computer. My ideas here are still rough, and would need work with=20
how we handle wedges. But the main thought is to make it so that somehow=20
hooking up a disk with a partition with a duplicate name of another=20
partition won't cause the permissions of one to slip over to the other (I=
understand that Jason's thoughts on wedges would permit only one of the=20
two identically-named partitions to be accessible at the same time; this=20
idea is to make sure we can keep track of both of their permissions and=20
permit only the right one to be active at once).

Also, we would probably want a way to change the bind point for locators.=
For instance, when someone first updates to a devfs system, all their=20
locators will be config-based. Like "sd0a" or "cd1d"; i.e. the devfs node=
really is tied to whatever shows up in that probe position. We will want a=
way to say tie a SCSI disk to a WWN. I'm sure there are other bindings=20
that make sense, and we will want them where appropriate.

The one issue I haven't thought through fully is what happens when you
have device nodes with the same name that refer to distinctly different
devices. Like you had a wedge "sd0a" and bound it to a given WWN. Now the
disk with that WWN has attached as sd3, and a different disk is at sd0. =20
I'm not sure how to handle the confusion in that case; maybe the thing to
do is have the current "sd0" get turned into "sdX" and "sd3" get turned
into "sd0". I'm not sure.

While I talk a fair bit about wedges above, these thoughts apply to all=20
device nodes. It's just that wedges and disks are the things that move=20
around a lot yet we really realy want permissions to not change. Things=20
like serial ports don't move around much.


Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)