Subject: Re: devfs, was Re: ptyfs fully working now...
To: Bill Studenmund <wrstuden@netbsd.org>
From: Christos Zoulas <christos@zoulas.com>
List: tech-kern
Date: 11/13/2004 03:30:43
On Nov 13, 12:10am, wrstuden@netbsd.org (Bill Studenmund) wrote:
-- Subject: Re: devfs, was Re: ptyfs fully working now...

| On Fri, Nov 12, 2004 at 09:08:24PM -0500, Christos Zoulas wrote:
| > On Nov 12,  4:13pm, wrstuden@netbsd.org (Bill Studenmund) wrote:
| > -- Subject: devfs, was Re: ptyfs fully working now...
| >
| > Here's what I've been thinking. At boot time, you pass the mount struct
| > of devfs a filename which contains a list of commands to be applied to
| > it before it gets mounted. These have plain unix syntax and they can be
| >
| > chown id:id name	# change ownership to a configured node
| > chmod mode name		# change permissions to a configured node
| > rm name			# whiteout a configured node
| > ln -s name		# make a symlink to an configured node if it exists
| > mkdir name		# create a directory
| > mknod name [b|c M M] [p]# create a node, for sockets we just create them.
| 
| I like the idea of a file that contains info about modes and owners, and I
| hadn't thought about whiteouts - good idea. However I think a better way
| to do this is a binary database. I think the keys should be locators;
| where the device is in the config hierarchy. For each entry, we keep most
| of the info you have below - name, uid, gid, mode (or ACL), mtime, atime,
| ctime (I don't think birthtime matters as it won't show up in stat), and
| type.

Fine, I agree that the device mapping should be using locators. I think
that birthtime should go in; it would have been nice for stat to be able
to access it, but that is not the case yet...

| I am not sure about the idea of making directories nor symlinks. It may be
| 
| a good one.. Same with pipes in /dev.
| 
| The thing I don't like is that you're using dev_t in what seems like a
| canonical manner. My understanding of the whole idea of devfs, though, is

I just wanted to stash dev_t somewhere to returning it to userland for
stat. It is not meant to be used for anything else.

| that dev_t really is just a number that gets thrown around; the kernel
| returns it in stat, and userland can use it for comparison. While major
| and minor numbers still make sense, the whole thing I wanted was for them
| 
| to not at all matter from boot to boot.
| 
| What I was thinking was that as we boot, devices register their nodes
| during configuration. Drivers add default info (like owner, mode, and most
| importantly default name & locator) while registering. Then, like you
| said, we read a file on boot. However my thought is that we merge the two
| databases, based on locator. That way devices that are here now and were
| here before have the exact settings as last time. Nodes that were here
| last boot but aren't now show up with a NULL device pointer. Nodes that
| are new show up with default settings.

It does not have to be at boot, but at mount time. Unless we want to mount
devfs after kernel autoconfiguration which I think is a bit radical. I prefer
to have it mounted by userland. Then people who don't like devfs don't need
to use it, and regular devices can still be used in the transition period.

| > This file gets loaded at mount time by the kernel into an internal hash t
| able
| > that contains:
| >         LIST_ENTRY(devfsnode) hash;     /* hash chain */
| > 	struct vnode    *vnode;   	/* vnode associated with this entry
| 	struct device	*device;	/* our device, NULL if not
| 					 * configured */
| > 	devfstype        type;     	/* type of devfs node */
| > 	u_long          ptyfs_fileno;   /* unique file id */
| > 	char		name[16];
| > 	uid_t		uid;
| > 	gid_t		gid;
| > 	mode_t		mode;
| > 	int		flags;		/* immutable etc */
| > 	dev_t		dev;		/* if device, device info */
| > 	/* the timestamps for the node */
| > 	struct timespec	mtime;
| > 	struct timespec	ctime;
| > 	struct timespec	atime;
| > 	struct timespec	birthtime;
| >
| > 	int		flag;		/* below */
| > #define DEVFS_OVERRIDE_MODE		0x01
| > #define DEVFS_OVERRIDE_UID		0x02
| > #define DEVFS_OVERRIDE_GID		0x04
| > #define DEVFS_OVERRIDE_FLAGS	0x08
| > #define DEVFS_WHITEOUT		0x10
| > #define DEVFS_MKDIR			0x20
| > #define DEVFS_SYMLINK		0x40	/* target to be looked up in a different
| > 					 * table */
| > #define	DEVFS_MKNOD		0x80
| >
| > #define	DEVFS_ACCESSED		0x1000	/* Node was accessed */
| > #define	DEVFS_MODIFIED		0x2000	/* Node was written */
| > #define	DEVFS_CHANGED		0x4000	/* Node was changed perm/ownership */
| > #define	DEVFS_DIRTY		0x8000	/* Changes not reflected to the file */
| >
| > This is the same struct used internally for book-keeping. When an mkdir,
| > chmod, chown, rm, ln -s operation is done on devfs, the change is reflect
| ed
| > on the internal memory table, and the DIRTY flag is set. Occasionally [on
| ce
| > a minute if flag is DIRTY, the file we loaded get written with the updated
| > permissions. Or if it is DIRTY it is written on unmount. The file can live
| > under the mount if we don't want it accessible. We also provide a simple
| > character device that when we cat it, it provides a textual description of
| > the current set of commands.
| 
| I do like your ideas about db updating; chown, chmod, mv, and rm should
| update the db. And a tool to turn the db file into a text representation
| may be good. But as before, the whole idea is to make device probe order
| not matter; partition "HR files" always has the same permissions
| regardless of if it's sd0 or sd19. If we use dev_t the way I think you
| described, we're still sensitive to probe order.

No, I meant the hashtable to be keyed by name... But I guess locators is
more stable.

| One issue that at least my thought of how devfs would have is that
| locators are really important, and may need maintaining. Like we may want
| 
| to make device node locators be tied to device ID, like a SCSI disk's WWN.
| 
| So the partition "HR files" on the disk with WWN FOO could be
| distinguished from a partition "HR files" on a zip drive someone hooked up
| 
| to the computer. My ideas here are still rough, and would need work with
| how we handle wedges. But the main thought is to make it so that somehow
| hooking up a disk with a partition with a duplicate name of another
| partition won't cause the permissions of one to slip over to the other (I
| 
| understand that Jason's thoughts on wedges would permit only one of the
| two identically-named partitions to be accessible at the same time; this
| idea is to make sure we can keep track of both of their permissions and
| permit only the right one to be active at once).
| 
| Also, we would probably want a way to change the bind point for locators.
| 
| For instance, when someone first updates to a devfs system, all their
| locators will be config-based. Like "sd0a" or "cd1d"; i.e. the devfs node
| 
| really is tied to whatever shows up in that probe position. We will want a
| 
| way to say tie a SCSI disk to a WWN. I'm sure there are other bindings
| that make sense, and we will want them where appropriate.
| 
| The one issue I haven't thought through fully is what happens when you
| have device nodes with the same name that refer to distinctly different
| devices. Like you had a wedge "sd0a" and bound it to a given WWN. Now the
| disk with that WWN has attached as sd3, and a different disk is at sd0. 
| I'm not sure how to handle the confusion in that case; maybe the thing to
| do is have the current "sd0" get turned into "sdX" and "sd3" get turned
| into "sd0". I'm not sure.
| 
| While I talk a fair bit about wedges above, these thoughts apply to all
| device nodes. It's just that wedges and disks are the things that move
| around a lot yet we really realy want permissions to not change. Things
| like serial ports don't move around much.
 
I agree that wedges and disks need special consideration. I just have not
sat down and analyzed the requirements partition binding to device nodes yet.

christos