tech-kern: Re: devfs, was Re: ptyfs fully working now...

Subject: Re: devfs, was Re: ptyfs fully working now...
To: Eric Haszlakiewicz <erh@nimenees.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 11/20/2004 17:52:00
--EP0wieDxd4TSJjHq
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 19, 2004 at 08:18:18PM -0600, Eric Haszlakiewicz wrote:
> On Fri, Nov 19, 2004 at 11:51:55AM -0800, Bill Studenmund wrote:
> > On Thu, Nov 18, 2004 at 08:26:27PM -0600, Eric Haszlakiewicz wrote:
> > > 	I don't think it would be any messier than writing code to store
> > > the information in a file instead of a filesystem.  With the file
> >=20
> > I think it would be messier in two ways. 1) you have two file systems
> > involved. So you have up to two vnodes involved in everything. And you
> > have three classes of vnodes: devfs-only vnodes with no underlying node,
> > devfs device nodes with an underlying node, and devfs other-nodes with =
an
> > underlying vnode. While an overlay file system can do that, it's messy.=
 2)
>=20
> 	Well, it's a little simpler with the data file backed devfs, but you
> still have the data file vnode that you have to deal with.  And you still
> have the three classes of vnodes, except the modified devfs vnode is a
> vnode+extra data instead of vnode+vnode.

I don't think the data file vnode will be an issue. At least what I have=20
in mind is that it (well really they) will be the "mount point" for devfs.=
=20
There is no reason they will show up in /dev. We could add a convenience=20
device that will create a copy of the devfsdb (so "cat /dev/devfsdb >=20
backupfile" works right), but that can be managed as a device that=20
attaches itself.

> > You're still relying on "traditional" node names to identify the device=
. =20
> > One of the big advantages I see of a devfs is that we can now use locat=
ors
> > to identify devices. I mainly see it in terms of disks on a SAN, but us=
ing
> > locators can make a lot of stuff work quite nicely. Like you can have an
> > office/work group machine that different folks hook USB sticks into. We
> > remember device permissions and tie them to the S/N of the stick (I adm=
it
>=20
> 	hmm.. so what does a locator look like?  If it's something that isn't
> easy to represent as a single line of text I think I can see how having
> a central devfs persistance info file would be useful.

To an extent we haven't figured that all out. We have locators in the=20
kernel, and they'd serve as a basis. However I expect they can end up=20
looking like much more complicated strings. In order to simplify any=20
processing in the kernel, we may also have them be compiled queries, so=20
that the value the kernel'd have isn't necessarily human-readable.

The simplest ones (and the first ones) will be something like "CDEV3,3" =20
would be char dev major 3, minor 3 (rwd0d on i386). "BDEV0,3" would be
block dev major 0, minor 3 (wd0d on i386). Note, I'm using a very cryptic
and compact syntax. We can of course use something else.

As we start moving towards wedges, we will probaby want something like=20
"BDEV0,UNIT0" (still whole disk) or "BDEV0,UNIT0,MBR2" which'd be MBR=20
partition 2.

Another option would be "BDEV4,WWN=3D403DF32AB5" which would be a SCSI disk
identified by WWN (I totally made the WWN up, not sure if it's even
valid). This way we can identify a disk regardless of which target it=20
shows up as. We also can ID it regardless of which fabric attaches it (FC,=
=20
an alternate FC net, or iSCSI, etc.).

Exactly what we want in an identifier is still a bit up in the air, but we=
=20
know we want to add flex to our system to support them.

> > No, I got that point. I had that point, and I argued it for years. I no=
w=20
> > think it isn't a good one. I think it ends up being quite cumbersome.
>=20
> > Read my reference to "on-disk device node" to mean a node on the disk,=
=20
> > either a file or symlink, that contains info for a device node. I did n=
ot=20
> > specifically mean that it has to be a char or block dev node in the=20
> > underlying fs.
>=20
> 	I'm confused.  I read this as "it's bad to not have on disk device nodes=
".
> Which seems to conflict with the earlier impression I got that you'd
> prefer a single database file to hold persistent devfs info.
> Are you saying that there should be on-disk "device nodes" that control t=
he
> appearance of the actual device node or not?

Ok, I think we lost too much context. Let me try again.

I see three options that have been discussed:

1) Real, on-disk device nodes in /dev. This is what we have now, and what=
=20
we've been talking about moving away from.

2) There is a real, on-disk node (either a regular file or a symlink) that=
=20
acts as a placeholder for info for a device. An overlay devfs turns this=20
node into a block or character device for use by the OS. There is a 1:1=20
correspondence between a device in the system and an inode in /dev (a file=
=20
or symlink in the root file system). This is the "magic symlink" idea.

3) /dev is generated totally by the kernel. Whatever was under /dev in the
root file system doesn't matter, just as what is in "usr" in the root fs
when you mount /usr doesn't matter. The info to control what shows up in
/dev is stored in a db file. There is no inode in the root fs (no on-disk
node) that corresponds to a given device. This idea is the one I've been
championing.

Options 2 and 3 have variants where things other than device nodes can=20
show up in /dev. For option 2 it's just a matter of permitting other nodes=
=20
to show up. For option 3, it'd take a bit of unionfs integration (how the=
=20
directory listings of the upper and lower dirs are merged).

So to answer your question, no, I do not think it's a good idea to have=20
individual nodes in the file system under devfs control what does &=20
doesn't show up. I used to. I remember championing such a thing four or=20
five years ago. I however now do not like the idea.

Take care,

Bill

--EP0wieDxd4TSJjHq
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFBn/TAWz+3JHUci9cRAtjMAKCVmXQF3Q2t8aGEJUf58i7RvKFMtgCglDgt
XxHhTJ0hDnYYDJxAF1Z8OPg=
=ovgh
-----END PGP SIGNATURE-----

--EP0wieDxd4TSJjHq--