tech-kern: Re: devfs, was Re: FreeBSD 5/6/7 kernel emulator for NetBSD 2.x

Subject: Re: devfs, was Re: FreeBSD 5/6/7 kernel emulator for NetBSD 2.x
To: Bill Studenmund <wrstuden@NetBSD.org>
From: Matthew Orgass <darkstar@city-net.com>
List: tech-kern
Date: 10/28/2005 00:35:22
[apologies if this is sent twice, pine crashed while sending...]

On 2005-10-27 wrstuden@NetBSD.org wrote:
> Once we have devfs, why bother with traditional device nodes, other than
> to support NFS exporting /dev to older systems?

  I was thinking of very small, special purpose systems with a small
number of devices where putting them on disk and specially configuring the
numbers might be easier than mounting devfs.  Most importantly, it seems
to me that any design where providing the ability to configure numbered
devices is a significant burden is broken.  If eventually it really is not
used ever then it could go away (and could more quickly not be included
by default).

> >   My basic problem with your way of looking at devfs is that you put
> > significant responsibility for device organization into devfs itself
> > rather than exposing kernel organization.  IMO, user organization is
> > likely to have kernel meaning as well, so the focus should be how to
> > organize devices in kernel such that a "categorized probe order" has
> > meaning to userland and can be directly exported as cannonical device
> > reference.
>
> To be honest, I have the feeling you are discussing details that are finer
> than the hand-wavyness we currently are at. So it may be premature to say
> that we disagree. :-) I think we're at the point where we need a white
> board and a few beers to map stuff out.

  I think we still have some basic disagreements, but it sounds like you
may be coming around on the cannonical name issue :).  IMO, cannonical
names are both necessary and sufficient (any other desired forms of
reference should be simple aliases).

> How is "organize devices in kernel ..." different from wiring devices
> down? We can do that now, and it kinda works and kinda doesn't. A number
> of us don't like it (well see warts in it), and that's why we get ideas
> about devfs and other things. :-)

  IMO the main problem with the current system is the limitation of how
wiring and device organization is done, not with the basic "probe order
modified by configuration" system.  I don't think device organization
issues should be solved at a separate export level (devfs), but rather
that devfs should be a simple export of a somewhat more complex internal
organization.

  There should be a general way to categorize device instances such that
sequential numbering or naming is done only within the limited category
(such as wedges for a particular disk being numbered separately from other
disks) and a general way to reserve ranges of number for particular
purposes (such as "first MSDOS partition is at sd0e").  IMO, it is not
unreasonable to go to some trouble to try to allow "sequential" numbering
to make sense; having short cannonical names is a good thing and should
not be avoided just because it is possible in some cases to have complex
device organization.  I prefer to refer to a partition by position in most
cases, because it is short (a combination of position and type would be
ideal, IMO, sequentially numbering partitions of a type in separate
categories).

> >   It should then also be possible to import (at least some kinds of)
> > kernel configuration changes to a running kernel, to avoid the need to
> > reboot to wire down a device.
>
> As above, we may be not in disagreement. One idea I had was that if the
> userland config file said SCSI disk wwn 52004567BA64678D was sd3, devfs
> setup would make whatever disk had that wwn be sd3. I was even kicking
> around the idea that kernel log messages would say "sd3" in them. This
> idea is still vague, and for instance I haven't figured out what exactly
> to do if we already had an sd3. :-)

  I think the disagreement is what part of the kernel is doing the
configuration.  IMO, the wiring should determine cannonical device naming,
and would therefore need to be at the config level not devfs.  Having a
cannonical name makes printing it easy, but raises the additional issue of
what to do if a device is wired to multiple places.  IMO, there should be
a way to specify "primary" and "secondary" wiring and fail to attach only
if multiple "primary" wirings match.

> >   It seems to me that the basic step necessary for any type of device
> > existence aware devfs is to separate lookup from open and provide event
> > notification for use by the file systems (which would keep a struct device
> > reference instead of device number).  This would still be usable with
> > number to name mappings, but with the mapping done in a different place
> > (outside individual drivers).
>
> Maybe I know too much about how file systems work, but I don't understand,
> "separate lookup from open." They are separate now...

  I worded that badly; I mean that devices currently get handed a dev_t
for open (and other interaction), but should IMO be given a struct device
pointer to the particular device instance, which would need a separate
device lookup method.  Using struct devie references would also allow
renaming/renumbering while presering open access (and potentially
preserving the previous names also and/or using a new base name for the
renumbering).

> What events would file systems be interested in and how would they use
> them? My understanding is that the file systems won't care.

  Device attach/detach mostly, to maintain a struct device pointer and
potentially appear/disappear based on presence.


Matthew Orgass
darkstar@city-net.com