Re: (Semi-random) thoughts on device tree structure and devfs

To: Masao Uebayashi <uebayasi%tombi.co.jp@localhost>
Subject: Re: (Semi-random) thoughts on device tree structure and devfs
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Mon, 8 Mar 2010 08:16:56 +0000
At the risk of being a wet blanket:

On Sun, Mar 07, 2010 at 06:43:49PM +0900, Masao Uebayashi wrote:
 > I've been spending LOTS of time to investigate various devicess sources, to
 > understand some questions I've had, like:
 > 
 > - Why NetBSD/arm has no bus_space_mmap(4)?

hellifIknow;

 > - Why tty locking is messy?

because ttys are messy, which is because they haven't had a big
rototill in some twenty years or more;

 > - Why sys/dev/wscons has so many #ifdef's?  (Modular unfriendly!)

dunno;

 > - How dk(4) is enumerated?

in the order found;

 > a) Device enumeration is unstable / unpredictable
 > 
 > dk(4) is a pseudo device, and its instances are numbered in the order it's
 > created.  This is fine when you manually / explicitly add wedges(4) by using
 > "dkctl addwedge".  This is not fine, if I have a gpt(4) disk label which has
 > ordered partitions.  I expect disks to be created in the order I write in
 > the gpt(4) disk label.  It's annoying the numbering changes when I add a new
 > disk.  Same for raidframe(4).

Why doesn't gpt(4) create the wedges in that order? If it did that
they'd come out numbered the way you'd expect.

Having the numbering change when you add a new disk is unavoidable.
See further notes below.

 > b) Consistent device topology management is missing
 > 
 > The reason why NetBSD/arm has no bus_space_mmap(9) has turned out to be the
 > fact that we have no consistent (MI) way to manage physical address space of
 > devices.  NetBSD/mips has a working bus_space_mmap(9) in
 > sys/arch/mips/mips/bus_space_alignstride_chipdep.c.  It defines address
 > windows and manage it by itself.
 > 
 > Who wants to reimplement it on all cpus/ports/platforms?
 > Considering physical address space is a pretty much simple concept
 > - a single linear address space.

Except when it isn't really; consider for example NUMA systems. I
think there have also been systems where different CPUs see a
different physical address space view. Whether any of these are
systems we care about, I dunno.

 > And we already manage (kind of) tree of devices in autoconf(9).  Do
 > we want to manage such a topology in many places?  No.

That said, it's still useful to have MI code for common constructs
even if it doesn't work with every platform.

 > c) Control / data flow is unclear
 > 
 > I've never remembered what wscons command/device to configure wscons to add
 > screen, load font, change encoding.  It's a total mess.  I don't know how
 > the ioctl I send via wscons command is delivered to device.  Same for data.
 > Even by looking at sys/dev/wscons.  Why it it so complicated?

Dunno. Is it? ioctl routing is gross (almost inevitably/inherently)
but this has never seemed particularly worse than anything else.

 > Our tty locking code has so many hacks.  See grep XXX sys/kern/tty*.  And we
 > have to fix all serial devices.  How should serial devices deal with tty
 > lock?  How ioctl works?  How its callback is called and when?  How to avoid
 > deadlock?  This is almost hopeless.

Yes, there's a reason "fix ttys" is fairly high on my cleanup list.

 > Same for network devices's ioctl handling.
 >
 > d) Abstraction of combined/aggregated device is inconsistent
 > 
 > We have some *special* devices that combine/aggregate multiple
 > devices and make it look like a single device.  For example
 > wsmux(4), ccd(4), raidframe(4), lvm(4), bridge(4), agr(4), ...  Now
 > these do almost random way to manage its components, and its
 > behaviour is hard to guess.  You have to learn how to add/delete
 > components to some combining device, its limitation, etc.

Unifying the access methods for these (or at least some of these)
seems like a good idea, yes.

 > The enumeration of these is also hard to predict.
 > 
 > 
 > e) Random way of abstraction
 > 
 > We have many non-real devices used to abstract real devices.  For
 > example audio, tty, wsdisplay, network interfaces, wedges, scsipi,
 > com and friends, usb, pseudo devices, ...  We have to learn how to
 > use them and their behavior respectively.

Yes, and?

 > Developers have to decide how your device is represented to user.
 > If you write a serial device, you have to implement all the syscall
 > nobs, buffer management, tty interaction.  You'll surely end up
 > having a big modified copy of com.c, which is almost impossible to
 > maintain.

See "ttys are a mess".

 > I want to fix all of these.

All at once?

If you have a single overarching framework in mind that's going to
address all the preceding stuff, I'd really suggest hacking it up as a
research project first. (That is, do it in a context where you aren't
committed to maintaining expensive production system guarantees, like
backwards compatibility or the ability to run perl, and where you can
remove or disable things that get in the way instead of having to stop
and deal with them.)

This will give you at least three things; first, a proof of concept
that your architecture is viable, which many people are not going to
buy otherwise; second, real experience with it and with the things it
has to interact with; and third, the opportunity to find and disown
the parts you got wrong *before* they get merged into production code
and thereby carved into stone.

The last of these is arguably the most important...

On the other hand, if you don't have one single overarching framework
in mind to solve all these problems at once, I'd suggest starting on a
subset of them.

 > - Intuitivity
 > 
 > Behavior should be simple enough for users to guess without looking
 > into code.

That's a fine principle, but a bit on the general side. Which users do
you mean? There's a wide range.

 > - Predictability / stability
 > 
 > Device numbers don't change surprisingly.  When you plug device A
 > and B in slot 1 and 2, they should be shown in that order.  When
 > you add disk B @ slot 2, the number of disk A @ slot 1 must not
 > change.

That doesn't work. Suppose I plug disk B into slot 2, then disk A into
slot 1, but tomorrow I plug disk A in first? Or, suppose when I reset
the machine the vagaries of power distribution or who knows what cause
either one disk or the other to spin up and come ready first on
different days? Also, sometimes you can't tell which is slot 1 and
which is slot 2...

Everyone who has attempted to solve this problem comprehensively has
either given up or ended up using device paths instead of numbering.
This is where the /dev/dsk/c0t0s0v0w0q0p0z0f0r0 nonsense in vendor
System V came from; it's where Solaris's device path system came from;
and so on.

The other viable comprehensive approach is to write a name onto each
object and forget about or ignore the numberings. (But this only works
with objects that have identity.)

Or we can stick to what we have, which is that using numbers mostly
works and if you care enough you can wire devices down in the kernel
config. It has long seemed to me that this is the best approach.

 > - Simplicity, clarity, consistency
 > 
 > Common code is concentrated in single place.  Each driver
 > implements only its hardware accessors.  No scattered ioctl
 > handlings.

This is pretty general again.

 > A possible solution I'm thinking of is:
 > 
 > 1) Introduce devfs
 > 
 > 2) Natural device numbering
 > 
 > 3) "Functional" device instances abstraction
 > 
 > 4) "Real" and "pseudo" device trees
 >
 > *
 > 
 > 1) Introducing devfs
 > 
 > devfs is a pseudo filesystem which shows device topology in a mount point.
 > There's (unfinished) branch mjf-devfs.  devfs helps to identify devices
 > uniquely.  wd0 on my DELL OptiPlex 745 looks like:
 > 
 >      /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd0

The topology information is already readily available, at least to the
kernel. (And to config.) What problem is this supposed to solve?

 > 2) Natural device numbering
 > 
 > Device number in devfs is enumerated locally in the attachment.
 > Numbers are *naturally* assigned; should match physical bus/slot
 > numbers so that users can make sure which is which.  This is
 > important especially for block devices.  Think when you plug a USB
 > floppy and newfs it.
 > 
 > I *believe* *all* *real* devices can be represented by this scheme.

What makes it work better than the scheme we already have? Or are you
proposing device paths and making the numbers independent in each
context?

In that case, how is the user supposed to look at "pci0/ppb0/piixide0"
and "pci0/ppb1/ppb1/piixide0" and tell which is which? It is no better
than "piixide0" and "piixide1", especially since I can read the
attachment topology straight out of dmesg.

 > 3) "Functional" device instances abstraction
 > 
 > This is thw way audio(4) and video(4) are doing.  These are non-real devices,
 > but "functional" in that it provides a predefined function.  Topology like:
 > 
 >      /dev/mainbus0/.../pci0/azalia0/audio0
 > 
 > is intuitive by viewing this as
 > 
 >      - azalia(4) implements a function audio(4)
 >      - audio(4) is an "abstracted" function represented to users
 > 
 > This also helps users to understand how its internal works.  Users basically
 > accesses the device via audio(4) interface.  audio(4) is responsible to
 > interact with real hardware drivers.  Both control and data are transfered
 > via audio(4).  It's also easily guessed if user forcibly control real device
 > (azalia(4)), audio(4) would be *surprised*, and some inconsistency will be
 > expected to happen.

Sure, and buses are this kind of abstraction too...

 > wscons(4)/tty(4) could be abstracted like:
 > 
 >      /dev/mainbus0/.../pci0/vga0/display0/screen0/vt100emul0/termios0/tty0
 > 
 > This might look redundant, but each device instance's
 > *responsibility* is very clear.  tty(4) is *the* device you
 > interact when you use it as a tty.  Pretty much straightforward.
 > When you send a tty ioctl, it goes to tty(4), which may be
 > delivered to upper layers.  To add a new screen, it's obvious that
 > the device we should ask to is display(4).  We can guess how
 > control/data is delivered.  We can also guess forcibly deleting a
 > screen causes its child devices problems, because topology is
 > visible.

This is not that much different from what already exists, except for
the "ttys are a mess" issue. You've split wscons into two or maybe
three chunks, and you've made the line discipline and tty structure
first-class devices.

I would argue though that the plumbing should be wscons -> tty ->
termios, not wscons -> termios -> tty. It's the termios code that the
userlevel processes interact directly with.

 > 4) "Real" and "pseudo" device trees
 > [...]
 > This should be addressed by representing pseudo deivce tree, like:
 > 
 >      /dev/pseudobus0/raid0/disk0/diskpart0
 >                           /component0
 >                           /component1
 >                           :
 > 
 > Where component devices are symlinks to the real instances
 > 
 >      /dev/pseudobus0/raid0/component0
 >      -> /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0
 > 
 > and the reverse
 > 
 >      /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0/raid0
 >      -> /dev/pseudobus0/raid0

This has the same numbering problem as wedges. In fact, it *is* the
same problem as wedges.

And it doesn't mean anything for pseudo-devices that aren't alternate
names for something else, like e.g. swcrypto.

 >      /dev/pseudobus0/ppp0/pppldisc0/tty0

that doesn't make any sense...

 > The last one might need more thoughts, but the point is most things can be
 > represented with 2 trees.  I don't think we need netgraph(4) if we once get
 > device including network interfaces topology visible and make their hooks
 > *a little* more flexible.  Tree is the best structure, because everyone is
 > familar with it.

I think an explicit graph (rather than trees with symlinks) is a
better approach in the long term.

-- 
David A. Holland
dholland%netbsd.org@localhost
Follow-Ups:
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: Masao Uebayashi
References:
- (Semi-random) thoughts on device tree structure and devfs
  - From: Masao Uebayashi
Prev by Date: Re: config(5) break down
Next by Date: Re: (Semi-random) thoughts on device tree structure and devfs
Previous by Thread: Re: (Semi-random) thoughts on device tree structure and devfs
Next by Thread: Re: (Semi-random) thoughts on device tree structure and devfs
Indexes:
Home | Main Index | Thread Index | Old Index