(Semi-random) thoughts on device tree structure and devfs

To: tech-kern%NetBSD.org@localhost
Subject: (Semi-random) thoughts on device tree structure and devfs
From: Masao Uebayashi <uebayasi%tombi.co.jp@localhost>
Date: Sun, 7 Mar 2010 18:43:49 +0900

I've been spending LOTS of time to investigate various devicess sources, to
understand some questions I've had, like:

- Why NetBSD/arm has no bus_space_mmap(4)?
- Why tty locking is messy?
- Why sys/dev/wscons has so many #ifdef's?  (Modular unfriendly!)
- How dk(4) is enumerated?
:

After absorbed myself 3 days now, I think I've figured out almost all of
problems I've had and how I can fix these.  Before going directly to the
answer, let me summarize problems I've found:

*

a) Device enumeration is unstable / unpredictable

dk(4) is a pseudo device, and its instances are numbered in the order it's
created.  This is fine when you manually / explicitly add wedges(4) by using
"dkctl addwedge".  This is not fine, if I have a gpt(4) disk label which has
ordered partitions.  I expect disks to be created in the order I write in
the gpt(4) disk label.  It's annoying the numbering changes when I add a new
disk.  Same for raidframe(4).


b) Consistent device topology management is missing

The reason why NetBSD/arm has no bus_space_mmap(9) has turned out to be the
fact that we have no consistent (MI) way to manage physical address space of
devices.  NetBSD/mips has a working bus_space_mmap(9) in
sys/arch/mips/mips/bus_space_alignstride_chipdep.c.  It defines address
windows and manage it by itself.

Who wants to reimplement it on all cpus/ports/platforms?  Considering physical
address space is a pretty much simple concept - a single linear address space.
And we already manage (kind of) tree of devices in autoconf(9).  Do we want
to manage such a topology in many places?  No.


c) Control / data flow is unclear

I've never remembered what wscons command/device to configure wscons to add
screen, load font, change encoding.  It's a total mess.  I don't know how
the ioctl I send via wscons command is delivered to device.  Same for data.
Even by looking at sys/dev/wscons.  Why it it so complicated?

Our tty locking code has so many hacks.  See grep XXX sys/kern/tty*.  And we
have to fix all serial devices.  How should serial devices deal with tty
lock?  How ioctl works?  How its callback is called and when?  How to avoid
deadlock?  This is almost hopeless.  Same for network devices's ioctl handling.


d) Abstraction of combined/aggregated device is inconsistent

We have some *special* devices that combine/aggregate multiple devices and
make it look like a single device.  For example wsmux(4), ccd(4), raidframe(4),
lvm(4), bridge(4), agr(4), ...  Now these do almost random way to manage its
components, and its behaviour is hard to guess.  You have to learn how to
add/delete components to some combining device, its limitation, etc.

The enumeration of these is also hard to predict.


e) Random way of abstraction

We have many non-real devices used to abstract real devices.  For example
audio, tty, wsdisplay, network interfaces, wedges, scsipi, com and friends,
usb, pseudo devices, ...  We have to learn how to use them and their behavior
respectively.

Developers have to decide how your device is represented to user.  If you
write a serial device, you have to implement all the syscall nobs, buffer
management, tty interaction.  You'll surely end up having a big modified copy
of com.c, which is almost impossible to maintain.

*

I want to fix all of these.  Goals:

- Intuitivity

Behavior should be simple enough for users to guess without looking into code.

- Predictability / stability

Device numbers don't change surprisingly.  When you plug device A and B in
slot 1 and 2, they should be shown in that order.  When you add disk B @ slot
2, the number of disk A @ slot 1 must not change.

- Simplicity, clarity, consistency

Common code is concentrated in single place.  Each driver implements only its
hardware accessors.  No scattered ioctl handlings.

*

A possible solution I'm thinking of is:

1) Introduce devfs

2) Natural device numbering

3) "Functional" device instances abstraction

4) "Real" and "pseudo" device trees

*

1) Introducing devfs

devfs is a pseudo filesystem which shows device topology in a mount point.
There's (unfinished) branch mjf-devfs.  devfs helps to identify devices
uniquely.  wd0 on my DELL OptiPlex 745 looks like:

        /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd0


2) Natural device numbering

Device number in devfs is enumerated locally in the attachment.  Numbers are
*naturally* assigned; should match physical bus/slot numbers so that users
can make sure which is which.  This is important especially for block devices.
Think when you plug a USB floppy and newfs it.

I *believe* *all* *real* devices can be represented by this scheme.


3) "Functional" device instances abstraction

This is thw way audio(4) and video(4) are doing.  These are non-real devices,
but "functional" in that it provides a predefined function.  Topology like:

        /dev/mainbus0/.../pci0/azalia0/audio0

is intuitive by viewing this as

        - azalia(4) implements a function audio(4)
        - audio(4) is an "abstracted" function represented to users

This also helps users to understand how its internal works.  Users basically
accesses the device via audio(4) interface.  audio(4) is responsible to
interact with real hardware drivers.  Both control and data are transfered
via audio(4).  It's also easily guessed if user forcibly control real device
(azalia(4)), audio(4) would be *surprised*, and some inconsistency will be
expected to happen.

wscons(4)/tty(4) could be abstracted like:

        /dev/mainbus0/.../pci0/vga0/display0/screen0/vt100emul0/termios0/tty0

This might look redundant, but each device instance's *responsibility* is
very clear.  tty(4) is *the* device you interact when you use it as a tty.
Pretty much straightforward.  When you send a tty ioctl, it goes to tty(4),
which may be delivered to upper layers.  To add a new screen, it's obvious
that the device we should ask to is display(4).  We can guess how control/data
is delivered.  We can also guess forcibly deleting a screen causes its child
devices problems, because topology is visible.

wscons(4) without emulation would look like:

        /dev/mainbus0/.../pci0/vga0/display0/screen1

We don't need a detailed manual page how screen0 / screen1 are interfaced,
because it's obvious.

Other possible examples:

        /dev/mainbus0/.../wd0/disk0/diskpart0
        /dev/mainbus0/.../fxp0/ether0/net0
        /dev/mainbus0/.../com0/serial0/termios0/tty0


4) "Real" and "pseudo" device trees

Real devices stem from the mainbus0, and one of the real bus root there, like
/mainbus0/pci0 or /mainbus0/obio0 or /mainbus0/acpi0.  Non-real, "functional"
devices describe above stem from one of leaf "real" devices, like vga0 or
azalia0.  These can co-exist in one tree because functional devices don't
break tree.

Pseudo devices don't have parent, because its creation is arbitrary.  It's
created when you want.  The device number doesn't make sense much.  You don't
usually need to bother what pty(4) device you're using.

There're cases where devcie numbers of pseudo devices matter.  Disks and
aggregated devices.  You don't want raidframe(4)'s partitions to change after
reboot.  Same for bridge/tap configuration used for Xen after you added a
new NIC onto your machine.

This should be addressed by representing pseudo deivce tree, like:

        /dev/pseudobus0/raid0/disk0/diskpart0
                             /component0
                             /component1
                             :

Where component devices are symlinks to the real instances

        /dev/pseudobus0/raid0/component0
        -> /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0

and the reverse

        /dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0/raid0
        -> /dev/pseudobus0/raid0

So that you can uniquely ideintify the pseudo device by looking up from the
real device:

        
/dev/mainbus/pci0/ppb0/piixide0/atabus0/wd1/disk0/diskpart0/raid0/disk0/diskpart0


Other exapmples:

        /dev/pseudobus0/bridged0/net0
        /dev/pseudobus0/ppp0/pppldisc0/tty0


*

The last one might need more thoughts, but the point is most things can be
represented with 2 trees.  I don't think we need netgraph(4) if we once get
device including network interfaces topology visible and make their hooks
*a little* more flexible.  Tree is the best structure, because everyone is
familar with it.

Masao

P.S.  I've read 0 line of devfs code yet.

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635

Follow-Ups:
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: David Young
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: David Holland
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: Quentin Garnier
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: Jukka Ruohonen
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: Masao Uebayashi
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: Joerg Sonnenberger
- Re: (Semi-random) thoughts on device tree structure and devfs
  - From: Christoph Egger

Prev by Date: Re: removing aiboost(4) as redundant
Next by Date: Re: (Semi-random) thoughts on device tree structure and devfs
Previous by Thread: removing aiboost(4) as redundant
Next by Thread: Re: (Semi-random) thoughts on device tree structure and devfs
Indexes:

Home | Main Index | Thread Index | Old Index