Subject: Re: Call for review: The New Block/Character Device Switch
To: MAEKAWA Masahide <maekawa@math.sci.kobe-u.ac.jp>
From: Matt Thomas <matt@3am-software.com>
List: tech-kern
Date: 06/27/2001 22:30:47
At 01:25 PM 6/28/2001 +0900, MAEKAWA Masahide wrote:
>Here is a proposal for a framework to support the auto-generation of
>bdevsw/cdevsw table by config(8) and the dynamic assignment of major number.
>
>
>
>        The New Block/Character Device Switch Configuration Framework
>
>
>1.      Background
>
>We don't have the complete dynamic configuration framework yet.
>In order to support it, we must replace many places in the kernel
>from 'not-configurable' to 'configurable' in runtime.
>
>As a first step, try to make bdevsw/cdevsw tables 'configurable'.

That is goodness.

>2.      Current Implementation
>
>We have bdevsw/cdevsw tables in sys/arch/<arch>/<arch>/conf.c
>as an array defined statically and maintain by our own hands.
>Each entries are filled out by interfaces (i.e. open, close, etc...).
>Whether they are active or not is determined in compile time by NXXX.
>And many macros are defined for convenience in sys/conf.h
>and machine/conf.h and many functions which just return an error are
>in many device drivers.
>
>- The bdevsw/cdevsw tables MUST be configurable dynamically to implement
>   the dynamic configuration framework.

True

>- Are interfaces to each devices necessary to be global? What do you think
>   that exists kludge hacks to avoid namespace conflicts?

The only globals a driver should need is the cfattach and devsw.

>- What do you think that many macros for each devices are defined in 
>sys/conf.h
>   or machine/conf.h? In addition, in order to use that macros compels us to
>   define many new functions same as nullop/enodev and kludge aliases.

Shouldn't be needed; but they do make prototypes easier.

>- In order to maintain sys/arch/<arch>/<arch>/conf.c by hands
>   to use that kludge hacks/macros is a Bad Thing, right?

Yes.

>- The current implementation have a possibility to conflict major number
>   between 3rd party drivers and in-kernel drivers.

Yes.

>3.      Summary
>
>3.1.    Ideas
>
>The bdevsw/cdevsw tables are dealed with in the abstract to support
>to attach/detach the bdevsw/cdevsw data dynamically. Each drivers should
>use devsw_get(9) and devsw_lookup(9) to get the device interface instead of
>accessing them directly.

decsw_get I like.  Not sure about devsw_lookup.  It seems hackish.

>The initial bdevsw/cdevsw tables are generated automatically by config(8)
>Each bdevsw/cdevsw entry is a set of device interface functions and
>is put as a structure data in each drivers source.
>The structure data only refers to bdevsw/cdevsw interfaces. Now each
>interfaces are not global functions but local functions in their source.
>I.e. all interfaces are declared as a static function. If necessary to access
>the interface, we must access it via its structure data. Again, we must not
>access the interfaces directly now.
>
>In order to support this feature, add the new grammer 'device-switch'
>to 'files'. In my current implementation, all 'device-switch' is in
>the new file 'devsw.<arch>' under sys/arch/<arch>/conf.

Sigh.  Another file.

>devsw_attach(9) and devsw_detach(9) is used to attach/detach the
>bdevsw/cdevsw data which is a set of device interface dynamically.

That's good.

>The framework of dynamic assignment of major number is very simple.
>         - Prepare two ranges (i.e. four variables) in the kernel.
>           One is the range from low of block major number to high.
>           Another is the range from low of character major number to high.
>         - if need to assign the number dynamically, search an empty point
>           from low to high.
>         - return the major number of that empty point on success
>           or -1 as an error.
>These ranges should be configurable via something like sysctl(8) and
>define the initial minimum and maximum of each variables.

And are placed in ioconf.c?  or someplace else?

>These provide greater flexibility and remove many macros and functions
>which just return an error.
>
>3.2.    Examples
>
>3.2.1.  Kernel
>
>- sys/uvm/uvm_device.c
>
>         struct uvm_object *
>         udv_attach(arg, accessprot, off, size)
>                 ...
>         {
>                 ...
>
>                 mapfn = cdevsw[major(device)].d_mmap;
>                 if (mapfn == NULL ||
>                     mapfn == (paddr_t (*) __P((dev_t, off_t, int))) enodev ||
>                     mapfn == (paddr_t (*) __P((dev_t, off_t, int))) nullop)
>                         return (NULL);
>
>                 ...
>         }
>
>                                 ||
>                                 \/
>
>         struct uvm_object *
>         udv_attach(arg, accessprot, off, size)
>                 ...
>         {
>                 struct cdevsw *cdev;

This should be const.

>                 ...
>
>                 cdev = devsw_get(major(device), DEVCHR);
>                 if (cdev == NULL)
>                         return (NULL);
>                 mapfn = cdev->d_mmap;
>                 if (mapfn == NULL || mapfn == nommap || mapfn == nullmmap)
>                         return (NULL);
>
>                 ...
>         }
>
>         (nommap and nullmmap are aliases and are defined in sys/conf.h.)
>
>- sys/miscfs/fdesc/fdesc_vnops.c
>
>         void
>         fdesc_init()
>         {
>                 int cttymajor;
>
>                 ...
>
>                 /* locate the major number */
>                 for (cttymajor = 0; cttymajor < nchrdev; cttymajor++)
>                         if (cdevsw[cttymajor].d_open == cttyopen)
>                                 break;
>                 devctty = makedev(cttymajor, 0);
>
>                 ...
>         }
>
>                                 ||
>                                 \/
>
>         struct cdevsw *devswctty;
>
>         void
>         fdesc_init()
>         {
>                 ...
>
>                 devswctty = devsw_lookup("ctty", DEVCHR);
>                 if (devswctty == NULL)
>                         panic("not found cdevsw of ctty");
>                 devctty = makedev(devswctty->d_major, 0);
>
>                 ...
>         }

This seems wrong with me.  But I don't have a better alternative
yet.,

>3.2.2.  Userland - config(8)
>
>The 'fd' driver have both bdevsw/cdevsw interfaces.
>If 'fd' is defined in your kernel configuration file,
>config(8) generates below:
>
>- ioconf.c
>         extern bdevsw fd_bdevsw;
>         extern cdevsw fd_cdevsw;

Should be const.

>         struct bdevsw *bdevsw0[] = {

ditto

>                 ...
>                 &fd_bdevsw,
>                 ...
>         };
>
>         struct cdevsw *cdevsw0[] = {

ditto.

>                 ...
>                 &fd_cdevsw,
>                 ...
>         };
>
>         struct bdevsw **bdevsw = bdevsw0;
>         struct cdevsw **cdevsw = cdevsw0;
>
>If not, each entries are filled out by NULL.
>
>(The current 'fd' driver don't have fd_bdevsw and fd_cdevsw,
>  must be defined.)

Somehow I think this needs to connected to struct cfdriver but
the don't exist for pseudo-devices (maybe they should).

>4.      Synopsis
>
>4.1.    Kernel
>
>4.1.1.  Device Switch Types
>
>enum devswtype { DEVBLK, DEVCHR };
>
>DEVBLK          - Block device switch
>DEVCHR          - Character device switch
>
>4.1.2.  New entries of bdevsw/cdevsw
>
>d_major         - An alias to the major number to avoid linear searching 
>tables
>                   (should be -1 in the source. In initialization phase or
>                    dynamically attach phase, filled out by proper value.)

Does really need to be in the devsw?  Considering how little it will
be used, it could be calculated and would allow the devsw to read-only.

>d_name          - An alias of bdevsw/cdevsw to search by name
>                   (should be same as a devbase and be null-teminated.)
>
>4.1.3.  Variables
>
>bdevsw_dynlow   - Low of block major number
>bdevsw_dynhigh  - High of block major number
>cdevsw_dynlow   - Low of character major number
>cdevsw_dynhigh  - High of character major number
>
>4.2.    Userland - config(8)
>
>4.2.1.  Grammer
>
>device-switch <name> chr <num> [blk <num>] [<options>]
>
>name            - The prefix of bdevsw/cdevsw entry (required)
>chr             - A character major number (required)
>blk             - A block major number (optional)
>options         - Conditions to determine whether should be attached or not
>                   (optional)

options are like pty or fd or scsibus?
If you have multi device-switch for the same name, will the latter
take priority?

>4.2.2.  Structures and Variables
>
>struct devswi {
>         struct devswi   *di_next;       /* linked list */
>         const char      *di_srcfile;    /* the name of the "devsw" file */
>         u_short         di_srcline;     /* the line number */
>         const char      *di_name;       /* [bc]devsw name */
>         int             di_cmajor;      /* character major */
>         int             di_bmajor;      /* block major */
>         struct nvlist   *di_opts;       /* options */
>};

Inside config?  ok.

>struct devswi *alldevswi; /* list of all device switch */
>struct devswi **nextdevswi; /* to construct a linked list */
>
>struct hashtab *cdevswitab; /* character devswi lookup */
>struct hashtab *bdevswitab; /* block devswi lookup */
>
>int maxcdevswi; /* max number of character major */
>int maxbdevswi; /* max number of block major */
>
>4.3.    Functions
>
>4.3.1.  Kernel
>
>         void devsw_init(void);
>         void *devsw_get(int maj, enum devswtype type);
>         void *devsw_lookup(const char *name, enum devswtype type);
>         dev_t devsw_chr2blk_dev(dev_t chrdev);
>         dev_t devsw_blk2chr_dev(dev_t blkdev);
>
>         int devsw_attach(void *devsw, int maj, enum devswtype type);
>         void devsw_detach(void *devsw, enum devswtype type);
>
>4.3.2.  Userland - config(8)
>
>         int adddevsw(const char *name, int cmaj, int bmaj, struct nvlist 
> *opts);
>         int emitdevsw(FILE *fp);
>         int fixdevsw(void);
>
>
>5.      Description
>
>5.1.    New Functionality
>
>5.1.1.  Kernel
>
>void devsw_init(void);
>
>Initialize major number (d_major). In order to use major number,
>this function should be called in main(). All functions related with
>device switch must be called after calling this.
>
>
>void *devsw_get(int maj, enum devswtype type);
>
>Get a device switch associated with the major number 'maj' and the device
>switch type 'type'. The 'type' is used to identify what the device switch is.
>Return the device switch on success. Otherwise, return NULL.
>
>
>void *devsw_lookup(const char *name, enum devswtype type);
>
>Get a device switch associated with the name 'name' and the device switch type
>'type'. The 'type' is used to identify what the device switch is.
>Return the device switch on success. Otherwise, return NULL.
>
>
>dev_t devsw_chr2blk_dev(dev_t chrdev);
>
>Convert from character dev_t to block dev_t.
>Return the valid dev_t (!= NODEV) on success. Otherwise return NODEV.
>
>
>dev_t devsw_blk2chr_dev(dev_t blkdev);
>
>Convert from block dev_t to character dev_t.
>Return the valid dev_t (!= NODEV) on success. Otherwise return NODEV.
>
>
>int devsw_attach(void *devsw, int maj, enum devswtype type);
>
>Attach a device switch 'devsw' associated with the major number 'maj'
>and the device switch type 'type'. If 'maj' is -1, allocate a major number
>dynamically and fill out the d_major field. Return 0 on success or
>an error value.
>
>
>void devsw_detach(void *devsw, enum devswtype type);
>
>Detach a device switch 'devsw' associated with the device switch type 'type'.
>Return 0 on success or an error value.
>
>5.1.2.  Userland - config(8)
>
>int adddevsw(const char *name, int cmaj, int bmaj, struct nvlist *opts);
>
>Make a node which is associated with the name 'name' and character major
>number 'cmaj' and block major number 'bmaj' and the options 'opts'.
>The options are used to determine whether this device switch should be
>attached or not in fixdevsw().
>
>
>int emitdevsw(FILE *fp);
>
>Generate initial bdevsw/cdevsw tables, nbdevsw, ncdevsw, swapdev and memdev.
>
>nbdevsw - # of bdevsw (i.e. initial bdevsw table size)
>ncdevsw - # of cdevsw (i.e. initial cdevsw table size)
>swapdev - a fake swap device
>memdev  - a fake memory device
>
>
>int fixdevsw(void);
>
>Determine which device switch should be attached.
>
>
>6.      Implementation
>
>6.1.    Patch Kit
>
>All features have already been implemented.
>The current patch kit is available at:
>
>http://www.kbug.gr.jp/~maekawa/patches/cdevsw-20010626a.diff.gz
>
>SHA1 (cdevsw-20010626a.diff.gz) = 6e1065040a783d9155e29d416ab1d55521ef1e9c
>
>This patch kit is based on the source "2001/06/21 00:00:00 JST".
>Now trying to catch up with "2001/06/26 00:00:00 JST".
>
>6.2.    Supported Ports
>
>All Ports!
>
>(algor, alpha, amiga, amigappc, arc, arm26, arm32, atari, bebox, cats, cesfic,
>cobalt, dnard, dreamcast, evbsh3, hp300, hpcarm, hpcmips, hpcsh, i386,
>luna68k, mac68k, macppc, mipsco, mmeye, mvme68k, netwinder, news68k, newsmips,
>next68k, ofppc, pc532, pmax, prep, sandpoint, sgimips, sparc, sparc64, sun2,
>sun3, vax, walnut, x68k and x86_64)
>
>alpha, amigappc, cats, dnard, netwinder, pc532, sparc64, sun2, vax, walnut and
>x86_64 are not compile tested. Maybe these have some typos and errors.
>
>i386 and hpcmips work fine.

This is real cool!
--
Matt Thomas               Internet:   matt@3am-software.com
3am Software Foundry      WWW URL:    http://www.3am-software.com/bio/matt/
Cupertino, CA             Disclaimer: I avow all knowledge of this message