Subject: Call for review: The New Block/Character Device Switch
To: None <tech-kern@NetBSD.org>
From: MAEKAWA Masahide <maekawa@math.sci.kobe-u.ac.jp>
List: tech-kern
Date: 06/28/2001 13:25:36
Here is a proposal for a framework to support the auto-generation of
bdevsw/cdevsw table by config(8) and the dynamic assignment of major number.



       The New Block/Character Device Switch Configuration Framework


1.	Background

We don't have the complete dynamic configuration framework yet.
In order to support it, we must replace many places in the kernel
from 'not-configurable' to 'configurable' in runtime.

As a first step, try to make bdevsw/cdevsw tables 'configurable'.


2.	Current Implementation

We have bdevsw/cdevsw tables in sys/arch/<arch>/<arch>/conf.c
as an array defined statically and maintain by our own hands.
Each entries are filled out by interfaces (i.e. open, close, etc...).
Whether they are active or not is determined in compile time by NXXX.
And many macros are defined for convenience in sys/conf.h
and machine/conf.h and many functions which just return an error are
in many device drivers.

- The bdevsw/cdevsw tables MUST be configurable dynamically to implement
  the dynamic configuration framework.

- Are interfaces to each devices necessary to be global? What do you think
  that exists kludge hacks to avoid namespace conflicts?

- What do you think that many macros for each devices are defined in sys/conf.h
  or machine/conf.h? In addition, in order to use that macros compels us to
  define many new functions same as nullop/enodev and kludge aliases.

- In order to maintain sys/arch/<arch>/<arch>/conf.c by hands
  to use that kludge hacks/macros is a Bad Thing, right?

- The current implementation have a possibility to conflict major number
  between 3rd party drivers and in-kernel drivers.


3.	Summary

3.1.	Ideas

The bdevsw/cdevsw tables are dealed with in the abstract to support
to attach/detach the bdevsw/cdevsw data dynamically. Each drivers should
use devsw_get(9) and devsw_lookup(9) to get the device interface instead of
accessing them directly.

The initial bdevsw/cdevsw tables are generated automatically by config(8)
Each bdevsw/cdevsw entry is a set of device interface functions and
is put as a structure data in each drivers source.
The structure data only refers to bdevsw/cdevsw interfaces. Now each
interfaces are not global functions but local functions in their source.
I.e. all interfaces are declared as a static function. If necessary to access
the interface, we must access it via its structure data. Again, we must not
access the interfaces directly now.

In order to support this feature, add the new grammer 'device-switch'
to 'files'. In my current implementation, all 'device-switch' is in
the new file 'devsw.<arch>' under sys/arch/<arch>/conf.

devsw_attach(9) and devsw_detach(9) is used to attach/detach the
bdevsw/cdevsw data which is a set of device interface dynamically.

The framework of dynamic assignment of major number is very simple.
	- Prepare two ranges (i.e. four variables) in the kernel.
	  One is the range from low of block major number to high.
	  Another is the range from low of character major number to high.
	- if need to assign the number dynamically, search an empty point
	  from low to high.
	- return the major number of that empty point on success
	  or -1 as an error.
These ranges should be configurable via something like sysctl(8) and
define the initial minimum and maximum of each variables.

These provide greater flexibility and remove many macros and functions
which just return an error.

3.2.	Examples

3.2.1.	Kernel

- sys/uvm/uvm_device.c

	struct uvm_object *
	udv_attach(arg, accessprot, off, size)
		...
	{
		...

		mapfn = cdevsw[major(device)].d_mmap;
		if (mapfn == NULL ||
		    mapfn == (paddr_t (*) __P((dev_t, off_t, int))) enodev ||
		    mapfn == (paddr_t (*) __P((dev_t, off_t, int))) nullop)
			return (NULL);

		...
	}

				||
				\/

	struct uvm_object *
	udv_attach(arg, accessprot, off, size)
		...
	{
		struct cdevsw *cdev;
		...

		cdev = devsw_get(major(device), DEVCHR);
		if (cdev == NULL)
			return (NULL);
		mapfn = cdev->d_mmap;
		if (mapfn == NULL || mapfn == nommap || mapfn == nullmmap)
			return (NULL);

		...
	}

	(nommap and nullmmap are aliases and are defined in sys/conf.h.)

- sys/miscfs/fdesc/fdesc_vnops.c

	void
	fdesc_init()
	{
		int cttymajor;

		...

		/* locate the major number */
		for (cttymajor = 0; cttymajor < nchrdev; cttymajor++)
			if (cdevsw[cttymajor].d_open == cttyopen)
				break;
		devctty = makedev(cttymajor, 0);

		...
	}

				||
				\/

	struct cdevsw *devswctty;

	void
	fdesc_init()
	{
		...

		devswctty = devsw_lookup("ctty", DEVCHR);
		if (devswctty == NULL)
			panic("not found cdevsw of ctty");
		devctty = makedev(devswctty->d_major, 0);

		...
	}

3.2.2.	Userland - config(8)

The 'fd' driver have both bdevsw/cdevsw interfaces.
If 'fd' is defined in your kernel configuration file,
config(8) generates below:

- ioconf.c
	extern bdevsw fd_bdevsw;
	extern cdevsw fd_cdevsw;

	struct bdevsw *bdevsw0[] = {
		...
		&fd_bdevsw,
		...
	};

	struct cdevsw *cdevsw0[] = {
		...
		&fd_cdevsw,
		...
	};

	struct bdevsw **bdevsw = bdevsw0;
	struct cdevsw **cdevsw = cdevsw0;

If not, each entries are filled out by NULL.

(The current 'fd' driver don't have fd_bdevsw and fd_cdevsw,
 must be defined.)


4.	Synopsis

4.1.	Kernel

4.1.1.	Device Switch Types

enum devswtype { DEVBLK, DEVCHR };

DEVBLK		- Block device switch
DEVCHR		- Character device switch

4.1.2.	New entries of bdevsw/cdevsw

d_major		- An alias to the major number to avoid linear searching tables
		  (should be -1 in the source. In initialization phase or
		   dynamically attach phase, filled out by proper value.)
d_name		- An alias of bdevsw/cdevsw to search by name
		  (should be same as a devbase and be null-teminated.)

4.1.3.	Variables

bdevsw_dynlow	- Low of block major number
bdevsw_dynhigh	- High of block major number
cdevsw_dynlow	- Low of character major number
cdevsw_dynhigh	- High of character major number

4.2.	Userland - config(8)

4.2.1.	Grammer

device-switch <name> chr <num> [blk <num>] [<options>]

name		- The prefix of bdevsw/cdevsw entry (required)
chr		- A character major number (required)
blk		- A block major number (optional)
options		- Conditions to determine whether should be attached or not
		  (optional)

4.2.2.	Structures and Variables

struct devswi {
	struct devswi	*di_next;	/* linked list */
	const char	*di_srcfile;	/* the name of the "devsw" file */
	u_short		di_srcline;	/* the line number */
	const char	*di_name;	/* [bc]devsw name */
	int		di_cmajor;	/* character major */
	int		di_bmajor;	/* block major */
	struct nvlist	*di_opts;	/* options */
};

struct devswi *alldevswi; /* list of all device switch */
struct devswi **nextdevswi; /* to construct a linked list */

struct hashtab *cdevswitab; /* character devswi lookup */
struct hashtab *bdevswitab; /* block devswi lookup */

int maxcdevswi; /* max number of character major */
int maxbdevswi; /* max number of block major */

4.3.	Functions

4.3.1.	Kernel

	void devsw_init(void);
	void *devsw_get(int maj, enum devswtype type);
	void *devsw_lookup(const char *name, enum devswtype type);
	dev_t devsw_chr2blk_dev(dev_t chrdev);
	dev_t devsw_blk2chr_dev(dev_t blkdev);

	int devsw_attach(void *devsw, int maj, enum devswtype type);
	void devsw_detach(void *devsw, enum devswtype type);

4.3.2.	Userland - config(8)

	int adddevsw(const char *name, int cmaj, int bmaj, struct nvlist *opts);
	int emitdevsw(FILE *fp);
	int fixdevsw(void);


5.	Description

5.1.	New Functionality

5.1.1.	Kernel

void devsw_init(void);

Initialize major number (d_major). In order to use major number,
this function should be called in main(). All functions related with
device switch must be called after calling this.


void *devsw_get(int maj, enum devswtype type);

Get a device switch associated with the major number 'maj' and the device
switch type 'type'. The 'type' is used to identify what the device switch is.
Return the device switch on success. Otherwise, return NULL.


void *devsw_lookup(const char *name, enum devswtype type);

Get a device switch associated with the name 'name' and the device switch type
'type'. The 'type' is used to identify what the device switch is.
Return the device switch on success. Otherwise, return NULL.


dev_t devsw_chr2blk_dev(dev_t chrdev);

Convert from character dev_t to block dev_t.
Return the valid dev_t (!= NODEV) on success. Otherwise return NODEV.


dev_t devsw_blk2chr_dev(dev_t blkdev);

Convert from block dev_t to character dev_t.
Return the valid dev_t (!= NODEV) on success. Otherwise return NODEV.


int devsw_attach(void *devsw, int maj, enum devswtype type);

Attach a device switch 'devsw' associated with the major number 'maj'
and the device switch type 'type'. If 'maj' is -1, allocate a major number
dynamically and fill out the d_major field. Return 0 on success or
an error value.


void devsw_detach(void *devsw, enum devswtype type);

Detach a device switch 'devsw' associated with the device switch type 'type'.
Return 0 on success or an error value.

5.1.2.	Userland - config(8)

int adddevsw(const char *name, int cmaj, int bmaj, struct nvlist *opts);

Make a node which is associated with the name 'name' and character major
number 'cmaj' and block major number 'bmaj' and the options 'opts'.
The options are used to determine whether this device switch should be
attached or not in fixdevsw().


int emitdevsw(FILE *fp);

Generate initial bdevsw/cdevsw tables, nbdevsw, ncdevsw, swapdev and memdev.

nbdevsw	- # of bdevsw (i.e. initial bdevsw table size)
ncdevsw	- # of cdevsw (i.e. initial cdevsw table size)
swapdev	- a fake swap device
memdev	- a fake memory device


int fixdevsw(void);

Determine which device switch should be attached.


6.	Implementation

6.1.	Patch Kit

All features have already been implemented.
The current patch kit is available at:

http://www.kbug.gr.jp/~maekawa/patches/cdevsw-20010626a.diff.gz

SHA1 (cdevsw-20010626a.diff.gz) = 6e1065040a783d9155e29d416ab1d55521ef1e9c

This patch kit is based on the source "2001/06/21 00:00:00 JST".
Now trying to catch up with "2001/06/26 00:00:00 JST".

6.2.	Supported Ports

All Ports!

(algor, alpha, amiga, amigappc, arc, arm26, arm32, atari, bebox, cats, cesfic,
cobalt, dnard, dreamcast, evbsh3, hp300, hpcarm, hpcmips, hpcsh, i386,
luna68k, mac68k, macppc, mipsco, mmeye, mvme68k, netwinder, news68k, newsmips,
next68k, ofppc, pc532, pmax, prep, sandpoint, sgimips, sparc, sparc64, sun2,
sun3, vax, walnut, x68k and x86_64)

alpha, amigappc, cats, dnard, netwinder, pc532, sparc64, sun2, vax, walnut and
x86_64 are not compile tested. Maybe these have some typos and errors.

i386 and hpcmips work fine.

--- MAEKAWA Masahide (gehenna@NetBSD.org)