Subject: Re: CFR: The Auto-Generation Block/Character Device Switch Tables
To: MAEKAWA Masahide <gehenna@NetBSD.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 05/09/2002 10:21:36
On Thu, 9 May 2002, MAEKAWA Masahide wrote:

> Here is a proposal for a framework to support the auto-generation of
> block/character device switch tables by config(8).

Overall, looks good. Comments where I think they're needed. :-)

>    The Auto-Generation Block/Character Device Switch Tables by config(8)
>
>
> 1.	Background
>
> It is too painful to maintain port-dependent conf.c, conf.h and sys/conf.h.

Well, not "too" painful because we do it now. But especially for all of
our ports, it is a real pain.

> 2.	Current Implementation
>
> Now we have block/character device switch (bdevsw, cdevsw) tables in
> sys/arch/<ARCH>/<ARCH>/conf.c as an array defined statically and
> maintain by our own hands. Each entries are filled out by device interfaces
> (i.e. open, close, etc...) using macros. That many macros are defined for
> convenience in sys/conf.h and machine/conf.h.
>
> Whether their entries are active is determined in compile time by NXXX
> which is generated by config(8). In addition, many functions which just
> return EXXX value defined in errno.h as an error are in many device drivers.
>
> I have one basic questions.
>
> What do you think that many macros for each devices are defined in sys/conf.h
> or machine/conf.h? In addition, in order to use that macros compels us to
> define many new functions same as nullop/enodev and kludge aliases by #define.
> In order to just maintain sys/arch/<ARCH>/<ARCH>/conf.c by hands, what that
> kludge hacks/macros are used is a Bad Thing, right?

I'm sorry, that didn't parse.

> 3.	Summary
>
> 3.1.	Ideas
>
> The framework needs to support both static and dynamic assignment of the
> device majors. To realize the latter, the initial bdevsw/cdevsw tables are
> generated automatically by config(8).
>
> For kernel:
> In the current implementation, each bdevsw/cdevsw entry (a set of device
> interface functions such as open, close, etc.) is embedded in the device
> switch definition (conf.c). Instead, it is modified to be distributed in
> the corresponding device driver (backend) source as a constant data.
> Because of this, when you write a new device driver, all you have to do
> to the machine-dependent part is to add one line to the 'majors' file of
> each port (see below) and config files.
>
> The interface functions are not global but local in their source. They
> should be always called via the device switch. It's a bad idea to call
> them directly from outside of the driver. To get the device switch entry
> corresponding to a specific device, devsw_lookup(9) function is introduced.
> Similarly, newly added devsw_lookup_major(9) can be used to get the major
> number of a specific device.

Why? We've always let the routines be global before. Why should we stop
that now? Also, with the above change, you're doing more than just change
how we handle b/cdevsw, you're forcing anything which calls routines
directly to now have to know about major numbers. Also, you're forcing
major numbers (for these rare cases) to have to be made global. :-)

That strikes me as a bad thing.

> For config(8):
> In order to support this feature, a new grammer 'device-major' is added to
> 'files'.  All 'device-major' is put within the new machine-dependent file
> 'majors.<ARCH>' under sys/arch/<ARCH>/conf, which is included from
> 'files.<ARCH>'. This is the only file which contains the device number
> definitions.
>
> To support the dynamic assignment of the device major, devsw_attach(9) and
> devsw_detach(9) are added; these can be used to attach/detach the device
> switch data dynamically instead of memcpy. These functions are useful for
> LKM framework.

Just to make sure I understand, compiled-in drivers won't need
devsw_attach(), correct?

> These features provide greater flexibility and make less pain to maintain
> device majors and remove many macros and many functions which just return
> an error, cdev_*_init, bdev_*_init and so on. We can get simple port-dependent
> conf.c and sys/conf.h and machine/conf.h.
>
> IMPORTANT:
> 	I DON'T merge bdevsw/cdevsw into a single structure.
> 	I have tried to merge them at first, but that have big
> 	impacts/afftects seriously. So at this time, I decided
> 	NOT to merge them. If necessary, we should discuss about
> 	this in OTHER thread. Even if conclude to merge them into a
> 	single structure in that discussion, my proposal affects nothing.
>
> 3.2.	Examples
>
> 3.2.1.	Kernel
>
> Before:
> 	(foo.c)
> 		foo_open(...) {
> 			...
> 		}
> 	(bar.c)
> 		extern foo_open();
>
> 		bar() {
> 			...
> 			foo_open(...)
> 			...
> 		}

I don't think that's the current case. For places where we are given the
major number, like in spec_open(), we do:  (*cdevsw[maj].d_open)(dev, ...)
Especially in terms of call usage, I expect the vast majority of calls to
go through the switch tables.

I would caution you that in places where drivers are making explicit calls
to open/close/ioctl routines (i.e. where it's specifically foo_open, not
jumping though the devsw), because chances are it REALLY wants that
routine. I don't see what we gain by making such code have to now know
what major number to use to find said driver.

> After:
> 	(foo.c)
> 		const struct cdevsw foo_cdevsw = {
> 			foo_open, ...
> 		};
>
> 		foo_open(...) {
> 			...
> 		}
> 	(bar.c)
> 		bar() {
> 			const struct cdevsw *cdev;
>
> 			...
> 			cdev = devsw_lookup(dev_t, DEVCHR);
> 			(*cdev->d_open)(...)
> 			...
> 		}
>
> 	If not available major numbers,
>
> 		extern const struct cdevsw foo_cdevsw;
>
> 		bar() {
> 			...
> 			(*foo_cdevsw.d_open)(...)
> 			...
> 		}

Well, that is one way to do it. But is exporting struct b/cdevsw really
that different from exporting the routines? Hmmm... Maybe it is.

> 3.2.2.	Userland - config(8)
>
> The 'fd' driver have device interfaces for block/character devices.
> If 'fd' is defined in your kernel configuration file,
> config(8) generates below:
>
> - devsw.c
> 	extern const struct bdevsw fd_bdevsw;
> 	extern const struct cdevsw fd_cdevsw;
>
> 	const struct bdevsw *bdevsw0[] = {
> 		...
> 		&fd_bdevsw,
> 		...
> 	};
>
> 	const struct cdevsw *cdevsw0[] = {
> 		...
> 		&fd_cdevsw,
> 		...
> 	};
>
> 	const struct bdevsw **bdevsw = bdevsw0;
> 	const struct cdevsw **cdevsw = cdevsw0;
>
> If not, each entries are filled out by NULL.
>
> Here, fd_cdevsw/fd_bdevsw must be provided by the fd driver. So, we need to
> add the definision of these data to "fd" driver source. Similarly, any other
> devices have to provide their own device switches.
>
>
> 4.	Synopsis

[snip]

> 5.	Description
>
> 5.1.	New Functionality
>
> 5.1.1.	Kernel
>
>
> const void *devsw_lookup(dev_t dev, enum devswtype type);
>
> Get a device switch associated with the dev_t 'dev' and the device switch
> type 'type'. The 'type' determines which device switches to be looked up.
> In the internal of this function, get the major number from 'dev' by using
> major() macro. Return the device switch on success. Otherwise, return NULL.
>
>
> int devsw_lookup_major(const void *devsw, enum devswtype type);
>
> Get a device major number associated with the device switch 'devsw' and
> the device switch type 'type'. The 'type' determines which device switches
> to be looked up. Return the device switch on success. Otherwise, return NULL.

Uhm, you can't return NULL from a function that returns int. ;-)

> dev_t devsw_chr2blk_dev(dev_t chrdev);
>
> Convert from character dev_t to block dev_t.
> Return the valid dev_t (!= NODEV) on success. Otherwise return NODEV.
>
>
> dev_t devsw_blk2chr_dev(dev_t blkdev);
>
> Convert from block dev_t to character dev_t.
> Return the valid dev_t (!= NODEV) on success. Otherwise return NODEV.
>
>
> int devsw_attach(const void *devsw, int *maj, enum devswtype type);
>
> Attach a device switch 'devsw' associated with the major number '*maj'
> and the device switch type 'type'. If '*maj' is -1, allocate a major number
> dynamically and stored allocated number in '*maj'. Return 0 on success or
> an error value.

This interface I don't like. You should pass in both the character & block
devsws and maj int pointers at once; once call does it all. If you do
devsw_attach as two steps (one for character, one for block) how do you
know you should assosciate the character & block devices together?

I think a better signature would be:

int devsw_attach(const cdevsw *cdevsw, int *cret, const bdevsw *bdevsw,
	int *bret)

> void devsw_detach(const void *devsw, enum devswtype type);
>
> Detach a device switch 'devsw' associated with the device switch type 'type'.
>
> 5.1.2.	Userland - config(8)

[snip]

> 6. Compatibility
>
> These changes break a compatibility of LM_DT_BLOCK/LM_DT_CHAR in LKM framework.
> (This feature is used to attach the device switch in run-time.)

Don't worry about that. Break them. LKMs are only good for the specific
kernel version they were built against. Far less invasive changes have
broken lkm compatability in the past. :-)

I think dynamic major number support is well worth the code rototillage
this would entail.

> In the current implementation, each LKM drivers search the reserved entry for
> LKM directly and change device switch tables by using memcpy().
> The device switches are defined in each kernel module sources.
>
> In this proposal, all device switches must be defined in original driver
> source for static-linked kernel. So we don't need to define a device switch
> in kernel module source. And we don't need to lookup device switch tables
> directly to attach the device switch. Just use devsw_attach(9).

One other aspect of lkms now is that they will only load in lkm device
slots, bdev_lkm_dummy or cdev_lkm_dummy. It would be nice to retain that
behavior. Or at least on a port with N lkm slots, it would be nice for the
first N lkms to use them. Otherwise we have problems with adding devices
at major numbers for which we don't have device nodes.

> If try to load the some old LKM drivers withour any hacks for LKM,
> we can see the terrible disaster. To protect the kernel from the disaster,
> bump the LKM_VERSION in sys/lkm.h and MUST re-make the LKM drivers.
>
> Fortunately, this LKM feature (LM_DT_BLOCK/LM_DT_CHAR) is used by only
> ipfilter(4) in current NetBSD tree. It has already been rewritten,
> but not tested yet.
>
> The iwm_fd driver on NetBSD/mac68k also changes device switch tables.
> But this driver chages tables directly by using memcpy() with hard-coded
> device majors, not using this LKM feature. There is no good solutions for this.
> If want to attach the device switches to the kernel, we should use the
> LM_DT_BLOCK/LM_DT_CHAR feature and free from hard-coded device majors.

1) Ask Scott if we still need this lkm. I believe it predates our having
an iwm driver in-tree. 2) change it to use dynamic numbers. :-)

Take care,

Bill