Subject: Re: Understanding foo_open, foo_read, etc.
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 08/29/2006 17:27:39
> In general, looking through the system, I find a number of drivers
> with functions with names like foo_open, foo_read, and foo_close.  In
> some cases, foo_read takes a 'struct file *' as its first argument;
> in some, it takes the device's softc.  I don't know why!

I think part of the problem here is the distinction between a special
device file and a non-vnode file descriptor.

A file descriptor is an index into the per-process open file table.
The objects here are the real thing behind file descriptors, and are
why, for example, there is a difference between opening a file twice
and opening it once and dup()ing: one way you get two open files (even
if they point to the same underlying file); the other way you get two
descriptors that refer to the same open file.

Now, file descriptor objects ("file table entries") have types.  The
commonest type is probably "vnode", which refers to a filesystem entity
in some filesystem.  Vnodes themselves have types, but let's hold that
in abeyance for the moment.

But there are other types of open file table objects.  "Socket" is
perhaps the next easiest to explain; in this case, the file table entry
points to a structure describing a socket, not a vnode, and the
routines called to do things with the file descriptor (read, write,
seek, ioctl, etc) are different.  On recent NetBSD "pipe" is a third
kind of open file table entry.

Now, let's go back to vnodes.  Vnodes have types.  The commonest type
may be "plain file", but there are also "directory" and, most
importantly at the moment, "device special file".  Device files are,
loosely speaking, what one finds in /dev, and they are what are usually
used to interface to hardware (/dev/ttyE0, /dev/wd0g, /dev/wsmouse).

Traditional drivers are support for device special vnodes.  When you
find

> For instance, rnd.c's rndread is:

> int rndread(dev_t dev, struct uio *uio, int ioflag);

this is the "read" routine for the driver backing /dev/random and
/dev/urandom.

The "drivers" that use struct file * arguments, such as you see for
soo_read() (sys/kern/sys_socket.c), are not what are traditionally
called drivers.  These are support not for device special vnodes but
rather for non-vnode open file table entries.

In the zaptel code you quote

> 	dev_type_open(ztopen);
> 	const struct cdevsw zaptel_cdevsw = {
> 		ztopen, noclose, noread, nowrite, noioctl, nostop, notty,
> 		nopoll, nommap, nokqfilter, 0
> 	};

> Then, moments later:

> 	static struct fileops zt_fileops = {
> 		.fo_close = zt_close,
> 		.fo_ioctl = zt_ioctl,
> 	#ifdef __NetBSD__
> 		.fo_fcntl = zt_fcntl,
> 	#endif
> 		.fo_read = zt_read,
> 		.fo_write = zt_write,
> 		.fo_poll =  zt_poll,
> 		.fo_stat = zt_stat,
> 	#ifndef __NetBSD__
> 		.fo_kqfilter = zt_kqfilter
> 	#endif
> 	};

you have both.  The struct cdevsw describes the "device special file"
interface; the struct fileops describes the "open file table entry"
interface.

> I don't think I understand this.  What are the circumstances under
> which one would fill in 'file *' objects with fileops structures,
> instead of a cdevsw?

This question almost doesn't make sense; it's a conceptual type clash,
a bit like asking under what circumstances would one speak "without"
instead of "English".  You have to use a fileops when filling in a
struct file *; you have to use a cdevsw when setting up a device
special vnode.  (I'm ignoring, for purposes of this email, the
distinction between a block device (bdevsw) and a character device
(cdevsw); it's largely a historical artifact and would only confuse an
already confusing thing further.  I can go into it if you want.)

Without knowing more about the zaptel code, I can't be sure, but my
guess is that there is a special device interface, and under some
circumstances, this provokes the creation of new open file table
entries, which are their own kind of open file table entry, using
zt_fileops, rather than being vnodes or sockets or whatever.

As for why they do do this, I can only speculate.  My guess is that
it's a way to do something like cloning devices, but in a less
OS-dependent way than actually using the OS's cloning device support.

> My vague theory is that, if functions matching the cdevsw prototypes
> were created and put into the cdevsw structure, it would not be
> necessary to allocate a file and fill it with file ops, because the
> device would somehow magically know to use its cdevsw operations.

Something like this.  More precisely, when a device special file is
opened, the generic support calls the vnode open routine, which creates
a device special vnode corresponding to the device in the filesystem.
It then calls the driver for that device (through the cdevsw struct) to
do any device-specific stuff.  Then, if that succeeds, it constructs an
open file table entry of type vnode, pointing to the vnode, and that's
the what goes in the table slot for the returned file descriptor.  You
see only the device-specific part of this; the rest is generic code,
some of it in filesystem-indepedent high kernel code, some of it in the
per-filesystem code for the filesystem the device lives in.

Creating open file table entries of other types is not done by opening
things with open() or its relatives (again, I'm simplifying - cloning
device and /dev/fd support complicate this more than I want to get into
here).  Traditionally, each type of open file object has its own
syscalls, such as socket() for sockets and pipe() for pipes.  In the
case of the zaptel code, my guess (and it is just a guess) would be
that something in the device special driver - an ioctl, maybe -
provokes creation of a new open file table entry using zt_fileops,
which is then returned (in the form of a file descriptor) to userland
for future use.

I hope this as been more illuminating than obscuring.  It's a bit
complicated, but then, it's a complicated subject.  I'll be happy to go
into more detail if you want, or take correction if I've botched
something in this description.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B