tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

raw disk device interface abstraction



We're all dancing around a very fundamental question here: what interface 
abstraction should the "raw" interface to a disk controller (and attached 
disks) present?

We're not going to allow userland to directly write device registers as a 
general practice (X11 notwithstanding, and that's a glaring & horrible 
exception to UNIX rules because we've been unwilling to put a full graphics 
abstraction subsystem, with appropriate userland API, into the kernel (too big! 
too ugly! no API agreement!), as we have with disks (filesystems), network 
interfaces (protocol stacks), and serial devices (tty line disciplines)), and 
userland code does not handle device interrupts; that's the kernel's job.

We do generally allow userland to initiate DMA (through system calls) directly 
from userland memory - that's why the raw interface is generally faster than 
the block interface: no byte copying. Oh, and you get to do I/O in chunks 
larger than block interface is designed for, provided that the device (and 
driver) permits it.

Then there's the whole addressing question. Disk blocks used to be addressed by 
cylinder/head/sector numbers, and the driver translated between block numbers 
and c/h/s; now, modern disks do that translation for us, and when asked about 
c/h/s they even lie to us to hide their guts (or to follow very old 
abstractions). And we're talking lately about disks with 4K native blocks 
rather than the traditional 512 byte blocks (though you've been able to format 
properly compliant SCSI disks to block sizes other than 512 bytes for a very 
long time (decades)).

However, even "blocks" are an abstraction - UNIX wants to address everything in 
bytes; just look at read(2), write(2), and lseek(2). No mention of "blocks" - 
bytes are the fundamental (atomic) data & address unit of the system. We 
translate that to everything else as required.

So, what should be the abstraction that the raw interface to a "disk" be? It's 
going to have a translation from bytes to whatever the disk is addressed in. 
The driver will handle manipulation of the device registers and handle 
interrupts. Our memory allocators tend already to be conservative about 
alignment, but would not be unreasonable for a device driver that knows de 
facto that a device requires aligned DMA addresses to check what's requested in 
read(2)/write(2) and return EINVAL as necessary (naturally, the device man page 
should document all the reasons a driver will return an error). However, some 
warts are just easier to handle in the device driver, rather than leave for 
(less capable) userland code to deal with.

Another way to put the question: what is a disk? What are its fundamental 
properties, and how can we design a reasonable abstraction (which in most cases 
is probably not all that abstract) for userland code to reasonably deal with?

As with all things, we have tradeoffs to make; UNIX is pragmatic: a good 
solution today to today's problems is better than a perfect solution (which 
we've got to find some poor sod to implement!) tomorrow.

        Erik <fair%netbsd.org@localhost>



Home | Main Index | Thread Index | Old Index