Subject: Re: RFC: Device power management
To: Steven M. Bellovin <smb@cs.columbia.edu>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 07/17/2007 10:00:43
--umrsQkkrw7viUWFs
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jul 16, 2007 at 10:45:33AM -0400, Steven M. Bellovin wrote:
> Disk drives can require a few seconds to spin up before they're
> ready; some component (I'm not sure which) is going to need to be
> cognizant of that delay.

This already pretty much works with the disk drivers we have, mostly
by having long-enough timeouts for disk requests, though you're right
that the pm framework will also need to tolerate these delays.=20

There's another more significant wrinkle with disks, looking a little
further ahead than the present work. Knowing when a disk is spun down,
there would be a class of requests which we leave deferred in the
queue and not issue to the device, until some higher-priority request
requires the disk to be spun up. Then, knowing that the device is spun
up, we want to clear that backlog from the queue. Finally, and this is
the more intricate bit, we need to determine when to start holding
back those requests again even though the disk is spun up, because we
want to give it a chance to go idle and spin down (even though such
requests might be coming constantly).

The classic example is atime updates, which we currently hold back
indefinitely in the filesystem, but there are others.  I can imagine
the case for a new flag to open(2), kind of the reverse of O_DSYNC,
which causes data updates to be flushed at that lower priority, and/or
held back in the filesystem/uvm until the disk is spinning (to keep
the queue shorter and avoid long-term locks).=20

We have the beginnings of the infrastructure for this in the priocscan
work, but we need to identify which other requests (mostly writes)
fall into the new priority class.  There should also be callbacks into
the filesystem and syncer to say "now is your opportunity to issue
background writes", like clearing atimes/dirty vnodes and lazy-sync
data, before the disk spins down.

Somewhat in parallel with this, I'd love to see the ability to set the
io priority level on a per-process basis, something like linux's
"ionice", shifting the priority slot where io's issued for certain
processes land.

--
Dan.


--umrsQkkrw7viUWFs
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD8DBQFGnAaqEAVxvV4N66cRAvJpAJ9YPnYD7PM5/SfBTWCbGtN6/MoMcACfY/2H
o7NMqyMYIG8VLR3jSgZYqfM=
=4r3H
-----END PGP SIGNATURE-----

--umrsQkkrw7viUWFs--