Subject: Re: RFC: Device power management
To: Daniel Carosone <dan@geek.com.au>
From: Steven M. Bellovin <smb@cs.columbia.edu>
List: tech-kern
Date: 07/16/2007 22:17:23
On Tue, 17 Jul 2007 10:00:43 +1000
Daniel Carosone <dan@geek.com.au> wrote:

> On Mon, Jul 16, 2007 at 10:45:33AM -0400, Steven M. Bellovin wrote:
> > Disk drives can require a few seconds to spin up before they're
> > ready; some component (I'm not sure which) is going to need to be
> > cognizant of that delay.
> 
> This already pretty much works with the disk drivers we have, mostly
> by having long-enough timeouts for disk requests, though you're right
> that the pm framework will also need to tolerate these delays. 
> 
> There's another more significant wrinkle with disks, looking a little
> further ahead than the present work. Knowing when a disk is spun down,
> there would be a class of requests which we leave deferred in the
> queue and not issue to the device, until some higher-priority request
> requires the disk to be spun up. Then, knowing that the device is spun
> up, we want to clear that backlog from the queue. Finally, and this is
> the more intricate bit, we need to determine when to start holding
> back those requests again even though the disk is spun up, because we
> want to give it a chance to go idle and spin down (even though such
> requests might be coming constantly).
> 
> The classic example is atime updates, which we currently hold back
> indefinitely in the filesystem, but there are others.  I can imagine
> the case for a new flag to open(2), kind of the reverse of O_DSYNC,
> which causes data updates to be flushed at that lower priority, and/or
> held back in the filesystem/uvm until the disk is spinning (to keep
> the queue shorter and avoid long-term locks). 
> 
> We have the beginnings of the infrastructure for this in the priocscan
> work, but we need to identify which other requests (mostly writes)
> fall into the new priority class.  There should also be callbacks into
> the filesystem and syncer to say "now is your opportunity to issue
> background writes", like clearing atimes/dirty vnodes and lazy-sync
> data, before the disk spins down.
> 
> Somewhat in parallel with this, I'd love to see the ability to set the
> io priority level on a per-process basis, something like linux's
> "ionice", shifting the priority slot where io's issued for certain
> processes land.
> 
Right -- Linux does something like this.  When in battery mode, certain
disk writes are just queued.  However, once the disk is spun up,
everything is flushed.


		--Steve Bellovin, http://www.cs.columbia.edu/~smb