Subject: Re: power management
To: Jachym Holecek <>
From: Jason Thorpe <>
List: tech-kern
Date: 06/27/2006 10:58:27
On Jun 22, 2006, at 10:32 AM, Jachym Holecek wrote:

> Hello,
> there's recently been fair amount of work going on towards proper
> ACPI support. This seems like a good opportunity to have a look
> at power management support in NetBSD.

First, I haven't had a lot of time to review this, but I wanted to  
make at least a few comments now.

>   * Powerhooks are not explicitely tied to devices, it's impossible
>     to selectively power down unused devices for instance. This also
>     means it's impossible to expose per-device PM to userland, we can
>     only run all powerhooks, or none.

It's not just that -- we also have a fairly ad-hoc way to power-up/ 
down devices when they're opened/closed, ifconfig'd up/down, etc.  It  
would be nice to have a more general mechanism that was a bit more  
unified and perhaps driven from higher levels.

> * On some platforms, model-specific ASICs can be used to power-{up,
>     down} otherwise MI devices. This is the case for prep and  
> sparcbook
>     (thanks garbled@ for the examples). Powerhooks don't provide a  
> clean
>     way to handle this (ie run MD handler and driver's MI powerhook).
> Talking about PM, it seems reasonable to distinguish the following:
>   1. System-wide management, affecting the whole machine. This is what
>      APM and ACPI sleep states do.
>   2. Per-device power management. It should be possible to allow power
>      saving operation (there can be more modes) for devices that  
> support
>      it (802.11 wireless etc). Furthermore, the user should be  
> allowed to
>      poweroff unused components (no point in running ethernet  
> interfaces
>      when you're sitting on a plane and batterry is running out). See
>      next point.

In a perfect world, this is automatic.  "ifconfig wm0 down" or  
"ifconfig ath0 down" removes power.  We have this in an ad-hoc  
fashion now.  For this, we need some better way of saying "hey, I'm  
not on the network", but that's really a UI problem.

> 3. The system should be able to monitor device activity and take
>      appropriate steps automatically, at least for the most obvious
>      scenarios (acpiacad(4) disconnected --> put devices to power
>      saving operation). There also needs to be a way for userland
>      to monitor this so that more sophisticated PM policies are
>      doable.

powerd(8) partially does this.  The idea is that power events like  
"AC adapter unplugged" send notifications down to powerd(), which  
then makes adjustments to various settings based on policy encoded  
ins scripts.  I have always assumed that someone would go and improve  
powerd(8) :-)  Now that we have proplib, we can express a much richer  
set of events down to powerd(8), as well.

> I'd like to handle (2) fairly soon -- it should provide reasonable  
> basis
> for further work, in particular it's prerequisite for (1). It's  
> also good
> to do this before more powerhooks get written (I could convert the
> existing ones, there's only a handful).
> A couple of thoughts on (3) are pasted below [@] to get the discussion
> started. Also check the attachments for annotated comments from people
> I've discussed PM with (thanks for the input!), there's a number of  
> ideas
> for future work -- mostly related to (3), some also touch (2) and (1).
> Messages not included here are either (hopefully) covered by the  
> proposal,
> or mentioned in one of the attachments (I picked the longest  
> replies ;).
> Now to the point -- for (2) I'd propose:
>   * Get rid of powerhooks as we know them
>   * Distinguish the following "power levels", it makes sense to define
>     this in terms of performance and functionality:
>     ON 		- Device is fully powered up. This is the initial state
> 		  after autoconf. This is "high" power level.
>     LOWPOWER1 	- Moderate power saving. May impact perfomance, but the
> 		  device needs to stay operational. Imagine atactl's
> 		  "idle"/"standby" states.
>     LOWPOWER2 	- Aggresive power saving. Sacrifice all performance and
> 		  feel free to make the device not-operational in some
> 		  non-vital way.
>     OFF 	- Device is stone dead. This is "low" power level.
>     I'm not particularly fixed on the set of levels/number of levels
>     (the above is taken from AIX, IIRC), so feel free to suggest  
> better
>     one, as long as it's strictly ordered and has clear "operational"
>     and "performance" semantics.
>   * Use ca_activate as per-device PM entry point. The calling  
> convention
>     may need to change slightly, it seem good to pass a request  
> structure
>     pointer intead of an enum (the same request struct could be reused
>     for (1)). In any case, the desired power level is the primary
>     argument.
>   * Have a kernel process handle all device-PM operations so that the
>     devices can sleep on power level transition (may need to wait for
>     DMA or just have a *long* OFF->ON period).
>   * Handle devices with hierarchy in mind -- high->low transitions
>     should hit children (recursively) before the parent, low->high
>     transitions should hit the parent before any of it's (recursive)
>     children.
>   * Provide a hook for MD code (see prep and sparcbook case above).
>     When the hook is present, it will be used to wrap ca_activate
>     calls for _any_ device in the system. This way MD code can even
>     disable some PM operations for known-broken devices, or handle
>     the need to access dedicated ASIC at appropriate point.
>   * Userland interface will go via character-device (/dev/power would
>     be good, except it's optional component) as (probably) a
>     dictionary-passing ioctl.
>   * A powerctl tool should exist, "powerctl <dev> <level> <...>"
>     would push <dev> to <level> the way described above. When
>     <dev> is the root device, all devices are affected (obviously).
>   * Busses need a rescan after (some) low->high transitions.
>   * Don't care about "not-configured" devices *for now*. They
>     definitely should be handled (by parent bus), but it would
>     be too intrusive/messy with current state of autoconf.
> I hope I didn't forget something. The actual diff would probably
> be shorter then this mail...
> 	-- Jachym
> [@] Activity monitoring:
>   * The device themselves are mostly _not_ competent to decide on  
> their
>     own activity. Instead, upper layers should indicate this. The  
> network
>     stack can best tell if (or "how much"?) given interface is active.
>     Wscons best knows when a display is active -- it can watch mouse
>     and keyboard, and indicate inactivity after they don't send in any
>     events for a while.
>   * Higher levels need to be able to query "power state" of devices,
>     no point in sending data to network card if it's off. There also
>     needs to be a way to force a device into operational mode if
>     it's powered off. This could happen just by marking the device
>     active and waiting for the event to be propagated and handled
>     eventually. Not sure about this.
>   * It seems reasonable to send activity events to userland so that it
>     has enough information for PM policy. If no deamon (?) is  
> listening,
>     the kernel itself should come up with reasonable default action.
>   * Transitions between active/inactive should be filtreded by
>     configurable timeout. This would avoid spurious actions and event
>     floods. "Just let me know when the disk is inactive for N  
> seconds".
> <garrett-damore>
> <gavan-fantom>
> <jesse-off>
> <steven-bellovin>

-- thorpej