tech-kern: Re: timecounter (was uptime(1)) behaviour & sleep mode

Subject: Re: timecounter (was uptime(1)) behaviour & sleep mode
To: None <tech-kern@netbsd.org>
From: Frank Kardel <kardel@netbsd.org>
List: tech-kern
Date: 07/13/2006 12:53:09
Daniel Carosone wrote:

>tech-kern readers:
>
>The following message was posted recently to tech-userlevel:
>
>On Wed, Jul 12, 2006 at 11:50:46PM +0200, Arnaud Lacombe wrote:
>  
>
>>With recent change in ACPI and in time handling, I was wondering what
>>should be the behaviour of uptime(1) when we start to deal with sleep (with ACPI
>>S3 or APM). uptime(1) defines it as "the length of time the system has
>>been up" but when the laptop goes into sleep mode, is it still considered as
>>being up or not ?
>>    
>>
there seems to be room for interpretation :-)

>
>This made me wonder about a parallel question for kernel internals,
>regarding timecounters under similar circumstances.
>
>I think they're mostly separate discussions, so I've set replies for
>this message to tech-kern; I'll reply in a moment for the userlevel
>context separately.
>
>Timecounters rely on the assumption that the underlying counter is
>monotonically increasing, wrapping predictably at a known boundary.  I
>wonder what impact various kinds of suspend/resume might have on this
>assumption, and what can be done to protect against bad effects.
>
>The answer almost certainly varies according to the specific counter,
>and the specific type of sleep/suspend done.  For example, I imagine
>that in S1 sleep, the TSC and i8259 timecounters are probably
>preserved, and maybe also in S3. The same may also be true for APM
>sleeps; in fact it may be the case that in most of the permutations we
>can presently reach, it's not actually an issue, if mostly by good
>luck.
>
>I can imagine at least one case where the counter values may get
>lost/reset: BIOS-mediated APM suspend-to-disk, where the hardware is
>fully powered off and thus lose its present state. I don't beleive
>there's any way to 'restore' the TSC cycle-counter, nor that an APM
>BIOS would try to even if there was.
>  
>
All likely to be true. The hickups will occur at the first clock interrupt
where they can mess up the time scale if the counter lost its
state. The longer the wrapping period is the more shift in the (up-) time
scale could happen.
The impact is usually *not* as bad as feared as most counters wrap 
relatively
fast compared to the interrupt rate. (i8254 once per tick) Also these 
effect do
not account for the effects reported. The problem with lost counter 
state is also
not new as it was there already in the kern_microtime.c code for
interpolating with the help of cycle counters.

Fact is that the system freezes wrt/ time during sleeps. Thus one
the time stands still. This is what actually happens for the
{micro,nano,bin}*UP*time() calls (uptime time scale).
They do not experience any time warp in the current (FreeBSD
derived) code because of a sleep except for counter hickups
described above. The real/wall clock time is adjusted *at wakeup* via 
tc_setclock()
via acpi_wakeup()/inittodr(). So wall clock time is readily recovered
at wakeup.

Now, the reason for the original report is that uptime only covers
the running time and the boottime changes with each sleep is the
following:
    The FreeBSD code keeps the offset between the uptime
    timescale (that freezes during sleep) and the current time
    in boottimebin (mirrored into boottime). Thus each time
    the time is set (tc_setclock()) the boot time changes (see 
acpi_wakeup.c).
    IIRR previously our boot time stayed fixed. bootime currently is
    current_time - running_time in the timecounter case.
    This is confusing. Also many seem to expect the uptime to be the 
time difference
    between current time and boot time - thus includes the sleep period 
at user
    level.

For a fix I will separate the time offset between uptime (actually 
running time)
and the current time from true boot time. Once boottime is back to be only
initialialized at startup  the user level code in w.c and other uptime 
calculations
that use kern.boottime will be back to normal again.

There are more issues buried in the kernel with respect to sleeping - like
callouts - but that would be another interesting thread.

>I also expect we'll come across other cases as we expand the range of
>sleep permutations we can reach.  Is the behaviour if the ACPI counter
>well-defined across each of the various state transitions?
>  
>
I think: time will tell :-) (couldn't resist)

>So: what can/does/should the timecounter code do to protect itself
>against this issue?
>
loss of counter state: nothing - that can upset the uptime timescale.
We could add powerhooks to counters - but we had that
problem already before with the kern_microtime.c code. On second thought
we are just starting to be able to sleep... see below

>  I expect it's not a big deal to reset or reselect
>each of the counters on a wakeup event, provided the event is
>delivered, and the specific counter implementations have enough
>information to know whether its necessary for this specific
>event. Should we do something in preparation on suspend (perhaps
>switching to a lower quality but more suspend-safe counter, or setting
>flags to warn waking-up code that the counter may be invalid)?
>  
>
I consider powerhooks still a massive overkill as that would
add an addition of an offset value to even the simplest counters +
all the management code. That effort for supporting mobile sleep capable
systems that already have a hard time keeping a decent time due to
temperature effects, low quality oscillators (=thermometers)?
The wall clock time is already corrected with tc_setclock() in 
acpi_wakeup.c.
thus introducing and error of +- 0.5sec at minimum.
The stable low quality counter is called RTC here. 32768 Hz frequency, 1 
second
resolution low power wall clock chip (in NTP we call that a TOY chip - 
time of
year :-)).

>I know there's some handling of this already, but was there a general
>analysis or just specific workarounds for particular cases as they
>were discovered (like not using the TSC if X or Y)?
>  
>
We may find more things that don't work too well with sleeping.
TSC has been the most prominent thing. But I definitly think we should
refrain from attempting to achieve nanosecond time scale stability
across sleep periods on platforms that are not even remotely suitable
as stable precision time keepers (especially in conjunction with
powersaving/sleep).

>If the underlying timecounters, and thus the bintime variables that
>ultimately get shown by uptime(1), have gotten screwed up,
>
The timecounters were correct, but boottime had(current has)
changed semantics.
I'll remove that nit so boottime stays at the initially set time stamp. This
restores the old userland expectations.

> the
>userland discussion about which and how these variables should be
>presented by uptime(1) is irrelevant.
>  
>
The counters were not screwed up (at least not to any relevant extent).
We shouldn't go overboard here - boottime semantics will be restored to
normal.

>--
>Dan.
>  
>
Frank