Subject: Xen time issues (Re: HEADS UP: timecounters (branch
To: None <tech-kern@netbsd.org>
From: Jed Davis <jdev@panix.com>
List: tech-kern
Date: 07/04/2006 01:36:46
Frank Kardel <kardel@netbsd.org> writes:

> Jed Davis wrote:
>>Frank Kardel <kardel@netbsd.org> writes:
[]
>>>My observations:
>>>    - my builds where gcc4.
>>>    - it is running on an Athlon 64 X2
>>>    - I was seeing process_system_time getting ahead of
>>>     shadow_system_time.
[]
> [Keep in mind a specific environment: AMD 64 X4, gcc4, -march=athlon.
> Intel seems to do fine given the reports in teck-kern@.]

Somewhat fine; I've had problems with TSC stability with
hyperthreading enabled -- independent of Xen, but a domU has nothing
else to use for timing (and I believe Xen itself makes use of TSCs).

> That confuses me. xen/xen/clock.c:xen_timer_handler does assume that
> process_system_time is always less then shadow_system_time. If that is
> violated delta becomes negative.

It's (shadow_system_time + get_tsc_offset_ns()), which is -- assuming
a reliable TSC -- the system_time as of right now, which is expected
to come after processed_system_time.  More to the point, it's expected
to be monotonically increasing, and thus greater now than when
processed_system_time was last set to something <= it.

Clearly that's not always the case, though I'm not sure how.

> In that situation I have seen the hardclock(9) catch up loop
> repeatedly calling hardclock for a long time in a tight loop on the
> amd. This is the hang Thor and I see.
>
> Could it be that "delta >= NS_PER_TICK" is effectively doing an
> unsigned comparison here (NS_PER_TICK is unsigned)?

Yes, it would be unsigned comparison -- that's a standard C thing, the
promotion rules, and I don't know how I didn't notice it sooner.
That's clearly wrong; the loop shouldn't run at all in that case.

> furthermore the negative delta will be assigned to ci->ci_cc.cc_denom.
> Is this wanted?

I'm not sure what else can be done there -- the underlying time source
has gone backwards.  What does the timecounter framework do when a
timecounter steps backwards or otherwise gives obviously wrong output?


-- 
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l))))))  (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k)))))))    '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))