tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: PSA: Clock drift and pkgin



>>>        } else if (sec <= (LONG_MAX / 1000000))
>>>                ticks = (((sec * 1000000) + (unsigned long)usec + (tick - 1))
>>>                    / tick) + 1;
>> The delay is always rounded up to the resolution of the clock, so
>> waiting for 1 microsecond waits at least 10ms.

But it is increased by 1 tick when it is an exact multiple of the clock
resolution, too.  For sleeps, that makes some sense.  For timer
reloads, it doesn't.

I could of course be wrong about that code being responsible, but
reading realtimerexpire() makes me think not; it uses tshzto, which
calls tstohz, which calls tvtohz, which is where the code quoted above
comes from.  Maybe realtimerexpire should be using other code?

> Look at the wording sleep(3), nanosleep(2), etc.  They all use
> wording like "... the number of time units have elapsed ..."

True.

And, if the misbehaviour involved sleep, nanosleep, etc, that would be
relevant.  The symptom I'm seeing has nothing to do with them (except
that both are related to time); what I'm talking about is the timing of
SIGALRMs generated by setitimer(ITIMER_REAL,...) when it_interval is
set to 1/HZ (which in my test cases is exact).  setitimer(2) does say
that "[t]ime values smaller than the resolution of the system clock are
rounded up to this resolution (typically 10 milliseconds)", but it does
_not_ have language similar to what you quote for sleep() and
relatives.  Nor, IMO, should it.  The signals should be delivered on
schedule, though of course process scheduling means the target process
may not run the handler on schedule.  Under interrupt load sufficient
that softclock isn't running when it should, I'd consider this
excusable.  That does not describe my test systems.

1.4T does not have this bug.  As I mentioned, it works fine on sparc.
Even on i386, I see:

$ date; test-alrm > test-alrm.out; date
Sat Dec 23 07:57:45 EST 2023
Sat Dec 23 07:58:46 EST 2023
$ sed -n -e 1p -e \$p < test-alrm.out
1703336265.921251
1703336325.916413
$ 

Linux, at least on x86_64, gets this right too.  On a work machine:

$ date; ./test-alrm > test-alrm.out; date
Sat Dec 23 08:18:15 EST 2023
Sat Dec 23 08:19:15 EST 2023
$ sed -n -e 1p -e \$p < test-alrm.out
1703337495.219734
1703337555.209737
$ uname -a
Linux mouchine 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ 

> Two options are to increase HZ on the host as suggested, or halve HZ
> on the guest.

I suppose actually fixing the bug isn't an option?

I don't know whether that would mean using different code for timer
reloads and sleeps or what.  But 1.4T is demonstration-by-example that
it is entirely possible to get this right, even in a tickful system.
(I don't know whether 1.4T sleeps may be slightly too short; I haven't
tested that.  But, even if so, fixing that should not involve breaking
timer reloads.)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index