NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58733: clock_nanosleep() (and I suspect more) is badly broken



The following reply was made to PR kern/58733; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/58733: clock_nanosleep() (and I suspect more) is badly broken
Date: Thu, 10 Oct 2024 18:09:49 +0700

 I have been looking more into this.   I am running the patch I suggested
 in an earlier message, and all seems well, I plan to commit that soon.
 
 This is what happens now (using some diagnostics requested by setting
 a magic bit in the flags word - and no, neither the diagnostics nor the
 modified version of sleep used to request them are going to be committed).
 
 [  3098.057195] nanosleep1(): ts2timo() OK [4098.057887029] @[3098.057887687] -> 1000001
 [  3105.459332] nanosleep1(): kpause(1000001)->0
 [  3105.459332] nanosleep1(): now=[3105.459390099] err=0
 [  3105.459332] nanosleep1(): abstime end-now=[992.598496930]
 [  3105.459332] nanosleep1(): for [992.598496930] need 992600
 [  3166.676468] nanosleep1(): kpause(992600)->0
 [  3166.676468] nanosleep1(): now=[3166.676518727] err=0
 [  3166.676468] nanosleep1(): abstime end-now=[931.381368302]
 [  3166.676468] nanosleep1(): for [931.381368302] need 931383
 [  3323.383389] nanosleep1(): kpause(931383)->0
 [  3323.383389] nanosleep1(): now=[3323.383446755] err=0
 [  3323.383389] nanosleep1(): abstime end-now=[774.674440274]
 [  3323.383389] nanosleep1(): for [774.674440274] need 774676
 [  3612.484853] nanosleep1(): kpause(774676)->0
 [  3612.484853] nanosleep1(): now=[3612.485074839] err=0
 [  3612.484853] nanosleep1(): abstime end-now=[485.572812190]
 [  3612.484853] nanosleep1(): for [485.572812190] need 485574
 [  4098.067055] nanosleep1(): kpause(485574)->35
 [  4098.067055] nanosleep1(): now=[4098.067775389] err=0
 [  4098.067055] nanosleep1(): abstime end-now=[-1.990111640]
 [  4098.067055] nanosleep1(): for [0.000000000] need 0
 [  4098.067055] nanosleep1()->0
 
 The first line of that is where the change is most clear, before
 the patch it would have been something more like
 
 [  2794.557267] nanosleep1(): ts2timo() OK [999.999999361] @[2794.558139625] -> 1000001
 
 In both cases the request was for a TIMER_ABSTIME sleep until
 the current time (that is in the [] after '@' in those lines) + 1000
 seconds.   This diagnostic shows what is in the request timespec
 after ts2timo() is called (and returns 0, the "OK" in the line).
 
 In the patched version that value is unchanged from the request,
 in the previous version it has been set to the interval requested
 (1000 secs, more or less - slightly less as some time passed between
 when sleep obtained the current time, added 10000, and requested that
 end time, and when the kernel subtracted the slightly later (larger)
 current time from that, so leaving a little less than 1000).
 
 The rest of the diagnostics show the working of nanosleep1() - it
 calls kpause with the timeout (nb: on my system, HZ=1000 so for
 1000 seconds we have 1000000 ticks, plus the free one the kernel
 throws in as I'm such a good customer...)
 
 In the successful trace above, kpause returned after about 7.4
 seconds (that was as a result of a ^Z fg sequence) leaving 992.6
 seconds left to sleep, 992600 ticks to go, go back and repeat the
 kpause() all over again.   That repeats several times (me stopping
 and resuming the sleep) until the end when kpause() returns 35 (EWOULDBLOCK)
 indicating that kpause() is returning because the time ran out,
 When we calculate the end time then, we see we have overrun by about
 2 seconds (that's acceptable) and nanosleep1() and then clock_nanosleep()
 return 0.   All good.
 
 With the old ts2timo() the sequence was more like this (from yesterday's
 debugging)...
 
 [  2794.557267] nanosleep1(): ts2timo() OK [999.999999361] @[2794.558139625] -> 1000001
 [  2797.301318] nanosleep1(): kpause(1000001)->0
 [  2797.301318] nanosleep1(): now=[2797.301376196] err=0
 [  2797.301318] nanosleep1(): abstime end-now=[-1798.698623165]
 [  2797.301318] nanosleep1(): for [0.000000000] need 0
 [  2797.301318] nanosleep1()->0
 
 The initial kpause() is the same, that time I stopped, and resumed,
 it after about 3 seconds (time went from 2794 to 2797).   The code
 then calculates how much time is left to run, and discovers that it
 has already overrun by almost 1800 seconds!   Better quit quickly,
 so it does.    All because 999-2797 == -1798.
 
 (How long the overrun appears to be depends upon the current time, the
 uptime for CLOCK_MONOTONIC, if the uptime had been < 1000, when the
 sleep process was resumed, then there would have been a positive
 result from the subtraction, and it would have gone back to sleep
 again, for a while.   That is, the -1800 value is irrelevant, just
 that it is something <= 0 - there was no actual overrun, just faulty
 calculations).
 
 I have done more examination of the kernel sources, I am even more
 convinced now that this change is the right one to make.   The analysis
 of the callers of ts2timo() stands.   I looked more into lwp_park(),
 that (as I suspected) doesn't care at all what happens to the timespec
 structure that is passed in, it just converts it to ticks, and waits
 for that long (or less, or perhaps a little more, it really doesn't care).
 In the TIMER_ABSTIME case that's it.   This change will make no difference
 to lwp_park() (the only caller I was slightly concerned about) at all.
 
 However, I have also been considering the TIMER_RELTIME case.   For
 that the timespec is already the interval, so ts2timo() doesn't need
 to do any computation, or adjustment, to get that.
 
 But if the caller specifies a very small interval, < 1 tick, ts2timo()
 adjusts the timespec passed in to be 1 tick (itimerfix()).
 
 That also seems wrong to me - this one is an adjustment to the input
 value in the TIMER_RELTIME case only (at least after the change to be
 made for the TIMER_ABSTIME case, when that is made itimerfix() in that
 case will be adjusting a local temporary variable no-one is ever going
 to see again, just wasting everyone's time).
 
 That can be investigated at some later time, but I suspect that what
 is happening in that case is (at best) sub-optimal as well (even if
 just needless computations doing nothing perhaps, in some of the
 cases.)
 
 It appears to be as if netbsd-10 has the exact same code and hence
 the same issue, and will need a pullup.   But that should be trivial
 as the change is very small and will apply cleanly to the -10 sources.
 
 For netbsd-9 the code isn't identical, but that appears to have the
 same problem - simply applying the patch won't work, it will need a
 separate patch (not difficult).   There's one minor additional complication
 though, in -9 ts2timo() is used for the linux compat futex calls (or one
 of them).   It seems from a not very thorough examination admittedly, that
 what ts2timo() does to the timespec is irrelevant to it - in fact from
 the comments, it seems that the author of that code was aware of the
 issue, and didn't much like it happening.   So I think that won't be
 a problem either.
 
 I do not plan on requesting pullups of either the change to ts2timo(9)
 or the added test case - the older versions don't need either of those.
 The man page, as anyone doing kernel development should be doing it
 using HEAD, not -10 (or -9) so the section-9 man pages there don't need
 to be perfect.   The AFT test working in HEAD (which it does for me)
 should be enough to verify that the identical code in -10 works as well,
 no need to run extra tests there for no reason.   Of course if anyone
 else disagrees and wants to requests pullups of those, go ahead.
 
 Commit coming very soon.
 
 kre
 


Home | Main Index | Thread Index | Old Index