Subject: Fix for PR#20390 - callwheel corruption related to TCP
To: None <tech-kern@netbsd.org>
From: Havard Eidnes <he@netbsd.org>
List: tech-kern
Date: 07/18/2003 16:20:46
Hi,

I'd like to direct your attention to PR#20390.  This is a race
condition (also on uniprocessor systems) related to TCP's use of the
callout facility, and the root cause is that in the following snippet
of code from sys/kern_timeout.c

 	CALLOUT_UNLOCK(s);
 	(*func)(arg);
 	CALLOUT_LOCK(s);

there is a window between CALLOUT_UNLOCK() (which lowers priority) and
the callout handler being called and getting to doing it's own spl()-
based exclusion which will allow other code to run.  The notes added
to the PR contains further explanation of what might happen.

Short-term I am leaning towards getting the patch in the PR committed
and pulled up to the netbsd-1-6 branch (yes, the problem is present
there as well...), so that we can get mitigation of the problem in
place while the right solution to this problem can be found and
implemented.

In the private review I've had so far, two suggestions have been made
for how to properly deal with this problem.  They are:

1) add an argument to the functions which establish a callout,
   indicating which IPL they would like to be invoked at, and instead
   of dropping all interrupt priority, lower priority to the
   registered IPL instead, a'la:

    s =3D splclock();
    /* get callout structure */
    spllower(co->co_ipl);
    (*co->co_func)(...)
    splx(s);

   Now, spl(9) does not document spllower(), and even though the i386
   port has that function available, other ports are missing this
   operation.

   As a sanity check, the callout establish function should possibly
   also make sure that noone try to register a callout handler with an
   IPL higher than what the callout code runs on, but presently there
   is no suitable comparison primitive for IPL levels (and it's not
   given that there can be (?)).

   I don't know whether it is possible to implement the spllower()
   primitive on all ports, and could use some guidance (and if it's
   possible, some assistance) here.

   This seems to me to be a pretty self-contained fix, which does not
   require extensive changes to the places where the callout code is
   being used.

2) make callout_stop() return an int (it's a void function at present)
   indicating whether the function could not stop a callout because
   it's in the process of being invoked, and add code the places where
   callout_stop() is used to handle this situation.

   This looks in some sense similar to the patch in the PR, in that it
   would require the cooperation of the callout handlers in doing
   "callout_ack()" once they have entered their spl()-protected
   region, and it would also require code to deal with the failure of
   callout_stop().

As might be evident from the above, I'm leaning in the direction of
the first solution if that is in fact possible to do.

As for the netbsd-1-6 branch, I suspect that both of these more
comprehensive solutions would require too many changes to make it a
good fit for pull-up, so I think we'll have to live with the
workaround patch for the TCP code on that branch for the moment.

Comments?

Best regards,

- H=E5vard