Re: if_txtimer API to replace (*if_watchdog)()

To: Taylor R Campbell <campbell+netbsd-tech-net%mumble.net@localhost>
Subject: Re: if_txtimer API to replace (*if_watchdog)()
From: Jason Thorpe <thorpej%me.com@localhost>
Date: Fri, 24 Jan 2020 06:57:03 -0800
> On Jan 24, 2020, at 12:27 AM, Taylor R Campbell <campbell+netbsd-tech-net%mumble.net@localhost> wrote:
> 
>> 2- if_timer does not have a way to model multiple queues.
> 
> Does if_txtimer have a way to model multiple queues?  Is it not just
> driver-specific logic to handle sc->sc_txq[i].txq_lastsent for each
> queue independently like wm(4)?

if_txtimer is logically associated with a transmit queue, so you would have one if_txtimer per transmit queue.  This is what wm(4) does.  (I think I mentioned originally that if_txtimer is essentially the wm(4) driver's transmit timer logic wrapped up in a small API that makes it easy to adopt in other drivers with minimal changes.)

>> 3- There should be one basic way to modeling the transmit watchdog
>> timer, rather than N.  (In the examples you cited above, N is
>> definitely > 1.)
> 
> If we just replaced if_slowtimo by an explicit callout in each driver
> just like wm_tick, would you still consider that to be >1 model?

If each driver used a callout per transmit queue, then it would be a single model.  I noted in my last message that callout would address all of the same issues as if_txtimer, but that I had other reasons for not using it.

>> Callouts are a bit more expensive... to arm them or introspect them
>> requires taking an additional lock **shared by other callouts** that
>> could be a source of lock contention;
> 
> The lock is normally the per-CPU lock of the CPU where it was last
> scheduled, so that should be pretty cheap unless something is going
> seriously wrong with callouts and/or the driver.

With these timers, in callout's terminology, it will be extremely common (like maybe 99.999%-100% of the time) for them to be "pending", and extremely rare for them to be "invoking" or "expired".  In a high-traffic scenario, this means that they will be almost always assigned to a CPU.  My concern is not necessarily other callouts within the driver causing contention, but other callouts in the system that might happen to also be assigned to that CPU.

Perhaps I'm engaged in premature optimization, but I wanted to avoid having to use *any* additional locks other than what the driver would already be using internally.

>> This could make a difference when you're trying to blast out packets
>> at 40Gb/s.
> 
> Which drivers (a) are for hardware devices that can handle >>1GB/sec,
> and (b) use if_watchdog -- i.e., which high-throughput drivers would
> take advantage of the callout-based mechanism inside your proposed
> if_txtimer?

dge(4).

>> Callouts also have a larger memory footprint ... not hugely so, mind
>> you, but each one is 10 pointers large; my proposal uses a bit less
>> memory than callouts... nearly a wash if you consider the fields
>> that can be removed from struct ifnet when the conversion of every
>> driver is complete.
> 
> If we break the drivers into two classes:
> 
> 1. Those that currently use if_watchdog.
>   - Status quo: There already is exactly 1 callout per device.
>   - With callouts: There would still be exactly 1 callout per device,
>     but in struct foo_softc instead of struct ifnet.
>   - With if_txtimer: There would still be exactly 1 callout per
>     device, but in struct if_txtimer instead of struct ifnet.
> 
> 2. Those that currently do not use if_watchdog and do their own thing.
>   - Status quo: There already are (1 + n) callouts per device, where
>     n is however many the driver allocates to do its own thing.
>   - With callouts: There would be n callouts per device because we
>     get rid of if_slowtimo.
>   - With if_txtimer: There would be n callouts per device because we
>     get rid of if_slowtimo.
> 
> I don't see a difference.  Am I missing something?

This is not actually the real breakdown.  The current situation is:

- If a driver has a non-NULL (*if_slowtimo)() (i.e. (*if_watchdog)()) function pointer, that ifnet structure will have a callout that fires 1x/sec to implement if_slowtimo (note the use of a callout for this purpose is different from the original BSD implementation).  The driver does not have access to this 1x/sec callout; it is only called back via (*if_slowtimo)() iff when if_timer transitions from 1 -> 0.

- Many network interface drivers have their own independent 1x/sec "tick" callout to implement link maintenance (this was introduced when mii(4) was added), stats gathering, etc.

So, the truth is that several drivers today actually have 2 -- if_slowtimo and "tick".  The wm(4) driver has 1 (it has a NULL (*if_slowtimo)(), and provides a "tick" callout for its own use).

My proposal keeps wm(4) at 1 (its own "tick" callout) (because it simply encapsulates the wm(4) approach into an API).  Using bare callouts to implement the tx watchdog timer would mean wm(4) could have up to 17 ("tick" + one callout per hardware tx queue).

The reason that wm(4) only has a single callout despite having up to 16 hardware tx queues is because in its "tick" callout, it checks all tx queue timers for expiration in a batch.  My intent with if_txtimer is that drivers that have multiple hardware tx queues would use the wm(4) driver's pattern.

My addition of if_txtimer_callout is meant to simplify the "hardware only has 1 tx queue, and driver does not already have a tick callout" case.  I.e. it allows this class of simpler driver (perhaps those used for legacy chips, e.g. if_sq.c) to use the new NET_MPSAFE mechanism without having to add their own "tick" callout directly.

>> Tx transmit timers have no need for precision ... 1 second
>> granularity is totally sufficient, and many drivers are already
>> using a 1-second tick timer for link maintenance / stats updating;
>> my proposal piggy-backs on that.  The cost is that you do some work
>> once a second.
> 
> I don't understand how this makes a difference between just using
> callouts directly, and hiding callouts in an API that uses them on
> request.  Can you elaborate?

The trade-off I am making here is "arming the timer in the driver transmit hot path is super cheap at the cost of checking it once a second" vs. "arming the timer in the driver transmit hot path a little more expensive, but you don't have to check it periodically once a second".

Maybe you're missing that if_txtimer_callout fires once a second, not when the transmit timer actually expires.

>> The intent of if_txtimer_callout is to essentially provide a the
>> callout-backed "tick" facility to drive if_txtimer to drivers that
>> wouldn't otherwise need that facility for their own purposes.  It's
>> a set of convenience functions to reduce code duplication.
> 
> Can you show what code is deduplicated this way?
> 
> The diff has +102 -82, even if I ignore the additions in net/if.h and
> net/if.c.  Some + is to be expected by pushing stuff from struct ifnet
> optionally into struct foo_softc, but I read through the patch and I
> don't see anything that looks plausibly like deduplication, other than
> perhaps the expression in if_txtimer_expired_explicit in wm(4).  The
> comment and whitespace density seems to be about the same before and
> after, so that seems like it's increasing the amount of code _and_ the
> API complexity.

My comment there was directed at if_txtimer_callout ... again, that is a convenience flavor of the API meant to prevent a driver that does not already provide a "tick" callout from having to add one.  Right now, there is a shared "slowtimo" facility that all drivers can leverage (it's just that it happens to be a bad mechanism).  Eventually, that will be removed.

Increasing API complexity is inevitable because MP-safe APIs are more complex than naive ones.  I am, however, attempting to minimize the complexity and still make the MP-safe mechanism easy to adopt universally (even for platforms that don't actually need it).

>>> For example, will any driver call if_txtimer_start
>>> without actually knowing up front whether it needs to use the callout?
>>> Will any driver call if_txtimer_stop in a code path that would not be
>>> adequately served either by if_txtimer_disarm, or by an unconditional
>>> if_txtimer_disarm & callout_stop?
>> 
>> The rules are pretty simple:
> 
> My questions weren't about the rules for use.  My questions were:
> 
> 1. Will any driver call if_txtimer_start without actually knowing up
>   front whether it needs to use the callout?

No.  if_txtimer_start() is meant only for drivers that don't provide their own "tick" ... i.e. it starts the "tick" that's hidden inside the API.

> 2. Will any driver call if_txtimer_stop in a code path that would not
>   be adequately served either by if_txtimer_disarm, or by an
>   unconditional if_txtimer_disarm & callout_stop?

if_txtimer_stop() stops the "tick" that's hidden inside the API.  Drivers that provide their own "tick" won't use it.  It's meant to be used inside the driver routine that brings the interface down.  It is a convenience wrapper around callout_stop() for drivers that don't already have their own callout_stop() call in that location.

In if_txtimer_disarm() is a lightweight function that can be used inside there driver's tx interrupt routine (perhaps a hot path) that means the equivalent of "ifp->if_timer = 0;".

> That is, will there be any code paths which meaningfully take
> advantage of having a run-time conditional?  If yes, can you give an
> example?  If no, why do we need a replica of the callout API spelled
> differently in if_txtimer?

It's not a replica.  It's a wrapper, and it's meant to be used by drivers that themselves don't already use a periodic callout directly.

> In the diff you showed, the answers to both questions are no: wm(4)
> and pcn(4) _know_ they don't use the if_txtimer callout, so they don't
> need if_txtimer_start/stop at all; sq(4) _knows_ it uses the
> if_txtimer callout, so it could just as well use callout_schedule and
> callout_stop instead of if_txtimer_start/stop.
> 
> This suggests to me that the if_txtimer API (at least, the version
> with optional callouts) is confounding two different purposes and
> would make it harder, not easier, to understand the drivers -- it uses
> one name, but there are really two disparate usage models.

No, they're not disparate usage models at all, because you'll notice that the way those drivers use if_txtimer_arm() and if_txtimer_disarm() are equivalent.

The difference is really one just of setup.

-- thorpej
Follow-Ups:
- Re: if_txtimer API to replace (*if_watchdog)()
  - From: Taylor R Campbell
- Re: if_txtimer API to replace (*if_watchdog)()
  - From: Jason Thorpe
References:
- Re: if_txtimer API to replace (*if_watchdog)()
  - From: Taylor R Campbell
Prev by Date: Re: if_txtimer API to replace (*if_watchdog)()
Next by Date: Re: if_txtimer API to replace (*if_watchdog)()
Previous by Thread: Re: if_txtimer API to replace (*if_watchdog)()
Next by Thread: Re: if_txtimer API to replace (*if_watchdog)()
Indexes:
Home | Main Index | Thread Index | Old Index