tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: packet timestamps (was Re: Changes to make /dev/*random better sooner)

On Wed, Apr 09, 2014 at 04:36:26PM -0700, Dennis Ferguson wrote:
> What I would like to do is to make per-packet timestamps (which you
> are doing already) more widely useful by moving where the timestamp is
> taken closer to the input interrupt which signals a packet delivery
> and carrying it in each packet's mbuf metadata.  This has a nice effect
> on the rare occasion that the packet is delivered to a socket with a
> consumer with cares about packet arrival times (like an NTP or IEEE 1588
> implementation), but it can also provide a new measure of the performance
> of the network code when making changes to it (how long did it take packets
> to get from the input interface to the socket, or from the input interface
> to the output interface?) which doesn't exist now.  In fact it would be
> nice to have it right now so the effect of the changes you are making
> could be measured instead of just speculating.  I was also imagining that
> the random number machinery would harvest timestamps from the mbufs,
> but maybe only after it is determined if the timestamp is one a user
> is interested in so it didn't use those.

FWIW, based on a suggestion by Dennis, in August I added a timestamp
capability to MBUFTRACE for my use at $DAYJOB.  MBUFTRACE thus enhanced
shows me how long it takes (cycles maximum, cycles average) for packets
to make ownership transitions: e.g., from IP queue to wm0 transmission
queue.  This has been useful for finding out where to concentrate my
optimization efforts, and it has helped to rule-in or -out hypotheses
about networking bugs.  Thanks, Dennis.

Here and there I have also fixed an MBUFTRACE bug, and I have made some
changes designed to reduce counters' cache footprint.  I call my variant
of MBUFTRACE, MBUFTRACE3.  I hope to feed MBUFTRACE3 back one of these

Here is a sample of 'netstat -ssm' output when MBUFTRACE3 is operating
on a box with "background" levels of traffic---there are two tables,
the first that you will recognize, and the second which is new:

                                             small        ext    cluster
            unix               inuse             3          1          1
             arp hold          inuse             8          0          0
             wm8 rx ring       inuse          1024       1024       1024
             wm7 rx ring       inuse          1024       1024       1024
             wm6 rx ring       inuse          1024       1024       1024
             wm5 rx ring       inuse          1024       1024       1024
             wm4 rx ring       inuse          1024       1024       1024
             wm3 rx ring       inuse          1024       1024       1024
             wm2 rx ring       inuse          1024       1024       1024
             wm1 rx ring       inuse          1024       1024       1024
             wm0 rx ring       inuse          1024       1024       1024
         unknown data          inuse         34802      34802          0
         revoked               inuse       1273302      63033      22922
 microsecs/tr           max # transitions   previous owner -> new owner
            6            17         2,068     wm8 cpu5 rxq -> wm8 rx          
            8            27           199           udp rx -> revoked         
            9        54,719        10,772 defer arp_deferral -> revoked         
           13            26            82           route  -> revoked         
           70     1,627,735       835,683            unix  -> revoked         
          241       487,685         3,977          ixg1 rx -> arp             
          260           260             1         arp hold -> ixg0 tx         
        1,410       456,389           772          ixg0 rx -> arp             
       22,296     6,712,491         2,082          ixg1 tx -> revoked         
      315,846     6,293,761           136          ixg0 tx -> revoked         
      516,585       709,193            13      wm4 tx ring -> revoked         

There are microseconds in that table, but netstat reads the CPU
frequency, average and maximum cycles/transition from the kernel and
does the arithmetic.  I'm using the CPU cycle counter to make all of the
timestamps.  I'm not compensating for clock drift or anything.

A more suitable display for this information than a table is *probably*
a directed graph with a vertex corresponding to each mbuf owner, and
an edge corresponding to each owner->owner transition.  Set the area
of each vertex in proportion to the mbufs in-use by the corresponding
owner, and set the width of each edge in proportion to the rate of
transitions.  Label each vertex with an mbuf-owner name.  Graphs for
normal/high-performance and abnormal/low-performance machines will have
distinct patterns, and the graph will help to illuminate bottlenecks.
If anyone is interested in programming this, let me know, and I will
describe in more detail what I have in mind.


David Young    Urbana, IL    (217) 721-9981

Home | Main Index | Thread Index | Old Index