tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: packet timestamps (was Re: Changes to make /dev/*random better sooner)
On Wed, Apr 09, 2014 at 04:36:26PM -0700, Dennis Ferguson wrote:
> What I would like to do is to make per-packet timestamps (which you
> are doing already) more widely useful by moving where the timestamp is
> taken closer to the input interrupt which signals a packet delivery
> and carrying it in each packet's mbuf metadata. This has a nice effect
> on the rare occasion that the packet is delivered to a socket with a
> consumer with cares about packet arrival times (like an NTP or IEEE 1588
> implementation), but it can also provide a new measure of the performance
> of the network code when making changes to it (how long did it take packets
> to get from the input interface to the socket, or from the input interface
> to the output interface?) which doesn't exist now. In fact it would be
> nice to have it right now so the effect of the changes you are making
> could be measured instead of just speculating. I was also imagining that
> the random number machinery would harvest timestamps from the mbufs,
> but maybe only after it is determined if the timestamp is one a user
> is interested in so it didn't use those.
FWIW, based on a suggestion by Dennis, in August I added a timestamp
capability to MBUFTRACE for my use at $DAYJOB. MBUFTRACE thus enhanced
shows me how long it takes (cycles maximum, cycles average) for packets
to make ownership transitions: e.g., from IP queue to wm0 transmission
queue. This has been useful for finding out where to concentrate my
optimization efforts, and it has helped to rule-in or -out hypotheses
about networking bugs. Thanks, Dennis.
Here and there I have also fixed an MBUFTRACE bug, and I have made some
changes designed to reduce counters' cache footprint. I call my variant
of MBUFTRACE, MBUFTRACE3. I hope to feed MBUFTRACE3 back one of these
days.
Here is a sample of 'netstat -ssm' output when MBUFTRACE3 is operating
on a box with "background" levels of traffic---there are two tables,
the first that you will recognize, and the second which is new:
small ext cluster
unix inuse 3 1 1
arp hold inuse 8 0 0
wm8 rx ring inuse 1024 1024 1024
wm7 rx ring inuse 1024 1024 1024
wm6 rx ring inuse 1024 1024 1024
wm5 rx ring inuse 1024 1024 1024
wm4 rx ring inuse 1024 1024 1024
wm3 rx ring inuse 1024 1024 1024
wm2 rx ring inuse 1024 1024 1024
wm1 rx ring inuse 1024 1024 1024
wm0 rx ring inuse 1024 1024 1024
unknown data inuse 34802 34802 0
revoked inuse 1273302 63033 22922
microsecs/tr max # transitions previous owner -> new owner
6 17 2,068 wm8 cpu5 rxq -> wm8 rx
8 27 199 udp rx -> revoked
9 54,719 10,772 defer arp_deferral -> revoked
13 26 82 route -> revoked
70 1,627,735 835,683 unix -> revoked
241 487,685 3,977 ixg1 rx -> arp
260 260 1 arp hold -> ixg0 tx
1,410 456,389 772 ixg0 rx -> arp
22,296 6,712,491 2,082 ixg1 tx -> revoked
315,846 6,293,761 136 ixg0 tx -> revoked
516,585 709,193 13 wm4 tx ring -> revoked
There are microseconds in that table, but netstat reads the CPU
frequency, average and maximum cycles/transition from the kernel and
does the arithmetic. I'm using the CPU cycle counter to make all of the
timestamps. I'm not compensating for clock drift or anything.
A more suitable display for this information than a table is *probably*
a directed graph with a vertex corresponding to each mbuf owner, and
an edge corresponding to each owner->owner transition. Set the area
of each vertex in proportion to the mbufs in-use by the corresponding
owner, and set the width of each edge in proportion to the rate of
transitions. Label each vertex with an mbuf-owner name. Graphs for
normal/high-performance and abnormal/low-performance machines will have
distinct patterns, and the graph will help to illuminate bottlenecks.
If anyone is interested in programming this, let me know, and I will
describe in more detail what I have in mind.
Dave
--
David Young
dyoung%pobox.com@localhost Urbana, IL (217) 721-9981
Home |
Main Index |
Thread Index |
Old Index