tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

hardware timestamping of packets



Dear all,

This summer I am implementing hardware timestamping of packets within
the NetBSD kernel as part of my Google Summer of Code project.
This is a feature that other operating systems (notably Linux) already
have and we are emulating their implementation to some extent.
However, there are a few points in which the inner architecture of BSD
calls for different implementation decisions and I wanted to
brainstorm a bit the users of this thread in order to see whether we
can find the most appropriate way to solve these issues for BSD.

First of all, most hardware cards that claim to do timestamping can
only timestamp a limited number of packets and use filters in order to
determine which packets to stamp.
I looked over the documentation of Intel 82576 (the card I have in the
Dell servers here and that has timestamping support):
ftp://ftp.pku.edu.cn/open/net/Intel-NIC/doc/OpenSDM_82576-DS-2p0.pdf
It appears that the support that Intel developed for hardware
timestamping is concerned mostly with the PTPv2 protocol packets, the
protocol for synchonizing clocks over the network. They are trying to
provide a synchronized clock in the hardware card which can be read by
the operating system. Looking at sections 7.10 (the one that describes
time synchornization support and discusses timestamping) and section
8.17 (that describes that associated registers), it turns out that the
card does not return timestamps in the packet descriptors but in two
registers that have to be explicitly read. This is fine when dealing
with single packets, since we can extend the receive wm_rxreap
function to read those registers as well, but does not help us when
dealing with multiple packets arriving in the same interrupt. Intel is
mostly concerned with timestamping PTPv2 packets that arrive quite
rarely, so they assured that you can set a filter which takes care
that the card timestamps PTPv2 packets only and which makes the
registers keep the timestamp until they are read (it cannot be
overwritten). When the filter is disabled and all packets are
timestamped, the registers are no longer locked and the timestamp
always corresponds to the latest received packet. Therefore if you
have multiple packets in one interrupt, you only get the hardware
timestamp of the last one. The only way to go around this (at least
for the Intel card) is to set a filter that makes most packet arrivals
create an interrupt, but then capturing at line speed would probably
not be feasible.

Another issue has to do with the interface when timestamping TX
packets. Timestamps are returned over the socket as ancillary data,
which means that they come with the associated packet. Of course, on
the transmit side there is no incoming packet, so what Linux does is
to copy the outgoing packet into a third queue associated with each
socket (the error queue, this seems Linux specific) and attach the
timestamp as ancillary data to that. This extension is not impossible
to add to NetBSD, however I wanted to check with you before adding a
third queue to the sockets. Linux uses a specific flag in recvmsg to
indicate that it wants a packet from the error queue.

The third issue has to do with transmitting timestamps associated with
packets from the driver level to the socket level, where they can be
transformed into ancillary information to be returned to the users.
Linux stores all timestamps in the skbuff structure, which is quite a
heavy structure. By contrast, mbufs are considerably more light and we
do not really wish to alter their structure, since they are
constructed to fit in certain memory page sizes along with their
payloads. Therefore the question arises as to what would be the best
way to associate timestamp information with received packets in the
context of BSD. The closest thing that comes to my mind is the
timestamping of bpf-captured packets, but bpf does not return packets
over a socket and does not really preserve mbufs after this step but
uses a buffer to pass them further.

Regards,
Vlad


Home | Main Index | Thread Index | Old Index