NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53562: bridge(4) breaks segmentation / TX checksum offloading

Hi, sorry for the delay in the response.

On 2018/09/10 16:34, Masanobu SAITOH wrote:
On 2018/09/08 8:50, Rin Okuyama wrote:
  The following would be a simplest
  example where your patch does not work:
  - NetBSD/amd64-currnet with wm0 connected to LAN
  - qemu-3.0.0nb2 installed from pkgsrc
  - setup tap and bridge as follows:
       # ifconfig tap0 create up
       # ifconfig bridge0 create
       # brconfig bridge0 add wm0 add tap0 up
  - run NetBSD/amd64 8.0 on QEMU with tap enabled:
       # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
         -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
  Then, the virtual host can send and receive small packets, e.g.,
  ping works well regardless of whether TSO is enabled or not for wm0.
[quotation added by RO from the previous message of mine]
  However, it cannot receive larger packets when TSO is enabled, e.g.,
  file cannot be retrieved through ftp. If TSO is turned off, everything
works fine.
[quotation done]

  Is this "wm0" on the guest or the host?
  If my understanding is correct, writing packet via tap on host is not related
  to TSO...

Sorry for bothering you many times. This "wm0" is for the host. Also,
in this example, the guest fails to receive data from the ftp server
working on the host. I forgot to test servers other than the host. It
worked fine; only connecting from the guest to host fails.

  If you're using wm(4) on the guest, please try vioif(4) and see if the problem
occurs or not.

I tried vioif(4) on the guest but the situation does not change.

  For checksum oofloading, the following kernel options will help for debugging:

  These options add some event counters:

  wm(4) has a lot of event counters, some of them are useful for debugging.
  To use them add "options WM_EVENT_COUNTERS" to your kernel config file.

Thanks! I enabled these options.

  For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
  I have not-yet-committed patch:

% vmstat -ev | grep toomany
wm0 txq00txtoomanyseg                                      0    0 misc

  If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the

I tried if_wm.c r1.587, which already contains this patch. Counter
"wmX txqYYtxtoomanyseg" was not increased.

I examined how packets flow in the example above, from the ftp server
on host (wm0 on host) to the ftp client on guest (tap0 on host ->
vioif0 on guest):

-> ip_output()
-> ip_if_output()
-> if_output_lock()
-> ifp->if_output = ether_output()
-> bridge_output()
-> bridge_enqueue()

When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
packets larger than MTU. If the destination interface is wm0, i.e.,
dst_ifp == ifp in bridge_output(), there's no problem with your patch
for if_bridge.c; packets are segmented automatically by the HW. However,
in this case, the destination is tap0, and bridge_enqueue() send packets
larger than MTU for tap0. This is, of course, illegal, since there is no
HW by which packets are segmented. The guest on QEMU, therefore, cannot
receive packets. This is the scenario how the failure occurs in my
example of QEMU above.

The similar problems occur when one of other TX offload options, TSO6,
IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
consider the case of IP4CSUM-TX. When this option is enabled for wm0
on the host, checksumming is omitted in ip_output(). Then, when the
destination is tap0, packets with wrong checksum are sent.

This patch disable TX offload when the interface is added to bridge:

With this patch, the guest on QEMU can retrieve files from the host
without problem, even if TX offload options are enabled, which supports
the discussion above. No need to modify if_bridge.c from the original
(r1.156) in this case.

I don't know this patch is preferable; there should be better ways to
handle the problem. Packets need to be segmented or checksummed only
if the destination in bridge_output() is different from the original
interface. I will examine further if needed.


Home | Main Index | Thread Index | Old Index