NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53562: bridge(4) breaks segmentation / TX checksum offloading

The following reply was made to PR kern/53562; it has been noted by GNATS.

From: Masanobu SAITOH <>
To: Rin Okuyama <>,,,,
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
Date: Fri, 14 Sep 2018 20:26:20 +0900

 On 2018/09/14 20:17, Masanobu SAITOH wrote:
 > On 2018/09/12 21:33, Rin Okuyama wrote:
 >> Hi, sorry for the delay in the response.
 >> On 2018/09/10 16:34, Masanobu SAITOH wrote:
 >> ...
 >>> On 2018/09/08 8:50, Rin Okuyama wrote:
 >> ...
 >>>>   The following would be a simplest
 >>>>   example where your patch does not work:
 >>>>   - NetBSD/amd64-currnet with wm0 connected to LAN
 >>>>   - qemu-3.0.0nb2 installed from pkgsrc
 >>>>   - setup tap and bridge as follows:
 >>>>        # ifconfig tap0 create up
 >>>>        # ifconfig bridge0 create
 >>>>        # brconfig bridge0 add wm0 add tap0 up
 >>>>   - run NetBSD/amd64 8.0 on QEMU with tap enabled:
 >>>>        # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
 >>>>          -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
 >>>>   Then, the virtual host can send and receive small packets, e.g.,
 >>>>   ping works well regardless of whether TSO is enabled or not for wm0.
 >> [quotation added by RO from the previous message of mine]
 >>>>   However, it cannot receive larger packets when TSO is enabled, e.g.,
 >>>>   file cannot be retrieved through ftp. If TSO is turned off, everything
 >>>>   works fine.
 >> [quotation done]
 >>>   0)
 >>>   Is this "wm0" on the guest or the host?
 >>>   If my understanding is correct, writing packet via tap on host is not related
 >>>   to TSO...
 > I realized that this problem is not only for TSO but for all outgoing offload flags.
 >> Sorry for bothering you many times. This "wm0" is for the host. Also,
 >> in this example, the guest fails to receive data from the ftp server
 >> working on the host. I forgot to test servers other than the host. It
 >> worked fine; only connecting from the guest to host fails.
 > I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
 > flag to wm0.
 >>>   1)
 >>>   If you're using wm(4) on the guest, please try vioif(4) and see if the problem
 >>> occurs or not.
 >> I tried vioif(4) on the guest but the situation does not change.
 >>>   2)
 >>>   For checksum oofloading, the following kernel options will help for debugging:
 >>>   These options add some event counters:
 >> ...
 >>>   3)
 >>>   wm(4) has a lot of event counters, some of them are useful for debugging.
 >>>   To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
 >> Thanks! I enabled these options.
 >>>   4)
 >>>   For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
 >>>   I have not-yet-committed patch:
 >>>> % vmstat -ev | grep toomany
 >>>> wm0 txq00txtoomanyseg                                      0    0 misc
 >>>   If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
 >>> problem.
 >> I tried if_wm.c r1.587, which already contains this patch. Counter
 >> "wmX txqYYtxtoomanyseg" was not increased.
 >> I examined how packets flow in the example above, from the ftp server
 >> on host (wm0 on host) to the ftp client on guest (tap0 on host ->
 >> vioif0 on guest):
 >> tcp_output()
 >> -> ip_output()
 >> -> ip_if_output()
 >> -> if_output_lock()
 >> -> ifp->if_output = ether_output()
 >> -> bridge_output()
 >> -> bridge_enqueue()
 >> When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
 >> packets larger than MTU. If the destination interface is wm0, i.e.,
 >> dst_ifp == ifp in bridge_output(), there's no problem with your patch
 >> for if_bridge.c; packets are segmented automatically by the HW. However,
 >> in this case, the destination is tap0, and bridge_enqueue() send packets
 >> larger than MTU for tap0. This is, of course, illegal, since there is no
 >> HW by which packets are segmented. The guest on QEMU, therefore, cannot
 >> receive packets. This is the scenario how the failure occurs in my
 >> example of QEMU above.
 >> The similar problems occur when one of other TX offload options, TSO6,
 >> IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
 >> consider the case of IP4CSUM-TX. When this option is enabled for wm0
 >> on the host, checksumming is omitted in ip_output(). Then, when the
 >> destination is tap0, packets with wrong checksum are sent.
 >> This patch disable TX offload when the interface is added to bridge:
 > I don't know if it's good to doing in L3...
 > It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
 > offload flags when adding a member to a bridge. I also don't know if
 > this is good solution or not...
 >   My diff have been committed as if_bridge.c rev. 1.156 now.
 >> Module Name:    src
 >> Committed By:    msaitoh
 >> Date:        Fri Sep 14 11:05:09 UTC 2018
 >> Modified Files:
 >>     src/sys/net: if_bridge.c
 >> Log Message:
 >>  Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
 >> flags. bridge_enqueue() is called from bridge_output() when a packet is
 >> spontaneous. Clear csum_flags before calling brige_enqueue() in
 >> bridge_forward() or bridge_broadcast() instead of in the beginning of
 >> bridge_enqueue().
 >> Note that this change doesn't fix a problem on the following configuration:
 >>     A bridge has two or more interfaces.
 >>     An address is assigned to an bridge member interface and
 >>     some offload flags are set.
 >>     Another interface has no address and has no any offload flag.
 >> XXX pullup-[78]
 >> To generate a diff of this commit:
 >> cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c
 >> With this patch, the guest on QEMU can retrieve files from the host
 >> without problem, even if TX offload options are enabled, which supports
 >> the discussion above. No need to modify if_bridge.c from the original
 >> (r1.156) in this case.
 >> I don't know this patch is preferable; there should be better ways to
 >> handle the problem. Packets need to be segmented or checksummed only
 >> if the destination in bridge_output() is different from the original
 >> interface. I will examine further if needed.
 >> Thanks,
 >> rin
                  SAITOH Masanobu (

Home | Main Index | Thread Index | Old Index