NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53562: bridge(4) breaks segmentation / TX checksum offloading



The following reply was made to PR kern/53562; it has been noted by GNATS.

From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
To: Rin Okuyama <rokuyama%rk.phys.keio.ac.jp@localhost>, gnats-bugs%NetBSD.org@localhost,
 kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: msaitoh%execsw.org@localhost
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
 offloading
Date: Fri, 14 Sep 2018 20:17:15 +0900

 On 2018/09/12 21:33, Rin Okuyama wrote:
 > Hi, sorry for the delay in the response.
 > 
 > On 2018/09/10 16:34, Masanobu SAITOH wrote:
 > ...
 >> On 2018/09/08 8:50, Rin Okuyama wrote:
 > ...
 >>>   The following would be a simplest
 >>>   example where your patch does not work:
 >>>   - NetBSD/amd64-currnet with wm0 connected to LAN
 >>>   - qemu-3.0.0nb2 installed from pkgsrc
 >>>   - setup tap and bridge as follows:
 >>>        # ifconfig tap0 create up
 >>>        # ifconfig bridge0 create
 >>>        # brconfig bridge0 add wm0 add tap0 up
 >>>   - run NetBSD/amd64 8.0 on QEMU with tap enabled:
 >>>        # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
 >>>          -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
 >>>   Then, the virtual host can send and receive small packets, e.g.,
 >>>   ping works well regardless of whether TSO is enabled or not for wm0.
 > [quotation added by RO from the previous message of mine]
 >>>   However, it cannot receive larger packets when TSO is enabled, e.g.,
 >>>   file cannot be retrieved through ftp. If TSO is turned off, everything
 >>>   works fine.
 > [quotation done]
 >>
 >>   0)
 >>   Is this "wm0" on the guest or the host?
 >>   If my understanding is correct, writing packet via tap on host is not related
 >>   to TSO...
 > 
 
 I realized that this problem is not only for TSO but for all outgoing offload flags.
 
 
 > Sorry for bothering you many times. This "wm0" is for the host. Also,
 > in this example, the guest fails to receive data from the ftp server
 > working on the host. I forgot to test servers other than the host. It
 > worked fine; only connecting from the guest to host fails.
 
 I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
 flag to wm0.
 
 >>   1)
 >>   If you're using wm(4) on the guest, please try vioif(4) and see if the problem
 >> occurs or not.
 > 
 > I tried vioif(4) on the guest but the situation does not change.
 > 
 >>   2)
 >>   For checksum oofloading, the following kernel options will help for debugging:
 >>      INET_CSUM_COUNTERS
 >>      TCP_CSUM_COUNTERS
 >>      UDP_CSUM_COUNTERS
 >>
 >>   These options add some event counters:
 > ...
 >>
 >>   3)
 >>   wm(4) has a lot of event counters, some of them are useful for debugging.
 >>   To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
 > 
 > Thanks! I enabled these options.
 > 
 >>   4)
 >>   For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
 >>   I have not-yet-committed patch:
 >>
 >>      http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
 >>
 >>> % vmstat -ev | grep toomany
 >>> wm0 txq00txtoomanyseg                                      0    0 misc
 >>
 >>   If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
 >> problem.
 > 
 > I tried if_wm.c r1.587, which already contains this patch. Counter
 > "wmX txqYYtxtoomanyseg" was not increased.
 > 
 > I examined how packets flow in the example above, from the ftp server
 > on host (wm0 on host) to the ftp client on guest (tap0 on host ->
 > vioif0 on guest):
 > 
 > tcp_output()
 > -> ip_output()
 > -> ip_if_output()
 > -> if_output_lock()
 > -> ifp->if_output = ether_output()
 > -> bridge_output()
 > -> bridge_enqueue()
 > 
 > When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
 > packets larger than MTU. If the destination interface is wm0, i.e.,
 > dst_ifp == ifp in bridge_output(), there's no problem with your patch
 > for if_bridge.c; packets are segmented automatically by the HW. However,
 > in this case, the destination is tap0, and bridge_enqueue() send packets
 > larger than MTU for tap0. This is, of course, illegal, since there is no
 > HW by which packets are segmented. The guest on QEMU, therefore, cannot
 > receive packets. This is the scenario how the failure occurs in my
 > example of QEMU above.
 > 
 > The similar problems occur when one of other TX offload options, TSO6,
 > IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
 > consider the case of IP4CSUM-TX. When this option is enabled for wm0
 > on the host, checksumming is omitted in ip_output(). Then, when the
 > destination is tap0, packets with wrong checksum are sent.
 > 
 > This patch disable TX offload when the interface is added to bridge:
 > 
 > http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
 
 I don't know if it's good to doing in L3...
 
 It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
 offload flags when adding a member to a bridge. I also don't know if
 this is good solution or not...
 
   My diff have been committed as if_bridge.c rev. 1.156 now.
 
 > Module Name:	src
 > Committed By:	msaitoh
 > Date:		Fri Sep 14 11:05:09 UTC 2018
 > 
 > Modified Files:
 > 	src/sys/net: if_bridge.c
 > 
 > Log Message:
 >  Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
 > flags. bridge_enqueue() is called from bridge_output() when a packet is
 > spontaneous. Clear csum_flags before calling brige_enqueue() in
 > bridge_forward() or bridge_broadcast() instead of in the beginning of
 > bridge_enqueue().
 > 
 > Note that this change doesn't fix a problem on the following configuration:
 > 
 > 	A bridge has two or more interfaces.
 > 
 > 	An address is assigned to an bridge member interface and
 > 	some offload flags are set.
 > 
 > 	Another interface has no address and has no any offload flag.
 > 
 > XXX pullup-[78]
 > 
 > 
 > To generate a diff of this commit:
 > cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c
 
 
 
 > With this patch, the guest on QEMU can retrieve files from the host
 > without problem, even if TX offload options are enabled, which supports
 > the discussion above. No need to modify if_bridge.c from the original
 > (r1.156) in this case.
 > 
 > I don't know this patch is preferable; there should be better ways to
 > handle the problem. Packets need to be segmented or checksummed only
 > if the destination in bridge_output() is different from the original
 > interface. I will examine further if needed.
 > 
 > Thanks,
 > rin
 
 
 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh%execsw.org@localhost
                                   msaitoh%netbsd.org@localhost)
 


Home | Main Index | Thread Index | Old Index