NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/53562: bridge(4) breaks segmentation / TX checksum offloading
The following reply was made to PR kern/53562; it has been noted by GNATS.
From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
To: Rin Okuyama <rokuyama%rk.phys.keio.ac.jp@localhost>, gnats-bugs%NetBSD.org@localhost,
kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: msaitoh%execsw.org@localhost
Subject: Re: kern/53562: bridge(4) breaks segmentation / TX checksum
offloading
Date: Fri, 14 Sep 2018 20:17:15 +0900
On 2018/09/12 21:33, Rin Okuyama wrote:
> Hi, sorry for the delay in the response.
>
> On 2018/09/10 16:34, Masanobu SAITOH wrote:
> ...
>> On 2018/09/08 8:50, Rin Okuyama wrote:
> ...
>>> Â The following would be a simplest
>>> Â example where your patch does not work:
>>> Â - NetBSD/amd64-currnet with wm0 connected to LAN
>>> Â - qemu-3.0.0nb2 installed from pkgsrc
>>> Â - setup tap and bridge as follows:
>>> Â Â Â Â Â Â # ifconfig tap0 create up
>>> Â Â Â Â Â Â # ifconfig bridge0 create
>>> Â Â Â Â Â Â # brconfig bridge0 add wm0 add tap0 up
>>> Â - run NetBSD/amd64 8.0 on QEMU with tap enabled:
>>> Â Â Â Â Â Â # qemu-system-x86_64 -m 128m -cdrom NetBSD-8.0-amd64.iso -boot d \
>>> Â Â Â Â Â Â Â Â -display curses -net tap,fd=3 -net nic 3<>/dev/tap0
>>> Â Then, the virtual host can send and receive small packets, e.g.,
>>> Â ping works well regardless of whether TSO is enabled or not for wm0.
> [quotation added by RO from the previous message of mine]
>>> Â However, it cannot receive larger packets when TSO is enabled, e.g.,
>>> Â file cannot be retrieved through ftp. If TSO is turned off, everything
>>> Â works fine.
> [quotation done]
>>
>> Â 0)
>> Â Is this "wm0" on the guest or the host?
>> Â If my understanding is correct, writing packet via tap on host is not related
>> Â to TSO...
>
I realized that this problem is not only for TSO but for all outgoing offload flags.
> Sorry for bothering you many times. This "wm0" is for the host. Also,
> in this example, the guest fails to receive data from the ftp server
> working on the host. I forgot to test servers other than the host. It
> worked fine; only connecting from the guest to host fails.
I see. I could reproduce the same problem with Xen dom0 & domU by adding only ip4csum
flag to wm0.
>> Â 1)
>> Â If you're using wm(4) on the guest, please try vioif(4) and see if the problem
>> occurs or not.
>
> I tried vioif(4) on the guest but the situation does not change.
>
>> Â 2)
>> Â For checksum oofloading, the following kernel options will help for debugging:
>> Â Â Â Â INET_CSUM_COUNTERS
>> Â Â Â Â TCP_CSUM_COUNTERS
>> Â Â Â Â UDP_CSUM_COUNTERS
>>
>> Â These options add some event counters:
> ...
>>
>> Â 3)
>> Â wm(4) has a lot of event counters, some of them are useful for debugging.
>> Â To use them add "options WM_EVENT_COUNTERS" to your kernel config file.
>
> Thanks! I enabled these options.
>
>> Â 4)
>> Â For TSO, current wm(4) is not perfect because it doesn't use m_defrag().
>> Â I have not-yet-committed patch:
>>
>> Â Â Â Â http://www.netbsd.org/~msaitoh/wm-defrag-20180906-0.dif
>>
>>> % vmstat -ev | grep toomany
>>> wm0 txq00txtoomanyseg                                     0   0 misc
>>
>> Â If "wmX txqYYtxtoomanyseg" is increasing, the above diff would fix the
>> problem.
>
> I tried if_wm.c r1.587, which already contains this patch. Counter
> "wmX txqYYtxtoomanyseg" was not increased.
>
> I examined how packets flow in the example above, from the ftp server
> on host (wm0 on host) to the ftp client on guest (tap0 on host ->
> vioif0 on guest):
>
> tcp_output()
> -> ip_output()
> -> ip_if_output()
> -> if_output_lock()
> -> ifp->if_output = ether_output()
> -> bridge_output()
> -> bridge_enqueue()
>
> When TSO4 is enabled for wm0 on the host, tcp_output() attempts to send
> packets larger than MTU. If the destination interface is wm0, i.e.,
> dst_ifp == ifp in bridge_output(), there's no problem with your patch
> for if_bridge.c; packets are segmented automatically by the HW. However,
> in this case, the destination is tap0, and bridge_enqueue() send packets
> larger than MTU for tap0. This is, of course, illegal, since there is no
> HW by which packets are segmented. The guest on QEMU, therefore, cannot
> receive packets. This is the scenario how the failure occurs in my
> example of QEMU above.
>
> The similar problems occur when one of other TX offload options, TSO6,
> IP4CSUM-TX, TCPnCSUM-TX, or UDPnCSUM-TX, are enabled. For example,
> consider the case of IP4CSUM-TX. When this option is enabled for wm0
> on the host, checksumming is omitted in ip_output(). Then, when the
> destination is tap0, packets with wrong checksum are sent.
>
> This patch disable TX offload when the interface is added to bridge:
>
> http://www.netbsd.org/~rin/disable_tx_offload_when_bridged_20180912.patch
I don't know if it's good to doing in L3...
It seems that both FreeBSD and DragonFly clear all bridge member's outgoing
offload flags when adding a member to a bridge. I also don't know if
this is good solution or not...
My diff have been committed as if_bridge.c rev. 1.156 now.
> Module Name: src
> Committed By: msaitoh
> Date: Fri Sep 14 11:05:09 UTC 2018
>
> Modified Files:
> src/sys/net: if_bridge.c
>
> Log Message:
> Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
> flags. bridge_enqueue() is called from bridge_output() when a packet is
> spontaneous. Clear csum_flags before calling brige_enqueue() in
> bridge_forward() or bridge_broadcast() instead of in the beginning of
> bridge_enqueue().
>
> Note that this change doesn't fix a problem on the following configuration:
>
> A bridge has two or more interfaces.
>
> An address is assigned to an bridge member interface and
> some offload flags are set.
>
> Another interface has no address and has no any offload flag.
>
> XXX pullup-[78]
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.156 -r1.157 src/sys/net/if_bridge.c
> With this patch, the guest on QEMU can retrieve files from the host
> without problem, even if TX offload options are enabled, which supports
> the discussion above. No need to modify if_bridge.c from the original
> (r1.156) in this case.
>
> I don't know this patch is preferable; there should be better ways to
> handle the problem. Packets need to be segmented or checksummed only
> if the destination in bridge_output() is different from the original
> interface. I will examine further if needed.
>
> Thanks,
> rin
--
-----------------------------------------------
SAITOH Masanobu (msaitoh%execsw.org@localhost
msaitoh%netbsd.org@localhost)
Home |
Main Index |
Thread Index |
Old Index