Subject: Re: bridge(4) and silent data corruption :-(
To: None <jonathan@dsg.stanford.edu>
From: Sean Doran <smd@ab.use.net>
List: tech-net
Date: 05/02/2002 12:15:46
| Agreed that ti's interesting.  I would still like to see the output of
| ifconfig (check to see if outboard TCP/ip acceleration is enabled)
| and	
| 	netstat -s -p tcp
| 	netstat -s -p ip
|
| on the machines involved.  ssh sessions to or from the bridge itself
| would also be interesting.

A proper answer will have to wait until Sunday evening European time,
when I can move wires around to put the bridge in front of a machine,
but basically, iirc there was not terribly much interesting in the protocol
summary outputs (I was checking this myself).  There were no problems
whatsoever doing large ssh2 transfers in or out of the bridge,
either to/from the local LAN (either side of the bridge), or to/from
the "world".   Again, this was one reason I was trying to walk through
what bridge(4) is really doing, since it's weird that the only visible
symptom is corruption experienced by machines on the far side of the
router, when those machines are transferring data to/from the world.

Do you have any particular requests for things I should provide
the help diagnose this (someone might suggest "send-pr" :-) )

FWIW, the apples and NetBSD (February i386 SMP kernel) boxes
that are station-X are all running with hardware checksumming
enabled on their interfaces.   I turned the bridge's hw checksumming
on and off as you suggested in earlier email, and it made
no difference.  (I didn't think about turning off the stations'
hw checksumming).

	Sean.

ps - the bridge is up and running, but there's nothing on the far side
     of the bridge (my laptoy is with me on the road, and i didn't want
     my other boxes suffering from network data corruption)

     ex0 is the side of the bridge closest to the router

     putting something on ex1 results in data corruption

     swapping ex0 and ex1, and putting something on ex0 results
     in data corruption

ex0: flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500
        capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        enabled=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        address: 00:04:76:de:ba:da
        media: Ethernet 10baseT
        status: active
        inet6 ex0 prefixlen 64 scopeid 0x1
ex1: flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500
        capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        enabled=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        address: 00:04:75:80:72:a2
        media: Ethernet 10baseT
        status: no carrier
        inet xxx.xxx.xxx.xxx netmask 0xfffffff8 broadcast xxx.xxx.xxx.xxx
        inet6 ex1 prefixlen 64 scopeid 0x2

These netstats are from the bridge.
I don't think netstats from the clients would be meaningful at
this point, since they haven't been behind the bridge for a while.

ip:
        11514641 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with length > max ip packet size
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 malformed fragments dropped
        0 fragments dropped after timeout
        0 packets reassembled ok
        17468 packets for this host
        0 packets for unknown/unsupported protocol
        11481771 packets forwarded (0 packets fast forwarded)
        2225 packets not forwardable
        0 redirects sent
        21234 packets sent from this host
        4 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        99 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        47 datagrams with bad address in header

        1714 packets sent
                1610 data packets (126274 bytes)
                13 data packets (15618 bytes) retransmitted
                76 ack-only packets (1130 delayed)
                0 URG only packets
                0 window probe packets
                11 window update packets
                4 control packets
                0 send attempts resulted in self-quench
        2144 packets received
                1108 acks (for 125920 bytes)
                48 duplicate acks
                0 acks for unsent data
                1134 packets (66356 bytes) received in-sequence
                24 completely duplicate packets (416 bytes)
                0 old duplicate packets
                0 packets with some dup. data (0 bytes duped)
                0 out-of-order packets (0 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                0 window update packets
                0 packets received after close
        0 connection requests
        11 connection accepts
        11 connections established (including accepts)
        9 connections closed (including 6 drops)
        0 embryonic connections dropped
        1097 segments updated rtt (of 1087 attempts)
        4 retransmit timeouts
                0 connections dropped by rexmit timeout
        0 persist timeouts (resulting in 0 dropped connections)
        9 keepalive timeouts
                8 keepalive probes sent
                1 connection dropped by keepalive
        0 correct ACK header predictions
        676 correct data packet header predictions
        586 PCB hash misses
        282 dropped due to no socket
        0 connections drained due to memory shortage
        0 bad connection attempts
        11 SYN cache entries added
                0 hash collisions
                11 completed
                0 aborted (no space to build PCB)
                0 timed out
                0 dropped due to overflow
                0 dropped due to bucket overflow
                0 dropped due to RST
                0 dropped due to ICMP unreachable
        0 SYN,ACKs retransmitted
        0 duplicate SYNs received for entries already in the cache
        0 SYNs dropped (no route or no space)

the only interesting netstat output is from the laptoy, which
has also been travelling around to other places in the network
since the last boot, so i don't know how much is related to
the bridge and how much is related to weird connectivity on the road.
better figures sunday...

ip:
        299731 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        56 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        28 packets reassembled ok
        282751 packets for this host
        31 packets for unknown/unsupported protocol
        0 packets forwarded (0 packets fast forwarded)
        16919 packets not forwardable
        2 packets received for unknown multicast group
        0 redirects sent
        212085 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        15 output packets discarded due to no route
        188 output datagrams fragmented
        376 fragments created
        0 datagrams that can't be fragmented
tcp:
        177489 packets sent
                36984 data packets (21810409 bytes)
                499 data packets (461816 bytes) retransmitted
                0 resends initiated by MTU discovery
                48179 ack-only packets (20549 delayed)
                0 URG only packets
                0 window probe packets
                89283 window update packets
                2555 control packets
        251183 packets received
                31985 acks (for 21393360 bytes)
                2464 duplicate acks
                0 acks for unsent data
                211208 packets (274736280 bytes) received in-sequence
                644 completely duplicate packets (888856 bytes)
                3 old duplicate packets
                12 packets with some dup. data (9648 bytes duped)
                24501 out-of-order packets (34816254 bytes)
                2 packets (2 bytes) of data after window
                2 window probes
                150 window update packets
                29 packets received after close
                342 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
        986 connection requests
        910 connection accepts
        0 bad connection attempts
        0 listen queue overflows
        1883 connections established (including accepts)
        2806 connections closed (including 1450 drops)
                44 connections updated cached RTT on close
                44 connections updated cached RTT variance on close
                22 connections updated cached ssthresh on close
        7 embryonic connections dropped
        31985 segments updated rtt (of 29266 attempts)
        281 retransmit timeouts
                10 connections dropped by rexmit timeout
        0 persist timeouts
                0 connections dropped by persist timeout
        11 keepalive timeouts
                0 keepalive probes sent
                3 connections dropped by keepalive
        2163 correct ACK header predictions
        188277 correct data packet header predictions