Subject: Connections stall in vlan(4) setup
To: None <tech-net@netbsd.org>
From: Nino Dehne <ndehne@gmail.com>
List: tech-net
Date: 02/21/2006 19:27:19
Hi,

after kern/32874 I'm pulling out my hair again over another issue.

Setup is as follows:

dsl--[sip2]gw1[sip1]---[sip2]gw2[sip1][vlan1]---lan1
                                         .        .
                                         .        .

sip0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        address: [...]
        media: Ethernet autoselect (none)
        status: no carrier
sip1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        address: [...]
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        [...]
sip2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        address: [...]
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        inet 192.168.0.253 netmask 0xfffffffc broadcast 192.168.0.255
        [...]
[...]
vlan1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        vlan: 1 parent: sip1
        address: [...]
        inet 192.168.1.254 netmask 0xffffff00 broadcast 192.168.1.255
        [...]
vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        vlan: 2 parent: sip1
        address: [...]
        inet 192.168.2.254 netmask 0xffffff00 broadcast 192.168.2.255
        [...]

gw1 is the router described in PR 32874. gw2 is another router running
3.99.15 and pf(4).

The problem I'm having is that an HTTP transfer from a server in lan1
to gw2 is stalling, i.e. ftp(1) is stuck at

Requesting [...]
  0% |                                     |     0       0.00 KB/s    --:-- ETA

The same transfer from gw1, however, works just fine. Also, traffic from
the LANs to the outside and back works just fine as well. All blocking rules
in pf(4) are logging and nothing shows up while attempting the transfer.

Things I have tried:

1) Cut connection between gw1 and gw2 and flush pf(4)'s ruleset. Doesn't
   help.
2) Cut connection between gw1 and gw2, flush the ruleset, delete vlan1 and
   configure vlan1's former address on the unused sip0 interface. Now the
   transfer works!
3) Telnet to the server manually and issue an illegal request. Since the
   reply is small enough to maybe fit into one packet, this also works
   just fine, even with vlan(4).

So somehow this must have got to do something with packet size, although
I don't get why this only applies to connections originating from gw2 and
not routed ones.

A session dump at the switch port that sip1 is connected to (sorry for the
overly long lines):

No.Time            Source        Destination   Protocol Info
 1 18:26:14.943333 192.168.1.254 192.168.1.1   TCP      65532 > 80 [SYN] Seq=0 Ack=0 Win=32768 Len=0 MSS=1460 WS=0 TSV=0 TSER=0
 2 18:26:14.943498 192.168.1.1   192.168.1.254 TCP      80 > 65532 [SYN, ACK] Seq=0 Ack=1 Win=32768 Len=0 MSS=1460 WS=0 TSV=0 TSER=0
 3 18:26:14.943903 192.168.1.254 192.168.1.1   TCP      65532 > 80 [ACK] Seq=1 Ack=1 Win=33580 Len=0 TSV=0 TSER=0
 4 18:26:14.944684 192.168.1.254 192.168.1.1   HTTP     GET /HEAD.gz HTTP/1.1
 5 18:26:14.946936 192.168.1.1   192.168.1.254 TCP      [TCP segment of a reassembled PDU]
 6 18:26:14.947152 192.168.1.1   192.168.1.254 TCP      [TCP segment of a reassembled PDU]
 7 18:26:14.947278 192.168.1.1   192.168.1.254 TCP      [TCP segment of a reassembled PDU]
 8 18:26:14.947397 192.168.1.1   192.168.1.254 TCP      [TCP segment of a reassembled PDU]
 9 18:26:14.947521 192.168.1.1   192.168.1.254 TCP      [TCP segment of a reassembled PDU]
10 18:26:15.139716 192.168.1.254 192.168.1.1   TCP      65532 > 80 [ACK] Seq=102 Ack=246 Win=33580 Len=0 TSV=1 TSER=0
11 18:26:15.140030 192.168.1.1   192.168.1.254 TCP      [TCP segment of a reassembled PDU]
12 18:26:15.140152 192.168.1.1   192.168.1.254 TCP      [TCP segment of a reassembled PDU]
13 18:26:16.132811 192.168.1.1   192.168.1.254 TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
14 18:26:18.132539 192.168.1.1   192.168.1.254 TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
15 18:26:22.131982 192.168.1.1   192.168.1.254 TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
16 18:26:30.130916 192.168.1.1   192.168.1.254 TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
17 18:26:45.909787 192.168.1.1   192.168.1.254 HTTP     Continuation or non-HTTP traffic
18 18:26:46.129413 192.168.1.1   192.168.1.254 TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
19 18:27:18.124442 192.168.1.1   192.168.1.254 TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]
26 18:27:49.900170 192.168.1.1   192.168.1.254 HTTP     [TCP Retransmission] Continuation or non-HTTP traffic
27 18:28:22.115810 192.168.1.1   192.168.1.254 TCP      [TCP Retransmission] [TCP segment of a reassembled PDU]

What is going on here? What else could I try?

A maybe related problem is that rules on gw2 cannot use the "synproxy state"
mechanism. I have to use modulate or keep state, otherwise the expected TCP
handshake from gw2 to lan1 is not even attempted according to tcpdump on the
server in lan1. This then applies to all connections, even from gw1 or outside.

TIA

Best regards,

ND