Subject: packet loss? w/ 1.6[A-D] & IPSEC policy
To: None <>
From: Arto Selonen <>
List: current-users
Date: 07/20/2002 14:30:19

I have had IPSEC policies in place since late May 2001, and things
have worked as expected up to 1.5ZC. After upgrading to 1.6A,
and continuing to the current 1.6B/1.6D client/server pair I have had
problems. SSH seems to work without noticeable effects, but
eg. web surfing from client to server breaks with connections
eventually timing out, etc.

If my memory serves me right, then this started happening as soon as I
upgraded the client ( from 1.5ZC to 1.6A, even though the
problems *seem* to be at the server end (which first stayed at 1.5ZC
and was then upgraded to 1.6A, 1.6B and 1.6D without any help).

It would seem that as soon as I turn IPSEC policy on for the
client/server pair, I start loosing packets (from the server end).
Why did it surface after (client?) upgrade to 1.6A (and beyond)? Have I
overlooked a required change somewhere along the way?

What can (should) I do to get this working again? send-pr? Any help in
debugging this would be appreciated. Naturally, I am happy to
provide any additional details that might be relevant to this issue.

The client ( runs 1.6B and the server ( runs 1.6D.
There is the Big Bad Internet between the hosts.

Here is what I'm seeing without the policies on:

user@ telnet 80
Connected to
Escape character is '^]'.
GET HTTP/1.0<enter>
HTTP/1.1 200 OK
[rest of headers + page content follows]
Connection closed by foreign host.

After I add the policies the dialog becomes:

user@ telnet 80
Connected to
Escape character is '^]'.
GET HTTP/1.0<enter>
Connection closed by foreign host.

In other words, there is no output and when I press Enter for the third
time the connection is closed. This can be repeated at will, and similar
effects happen when using web browsers. Sometimes I might even get a part
of the page before the connection closes (using lynx). Unsuccesful attempts
do not register at the web server logs (Apache 1.3.26).

The amount of data that the HTTP reply should contain is a bit over 2KB.

Here are some details:

/etc/ipsec.conf @ (modified IP/spi/keys/whitespace):
add esp 0 -E rijndael-cbc 0x0000000000000000000000000000000000000000000000000000000000000000;
add ah  1 -A hmac-sha1    0x0000000000000000000000000000000000000000;
add esp 2 -E rijndael-cbc 0x0000000000000000000000000000000000000000000000000000000000000000;
add ah  3 -A hmac-sha1    0x0000000000000000000000000000000000000000;
spdadd any -P out ipsec esp/transport//require ah/transport//require;
spdadd any -P in  ipsec esp/transport//require ah/transport//require;

The "same" policy is used at No racoon or other is used for key
management (so that should not be an issue here).

Here are the tcpdump outputs at (web server) for the above trials:

tcpdump -n -i ep0 host (no policy, modified timestamp,IP,port,seq#)
27.66736 > S 123:123(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF)
27.66780 > S 987:987(0) ack 124 win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF)
27.67405 > . ack 1 win 17520 <nop,nop,timestamp 0 0> (DF)
37.48140 > P 1:39(38) ack 1 win 17520 <nop,nop,timestamp 19 0> (DF)
37.67593 > . ack 39 win 17520 <nop,nop,timestamp 20 19> (DF)
38.76712 > P 39:41(2) ack 1 win 17520 <nop,nop,timestamp 22 0> (DF)
38.76925 > . 1:1449(1448) ack 41 win 17520 <nop,nop,timestamp 22 22> (DF)
38.77014 > P 1449:2246(797) ack 41 win 17520 <nop,nop,timestamp 22 22> (DF)
38.77086 > F 2246:2246(0) ack 41 win 17520 <nop,nop,timestamp 22 22> (DF)
38.78816 > . ack 2246 win 16723 <nop,nop,timestamp 22 22> (DF)
38.78911 > . ack 2247 win 16723 <nop,nop,timestamp 22 22> (DF)
38.79304 > F 41:41(0) ack 2247 win 17520 <nop,nop,timestamp 22 22> (DF)
38.79321 > . ack 42 win 17520 <nop,nop,timestamp 22 22> (DF)

tcpdump -n -i ep0 host (with IPSEC, modified timestamp,IP,spi)
03.13645 > AH(spi=0x1,seq=0x1): ESP(spi=0x0,seq=0x1) (DF)
03.13767 > AH(spi=0x3,seq=0x1): ESP(spi=0x2,seq=0x1) (DF)
03.14633 > AH(spi=0x1,seq=0x2): ESP(spi=0x0,seq=0x2) (DF)
23.09855 > AH(spi=0x1,seq=0x3): ESP(spi=0x0,seq=0x3) (DF)
23.28964 > AH(spi=0x3,seq=0x2): ESP(spi=0x2,seq=0x2) (DF)
24.43422 > AH(spi=0x1,seq=0x4): ESP(spi=0x0,seq=0x4) (DF)
24.43878 > AH(spi=0x3,seq=0x4): ESP(spi=0x2,seq=0x4) (DF)
24.45133 > AH(spi=0x1,seq=0x5): ESP(spi=0x0,seq=0x5) (DF)
47.05617 > AH(spi=0x1,seq=0x6): ESP(spi=0x0,seq=0x6) (DF)
47.05724 > AH(spi=0x3,seq=0x7): ESP(spi=0x2,seq=0x7)

Assuming that the packet exchange should be very similar to the clear text
case, I'm guessing there is the same three-way handshake, then my
intentionally slow 'GET' followed by a one second pause for the second
Enter to complete the HTTP request. After that the server should be
sending the reply, but there is only one packet and no output at the
client end, followed by my 20 second wait, and then the connection closes.

I've verified that both and give the "same" output when
running tcpdump during a failed telnet session. I have no idea why
the server skips several packets (in this case 0x3, 0x5 and 0x6).
That certainly would explain why the client doesn't get a proper reply for
the http request.

Running ping from client to server with the policy enabled looks ok:
120 packets transmitted, 120 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 9.511/17.585/20.115/2.529 ms

This is what the ep0 interface looks like on server (modified MAC,inet,inet6):
        address: 00:00:00:00:00:00
        media: Ethernet 10baseT
        inet netmask 0xfffffff8 broadcast
        inet alias netmask 0xfffffff8 broadcast
        inet alias netmask 0xfffffff8 broadcast
        inet6 fe80::200:00ff:fe00:0000%ep0 prefixlen 64 scopeid 0x2

Anything else that might be useful to check? I don't (yet) have
DEBUG/DIAGNOSTIC options in the kernel(s).

	Arto Selonen

#######======------  --------========########
Everstinkuja 5 B 35                               Don't mind doing it.
FIN-02600 Espoo         Don't mind not doing it.
Finland              tel +358 50 560 4826     Don't know anything about it.