tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: TCP connections clogging up accf_http(9)



    Date:        Wed, 1 Jun 2016 12:01:21 +0200
    From:        Edgar =?iso-8859-1?B?RnXf?= <ef%math.uni-bonn.de@localhost>
    Message-ID:  <20160601100121.GB1389%gumme.math.uni-bonn.de@localhost>

  | But that would mean that going-to-be-dead connections arrive at a similar 
  | rate as ordinary ones.

They are not necessarily "going-to-be-dead" - all we know is that the
queue is full.   Wait a few seconds, and all of the pending connections
may have satisfied the filter and be delivered.

  | I would say that, if that is the case, the application 
  | is in serious trouble no matter what you do.

The system is overloaded.   That's the primary function of the filters,
to help in that situation.  When there's low load they're not needed
(the application can handle things just fine.)

  | Isn't relative time (i.e. queue position) enough?

No.   Because how that relates to the real world depends upon the
rate at which connections are arriving, and the RTT to the source,
neither of which is meaningful without considering real time.

  | Can you make up a scenario where the tls/Timo solution doesn't work well 
  | but there still is a way to deal with it better that doesn't require
  | looking into the future?

Sure, if connections are arriving too fast for the queue to process, the
best solution is to simply drop (ignore) incoming connections as happened
previously.   The problem only occurs because the queue became clogged
with stuff that was never going away - it is only those connections
(connection attempts) that need attention, anything that is just in the
queue because it hasn't had time yet to have satisfied the filter should
be left alone - those connections have been SYN/ACK'd already, later
incoming ones attempting to get on the queue haven't, if those are dropped
the source sees (effectively) just a lost packet and will retry soon enough.

When the problem is just a temporary sudden burst of requests, rather than
a sustained overload, that allows everyone to be handled, without adding
excess load to the system.

But it does require knowing (which I suspect that we do already) how long
the request has been in the queue - anything that's been there < about 5
seconds should be left alone, always, anything that's been there > about
5 minutes should simply be discarded (aborted).

An additional problem with giving the old crud to the application is that
it gets no notification at all as to how long the request has been there.
All it can do when it receives the result from accept() is to start a new
timer and wait (even longer) for the full request to arrive - all totally
pointless.   The only possible use of sending a request that has failed to
satisfy a filter to the application is for logging (and similar measures).
Using accept() for that isn't really the best way, there ought to be a
way to notify something (not necessarily the application in question, it
usually doesn't care, it is the system admin who wants to know) that
there are problems - probably via some new (not currently invented) mechanism.

That can log the events if desired, and if it sees enough from one source
that it starts to look like an attack, then take counter measures.

Please do remember here hat this is one single mechanism that has to cope
with lots of different situations, from normal connection requests,
overload bursts, sustained overload, DoS attacks, broken clients that
don't send what they should, and broken servers that don't accept connections.
The kernel part of the solution has to work reasonably for all of those,
while at the same time also acting reasonably as seen by the clients
(other than attackers.)

kre



Home | Main Index | Thread Index | Old Index