netbsd-users: Re: FTPD: disallowing concurrent connections from same IP

Subject: Re: FTPD: disallowing concurrent connections from same IP
To: Dave Huang <khym@azeotrope.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-users
Date: 02/19/2003 20:39:54
[ On Wednesday, February 19, 2003 at 17:38:58 (-0601), Dave Huang wrote: ]
> Subject: Re: FTPD: disallowing concurrent connections from same IP
>
> True, it's a very limited test... but I think it reflects a pretty
> common scenario. Anyways, isn't 6 seconds out of 107 or so a 5% to 6%
> improvement? I'm not denying that you do get better speeds with
> multiple connections. I'm not even saying that you won't get hugely
> better speeds. What I'm saying is that in the case where you do get
> hugely better speeds, it's because you're using an unfair portion of
> the pipe.

I don't know how big the files you are serving are, but as a
hypothetical example let's consider big ones, such as maybe a full 650MB
ISO image.  If a download is proceeding at only about 20KB/s per
connection then 5% improvement in the total download time is highly
significant.  I.e. what probably matters most to a user is the overall
length of a whole session (as compared to what they perceive it would be
if they got ideal usage of their own local connectivity), not the
instantaneous speed or even the number of concurrent connections which
they are permitted.

For example if I have a 100Mb/s FDX connection all the way through to
your server but because of various TCP/IP tuning issues on either end
(and perhaps also between) I can only keep it even close to full by
pulling down that ISO as five chunks simultaneously, then that's what
I'll want to do.

Assuming all other things are as you said at first then you also really
do want me to do it that way too because the sooner I can get done what
I want to do then the sooner the next guy can come along and do the same
without hinderance.  :-)

It's like the issues with fair-share schedulers on timesharing systems.
You don't normally really want to limit the sole user of a system to 25%
of the CPU, especially if doing so will mean he'll still be using that
25% when another four users suddenly come along in the same class.  If
that happens then you'll only be able to give them each 20% even though
you normally promise to try to give them each at least 25% (including
the first guy who was getting 25% up until now).  Ideally you give the
first guy 100% so that he's done and gone before the next four come along.

You always want to make the best available use of your resources given
the current demand for them, and to give each _class_ of users at
_least_ the minimum allocation you promise them.  However since in the
general case of the global public Internet population accessing an
anonymous FTP server you've only got one class of truly anonymous users
and that means you can't safely make any assumptions about the fairness
of use of any given user with only what you know from the server side.

(i.e. there is no "fairness" for anonymous users -- they get what they
can take of what you give them to take!)

> Well, I'm no expert either... but I see people get 10K/sec while some
> other guy with 8 connections going gets 80K/s or whatever, and when
> I kill off the extra connections, things more or less even out.

Yes, but does that actually help anyone in the long run?  I seriously
doubt it.  Fair share of a download resource has almost nothing to do
with the actual transfer rate per connection and bandwidth consumed at a
given instant.  You're looking at a snapshot in time but what really
matters to the users is the overall "fairness" of their total download
time.

Of course killing off a connection is the worst possible thing to do.
You could be causing one or more people to have to retransmit
significant amounts of data thus making everything less fair for
everyone.  :-)

Also keep in mind how the resources available to a remote client IP#
might reflect the number of people using that IP#.  In the general case
you'll likely have individuals with unique addresses having more limited
bandwidth at their end than groups of individuals sharing a single IP#.
I.e. the bandwidth available at the client end will help keep usage fair
without your having to try to keep track of or even be aware of shared
IP# usage.  (Obviously one guy with a workstation right on an OC3 pipe
will potentially skew things, but only for a very short time!  ;-)

> Exactly what does NetBSD's ftpd's "rateget" mean? Doesn't "rateget guest
> 56k" limit the aggregate rate of all anonymous connections to
> 56kbytes/second?

No, I don't think so.  How can a stand-alone process know what all its
other sibling processes are doing without having some central control
process to mediate all the connections simultaneously?  Indeed if you
look at the code you'll see that "rateget" is per download transfer.
It's done right there in the middle of the read()/write() loop.

> If the "rateget" means that _each_ anonymous
> connection gets a max of 56kBps, that doesn't help either. Someone with
> one connection will get 56kBps, but someone with two will get 102kBps.

(actually the guy with two connections might get more like 114KB/s :-)

I think you've got to get rid of, or at least re-think, this idea of
"fair share" having anything to do with the instantaneous bandwidth use
of any given client (be it a person, IP#, or whatever).

Both those people might be downloading the same thing at the same time.
However overall there's potentially a savings in the total time spent
downloading between the two of them because that one guy is getting 5%
better throughput.  So then it is actually possible for both people win
vs. the case where they both start at the same time with one connection
each.

There are many many factors to consider but instantaneous transfer speed
of a given connection is probably the least important factor.

In the end users will adjust their behaviour to match the resources they
perceive to be available (including those at their end), especially if
you give them some idea of what to expect from your service.  (i.e. show
them what your limits are and give them some hint as to what the current
utilization might be, and if you're really nice then also tell them when
they might expect the least utilization)

> Hmm, I don't think I've changed anything...

Just to clear this up I perceived that you drastically changed the
population of users we were considering for this problem.  Initially you
didn't say anything about its profile leading me to believe that you
were talking about the generic global Public Internet population -- a
population where it's increasingly _less_ likely that each additional
anonymous connection from a given IP# is from the same person.

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>