Subject: Re: Throttling IO Requests in NetBSD via Congestion Control
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 08/21/2006 15:54:10
--f61P+fpdnY2FZS1u
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Aug 21, 2006 at 04:46:50PM -0400, Thor Lancelot Simon wrote:
> On Mon, Aug 21, 2006 at 01:35:03PM -0700, Bill Studenmund wrote:
> >=20
> > The current scheme just stops a process, so we basically pump the break=
s=20
> > on a writer. If a process stops writing (say it goes back to processing=
=20
> > data to write to another file), we stop hitting the breaks. With tweaki=
ng=20
> > the scheduling, we would be applying a more-gradual breaking. I'm not 1=
00%=20
> > sure how to do this as I _think_ I want whatever we do to decay; if a=
=20
> > program shifts away from writing, I'd like it to move back to being=20
> > scheduled as if it had never written. I know the scheduler does this, I=
'm=20
> > just not sure how to map the dynamics from disk usage to those for CPU=
=20
> > usage.
>=20
> Here is what bothers me -- and it's clear to me now that I did not adequa=
tely
> understand one key design decision early in this process.  I do not belie=
ve
> that it is _ever_ appropriate to throttle a writer simply because the
> number of writes it is issuing exceeds _X_, for any _X_, without some met=
ric
> of whether a congestion condition has actually occurred.
>=20
> I cannot imagine how, in general, doing that could actually have any other
> than a negative performance impact.

Depends.

If our congestion prediction model is accurate, then we can predict=20
congestion before we encounter it. Thus for a correct model, I believe=20
that there is _a_ value for _X_ that will work well.

I agree that we do not at present have a means for determining what _X_=20
is. My hope has been that we will develope tuning (either manual or=20
automatic) methods once we have the congenstion infrastructure in place.

> If we were actually measuring some value that could indicate congestion,
> such as the change in request latency, then I do not think the current
> approach would cause problems, and I believe that sleeping writers for
> something like the measured minimum request latency would in fact be the
> correct way to pace their writes -- it's almost exactly what network
> protocols do.  But without that, simply putting processes to sleep
> because we've decided they issued a "write flood" is just going to have
> a negative effect on performance that will cascade through the system.  I
> don't know how to explain the negative effect we're seeing right now,
> but I certainly wouldn't expect a positive one.

Take care,

Bill

--f61P+fpdnY2FZS1u
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFE6jmSWz+3JHUci9cRAsNQAJ9cWmC3p+iAQ6tFzGD3bA4JXQxd8ACeKEOM
IIAoMxzk7w6HXXnrY9VM570=
=TxXd
-----END PGP SIGNATURE-----

--f61P+fpdnY2FZS1u--