Subject: Re: increasing FD_SETSIZE to 1024 or 2048?
To: None <greywolf@starwolf.com>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: current-users
Date: 07/04/2000 13:06:44
>What are you doing with your named that is so special that it requires
>a modification that no other shop of consequence of which knowledge has
>appeared to me has found necessary to implement?  
>

I am doing lookups of from 100,000 to 1 million unique hostnames, or
unique IP addresess. For that application,c aching is largely useless
(since the input queries are unique).  I do lots of lookups in parallel.

That workload has unocvered several design flaws in BIND 8's named:
hamed's heap (data seg) grows faster than it can be easliy reclaimed
by timing out records when their TTL goes to zero.  Its also shown
that the maximum request rate of _non_cached requests is, in the long
term, limited to about 30-35 lookups /sec, even with 190 requestors in
parallel.  Most of the requestor threads (and most of the associated
named resources) are spent waiting for lame delegations, for requests
that will eventually time out (I retry up to 3 times).  That's a
long-term rate, over hours of about 30 requests/sec, with CPU load at
under 5%, on an otherwise-idle machine (pII/500, 256megs).

Bumping named's FD_SETSIZE to 2k, doing "files 1024" in
/etc/named.conf, and using some 800 requestors gets me about a 3-fold
incerase.  (named eats more CPU, i am trying to profile it but the
ipv6 changes broke compiling named with cc -pg, sigh).

It may be there's another way to acheive the same tuning without
changing FD_SETSIZE, but I haven't found it, and I hvae tried tweaking
a fiar few of named's eventloop parameters.

>Now either these sites are all lucky as hell, or they planned what they
>were doing.  Judging by that which you posit, you fall into neither
>category (and I'm not attempting to be insulting, here -- not my preferred
>mode of operation), 

Well, you succeeded. Burning 10 more machines just to do DNS lookups,
at very low CPU and network utilization, is not what I consider good
planning.  I spose if I had the machines sitting round doing nothing
it might be a different story.

oh yes -- i did "maxuers 512" in the kernel config file.

>which begs the original question of What Are You Doing?

Batched, parallel, DNS requests of huge numbers of unique hosts or
huge numbers of unique IP addresses. I'd like to scale up to 1000/sec,
but I'll probably run out of time to do tuning (have day job, plus
thesis to write).

The very few people who I know (via personal contacts) doing similar
things are using linux. their libc already has a FD_SETSIZE of 1024,
so they just bump "files 1000" in named.conf, and bingo.  Which gets
us back to square 1...