Subject: pkgsrc SpamAssassin Performance miserable under load
To: None <tech-pkg@NetBSD.org>
From: Chuck Yerkes <chuck+nbsd@2004.snew.com>
List: tech-pkg
Date: 01/13/2004 19:51:16
Okay, this has many (too many) elements in it, so it
could be lots of things.  I'm putting it out to get a sense
of if others are seeing this.

Old env:
Sun E420's (4CPU/4GB RAM) running a perl I built (generic) with
the right modules to run SpamAssassin 2.54.  No razor modules or
any of that. Runs to sendmail via a milter (written in C)

It handles around 500,000 inbound messages/day.  70% on one
machine.

Load is typically 4-5 on the main machine, less on the others.


Now, moving to pkgsrc, I build the pkgsrc perl (5.8.2) and
the spamassassin 2.60 modules - with all the dependant p5-
packages.

Moved the packages over, installed them on the mail machines.


Easy enough to have both on.  Turn off sendmail, turn off old spamd
(/usr/local/bin), turn on new one (/usr/pkg/bin), start sendmail.

Runs ok for a while, then the CPUs peg.  LA of 35 (at which point
sendmail refuses mail).

Unideal - 3 machines pegged and not taking mail, swapping, etc.
Generally I see it cascade into load quickly - and this bears out
lots of experience - slow down and you have more concurrent processes
which makes it slower which builds up processes which....

I turned off the bayesian filtering, but we've got a
NetBSD pkg management (unlikely culprit)
new perl (5.8.2 versus 5.8.0)
new SleepyCat DB2.x  (old used dbm for bayes)
minor new modules
new spamassassin  (lots of new rules - but no bayesian filtering now)

I'm looking into the details of how spamassassin is built (I've
told it not to use ssl for connections already) and how perl is
built.

I'm wondering if anyone has run into performance issues with SA
2.60 and if, OFF CHANCE, there's some "configure --with-goslow"
thing that's in there (in the perl build, in SA).

These aren't great machines - 4x500MHz and mediocre disks.  I can
smoke them with a DL360 (2x3GHz Intel with RAID).  I know that for
pure email, a 2 way 2GHz Intel box can do 75% of what a 12Way E4500
can do with some extra bits on the Sun side).  And I'm looking to
get a better machine to scan.

But spamassassin 2.60 shouldn't be THIS much slower than 2.54.
So I'm also checking out the new environment which pkgsrc took care of.

If anyone has thoughts or pointers or even "me toos", I'll take them
in consideration.