Subject: Correct way to block web bots and other unwanted traffic?
To: NetBSD, Help <netbsd-help@netbsd.org>
From: Andy Ruhl <acruhl@gmail.com>
List: netbsd-help
Date: 12/05/2006 07:16:20
I'm not sure if this is the right place to ask this, but I'll try anyway.

There seems to be too much information for me to wade through for this
topic, so I'm hoping someone can digest it quickly for me.

I have a small web server running in NetBSD/i386 3.1 that I mostly use
for my own personal use and also a few friends.

I'm noticing more and more that there are bots groping my server,
despite the fact that I run it on an alternate port. I did a project
to find out who it was by going back through the apache access_log to
find them.

It's all the usual suspects: Google, Yahoo, Microsoft, etc. And some
others which I'm not sure are savory.

So my question is, what's the right way to block these guys?

I don't currently have a robots.txt file, and I guess there's no harm
in just wildcarding it with a deny for the / directory. My problem
with that is, I think this is only an invitation to unsavory traffic.
Is this too paranoid?

But I'm running pf, and I'm wondering if I should block them? And if I
do, should I block IPs that I get in the access_log, or block the
entire domain name? Not sure what the best way to do it would be.

Maybe this is too big a question. I hope someone can give me a little insight.

Thanks.

Andy