pkgsrc-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
www/anubis trip report: blocking AI bots, basically works, hints
I recently became aware of anubis, in pkgsrc/www/anubis (thanks biegert@
and ryoon@).
This is likely of interest to those looking to prevent DOSing of their
sites from AI scrapers, or those who philosophically object to LLMs
being trained on stolen content. The point is to require proof of work
in the browser (as in hashcash for email from the old days).
I was able to get it going, and the big hint is that while there is a
default.env file in /usr/pkg/etc/anubis looking like a config file, it's
not, and you need to export those before starting, or turn them into
command-line arguments. It's not clear to me if the default-looking
config is really config or just docs but the default default config is
ok.
My todo list for the package, which I'll fold in if nobody tells me I'm
confused:
- there's no rc.d script
- there's no mechanism to run as non-root
- the env file should be used, or not be in etc
- config should be handled somehow
- defaults include listening on * instead of 127.0.0.1 and ::1 only
and that should probably be fixed
- upstream recommands /var/run/nginx/foo sockets instead of IP for
communication from nginx to anubis and anubis to nginx. We don't
have a subdir and anubis should run as anubis so this is not
trivial.
- there is some concept of alternative sidecar config and maybe we
should accomodate it
Since starting an hour or so ago:
- 2 legit clients passed challenge, one me, and one nerd friend.
(telling people you set up anubis is a good way to get page views :-)
- 2 blocked openai crawlers, with a user-agent of
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
which would have been challenged.
Home |
Main Index |
Thread Index |
Old Index