pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

www/anubis trip report: blocking AI bots, basically works, hints



I recently became aware of anubis, in pkgsrc/www/anubis (thanks biegert@
and ryoon@).

This is likely of interest to those looking to prevent DOSing of their
sites from AI scrapers, or those who philosophically object to LLMs
being trained on stolen content.  The point is to require proof of work
in the browser (as in hashcash for email from the old days).

I was able to get it going, and the big hint is that while there is a
default.env file in /usr/pkg/etc/anubis looking like a config file, it's
not, and you need to export those before starting, or turn them into
command-line arguments.  It's not clear to me if the default-looking
config is really config or just docs but the default default config is
ok.

My todo list for the package, which I'll fold in if nobody tells me I'm
confused:

  - there's no rc.d script
  - there's no mechanism to run as non-root
  - the env file should be used, or not be in etc
  - config should be handled somehow
  - defaults include listening on * instead of 127.0.0.1 and ::1 only
    and that should probably be fixed
  - upstream recommands /var/run/nginx/foo sockets instead of IP for
    communication from nginx to anubis and anubis to nginx.  We don't
    have a subdir and anubis should run as anubis so this is not
    trivial.
  - there is some concept of alternative sidecar config and maybe we
    should accomodate it


Since starting an hour or so ago:
  - 2 legit clients passed challenge, one me, and one nerd friend.
    (telling people you set up anubis is a good way to get page views :-)
  - 2 blocked openai crawlers, with a user-agent of
      Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
    which would have been challenged.


Home | Main Index | Thread Index | Old Index