* feat(default-config): block tencent cloud by default
This is what happens when you don't have an abuse contact.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: update spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(default-config): block Huawei Cloud
Closes#978
Huawei Cloud has been egregious about its scraping. All attempts to
contact their abuse team have failed. If you work for Huawei Cloud,
please raise this issue internally and get the scraping to just stop.
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.
* Split up AI filtering files
Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance.
Aggressive policy matches existing default in Anubis.
Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests.
Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file.
* chore: spelling
* chore: fix embeds
* chore: fix data includes
* chore: fix file name typo
* chore: Ignore READMEs in configs
* chore(lib/policy/config): go tool goimports -w
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* Define OpenAI bot ALLOW policies
Allows OpenAI bots to be allowlisted at the choice of the Anubis administrator. None are enabled by default.
* Define MistralAI bot ALLOW policy
* chore: spelling
* Add Applebot definition
Adds Apple's search indexing bot, and allowlists it by default.
Allowlisted by default because it is equivalent to Googlebot/Bingbot. Remove Applebot from `ai-robots-txt.yaml` for the same reasons.
Remove `Applebot-Extended` from `ai-robots-txt.yaml` as it has no effect.
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* feat(config): support importing bot policy snippets
This changes the grammar of the Anubis bot policy config to allow
importing from internal shared rules or external rules on the
filesystem.
This lets you create a file at `/data/policies/block-evilbot.yaml` and
then import it with:
```yaml
bots:
- import: /data/policies/block-evilbot.yaml
```
This also explodes the default policy file into a bunch of composable
snippets.
Thank you @Aibrew for your example gitea Atom / RSS feed rules!
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(data): update botPolicies.json to use imports
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(cmd/anubis): extract bot policies with --extract-resources
This allows a user that doesn't have anything but the Anubis binary to
figure out what the default configuration does.
* docs(data/botPolices.yaml): document import syntax in-line
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib/policy): better test importing from JSON snippets
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(admin): Add import syntax documentation
This documents the import syntax and is based on the block comment at
the top of the default bot policy file.
* docs(changelog): add note about importing snippets
Signed-off-by: Xe Iaso <me@xeiaso.net>
* style(lib/policy/config): use an error value instead of an inline error
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>