* Implement FCrDNS and other DNS features
* Redesign DNS cache and methods
* Fix DNS cache
* Rename regexSafe arg
* Alter verifyFCrDNS(addr) behaviour
* Remove unused dnsCache field from Server struct
* Upd expressions docs
* Update docs/docs/CHANGELOG.md
Signed-off-by: Xe Iaso <me@xeiaso.net>
* refactor(dns): simplify FCrDNS logging
* docs: clarify verifyFCrDNS behavior
Add a note to the documentation for `verifyFCrDNS` to clarify that it returns true when no PTR records are found for the given IP address.
* fix(dns): Improve FCrDNS error handling and tests
The `VerifyFCrDNS` function previously ignored errors returned from reverse DNS lookups. This could lead to incorrect passes when a DNS failure (other than a simple 'not found') occurred. This change ensures that any error from a reverse lookup will cause the FCrDNS check to fail.
The test suite for FCrDNS has been updated to reflect this change. The mock DNS lookups now simulate both 'not found' errors and other generic DNS errors. The test cases have been updated to ensure that the function behaves correctly in both scenarios, resolving a situation where two test cases were effectively duplicates.
* docs: Update FCrDNS documentation and spelling
Corrected a typo in the `verifyFCrDNS` function documentation.
Additionally, updated the spelling exception list to include new terms and remove redundant entries.
* chore: update spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* feat(default-config): block tencent cloud by default
This is what happens when you don't have an abuse contact.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: update spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(default-config): block Huawei Cloud
Closes#978
Huawei Cloud has been egregious about its scraping. All attempts to
contact their abuse team have failed. If you work for Huawei Cloud,
please raise this issue internally and get the scraping to just stop.
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.
* Split up AI filtering files
Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance.
Aggressive policy matches existing default in Anubis.
Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests.
Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file.
* chore: spelling
* chore: fix embeds
* chore: fix data includes
* chore: fix file name typo
* chore: Ignore READMEs in configs
* chore(lib/policy/config): go tool goimports -w
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* Define OpenAI bot ALLOW policies
Allows OpenAI bots to be allowlisted at the choice of the Anubis administrator. None are enabled by default.
* Define MistralAI bot ALLOW policy
* chore: spelling
* Add Applebot definition
Adds Apple's search indexing bot, and allowlists it by default.
Allowlisted by default because it is equivalent to Googlebot/Bingbot. Remove Applebot from `ai-robots-txt.yaml` for the same reasons.
Remove `Applebot-Extended` from `ai-robots-txt.yaml` as it has no effect.
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* feat(config): support importing bot policy snippets
This changes the grammar of the Anubis bot policy config to allow
importing from internal shared rules or external rules on the
filesystem.
This lets you create a file at `/data/policies/block-evilbot.yaml` and
then import it with:
```yaml
bots:
- import: /data/policies/block-evilbot.yaml
```
This also explodes the default policy file into a bunch of composable
snippets.
Thank you @Aibrew for your example gitea Atom / RSS feed rules!
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(data): update botPolicies.json to use imports
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(cmd/anubis): extract bot policies with --extract-resources
This allows a user that doesn't have anything but the Anubis binary to
figure out what the default configuration does.
* docs(data/botPolices.yaml): document import syntax in-line
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib/policy): better test importing from JSON snippets
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(admin): Add import syntax documentation
This documents the import syntax and is based on the block comment at
the top of the default bot policy file.
* docs(changelog): add note about importing snippets
Signed-off-by: Xe Iaso <me@xeiaso.net>
* style(lib/policy/config): use an error value instead of an inline error
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>