Split up AI filtering files (#592)

* Split up AI filtering files Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance. Aggressive policy matches existing default in Anubis. Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests. Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file. * chore: spelling * chore: fix embeds * chore: fix data includes * chore: fix file name typo * chore: Ignore READMEs in configs * chore(lib/policy/config): go tool goimports -w Signed-off-by: Xe Iaso <me@xeiaso.net> --------- Signed-off-by: Xe Iaso <me@xeiaso.net> Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-01 13:21:18 -07:00 · 2025-06-01 13:21:18 -07:00 · de7dbfe6d6
commit de7dbfe6d6
parent 77e0bbbce9
19 changed files with 107 additions and 18 deletions
--- a/data/botPolicies.yaml
+++ b/data/botPolicies.yaml
@ -17,8 +17,12 @@ bots:
    import: (data)/bots/_deny-pathological.yaml
  - import: (data)/bots/aggressive-brazilian-scrapers.yaml

-  # Enforce https://github.com/ai-robots-txt/ai.robots.txt
-  - import: (data)/bots/ai-robots-txt.yaml
+  # Aggressively block AI/LLM related bots/agents by default
+  - import: (data)/meta/ai-block-aggressive.yaml
+
+  # Consider replacing the aggressive AI policy with more selective policies:
+  # - import: (data)/meta/ai-block-moderate.yaml
+  # - import: (data)/meta/ai-block-permissive.yaml

  # Search engine crawlers to allow, defaults to:
  #   - Google (so they don't try to bypass Anubis)