Split up AI filtering files (#592)

* Split up AI filtering files

Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance.

Aggressive policy matches existing default in Anubis.

Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests.

Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file.

* chore: spelling

* chore: fix embeds

* chore: fix data includes

* chore: fix file name typo

* chore: Ignore READMEs in configs

* chore(lib/policy/config): go tool goimports -w

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
Corry Haines 2025-06-01 13:21:18 -07:00 committed by GitHub
parent 77e0bbbce9
commit de7dbfe6d6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 107 additions and 18 deletions

View file

@ -18,7 +18,9 @@ blueskybot
boi
botnet
BPort
Brightbot
broked
Bytespider
cachebuster
Caddyfile
caninetools
@ -41,6 +43,7 @@ cloudflare
confd
containerbuild
coreutils
Cotoyogi
CRDs
crt
daemonizing
@ -49,6 +52,7 @@ Debian
debrpm
decaymap
decompiling
Diffbot
discordapp
discordbot
distros
@ -66,11 +70,15 @@ everyones
evilbot
evilsite
expressionorlist
externalagent
externalfetcher
extldflags
facebookgo
Factset
fastcgi
fediverse
finfos
Firecrawl
flagenv
Fordola
forgejo
@ -86,6 +94,7 @@ googlebot
govulncheck
GPG
GPT
gptbot
grw
Hashcash
hashrate
@ -97,8 +106,11 @@ hostable
htmx
httpdebug
hypertext
iaskspider
iat
ifm
Imagesift
imgproxy
inp
iss
isset
@ -146,11 +158,15 @@ nginx
nobots
NONINFRINGEMENT
nosleep
OCOB
ogtags
omgili
omgilibot
onionservice
openai
openrc
pag
Pangu
parseable
passthrough
Patreon
@ -185,18 +201,22 @@ RUnlock
sas
sasl
Scumm
searchbot
searx
sebest
secretplans
selfsigned
Semrush
setsebool
shellcheck
Sidetrade
sitemap
sls
sni
Sourceware
Spambot
sparkline
spyderbot
srv
stackoverflow
startprecmd
@ -212,12 +232,15 @@ techarohq
templ
templruntime
testarea
Tik
Timpibot
torproject
traefik
unixhttpd
unmarshal
uvx
Varis
Velen
vendored
vhosts
videotest
@ -227,9 +250,11 @@ webmaster
webpage
websecure
websites
Webzio
wordpress
Workaround
workdir
wpbot
xcaddy
Xeact
xeiaso