nuke/.github/actions/spelling/expect.txt
Jason Cameron e0781e4560
feat: add robots2policy CLI to convert robots.txt to Anubis CEL (#657)
* feat: add robots2policy CLI utility to convert robots.txt to Anubis challenge policies

* feat: add documentation for robots2policy CLI tool

* feat: implement crawl delay handling as weight adjustment in Anubis rules

* feat: add various robots.txt and YAML configurations for user agent handling and crawl delays

* test: add comprehensive tests for robots2policy conversion and parsing

* fix: update example URL in usage instructions for robots2policy CLI

* Update metadata

check-spelling run (pull_request) for json/robots2policycli

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* docs: add crawl delay weight adjustment and deny user agents option to robots2policy CLI

* Update cmd/robots2policy/main.go

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* Update cmd/robots2policy/main.go

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* fix(robots2policy): use sigs.k8s.io/yaml

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(config): properly marshal bot policy rules

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(yeetfile): expose robots2policy in libexec

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(yeetfile): put robots2policy in $PATH

Signed-off-by: Xe Iaso <me@xeiaso.net>

* Update metadata

check-spelling run (pull_request) for json/robots2policycli

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* style: reorder imports

* refactor: use preexisting structs in config

* fix: correct flag check in main function

* fix: reorder fields in AnubisRule struct for better alignment

* style: improve alignment of struct fields in AnubisRule and OGTagCache

* Update metadata

check-spelling run (pull_request) for json/robots2policycli

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>

* fix: add validation for generated Anubis rules from robots.txt

* feat: add batch processing for robots.txt files to generate Anubis CEL policies

* fix: improve usage message and error handling for input file requirement

* refactor: update AnubisRule structure to use ExpressionOrList for improved expression handling

* refactor: reorganize policy definitions in YAML files for consistency and clarity

* fix: correct indentation in blacklist and complex YAML files for consistency

* test: enhance output comparison in robots2policy tests for YAML and JSON formats

* Revert "fix: improve usage message and error handling for input file requirement"

This reverts commit ddcde1f2a326545d3ef2ec32e5e03f55f4f931a8.

* fix: improve usage message and error handling in robots2policy

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

---------

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Xe Iaso <me@xeiaso.net>
2025-06-14 23:41:00 -04:00

295 lines
2.3 KiB
Text

acs
aeacus
Aibrew
alrest
amazonbot
anthro
anubis
anubistest
apk
Applebot
archlinux
badregexes
bdba
berr
betteralign
bingbot
bitcoin
blogging
Bluesky
blueskybot
boi
botnet
BPort
Brightbot
broked
Bytespider
cachebuster
Caddyfile
caninetools
Cardyb
celchecker
CELPHASE
cerr
certresolver
CGNAT
cgr
chainguard
chall
challengemozilla
checkpath
checkresult
chen
chibi
cidranger
ckie
cloudflare
confd
containerbuild
coreutils
Cotoyogi
CRDs
crt
Cscript
daemonizing
DDOS
Debian
debrpm
decaymap
decompiling
Diffbot
discordapp
discordbot
distros
dnf
dnsbl
dnserr
dracula
dronebl
droneblresponse
duckduckbot
eerror
ellenjoe
enbyware
everyones
evilbot
evilsite
expressionorlist
externalagent
externalfetcher
extldflags
facebookgo
Factset
fastcgi
fediverse
finfos
Firecrawl
flagenv
Fordola
forgejo
fsys
fullchain
Galvus
gha
gitea
goland
gomod
goodbot
googlebot
govulncheck
goyaml
GPG
GPT
gptbot
grw
Hashcash
hashrate
headermap
healthcheck
hebis
hec
hmc
hostable
htmlc
htmx
httpdebug
hypertext
iaskspider
iat
ifm
Imagesift
imgproxy
inp
iss
isset
ivh
Jenomis
JGit
journalctl
jshelter
JWTs
kagi
kagibot
keikaku
keypair
KHTML
kinda
KUBECONFIG
lcj
ldflags
letsencrypt
Lexentale
lgbt
licend
licstart
lightpanda
LIMSA
Linting
linuxbrew
LLU
loadbalancer
lol
LOMINSA
maintainership
malware
mcr
memes
metarefresh
metrix
mimi
minica
mistralai
Mojeek
mojeekbot
mozilla
nbf
netsurf
NFlag
nginx
nobots
NONINFRINGEMENT
nosleep
OCOB
ogtags
omgili
omgilibot
onionservice
openai
openrc
pag
palemoon
Pangu
parseable
passthrough
Patreon
pgrep
phrik
pidfile
pids
pipefail
pki
podkova
podman
prebaked
privkey
promauto
promhttp
proofofwork
pwcmd
pwuser
qualys
qwant
qwantbot
rac
rcvar
redir
redirectscheme
relayd
reputational
reqmeta
risc
ruleset
runlevels
RUnlock
sas
sasl
Scumm
searchbot
searx
sebest
secretplans
selfsigned
Semrush
Seo
setsebool
shellcheck
Sidetrade
sitemap
sls
sni
Sourceware
Spambot
sparkline
spyderbot
srv
stackoverflow
startprecmd
stoppostcmd
subgrid
subr
subrequest
SVCNAME
tagline
tarballs
techaro
techarohq
templ
templruntime
testarea
Tik
Timpibot
torproject
traefik
uberspace
unixhttpd
unmarshal
unparseable
uuidgen
uvx
UXP
Varis
Velen
vendored
vhosts
videotest
waitloop
weblate
webmaster
webpage
websecure
websites
Webzio
wildbase
wordpress
Workaround
workdir
wpbot
xcaddy
Xeact
xeiaso
xeserv
xesite
xess
xff
XForwarded
XNG
XReal
yae
YAMLTo
yeet
yeetfile
yourdomain
yoursite
Zenos
zizmor
zos