* feat: first implementation of honeypot logic This is a bit of an experiment, stick with me. The core idea here is that badly written crawlers are that: badly written. They look for anything that contains `<a href="whatever" />` tags and will blindly use those values to recurse. This takes advantage of that by hiding a link in a `<script>` tag like this: ```html <script type="ignore"><a href="/bots-only">Don't click</a></script> ``` Browsers will ignore it because they have no handler for the "ignore" script type. This current draft is very unoptimized (it takes like 7 seconds to generate a page on my tower), however switching spintax libraries will make this much faster. The hope is to make this pluggable with WebAssembly such that we force administrators to choose a storage method. First we crawl before we walk. The AI involvement in this commit is limited to the spintax in affirmations.txt, spintext.txt, and titles.txt. This generates a bunch of "pseudoprofound bullshit" like the following: > This Restoration to Balance & Alignment > > There's a moment when creators are being called to realize that the work > can't be reduced to results, but about energy. We don't innovate products > by pushing harder, we do it by holding the vision. Because momentum can't > be forced, it unfolds over time when culture are moving in the same > direction. We're being invited into a paradigm shift in how we think > about innovation. [...] This is intended to "look" like normal article text. As this is a first draft, this sucks and will be improved upon. Assisted-by: GLM 4.6, ChatGPT, GPT-OSS 120b Signed-off-by: Xe Iaso <me@xeiaso.net> * fix(honeypot/naive): optimize hilariously Signed-off-by: Xe Iaso <me@xeiaso.net> * feat(honeypot/naive): attempt to automatically filter out based on crawling Signed-off-by: Xe Iaso <me@xeiaso.net> * fix(lib): use mazeGen instead of bsGen Signed-off-by: Xe Iaso <me@xeiaso.net> * docs: add honeypot docs Signed-off-by: Xe Iaso <me@xeiaso.net> * chore(test): go mod tidy Signed-off-by: Xe Iaso <me@xeiaso.net> * chore: fix spelling metadata Signed-off-by: Xe Iaso <me@xeiaso.net> * chore: spelling Signed-off-by: Xe Iaso <me@xeiaso.net> --------- Signed-off-by: Xe Iaso <me@xeiaso.net>
409 lines
3.6 KiB
Text
409 lines
3.6 KiB
Text
acs
|
|
Actorified
|
|
actorifiedstore
|
|
actorify
|
|
Aibrew
|
|
alibaba
|
|
alrest
|
|
amazonbot
|
|
anthro
|
|
anubis
|
|
anubistest
|
|
apnic
|
|
APNICRANDNETAU
|
|
Applebot
|
|
archlinux
|
|
arpa
|
|
asnc
|
|
asnchecker
|
|
asns
|
|
aspirational
|
|
atuin
|
|
azuretools
|
|
badregexes
|
|
bbolt
|
|
bdba
|
|
berr
|
|
bezier
|
|
bingbot
|
|
Bitcoin
|
|
bitrate
|
|
Bluesky
|
|
blueskybot
|
|
boi
|
|
Bokm
|
|
botnet
|
|
botstopper
|
|
BPort
|
|
Brightbot
|
|
broked
|
|
buildah
|
|
byteslice
|
|
Bytespider
|
|
cachebuster
|
|
cachediptoasn
|
|
Caddyfile
|
|
caninetools
|
|
Cardyb
|
|
celchecker
|
|
celphase
|
|
cerr
|
|
certresolver
|
|
cespare
|
|
CGNAT
|
|
cgr
|
|
chainguard
|
|
chall
|
|
challengemozilla
|
|
challengetest
|
|
checkpath
|
|
checkresult
|
|
chibi
|
|
cidranger
|
|
ckie
|
|
cloudflare
|
|
Codespaces
|
|
confd
|
|
connnection
|
|
containerbuild
|
|
containerregistry
|
|
coreutils
|
|
Cotoyogi
|
|
Cromite
|
|
crt
|
|
Cscript
|
|
daemonizing
|
|
dayjob
|
|
DDOS
|
|
Debian
|
|
debrpm
|
|
decaymap
|
|
devcontainers
|
|
Diffbot
|
|
discordapp
|
|
discordbot
|
|
distros
|
|
dnf
|
|
dnsbl
|
|
dnserr
|
|
DNSTTL
|
|
domainhere
|
|
dracula
|
|
dronebl
|
|
droneblresponse
|
|
dropin
|
|
dsilence
|
|
duckduckbot
|
|
eerror
|
|
ellenjoe
|
|
emacs
|
|
enbyware
|
|
etld
|
|
everyones
|
|
evilbot
|
|
evilsite
|
|
expressionorlist
|
|
externalagent
|
|
externalfetcher
|
|
extldflags
|
|
facebookgo
|
|
Factset
|
|
fahedouch
|
|
fastcgi
|
|
FCr
|
|
fcrdns
|
|
fediverse
|
|
ffprobe
|
|
financials
|
|
finfos
|
|
Firecrawl
|
|
flagenv
|
|
Fordola
|
|
forgejo
|
|
forwardauth
|
|
fsys
|
|
fullchain
|
|
gaissmai
|
|
Galvus
|
|
geoip
|
|
geoipchecker
|
|
gha
|
|
GHSA
|
|
Ghz
|
|
gipc
|
|
gitea
|
|
godotenv
|
|
goland
|
|
gomod
|
|
goodbot
|
|
googlebot
|
|
gopsutil
|
|
govulncheck
|
|
goyaml
|
|
GPG
|
|
GPT
|
|
gptbot
|
|
Graphene
|
|
grpcprom
|
|
grw
|
|
gzw
|
|
Hashcash
|
|
hashrate
|
|
headermap
|
|
healthcheck
|
|
healthz
|
|
hec
|
|
helpdesk
|
|
Hetzner
|
|
hmc
|
|
homelab
|
|
hostable
|
|
htmlc
|
|
htmx
|
|
httpdebug
|
|
huawei
|
|
hypertext
|
|
iaskspider
|
|
iaso
|
|
iat
|
|
ifm
|
|
Imagesift
|
|
imgproxy
|
|
impressum
|
|
inbox
|
|
ingressed
|
|
inp
|
|
internets
|
|
IPTo
|
|
iptoasn
|
|
isp
|
|
iss
|
|
isset
|
|
ivh
|
|
Jenomis
|
|
JGit
|
|
jhjj
|
|
joho
|
|
journalctl
|
|
jshelter
|
|
JWTs
|
|
kagi
|
|
kagibot
|
|
Keyfunc
|
|
keypair
|
|
KHTML
|
|
kinda
|
|
KUBECONFIG
|
|
lcj
|
|
ldflags
|
|
letsencrypt
|
|
Lexentale
|
|
lfc
|
|
lgbt
|
|
licend
|
|
licstart
|
|
lightpanda
|
|
limsa
|
|
Linting
|
|
listor
|
|
LLU
|
|
loadbalancer
|
|
lol
|
|
lominsa
|
|
maintainership
|
|
malware
|
|
mcr
|
|
memes
|
|
metarefresh
|
|
metrix
|
|
mimi
|
|
Minfilia
|
|
mistralai
|
|
mnt
|
|
Mojeek
|
|
mojeekbot
|
|
mozilla
|
|
myclient
|
|
mymaster
|
|
mypass
|
|
myuser
|
|
nbf
|
|
nepeat
|
|
netsurf
|
|
nginx
|
|
nicksnyder
|
|
nobots
|
|
NONINFRINGEMENT
|
|
nosleep
|
|
nullglob
|
|
oci
|
|
OCOB
|
|
ogtag
|
|
oklch
|
|
omgili
|
|
omgilibot
|
|
openai
|
|
opendns
|
|
opengraph
|
|
openrc
|
|
oswald
|
|
pag
|
|
palemoon
|
|
Pangu
|
|
parseable
|
|
passthrough
|
|
Patreon
|
|
pgrep
|
|
phrik
|
|
pidfile
|
|
pids
|
|
pipefail
|
|
pki
|
|
podkova
|
|
podman
|
|
Postgre
|
|
poststart
|
|
prebaked
|
|
privkey
|
|
promauto
|
|
promhttp
|
|
proofofwork
|
|
publicsuffix
|
|
purejs
|
|
pwcmd
|
|
pwuser
|
|
qualys
|
|
qwant
|
|
qwantbot
|
|
rac
|
|
rawler
|
|
rcvar
|
|
redhat
|
|
redir
|
|
redirectscheme
|
|
refactors
|
|
remoteip
|
|
reputational
|
|
risc
|
|
ruleset
|
|
runlevels
|
|
RUnlock
|
|
runtimedir
|
|
runtimedirectory
|
|
Ryzen
|
|
sas
|
|
sasl
|
|
screenshots
|
|
searchbot
|
|
searx
|
|
sebest
|
|
secretplans
|
|
Semrush
|
|
Seo
|
|
setsebool
|
|
shellcheck
|
|
shirou
|
|
shopt
|
|
Sidetrade
|
|
simprint
|
|
sitemap
|
|
sls
|
|
sni
|
|
snipster
|
|
Spambot
|
|
sparkline
|
|
spyderbot
|
|
srv
|
|
stackoverflow
|
|
startprecmd
|
|
stoppostcmd
|
|
storetest
|
|
subgrid
|
|
subr
|
|
subrequest
|
|
SVCNAME
|
|
tagline
|
|
tarballs
|
|
tarrif
|
|
taviso
|
|
tbn
|
|
tbr
|
|
techaro
|
|
techarohq
|
|
telegrambot
|
|
templ
|
|
templruntime
|
|
testarea
|
|
Thancred
|
|
thoth
|
|
thothmock
|
|
Tik
|
|
Timpibot
|
|
TLog
|
|
traefik
|
|
trunc
|
|
uberspace
|
|
Unbreak
|
|
unbreakdocker
|
|
unifiedjs
|
|
unmarshal
|
|
unparseable
|
|
uvx
|
|
UXP
|
|
valkey
|
|
Varis
|
|
Velen
|
|
vendored
|
|
verify
|
|
vhosts
|
|
vkbot
|
|
VKE
|
|
vnd
|
|
VPS
|
|
Vultr
|
|
weblate
|
|
webmaster
|
|
webpage
|
|
websecure
|
|
websites
|
|
Webzio
|
|
whois
|
|
wildbase
|
|
withthothmock
|
|
wolfbeast
|
|
wordpress
|
|
workaround
|
|
workdir
|
|
wpbot
|
|
XCircle
|
|
xeiaso
|
|
xeserv
|
|
xesite
|
|
xess
|
|
xff
|
|
XForwarded
|
|
XNG
|
|
XOB
|
|
XOriginal
|
|
XReal
|
|
yae
|
|
YAMLTo
|
|
Yda
|
|
yeet
|
|
yeetfile
|
|
yourdomain
|
|
yyz
|
|
Zenos
|
|
zizmor
|
|
zombocom
|
|
zos
|
|
GLM
|
|
iocaine
|
|
nikandfor
|
|
pagegen
|
|
pseudoprofound
|
|
reimagining
|
|
Rhul
|
|
shoneypot
|
|
spammer
|
|
Y'shtola
|