TIL docker clients don't include the Accept header all the time. I would
have thought they did that. Oops.
Closes: #1346
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Implement FCrDNS and other DNS features
* Redesign DNS cache and methods
* Fix DNS cache
* Rename regexSafe arg
* Alter verifyFCrDNS(addr) behaviour
* Remove unused dnsCache field from Server struct
* Upd expressions docs
* Update docs/docs/CHANGELOG.md
Signed-off-by: Xe Iaso <me@xeiaso.net>
* refactor(dns): simplify FCrDNS logging
* docs: clarify verifyFCrDNS behavior
Add a note to the documentation for `verifyFCrDNS` to clarify that it returns true when no PTR records are found for the given IP address.
* fix(dns): Improve FCrDNS error handling and tests
The `VerifyFCrDNS` function previously ignored errors returned from reverse DNS lookups. This could lead to incorrect passes when a DNS failure (other than a simple 'not found') occurred. This change ensures that any error from a reverse lookup will cause the FCrDNS check to fail.
The test suite for FCrDNS has been updated to reflect this change. The mock DNS lookups now simulate both 'not found' errors and other generic DNS errors. The test cases have been updated to ensure that the function behaves correctly in both scenarios, resolving a situation where two test cases were effectively duplicates.
* docs: Update FCrDNS documentation and spelling
Corrected a typo in the `verifyFCrDNS` function documentation.
Additionally, updated the spelling exception list to include new terms and remove redundant entries.
* chore: update spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* fix(config): deprecate the report_as field for challenges
This was a bad idea when it was added and it is irresponsible to
continue to have it. It causes more UX problems than it fixes with
slight of hand.
Closes: #1310Closes: #1307
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(policy): use the new logger for config validation messages
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(admin/thresholds): remove this report_as setting
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(data): add services folder to embedded filesystem
Also includes a regression test to ensure this does not happen again.
Assisted-By: GLM 4.6 via Claude Code
* docs: update CHANGELOG
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(data): add ruleset to explicitly allow Docker / OCI clients
Fixes#1252
This is technically a regression as these clients used to work in Anubis
v1.22.0, however it is allowable to make this opt-in as most websites do not
expect to be serving Docker / OCI registry client traffic.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Update metadata
check-spelling run (pull_request) for Xe/gh-1252/docker-registry-client-fix
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* test(docker-registry): export the right envvars
Signed-off-by: Xe Iaso <me@xeiaso.net>
* ci: add simdjson dependency for homebrew node
Signed-off-by: Xe Iaso <me@xeiaso.net>
* ci: install go/node without homebrew
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test: use right github commit variable
Signed-off-by: Xe Iaso <me@xeiaso.net>
* ci: remove simdjson dependency
Signed-off-by: Xe Iaso <me@xeiaso.net>
* ci: install ko with an action
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: add OCI registry caveat docs
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Tencent Cloud's abuse team reached out to me recently and asked for this
rule to be removed. Prior attempts to reach out to them to report
abusive traffic have failed, thus leading to this IP space block as a
last resort to try and maintain uptime for systems administrators.
Unfortunately, it's difficult for Tencent's abuse team to take action if
there is a blanket block like this. Let's see if this doesn't cause too
much grief.
* feat(default-config): block tencent cloud by default
This is what happens when you don't have an abuse contact.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: update spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(data): add default-config macro
Closes#1152
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: update CHANGELOG
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test: add default-config-macro smoke test
This uses an AI generated python script to diff the contents of the bots
field of the default configuration file and the
data/meta/default-config.yaml file. It emits a patch showing what needs
to be changed.
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test: add httpdebug tool
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(data/clients/git): more strictly match the git client
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(default-config): make the default config far less paranoid
This uses a variety of heuristics to make sure that clients that claim
to be browsers are more likely to behave like browsers. Most of these
are based on the results of a lot of reverse engineering and data
collection from honeypot servers.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: update CHANGELOG
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Xe Iaso <xe.iaso@techaro.lol>
* fix(default-config): block Huawei Cloud
Closes#978
Huawei Cloud has been egregious about its scraping. All attempts to
contact their abuse team have failed. If you work for Huawei Cloud,
please raise this issue internally and get the scraping to just stop.
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Uptime Robot is a commonly used service for tracking service
interruptions. Additional policy definitions may be beneficial for
services that do publish their IP addresses in use. The list is
additionally aggregated to slightly shorten it.
Signed-off-by: Marcel Bischoff <marcel@herrbischoff.com>
This was causing issues with git clone against highly loaded servers. I
thought that this would be pretty innocuous, but I guess I was wrong.
Oops!
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib/policy/expressions): add system load average to bot expression inputs
This lets Anubis dynamically react to system load in order to
increase and decrease the required level of scrutiny. High load? More
scrutiny required. Low load? Less scrutiny required.
* docs: spell system correctly
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Update metadata
check-spelling run (pull_request) for Xe/load-average
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* fix(default-config): don't enable low load average feature by default
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Signed-off-by: Xe Iaso <xe.iaso@techaro.lol>
* feat(decaymap): add Delete method
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(lib/challenge): refactor Validate to take ValidateInput
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib): implement store interface
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib/store): all metapackage to import all store implementations
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(policy): import all store backends
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib): use new challenge creation flow
Previously Anubis constructed challenge strings from request metadata.
This was a good idea in spirit, but has turned out to be a very bad idea
in practice. This new flow reuses the Store facility to dynamically
create challenge values with completely random data.
This is a fairly big rewrite of how Anubis processes challenges. Right
now it defaults to using the in-memory storage backend, but on-disk
(boltdb) and valkey-based adaptors will come soon.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(decaymap): fix documentation typo
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(lib): fix SA4004
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib/store): make generic storage interface test adaptor
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(decaymap): invert locking process for Delete
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib/store): add bbolt store implementation
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: go mod tidy
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(devcontainer): adapt to docker compose, add valkey service
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): make challenges live for 30 minutes by default
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib/store): implement valkey backend
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib/store/valkey): disable tests if not using docker
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib/policy/config): ensure valkey stores can be loaded
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Update metadata
check-spelling run (pull_request) for Xe/store-interface
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
* chore(devcontainer): remove port forwards because vs code handles that for you
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(default-config): add a nudge to the storage backends section of the docs
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(docs): listen on 0.0.0.0 for dev container support
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(policy): document storage backends
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: update CHANGELOG and internal links
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(admin/policies): don't start a sentence with as
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: fixes found in review
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.
* Fix cookieDynamicDomain option not being set in Options struct
* Fix using wrong cookie name when using dynamic cookie domains
* Adjust testcases for new cookie option structs
* Add known words to expect.txt and change typo in Zombocom
* Cleanup expect.txt
* Add changes to changelog
* Bump versions of grpc and apimachinery
* Fix testcases and add additional condition for dynamic cookie domain
* feat(config): opengraph passthrough configuration
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(ogtags): use config.OpenGraph for configuration
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: wire up ogtags config in most of the app
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(ogtags): return default tags if they are supplied
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: make OpenGraph legal so we have some sanity in reviewing
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): use OpenGraph.Enabled
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib): load default config file if one is not specified in spawnAnubis
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(config): fix ST1005
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: document open graph defaults and its new home in the policy file
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(installation): point to weight threshold new home
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: rename default to override
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(default-config): add off-by-default opengraph settings to bot policy file
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(anubis): make build
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test(lib): fix build
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(lib): implement request weight
Replaces #608
This is a big one and will be what makes Anubis a generic web
application firewall. This introduces the WEIGH option, allowing
administrators to have facets of request metadata add or remove
"weight", or the level of suspicion. This really makes Anubis weigh
the soul of requests.
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): maintain legacy challenge behavior
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): make weight have dedicated checkers for the hashes
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(data): convert some rules over to weight points
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: document request weight
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(CHANGELOG): spelling error
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: fix links to challenge information
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs(policies): fix formatting
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(config): make default weight adjustment 5
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
* Split up AI filtering files
Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance.
Aggressive policy matches existing default in Anubis.
Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests.
Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file.
* chore: spelling
* chore: fix embeds
* chore: fix data includes
* chore: fix file name typo
* chore: Ignore READMEs in configs
* chore(lib/policy/config): go tool goimports -w
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
* Define OpenAI bot ALLOW policies
Allows OpenAI bots to be allowlisted at the choice of the Anubis administrator. None are enabled by default.
* Define MistralAI bot ALLOW policy
* chore: spelling
* Add Applebot definition
Adds Apple's search indexing bot, and allowlists it by default.
Allowlisted by default because it is equivalent to Googlebot/Bingbot. Remove Applebot from `ai-robots-txt.yaml` for the same reasons.
Remove `Applebot-Extended` from `ai-robots-txt.yaml` as it has no effect.
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
This seems counter-intuitive at first glance, but let me cook.
One of the problems with Anubis is that the rule matching is super
deterministic. This means that attackers can figure out what patterns
they are hitting and change things to bypass them.
The randInt function lets you have rulesets behave nondeterministically.
This is a very easy way to hang yourself, but can be great to
psychologically mess with scraper operators. Consider this rule:
```yaml
- name: deny-lightpanda-sometimes
action: DENY
expression:
all:
- userAgent.matches("LightPanda")
- randInt(16) >= 4
```
It would match about 75% of the time.
Signed-off-by: Xe Iaso <me@xeiaso.net>