feat: enable loading config fragments (#321)

* feat(config): support importing bot policy snippets

This changes the grammar of the Anubis bot policy config to allow
importing from internal shared rules or external rules on the
filesystem.

This lets you create a file at `/data/policies/block-evilbot.yaml` and
then import it with:

```yaml
bots:
- import: /data/policies/block-evilbot.yaml
```

This also explodes the default policy file into a bunch of composable
snippets.

Thank you @Aibrew for your example gitea Atom / RSS feed rules!

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(data): update botPolicies.json to use imports

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(cmd/anubis): extract bot policies with --extract-resources

This allows a user that doesn't have anything but the Anubis binary to
figure out what the default configuration does.

* docs(data/botPolices.yaml): document import syntax in-line

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib/policy): better test importing from JSON snippets

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs(admin): Add import syntax documentation

This documents the import syntax and is based on the block comment at
the top of the default bot policy file.

* docs(changelog): add note about importing snippets

Signed-off-by: Xe Iaso <me@xeiaso.net>

* style(lib/policy/config): use an error value instead of an inline error

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
Xe Iaso 2025-04-23 07:01:28 -04:00 committed by GitHub
parent 4e2c9de708
commit 74e11505c6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
37 changed files with 1210 additions and 1305 deletions

View file

@ -0,0 +1,147 @@
# Importing configuration rules
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
Anubis has the ability to let you import snippets of configuration into the main configuration file. This allows you to break up your config into smaller parts that get logically assembled into one big file.
EG:
<Tabs>
<TabItem value="json" label="JSON">
```json
{
"bots": [
{
"import": "(data)/bots/ai-robots-txt.yaml"
},
{
"import": "(data)/bots/cloudflare-workers.yaml"
}
]
}
```
</TabItem>
<TabItem value="yaml" label="YAML" default>
```yaml
bots:
# Pathological bots to deny
- # This correlates to data/bots/ai-robots-txt.yaml in the source tree
import: (data)/bots/ai-robots-txt.yaml
- import: (data)/bots/cloudflare-workers.yaml
```
</TabItem>
</Tabs>
Of note, a bot rule can either have inline bot configuration or import a bot config snippet. You cannot do both in a single bot rule.
<Tabs>
<TabItem value="json" label="JSON">
```json
{
"bots": [
{
"import": "(data)/bots/ai-robots-txt.yaml",
"name": "generic-browser",
"user_agent_regex": "Mozilla|Opera\n",
"action": "CHALLENGE"
}
]
}
```
</TabItem>
<TabItem value="yaml" label="YAML" default>
```yaml
bots:
- import: (data)/bots/ai-robots-txt.yaml
name: generic-browser
user_agent_regex: >
Mozilla|Opera
action: CHALLENGE
```
</TabItem>
</Tabs>
This will return an error like this:
```text
config is not valid:
config.BotOrImport: rule definition is invalid, you must set either bot rules or an import statement, not both
```
Paths can either be prefixed with `(data)` to import from the [the data folder in the Anubis source tree](https://github.com/TecharoHQ/anubis/tree/main/data) or anywhere on the filesystem. If you don't have access to the Anubis source tree, check /usr/share/docs/anubis/data or in the tarball you extracted Anubis from.
## Writing snippets
Snippets can be written in either JSON or YAML, with a preference for YAML. When writing a snippet, write the bot rules you want directly at the top level of the file in a list.
Here is an example snippet that allows [IPv6 Unique Local Addresses](https://en.wikipedia.org/wiki/Unique_local_address) through Anubis:
<Tabs>
<TabItem value="json" label="JSON">
```json
[
{
"name": "ipv6-ula",
"action": "ALLOW",
"remote_addresses": ["fc00::/7"]
}
]
```
</TabItem>
<TabItem value="yaml" label="YAML" default>
```yaml
- name: ipv6-ula
action: ALLOW
remote_addresses:
- fc00::/7
```
</TabItem>
</Tabs>
## Extracting Anubis' embedded filesystem
You can always extract the list of rules embedded into the Anubis binary with this command:
```text
anubis --extract-resources=static
```
This will dump the contents of Anubis' embedded data to a new folder named `static`:
```text
static
├── apps
│ └── gitea-rss-feeds.yaml
├── botPolicies.json
├── botPolicies.yaml
├── bots
│ ├── ai-robots-txt.yaml
│ ├── cloudflare-workers.yaml
│ ├── headless-browsers.yaml
│ └── us-ai-scraper.yaml
├── common
│ ├── allow-private-addresses.yaml
│ └── keep-internet-working.yaml
└── crawlers
├── bingbot.yaml
├── duckduckbot.yaml
├── googlebot.yaml
├── internet-archive.yaml
├── kagibot.yaml
├── marginalia.yaml
├── mojeekbot.yaml
└── qwantbot.yaml
```

View file

@ -12,6 +12,7 @@ Bot policies let you customize the rules that Anubis uses to allow, deny, or cha
- Request path
- User agent string
- HTTP request header values
- [Importing other configuration snippets](./configuration/import.mdx)
As of version v1.17.0 or later, configuration can be written in either JSON or YAML.