feat: Add Open Graph tag support (#195)

* feat: Add Open Graph tag support (og-tags)

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Fix: Prevent nil pointer dereference in test (og-tags)

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat!: Implement Open Graph tag caching and passthrough functionality (WIP)

I'm going to sleep. currently tags are passed to renderIndex.

see https://github.com/TecharoHQ/anubis/issues/131

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat: Add configuration for air tool with build and logger settings

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat: Move OG tags to base template (og-tags)

Moves the Open Graph (OG) tags from the index template to
the base template. This allows OG tags to be set on any
page, not just the index.  Also adds a
BaseWithOGTags function to the web package to allow
passing OG tags to the base template.  Removes the
ogTags parameter from the Index function and template.

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Delete CHANGELOG.md

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat: Add language attribute to HTML tag in template

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix(tests):  Fix nil pointer ref

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat(og-tags): Add timeout to http client (og-tags)

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* style: fix line endings & indentation

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* style: add inspection comment for GoBoolExpressions in UnchangingCache

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat(og-tags): Implement Open Graph tag fetching and caching

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix(og-tags): Simplify Open Graph tag extraction logic

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix(og-tags): Add nil check in isOGMetaTag and enhance test cases

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat(og-tags): Add approved tags and prefixes for Open Graph extraction

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* test(og-tags): Update tests with approved tags and improve clarity

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* chore: Add changelog notes

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix: Improve stability of the target fetcher?

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix: Update template error handling and improve Open Graph tag integration

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* style: format files and remove deubg logs

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat: Credit CELPHASE for mascot design (og-tags)

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat: Credit CELPHASE for mascot design (og-tags)

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat: Allow twitter prefixed OG tags by default

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* chore: replace /tmp with /var

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Update docs/docs/CHANGELOG.md

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* Update docs/docs/admin/configuration/open-graph.mdx

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* chore: add fediverse to default prefixes (#og-tags)

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat(og-tags): Remove og-query-distinct flag

This commit removes the `og-query-distinct` flag and
associated logic.  URLs with different query parameters
will now always be treated as the same cache key for Open
Graph tags.  This simplifies the caching logic and
improves performance.

Additionally, the http client used for fetching OG tags
is now a member of the OGTagCache struct, rather than a
global variable. This improves testability and allows
for more flexible configuration in the future.

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Update docs/docs/admin/configuration/open-graph.mdx

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* docs: remove og tags references

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* refactor: rename url > u to not overlap package name

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Update internal/ogtags/cache.go

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* Update internal/ogtags/cache.go

Co-authored-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>

* fix(tests): Don't use network when network access is disabled

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Fix: Handle nil URL in GetOGTags (og-tags)

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* chore: sort installation docs alphabetically

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix(tests): validate that no duplicate requests are made

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* style(tests): remove unused ok var

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* docs: convert to table fmt

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat(og-tags): Enhance OG tag fetching and caching

Adds additional approved OG tags (`keywords`, `author`), improves

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* chore: update generated templ's after format

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* fix(tests): update integration_test.go to reflect the new behavior of fetchHTMLDocument

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Revert "data/botPolicies: allow iMessage scraper by default (#178)"

This reverts commit 21a9d777

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Fix: Simplify ogTags access in cache test.

Didn't know this was possible! wow!

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Fix: Handle request timeouts when fetching OG tags (#og-tags)

Cache a nil result for half the TTL to avoid repeatedly
requesting a timed-out URL.

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Fix: make OG tags passthrough option function.

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* Fix: Handle timeouts and non-200 responses when fetching OG tags (og-tags)

- Cache empty results for timeouts and non-200 status codes
  to avoid spamming the server.
- Use a non-nil empty map to represent empty results in the
  cache, as nil would be a cache miss.

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* feat(og-tags): switch to http.MaxBytesReader

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

* chore(og-tags): add noindex, nofollow meta tag and update error line numbers

Signed-off-by: Jason Cameron <git@jasoncameron.dev>

---------

Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Signed-off-by: Jason Cameron <jasoncameron.all@gmail.com>
Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
Jason Cameron 2025-04-06 20:02:12 -04:00 committed by GitHub
parent 8adf1a06eb
commit 77436207e6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
24 changed files with 1482 additions and 269 deletions

12
.air.toml Normal file
View file

@ -0,0 +1,12 @@
root = "."
tmp_dir = "var"
[build]
cmd = "go build -o ./var/main ./cmd/anubis"
bin = "./var/main"
args = ["--use-remote-address"]
exclude_dir = ["var", "vendor", "docs", "node_modules"]
[logger]
time = true
# to change flags at runtime, prepend with -- e.g. $ air -- --target http://localhost:3000 --difficulty 20 --use-remote-address

7
.gitignore vendored
View file

@ -2,6 +2,10 @@
*.deb
*.rpm
# Additional package locks
pnpm-lock.yaml
yarn.lock
# Go binaries and test artifacts
main
*.test
@ -10,3 +14,6 @@ node_modules
# MacOS
.DS_store
# Intellij
.idea

View file

@ -7,6 +7,7 @@ import (
"crypto/rand"
"embed"
"encoding/hex"
"errors"
"flag"
"fmt"
"io/fs"
@ -54,7 +55,8 @@ var (
healthcheck = flag.Bool("healthcheck", false, "run a health check against Anubis")
useRemoteAddress = flag.Bool("use-remote-address", false, "read the client's IP address from the network request, useful for debugging and running Anubis on bare metal")
debugBenchmarkJS = flag.Bool("debug-benchmark-js", false, "respond to every request with a challenge for benchmarking hashrate")
ogPassthrough = flag.Bool("og-passthrough", false, "enable Open Graph tag passthrough")
ogTimeToLive = flag.Duration("og-expiry-time", 24*time.Hour, "Open Graph tag cache expiration time")
extractResources = flag.String("extract-resources", "", "if set, extract the static resources to the specified folder")
)
@ -124,7 +126,7 @@ func setupListener(network string, address string) (net.Listener, string) {
}
func makeReverseProxy(target string) (http.Handler, error) {
u, err := url.Parse(target)
targetUri, err := url.Parse(target)
if err != nil {
return nil, fmt.Errorf("failed to parse target URL: %w", err)
}
@ -132,10 +134,10 @@ func makeReverseProxy(target string) (http.Handler, error) {
transport := http.DefaultTransport.(*http.Transport).Clone()
// https://github.com/oauth2-proxy/oauth2-proxy/blob/4e2100a2879ef06aea1411790327019c1a09217c/pkg/upstream/http.go#L124
if u.Scheme == "unix" {
if targetUri.Scheme == "unix" {
// clean path up so we don't use the socket path in proxied requests
addr := u.Path
u.Path = ""
addr := targetUri.Path
targetUri.Path = ""
// tell transport how to dial unix sockets
transport.DialContext = func(ctx context.Context, _, _ string) (net.Conn, error) {
dialer := net.Dialer{}
@ -145,7 +147,7 @@ func makeReverseProxy(target string) (http.Handler, error) {
transport.RegisterProtocol("unix", libanubis.UnixRoundTripper{Transport: transport})
}
rp := httputil.NewSingleHostReverseProxy(u)
rp := httputil.NewSingleHostReverseProxy(targetUri)
rp.Transport = transport
return rp, nil
@ -255,6 +257,9 @@ func main() {
PrivateKey: priv,
CookieDomain: *cookieDomain,
CookiePartitioned: *cookiePartitioned,
OGPassthrough: *ogPassthrough,
OGTimeToLive: *ogTimeToLive,
Target: *target,
})
if err != nil {
log.Fatalf("can't construct libanubis.Server: %v", err)
@ -288,6 +293,8 @@ func main() {
"version", anubis.Version,
"use-remote-address", *useRemoteAddress,
"debug-benchmark-js", *debugBenchmarkJS,
"og-passthrough", *ogPassthrough,
"og-expiry-time", *ogTimeToLive,
)
go func() {
@ -299,7 +306,7 @@ func main() {
}
}()
if err := srv.Serve(listener); err != http.ErrServerClosed {
if err := srv.Serve(listener); !errors.Is(err, http.ErrServerClosed) {
log.Fatal(err)
}
wg.Wait()
@ -312,8 +319,8 @@ func metricsServer(ctx context.Context, done func()) {
mux.Handle("/metrics", promhttp.Handler())
srv := http.Server{Handler: mux}
listener, url := setupListener(*metricsBindNetwork, *metricsBind)
slog.Debug("listening for metrics", "url", url)
listener, metricsUrl := setupListener(*metricsBindNetwork, *metricsBind)
slog.Debug("listening for metrics", "url", metricsUrl)
go func() {
<-ctx.Done()
@ -324,7 +331,7 @@ func metricsServer(ctx context.Context, done func()) {
}
}()
if err := srv.Serve(listener); err != http.ErrServerClosed {
if err := srv.Serve(listener); !errors.Is(err, http.ErrServerClosed) {
log.Fatal(err)
}
}

View file

@ -343,12 +343,6 @@
"5.102.173.71/32"
]
},
{
"_comment": "This has been reverse-engineered through making iMessage's preview function hit a URL that prints the user-agent in the server logs.",
"name": "iMessage preview",
"user_agent_regex": ".*facebookexternalhit/1\\.1 Facebot Twitterbot/1\\.0$",
"action": "ALLOW"
},
{
"name": "us-artificial-intelligence-scraper",
"user_agent_regex": "\\+https\\://github\\.com/US-Artificial-Intelligence/scraper",

View file

@ -14,7 +14,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added support for native Debian, Red Hat, and tarball packaging strategies including installation and use directions.
- A prebaked tarball has been added, allowing distros to build Anubis like they could in v1.15.x.
- The placeholder Anubis mascot has been replaced with a design by [CELPHASE](https://bsky.app/profile/celphase.bsky.social).
- Allow iMessage's link preview fetcher through Anubis by default.
- Added a periodic cleanup routine for the decaymap that removes expired entries, ensuring stale data is properly pruned.
- Added a no-store Cache-Control header to the challenge page
- Hide the directory listings for Anubis' internal static content
@ -38,6 +37,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added `zizmor` for GitHub Actions static analysis
- Fixed most `zizmor` findings
- Enabled Dependabot
- Added an air config for autoreload support in development ([#195](https://github.com/TecharoHQ/anubis/pull/195))
- Added support for [OpenGraph tags](https://ogp.me/) when rendering the challenge page. This allows for social previews to be generated when sharing the challenge page on social media platforms ([#195](https://github.com/TecharoHQ/anubis/pull/195))
- Added an `--extract-resources` flag to extract static resources to a local folder.
- Add noindex flag to all Anubis pages ([#227](https://github.com/TecharoHQ/anubis/issues/227)).

View file

@ -0,0 +1,47 @@
---
id: open-graph
title: Open Graph Configuration
---
# Open Graph Configuration
This page provides detailed information on how to configure [OpenGraph tag](https://ogp.me/) passthrough in Anubis. This enables social previews of resources protected by Anubis without having to exempt each scraper individually.
## Configuration Options
| Name | Description | Type | Default | Example |
|------------------|-----------------------------------------------------------|----------|---------|-------------------------|
| `OG_PASSTHROUGH` | Enables or disables the Open Graph tag passthrough system | Boolean | `false` | `OG_PASSTHROUGH=true` |
| `OG_EXPIRY_TIME` | Configurable cache expiration time for Open Graph tags | Duration | `24h` | `OG_EXPIRY_TIME=1h` |
## Usage
To configure Open Graph tags, you can set the following environment variables, environment file or as flags in your Anubis configuration:
```sh
export OG_PASSTHROUGH=true
export OG_EXPIRY_TIME=1h
```
## Implementation Details
When `OG_PASSTHROUGH` is enabled, Anubis will:
1. Check a local cache for the requested URL's Open Graph tags.
2. If a cached entry exists and is still valid, return the cached tags.
3. If the cached entry is stale or not found, fetch the URL, parse the Open Graph tags, update the cache, and return the new tags.
The cache expiration time is controlled by `OG_EXPIRY_TIME`.
## Example
Here is an example of how to configure Open Graph tags in your Anubis setup:
```sh
export OG_PASSTHROUGH=true
export OG_EXPIRY_TIME=1h
```
With these settings, Anubis will cache Open Graph tags for 1 hour and pass them through to the challenge page.
For more information, refer to the [installation guide](../installation).

View file

@ -53,12 +53,16 @@ Anubis uses these environment variables for configuration:
| `ED25519_PRIVATE_KEY_HEX_FILE` | unset | Path to a file containing the hex-encoded ed25519 private key. Only one of this or its sister option may be set. |
| `METRICS_BIND` | `:9090` | The network address that Anubis serves Prometheus metrics on. See `BIND` for more information. |
| `METRICS_BIND_NETWORK` | `tcp` | The address family that the Anubis metrics server listens on. See `BIND_NETWORK` for more information. |
| `SOCKET_MODE` | `0770` | _Only used when at least one of the `*_BIND_NETWORK` variables are set to `unix`._ The socket mode (permissions) for Unix domain sockets. |
| `OG_EXPIRY_TIME` | `24h` | The expiration time for the Open Graph tag cache. |
| `OG_PASSTHROUGH` | `false` | If set to `true`, Anubis will enable Open Graph tag passthrough. |
| `POLICY_FNAME` | unset | The file containing [bot policy configuration](./policies.md). See the bot policy documentation for more details. If unset, the default bot policy configuration is used. |
| `SERVE_ROBOTS_TXT` | `false` | If set `true`, Anubis will serve a default `robots.txt` file that disallows all known AI scrapers by name and then additionally disallows every scraper. This is useful if facts and circumstances make it difficult to change the underlying service to serve such a `robots.txt` file. |
| `SOCKET_MODE` | `0770` | _Only used when at least one of the `*_BIND_NETWORK` variables are set to `unix`._ The socket mode (permissions) for Unix domain sockets. |
| `TARGET` | `http://localhost:3923` | The URL of the service that Anubis should forward valid requests to. Supports Unix domain sockets, set this to a URI like so: `unix:///path/to/socket.sock`. |
| `USE_REMOTE_ADDRESS` | unset | If set to `true`, Anubis will take the client's IP from the network socket. For production deployments, it is expected that a reverse proxy is used in front of Anubis, which pass the IP using headers, instead. |
For more detailed information on configuring Open Graph tags, please refer to the [Open Graph Configuration](./configuration/open-graph.mdx) page.
### Key generation
To generate an ed25519 private key, you can use this command:
@ -86,6 +90,8 @@ services:
SERVE_ROBOTS_TXT: "true"
TARGET: "http://nginx"
POLICY_FNAME: "/data/cfg/botPolicy.json"
OG_PASSTHROUGH: "true"
OG_EXPIRY_TIME: "24h"
ports:
- 8080:8080
volumes:
@ -122,6 +128,10 @@ containers:
value: "true"
- name: "TARGET"
value: "http://localhost:5000"
- name: "OG_PASSTHROUGH"
value: "true"
- name: "OG_EXPIRY_TIME"
value: "24h"
resources:
limits:
cpu: 500m

2
go.mod
View file

@ -10,6 +10,7 @@ require (
github.com/prometheus/client_golang v1.21.1
github.com/sebest/xff v0.0.0-20210106013422-671bd2870b3a
github.com/yl2chen/cidranger v1.0.2
golang.org/x/net v0.37.0
)
require (
@ -42,7 +43,6 @@ require (
github.com/prometheus/procfs v0.15.1 // indirect
golang.org/x/exp/typeparams v0.0.0-20231108232855-2478ac86f678 // indirect
golang.org/x/mod v0.24.0 // indirect
golang.org/x/net v0.37.0 // indirect
golang.org/x/sync v0.12.0 // indirect
golang.org/x/sys v0.31.0 // indirect
golang.org/x/tools v0.31.0 // indirect

View file

@ -13,6 +13,7 @@ import (
// UnchangingCache sets the Cache-Control header to cache a response for 1 year if
// and only if the application is compiled in "release" mode by Docker.
func UnchangingCache(next http.Handler) http.Handler {
//goland:noinspection GoBoolExpressions
if anubis.Version == "devel" {
return next
}
@ -72,7 +73,6 @@ func NoStoreCache(next http.Handler) http.Handler {
})
}
// Do not allow browsing directory listings in paths that end with /
func NoBrowsing(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

51
internal/ogtags/cache.go Normal file
View file

@ -0,0 +1,51 @@
package ogtags
import (
"errors"
"log/slog"
"net/url"
"syscall"
)
// GetOGTags is the main function that retrieves Open Graph tags for a URL
func (c *OGTagCache) GetOGTags(url *url.URL) (map[string]string, error) {
if url == nil {
return nil, errors.New("nil URL provided, cannot fetch OG tags")
}
urlStr := c.getTarget(url)
// Check cache first
if cachedTags := c.checkCache(urlStr); cachedTags != nil {
return cachedTags, nil
}
// Fetch HTML content
doc, err := c.fetchHTMLDocument(urlStr)
if errors.Is(err, syscall.ECONNREFUSED) {
slog.Debug("Connection refused, returning empty tags")
return nil, nil
} else if errors.Is(err, ErrNotFound) {
// not even worth a debug log...
return nil, nil
}
if err != nil {
return nil, err
}
// Extract OG tags
ogTags := c.extractOGTags(doc)
// Store in cache
c.cache.Set(urlStr, ogTags, c.ogTimeToLive)
return ogTags, nil
}
// checkCache checks if we have the tags cached and returns them if so
func (c *OGTagCache) checkCache(urlStr string) map[string]string {
if cachedTags, ok := c.cache.Get(urlStr); ok {
slog.Debug("cache hit", "tags", cachedTags)
return cachedTags
}
slog.Debug("cache miss", "url", urlStr)
return nil
}

View file

@ -0,0 +1,122 @@
package ogtags
import (
"net/http"
"net/http/httptest"
"net/url"
"testing"
"time"
)
func TestCheckCache(t *testing.T) {
cache := NewOGTagCache("http://example.com", true, time.Minute)
// Set up test data
urlStr := "http://example.com/page"
expectedTags := map[string]string{
"og:title": "Test Title",
"og:description": "Test Description",
}
// Test cache miss
tags := cache.checkCache(urlStr)
if tags != nil {
t.Errorf("expected nil tags on cache miss, got %v", tags)
}
// Manually add to cache
cache.cache.Set(urlStr, expectedTags, time.Minute)
// Test cache hit
tags = cache.checkCache(urlStr)
if tags == nil {
t.Fatal("expected non-nil tags on cache hit, got nil")
}
for key, expectedValue := range expectedTags {
if value, ok := tags[key]; !ok || value != expectedValue {
t.Errorf("expected %s: %s, got: %s", key, expectedValue, value)
}
}
}
func TestGetOGTags(t *testing.T) {
var loadCount int // Counter to track how many times the test route is loaded
// Create a test server to serve a sample HTML page with OG tags
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
loadCount++
if loadCount > 1 {
t.Fatalf("Test route loaded more than once, cache failed")
}
w.Header().Set("Content-Type", "text/html")
w.Write([]byte(`
<!DOCTYPE html>
<html>
<head>
<meta property="og:title" content="Test Title" />
<meta property="og:description" content="Test Description" />
<meta property="og:image" content="http://example.com/image.jpg" />
</head>
<body>
<p>Hello, world!</p>
</body>
</html>
`))
}))
defer ts.Close()
// Create an instance of OGTagCache with a short TTL for testing
cache := NewOGTagCache(ts.URL, true, 1*time.Minute)
// Parse the test server URL
parsedURL, err := url.Parse(ts.URL)
if err != nil {
t.Fatalf("failed to parse test server URL: %v", err)
}
// Test fetching OG tags from the test server
ogTags, err := cache.GetOGTags(parsedURL)
if err != nil {
t.Fatalf("failed to get OG tags: %v", err)
}
// Verify the fetched OG tags
expectedTags := map[string]string{
"og:title": "Test Title",
"og:description": "Test Description",
"og:image": "http://example.com/image.jpg",
}
for key, expectedValue := range expectedTags {
if value, ok := ogTags[key]; !ok || value != expectedValue {
t.Errorf("expected %s: %s, got: %s", key, expectedValue, value)
}
}
// Test fetching OG tags from the cache
ogTags, err = cache.GetOGTags(parsedURL)
if err != nil {
t.Fatalf("failed to get OG tags from cache: %v", err)
}
// Test fetching OG tags from the cache (3rd time)
newOgTags, err := cache.GetOGTags(parsedURL)
if err != nil {
t.Fatalf("failed to get OG tags from cache: %v", err)
}
// Verify the cached OG tags
for key, expectedValue := range expectedTags {
if value, ok := ogTags[key]; !ok || value != expectedValue {
t.Errorf("expected %s: %s, got: %s", key, expectedValue, value)
}
initialValue := ogTags[key]
cachedValue, ok := newOgTags[key]
if !ok || initialValue != cachedValue {
t.Errorf("Cache does not line up: expected %s: %s, got: %s", key, initialValue, cachedValue)
}
}
}

69
internal/ogtags/fetch.go Normal file
View file

@ -0,0 +1,69 @@
package ogtags
import (
"errors"
"fmt"
"golang.org/x/net/html"
"log/slog"
"mime"
"net"
"net/http"
)
var (
ErrNotFound = errors.New("page not found") /*todo: refactor into common errors lib? */
emptyMap = map[string]string{} // used to indicate an empty result in the cache. Can't use nil as it would be a cache miss.
)
func (c *OGTagCache) fetchHTMLDocument(urlStr string) (*html.Node, error) {
resp, err := c.client.Get(urlStr)
if err != nil {
var netErr net.Error
if errors.As(err, &netErr) && netErr.Timeout() {
slog.Debug("og: request timed out", "url", urlStr)
c.cache.Set(urlStr, emptyMap, c.ogTimeToLive/2) // Cache empty result for half the TTL to not spam the server
}
return nil, fmt.Errorf("http get failed: %w", err)
}
// this defer will call MaxBytesReader's Close, which closes the original body.
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
slog.Debug("og: received non-OK status code", "url", urlStr, "status", resp.StatusCode)
c.cache.Set(urlStr, emptyMap, c.ogTimeToLive) // Cache empty result for non-successful status codes
return nil, ErrNotFound
}
// Check content type
ct := resp.Header.Get("Content-Type")
if ct == "" {
// assume non html body
return nil, fmt.Errorf("missing Content-Type header")
} else {
mediaType, _, err := mime.ParseMediaType(ct)
if err != nil {
// Malformed Content-Type header
return nil, fmt.Errorf("invalid Content-Type '%s': %w", ct, err)
}
if mediaType != "text/html" && mediaType != "application/xhtml+xml" {
return nil, fmt.Errorf("unsupported Content-Type: %s", mediaType)
}
}
resp.Body = http.MaxBytesReader(nil, resp.Body, c.maxContentLength)
doc, err := html.Parse(resp.Body)
if err != nil {
// Check if the error is specifically because the limit was exceeded
var maxBytesErr *http.MaxBytesError
if errors.As(err, &maxBytesErr) {
slog.Debug("og: content exceeded max length", "url", urlStr, "limit", c.maxContentLength)
return nil, fmt.Errorf("content too large: exceeded %d bytes", c.maxContentLength)
}
// parsing error (e.g., malformed HTML)
return nil, fmt.Errorf("failed to parse HTML: %w", err)
}
return doc, nil
}

View file

@ -0,0 +1,119 @@
package ogtags
import (
"fmt"
"io"
"net/http"
"net/http/httptest"
"os"
"strings"
"testing"
"time"
)
func TestFetchHTMLDocument(t *testing.T) {
tests := []struct {
name string
htmlContent string
contentType string
statusCode int
contentLength int64
expectError bool
}{
{
name: "Valid HTML",
htmlContent: `<!DOCTYPE html>
<html>
<head><title>Test</title></head>
<body><p>Test content</p></body>
</html>`,
contentType: "text/html",
statusCode: http.StatusOK,
expectError: false,
},
{
name: "Empty HTML",
htmlContent: "",
contentType: "text/html",
statusCode: http.StatusOK,
expectError: false,
},
{
name: "Not found error",
htmlContent: "",
contentType: "text/html",
statusCode: http.StatusNotFound,
expectError: true,
},
{
name: "Unsupported Content-Type",
htmlContent: "*Insert rick roll here*",
contentType: "video/mp4",
statusCode: http.StatusOK,
expectError: true,
},
{
name: "Too large content",
contentType: "text/html",
statusCode: http.StatusOK,
expectError: true,
contentLength: 5 * 1024 * 1024, // 5MB (over 2MB limit)
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if tt.contentType != "" {
w.Header().Set("Content-Type", tt.contentType)
}
if tt.contentLength > 0 {
// Simulate content length but avoid sending too much actual data
w.Header().Set("Content-Length", fmt.Sprintf("%d", tt.contentLength))
io.CopyN(w, strings.NewReader("X"), tt.contentLength)
} else {
w.WriteHeader(tt.statusCode)
w.Write([]byte(tt.htmlContent))
}
}))
defer ts.Close()
cache := NewOGTagCache("", true, time.Minute)
doc, err := cache.fetchHTMLDocument(ts.URL)
if tt.expectError {
if err == nil {
t.Error("expected error, got nil")
}
if doc != nil {
t.Error("expected nil document on error, got non-nil")
}
} else {
if err != nil {
t.Errorf("unexpected error: %v", err)
}
if doc == nil {
t.Error("expected non-nil document, got nil")
}
}
})
}
}
func TestFetchHTMLDocumentInvalidURL(t *testing.T) {
if os.Getenv("DONT_USE_NETWORK") != "" {
t.Skip("test requires theoretical network egress")
}
cache := NewOGTagCache("", true, time.Minute)
doc, err := cache.fetchHTMLDocument("http://invalid.url.that.doesnt.exist.example")
if err == nil {
t.Error("expected error for invalid URL, got nil")
}
if doc != nil {
t.Error("expected nil document for invalid URL, got non-nil")
}
}

View file

@ -0,0 +1,155 @@
package ogtags
import (
"net/http"
"net/http/httptest"
"net/url"
"testing"
"time"
)
func TestIntegrationGetOGTags(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/html")
switch r.URL.Path {
case "/simple":
w.Write([]byte(`
<!DOCTYPE html>
<html>
<head>
<meta property="og:title" content="Simple Page" />
<meta property="og:type" content="website" />
</head>
<body><p>Simple page content</p></body>
</html>
`))
case "/complete":
w.Write([]byte(`
<!DOCTYPE html>
<html>
<head>
<meta property="og:title" content="Complete Page" />
<meta property="og:description" content="A page with many OG tags" />
<meta property="og:image" content="http://example.com/image.jpg" />
<meta property="og:url" content="http://example.com/complete" />
<meta property="og:type" content="article" />
</head>
<body><p>Complete page content</p></body>
</html>
`))
case "/no-og":
w.Write([]byte(`
<!DOCTYPE html>
<html>
<head>
<title>No OG Tags</title>
</head>
<body><p>No OG tags here</p></body>
</html>
`))
default:
w.WriteHeader(http.StatusNotFound)
}
}))
defer ts.Close()
// Test with different configurations
testCases := []struct {
name string
path string
query string
expectedTags map[string]string
expectError bool
}{
{
name: "Simple page",
path: "/simple",
query: "",
expectedTags: map[string]string{
"og:title": "Simple Page",
"og:type": "website",
},
expectError: false,
},
{
name: "Complete page",
path: "/complete",
query: "ref=test",
expectedTags: map[string]string{
"og:title": "Complete Page",
"og:description": "A page with many OG tags",
"og:image": "http://example.com/image.jpg",
"og:url": "http://example.com/complete",
"og:type": "article",
},
expectError: false,
},
{
name: "Page with no OG tags",
path: "/no-og",
query: "",
expectedTags: map[string]string{},
expectError: false,
},
{
name: "Non-existent page",
path: "/not-found",
query: "",
expectedTags: nil,
expectError: false,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// Create cache instance
cache := NewOGTagCache(ts.URL, true, 1*time.Minute)
// Create URL for test
testURL, _ := url.Parse(ts.URL)
testURL.Path = tc.path
testURL.RawQuery = tc.query
// Get OG tags
ogTags, err := cache.GetOGTags(testURL)
// Check error expectation
if tc.expectError {
if err == nil {
t.Error("expected error, got nil")
}
return
}
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
// Verify all expected tags are present
for key, expectedValue := range tc.expectedTags {
if value, ok := ogTags[key]; !ok || value != expectedValue {
t.Errorf("expected %s: %s, got: %s", key, expectedValue, value)
}
}
// Verify no extra tags are present
if len(ogTags) != len(tc.expectedTags) {
t.Errorf("expected %d tags, got %d", len(tc.expectedTags), len(ogTags))
}
// Test cache retrieval
cachedOGTags, err := cache.GetOGTags(testURL)
if err != nil {
t.Fatalf("failed to get OG tags from cache: %v", err)
}
// Verify cached tags match
for key, expectedValue := range tc.expectedTags {
if value, ok := cachedOGTags[key]; !ok || value != expectedValue {
t.Errorf("cached value - expected %s: %s, got: %s", key, expectedValue, value)
}
}
})
}
}

51
internal/ogtags/ogtags.go Normal file
View file

@ -0,0 +1,51 @@
package ogtags
import (
"net/http"
"net/url"
"time"
"github.com/TecharoHQ/anubis/decaymap"
)
type OGTagCache struct {
cache *decaymap.Impl[string, map[string]string]
target string
ogPassthrough bool
ogTimeToLive time.Duration
approvedTags []string
approvedPrefixes []string
client *http.Client
maxContentLength int64
}
func NewOGTagCache(target string, ogPassthrough bool, ogTimeToLive time.Duration) *OGTagCache {
// Predefined approved tags and prefixes
// In the future, these could come from configuration
defaultApprovedTags := []string{"description", "keywords", "author"}
defaultApprovedPrefixes := []string{"og:", "twitter:", "fediverse:"}
client := &http.Client{
Timeout: 5 * time.Second, /*make this configurable?*/
}
const maxContentLength = 16 << 20 // 16 MiB in bytes
return &OGTagCache{
cache: decaymap.New[string, map[string]string](),
target: target,
ogPassthrough: ogPassthrough,
ogTimeToLive: ogTimeToLive,
approvedTags: defaultApprovedTags,
approvedPrefixes: defaultApprovedPrefixes,
client: client,
maxContentLength: maxContentLength,
}
}
func (c *OGTagCache) getTarget(u *url.URL) string {
return c.target + u.Path
}
func (c *OGTagCache) Cleanup() {
c.cache.Cleanup()
}

View file

@ -0,0 +1,100 @@
package ogtags
import (
"net/url"
"testing"
"time"
)
func TestNewOGTagCache(t *testing.T) {
tests := []struct {
name string
target string
ogPassthrough bool
ogTimeToLive time.Duration
}{
{
name: "Basic initialization",
target: "http://example.com",
ogPassthrough: true,
ogTimeToLive: 5 * time.Minute,
},
{
name: "Empty target",
target: "",
ogPassthrough: false,
ogTimeToLive: 10 * time.Minute,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
cache := NewOGTagCache(tt.target, tt.ogPassthrough, tt.ogTimeToLive)
if cache == nil {
t.Fatal("expected non-nil cache, got nil")
}
if cache.target != tt.target {
t.Errorf("expected target %s, got %s", tt.target, cache.target)
}
if cache.ogPassthrough != tt.ogPassthrough {
t.Errorf("expected ogPassthrough %v, got %v", tt.ogPassthrough, cache.ogPassthrough)
}
if cache.ogTimeToLive != tt.ogTimeToLive {
t.Errorf("expected ogTimeToLive %v, got %v", tt.ogTimeToLive, cache.ogTimeToLive)
}
})
}
}
func TestGetTarget(t *testing.T) {
tests := []struct {
name string
target string
path string
query string
expected string
}{
{
name: "No path or query",
target: "http://example.com",
path: "",
query: "",
expected: "http://example.com",
},
{
name: "With complex path",
target: "http://example.com",
path: "/pag(#*((#@)ΓΓΓΓe/Γ",
query: "id=123",
expected: "http://example.com/pag(#*((#@)ΓΓΓΓe/Γ",
},
{
name: "With query and path",
target: "http://example.com",
path: "/page",
query: "id=123",
expected: "http://example.com/page",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
cache := NewOGTagCache(tt.target, false, time.Minute)
u := &url.URL{
Path: tt.path,
RawQuery: tt.query,
}
result := cache.getTarget(u)
if result != tt.expected {
t.Errorf("expected %s, got %s", tt.expected, result)
}
})
}
}

81
internal/ogtags/parse.go Normal file
View file

@ -0,0 +1,81 @@
package ogtags
import (
"strings"
"golang.org/x/net/html"
)
// extractOGTags traverses the HTML document and extracts approved Open Graph tags
func (c *OGTagCache) extractOGTags(doc *html.Node) map[string]string {
ogTags := make(map[string]string)
var traverseNodes func(*html.Node)
traverseNodes = func(n *html.Node) {
// isOGMetaTag only checks if it's a <meta> tag.
// The actual filtering happens in extractMetaTagInfo now.
if isOGMetaTag(n) {
property, content := c.extractMetaTagInfo(n)
if property != "" {
ogTags[property] = content
}
}
for child := n.FirstChild; child != nil; child = child.NextSibling {
traverseNodes(child)
}
}
traverseNodes(doc)
return ogTags
}
// isOGMetaTag checks if a node is *any* meta tag
func isOGMetaTag(n *html.Node) bool {
if n == nil {
return false
}
return n.Type == html.ElementNode && n.Data == "meta"
}
// extractMetaTagInfo extracts property and content from a meta tag
// *and* checks if the property is approved.
// Returns empty property string if the tag is not approved.
func (c *OGTagCache) extractMetaTagInfo(n *html.Node) (property, content string) {
var rawProperty string // Store the property found before approval check
for _, attr := range n.Attr {
if attr.Key == "property" || attr.Key == "name" {
rawProperty = attr.Val
}
if attr.Key == "content" {
content = attr.Val
}
}
// Check if the rawProperty is approved
isApproved := false
for _, prefix := range c.approvedPrefixes {
if strings.HasPrefix(rawProperty, prefix) {
isApproved = true
break
}
}
// Check exact approved tags if not already approved by prefix
if !isApproved {
for _, tag := range c.approvedTags {
if rawProperty == tag {
isApproved = true
break
}
}
}
// Only return the property if it's approved
if isApproved {
property = rawProperty
}
// Content is returned regardless, but property will be "" if not approved
return property, content
}

View file

@ -0,0 +1,295 @@
package ogtags
import (
"reflect"
"strings"
"testing"
"time"
"golang.org/x/net/html"
)
// TestExtractOGTags updated with correct expectations based on filtering logic
func TestExtractOGTags(t *testing.T) {
// Use a cache instance that reflects the default approved lists
testCache := NewOGTagCache("", false, time.Minute)
// Manually set approved tags/prefixes based on the user request for clarity
testCache.approvedTags = []string{"description"}
testCache.approvedPrefixes = []string{"og:"}
tests := []struct {
name string
htmlStr string
expected map[string]string
}{
{
name: "Basic OG tags", // Includes standard 'description' meta tag
htmlStr: `<!DOCTYPE html>
<html>
<head>
<meta property="og:title" content="Test Title" />
<meta property="og:description" content="Test Description" />
<meta name="description" content="Regular Description" />
<meta name="keywords" content="test, keyword" />
</head>
<body></body>
</html>`,
expected: map[string]string{
"og:title": "Test Title",
"og:description": "Test Description",
"description": "Regular Description",
},
},
{
name: "OG tags with name attribute",
htmlStr: `<!DOCTYPE html>
<html>
<head>
<meta name="og:title" content="Test Title" />
<meta property="og:description" content="Test Description" />
<meta name="twitter:card" content="summary" />
</head>
<body></body>
</html>`,
expected: map[string]string{
"og:title": "Test Title",
"og:description": "Test Description",
// twitter:card is still not approved
},
},
{
name: "No approved OG tags", // Contains only standard 'description'
htmlStr: `<!DOCTYPE html>
<html>
<head>
<meta name="description" content="Test Description" />
<meta name="keywords" content="Test" />
</head>
<body></body>
</html>`,
expected: map[string]string{
"description": "Test Description",
},
},
{
name: "Empty content",
htmlStr: `<!DOCTYPE html>
<html>
<head>
<meta property="og:title" content="" />
<meta property="og:description" content="Test Description" />
</head>
<body></body>
</html>`,
expected: map[string]string{
"og:title": "",
"og:description": "Test Description",
},
},
{
name: "Explicitly approved tag",
htmlStr: `<!DOCTYPE html>
<html>
<head>
<meta property="description" content="Approved Description Tag" />
</head>
<body></body>
</html>`,
expected: map[string]string{
// This is approved because "description" is in cache.approvedTags
"description": "Approved Description Tag",
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
doc, err := html.Parse(strings.NewReader(tt.htmlStr))
if err != nil {
t.Fatalf("failed to parse HTML: %v", err)
}
ogTags := testCache.extractOGTags(doc)
if !reflect.DeepEqual(ogTags, tt.expected) {
t.Errorf("expected %v, got %v", tt.expected, ogTags)
}
})
}
}
func TestIsOGMetaTag(t *testing.T) {
tests := []struct {
name string
nodeHTML string
targetNode string // Helper to find the right node in parsed fragment
expected bool
}{
{
name: "Meta OG tag",
nodeHTML: `<meta property="og:title" content="Test">`,
targetNode: "meta",
expected: true,
},
{
name: "Regular meta tag",
nodeHTML: `<meta name="description" content="Test">`,
targetNode: "meta",
expected: true,
},
{
name: "Not a meta tag",
nodeHTML: `<div>Test</div>`,
targetNode: "div",
expected: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Wrap the partial HTML in basic structure for parsing
fullHTML := "<html><head>" + tt.nodeHTML + "</head><body></body></html>"
doc, err := html.Parse(strings.NewReader(fullHTML))
if err != nil {
t.Fatalf("failed to parse HTML: %v", err)
}
// Find the target element node (meta or div based on targetNode)
var node *html.Node
var findNode func(*html.Node)
findNode = func(n *html.Node) {
// Skip finding if already found
if node != nil {
return
}
// Check if current node matches type and tag data
if n.Type == html.ElementNode && n.Data == tt.targetNode {
node = n
return
}
// Recursively check children
for c := n.FirstChild; c != nil; c = c.NextSibling {
findNode(c)
}
}
findNode(doc) // Start search from root
if node == nil {
t.Fatalf("Could not find target node '%s' in test HTML", tt.targetNode)
}
// Call the function under test
result := isOGMetaTag(node)
if result != tt.expected {
t.Errorf("expected %v, got %v", tt.expected, result)
}
})
}
}
func TestExtractMetaTagInfo(t *testing.T) {
// Use a cache instance that reflects the default approved lists
testCache := NewOGTagCache("", false, time.Minute)
testCache.approvedTags = []string{"description"}
testCache.approvedPrefixes = []string{"og:"}
tests := []struct {
name string
nodeHTML string
expectedProperty string
expectedContent string
}{
{
name: "OG title with property (approved by prefix)",
nodeHTML: `<meta property="og:title" content="Test Title">`,
expectedProperty: "og:title",
expectedContent: "Test Title",
},
{
name: "OG description with name (approved by prefix)",
nodeHTML: `<meta name="og:description" content="Test Description">`,
expectedProperty: "og:description",
expectedContent: "Test Description",
},
{
name: "Regular meta tag (name=description, approved by exact match)", // Updated name for clarity
nodeHTML: `<meta name="description" content="Test Description">`,
expectedProperty: "description",
expectedContent: "Test Description",
},
{
name: "Regular meta tag (name=keywords, not approved)",
nodeHTML: `<meta name="keywords" content="Test Keywords">`,
expectedProperty: "",
expectedContent: "Test Keywords",
},
{
name: "Twitter tag (not approved by default)",
nodeHTML: `<meta name="twitter:card" content="summary">`,
expectedProperty: "",
expectedContent: "summary",
},
{
name: "No content (but approved property)",
nodeHTML: `<meta property="og:title">`,
expectedProperty: "og:title",
expectedContent: "",
},
{
name: "No property/name attribute",
nodeHTML: `<meta content="No property">`,
expectedProperty: "",
expectedContent: "No property",
},
{
name: "Explicitly approved tag with property attribute",
nodeHTML: `<meta property="description" content="Approved Description Tag">`,
expectedProperty: "description", // Approved by exact match in approvedTags
expectedContent: "Approved Description Tag",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
fullHTML := "<html><head>" + tt.nodeHTML + "</head><body></body></html>"
doc, err := html.Parse(strings.NewReader(fullHTML))
if err != nil {
t.Fatalf("failed to parse HTML: %v", err)
}
var node *html.Node
var findMetaNode func(*html.Node)
findMetaNode = func(n *html.Node) {
if node != nil { // Stop searching once found
return
}
if n.Type == html.ElementNode && n.Data == "meta" {
node = n
return
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
findMetaNode(c)
}
}
findMetaNode(doc) // Start search from root
if node == nil {
// Handle cases where the input might not actually contain a meta tag, though all test cases do.
// If the test case is *designed* not to have a meta tag, this check should be different.
// But for these tests, failure to find implies an issue with the test setup or parser.
t.Fatalf("Could not find meta node in test HTML: %s", tt.nodeHTML)
}
// Call extractMetaTagInfo using the test cache instance
property, content := testCache.extractMetaTagInfo(node)
if property != tt.expectedProperty {
t.Errorf("expected property '%s', got '%s'", tt.expectedProperty, property)
}
if content != tt.expectedContent {
t.Errorf("expected content '%s', got '%s'", tt.expectedContent, content)
}
})
}
}

View file

@ -18,11 +18,13 @@ package test
import (
"flag"
"fmt"
"net"
"net/http"
"net/http/httptest"
"net/url"
"os"
"os/exec"
"strconv"
"testing"
"time"
@ -63,12 +65,6 @@ var (
realIP: "216.18.205.234",
userAgent: "Mozilla/5.0 (compatible; Kagibot/1.0; +https://kagi.com/bot)",
},
{
name: "iMessageScraper",
action: actionAllow,
realIP: placeholderIP,
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0",
},
{
name: "unknownAgent",
action: actionAllow,
@ -426,16 +422,30 @@ func spawnAnubis(t *testing.T) string {
t.Fatal(err)
}
listener, err := net.Listen("tcp", ":0")
if err != nil {
t.Fatalf("can't listen on random port: %v", err)
}
addr := listener.Addr().(*net.TCPAddr)
host := "localhost"
port := strconv.Itoa(addr.Port)
s, err := libanubis.New(libanubis.Options{
Next: h,
Policy: policy,
ServeRobotsTXT: true,
Target: "http://" + host + ":" + port,
})
if err != nil {
t.Fatalf("can't construct libanubis.Server: %v", err)
}
ts := httptest.NewServer(s)
ts := &httptest.Server{
Listener: listener,
Config: &http.Server{Handler: s},
}
ts.Start()
t.Log(ts.URL)
t.Cleanup(func() {

View file

@ -28,6 +28,7 @@ import (
"github.com/TecharoHQ/anubis/decaymap"
"github.com/TecharoHQ/anubis/internal"
"github.com/TecharoHQ/anubis/internal/dnsbl"
"github.com/TecharoHQ/anubis/internal/ogtags"
"github.com/TecharoHQ/anubis/lib/policy"
"github.com/TecharoHQ/anubis/lib/policy/config"
"github.com/TecharoHQ/anubis/web"
@ -71,6 +72,10 @@ type Options struct {
CookieDomain string
CookieName string
CookiePartitioned bool
OGPassthrough bool
OGTimeToLive time.Duration
Target string
}
func LoadPoliciesOrDefault(fname string, defaultDifficulty int) (*policy.ParsedConfig, error) {
@ -92,9 +97,9 @@ func LoadPoliciesOrDefault(fname string, defaultDifficulty int) (*policy.ParsedC
defer fin.Close()
policy, err := policy.ParseConfig(fin, fname, defaultDifficulty)
anubisPolicy, err := policy.ParseConfig(fin, fname, defaultDifficulty)
return policy, err
return anubisPolicy, err
}
func New(opts Options) (*Server, error) {
@ -114,6 +119,7 @@ func New(opts Options) (*Server, error) {
policy: opts.Policy,
opts: opts,
DNSBLCache: decaymap.New[string, dnsbl.DroneBLResponse](),
OGTags: ogtags.NewOGTagCache(opts.Target, opts.OGPassthrough, opts.OGTimeToLive),
}
mux := http.NewServeMux()
@ -152,6 +158,7 @@ type Server struct {
policy *policy.ParsedConfig
opts Options
DNSBLCache *decaymap.Impl[string, dnsbl.DroneBLResponse]
OGTags *ogtags.OGTagCache
}
func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
@ -329,9 +336,18 @@ func (s *Server) MaybeReverseProxy(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) RenderIndex(w http.ResponseWriter, r *http.Request) {
var ogTags map[string]string = nil
if s.opts.OGPassthrough {
var err error
ogTags, err = s.OGTags.GetOGTags(r.URL)
if err != nil {
slog.Error("failed to get OG tags", "err", err)
ogTags = nil
}
}
handler := internal.NoStoreCache(
templ.Handler(
web.Base("Making sure you're not a bot!", web.Index()),
web.BaseWithOGTags("Making sure you're not a bot!", web.Index(), ogTags),
),
)
handler.ServeHTTP(w, r)
@ -541,4 +557,5 @@ func (s *Server) checkRemoteAddress(b policy.Bot, addr net.IP) bool {
func (s *Server) CleanupDecayMap() {
s.DNSBLCache.Cleanup()
s.OGTags.Cleanup()
}

View file

@ -178,6 +178,10 @@ func TestCookieSettings(t *testing.T) {
break
}
}
if ckie == nil {
t.Errorf("Cookie %q not found", anubis.CookieName)
return
}
if ckie.Domain != "local.cetacean.club" {
t.Errorf("cookie domain is wrong, wanted local.cetacean.club, got: %s", ckie.Domain)
@ -186,10 +190,6 @@ func TestCookieSettings(t *testing.T) {
if ckie.Partitioned != srv.opts.CookiePartitioned {
t.Errorf("wanted partitioned flag %v, got: %v", srv.opts.CookiePartitioned, ckie.Partitioned)
}
if ckie == nil {
t.Errorf("Cookie %q not found", anubis.CookieName)
}
}
func TestCheckDefaultDifficultyMatchesPolicy(t *testing.T) {
@ -199,14 +199,14 @@ func TestCheckDefaultDifficultyMatchesPolicy(t *testing.T) {
for i := 1; i < 10; i++ {
t.Run(fmt.Sprint(i), func(t *testing.T) {
policy, err := LoadPoliciesOrDefault("", i)
anubisPolicy, err := LoadPoliciesOrDefault("", i)
if err != nil {
t.Fatal(err)
}
s, err := New(Options{
Next: h,
Policy: policy,
Policy: anubisPolicy,
ServeRobotsTXT: true,
})
if err != nil {

View file

@ -1,9 +1,15 @@
package web
import "github.com/a-h/templ"
import (
"github.com/a-h/templ"
)
func Base(title string, body templ.Component) templ.Component {
return base(title, body)
return base(title, body, nil)
}
func BaseWithOGTags(title string, body templ.Component, ogTags map[string]string) templ.Component {
return base(title, body, ogTags)
}
func Index() templ.Component {

View file

@ -1,18 +1,21 @@
package web
import (
"github.com/TecharoHQ/anubis"
"github.com/TecharoHQ/anubis/xess"
"github.com/TecharoHQ/anubis"
"github.com/TecharoHQ/anubis/xess"
)
templ base(title string, body templ.Component) {
<!DOCTYPE html>
<html>
<head>
templ base(title string, body templ.Component, ogTags map[string]string) {
<!DOCTYPE html>
<html lang="en">
<head>
<title>{ title }</title>
<link rel="stylesheet" href={ xess.URL }/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="robots" content="noindex,nofollow"/>
for key, value := range ogTags {
<meta property={ key } content={ value }/>
}
<style>
body,
html {
@ -53,9 +56,10 @@ templ base(title string, body templ.Component) {
}
</style>
@templ.JSONScript("anubis_version", anubis.Version)
</head>
<body id="top">
<main>
</head>
<body id="top">
<main>
<center>
<h1 id="title" class=".centered-div">{ title }</h1>
</center>
@ -65,18 +69,18 @@ templ base(title string, body templ.Component) {
<p>
Protected by <a href="https://github.com/TecharoHQ/anubis">Anubis</a> from <a
href="https://techaro.lol"
>Techaro</a>. Made with ❤️ in 🇨🇦.
>Techaro</a>. Made with ❤️ in 🇨🇦.
</p>
<p>Mascot design by <a href="https://bsky.app/profile/celphase.bsky.social">CELPHASE</a>.</p>
</center>
</footer>
</main>
</body>
</html>
</main>
</body>
</html>
}
templ index() {
<div class="centered-div">
<div class="centered-div">
<img
id="image"
style="width:100%;max-width:256px;"
@ -90,42 +94,57 @@ templ index() {
anubis.Version }
/>
<p id="status">Loading...</p>
<script async type="module" src={ "/.within.website/x/cmd/anubis/static/js/main.mjs?cacheBuster=" + anubis.Version }></script>
<script async type="module" src={
"/.within.website/x/cmd/anubis/static/js/main.mjs?cacheBuster=" + anubis.Version }></script>
<div id="progress" role="progressbar" aria-labelledby="status">
<div class="bar-inner"></div>
</div>
<details>
<summary>Why am I seeing this?</summary>
<p>You are seeing this because the administrator of this website has set up <a href="https://github.com/TecharoHQ/anubis">Anubis</a> to protect the server against the scourge of <a href="https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/">AI companies aggressively scraping websites</a>. This can and does cause downtime for the websites, which makes their resources inaccessible for everyone.</p>
<p>Anubis is a compromise. Anubis uses a <a href="https://anubis.techaro.lol/docs/design/why-proof-of-work">Proof-of-Work</a> scheme in the vein of <a href="https://en.wikipedia.org/wiki/Hashcash">Hashcash</a>, a proposed proof-of-work scheme for reducing email spam. The idea is that at individual scales the additional load is ignorable, but at mass scraper levels it adds up and makes scraping much more expensive.</p>
<p>Ultimately, this is a hack whose real purpose is to give a "good enough" placeholder solution so that more time can be spent on fingerprinting and identifying headless browsers (EG: via how they do font rendering) so that the challenge proof of work page doesn't need to be presented to users that are much more likely to be legitimate.</p>
<p>Please note that Anubis requires the use of modern JavaScript features that plugins like <a href="https://jshelter.org/">JShelter</a> will disable. Please disable JShelter or other such plugins for this domain.</p>
<p>You are seeing this because the administrator of this website has set up <a
href="https://github.com/TecharoHQ/anubis">Anubis</a> to protect the server against the scourge of
<a href="https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/">AI companies
aggressively scraping websites</a>. This can and does cause downtime for the websites, which makes their
resources inaccessible for everyone.</p>
<p>Anubis is a compromise. Anubis uses a <a href="https://anubis.techaro.lol/docs/design/why-proof-of-work">Proof-of-Work</a>
scheme in the vein of <a href="https://en.wikipedia.org/wiki/Hashcash">Hashcash</a>, a proposed
proof-of-work scheme for reducing email spam. The idea is that at individual scales the additional load is
ignorable, but at mass scraper levels it adds up and makes scraping much more expensive.</p>
<p>Ultimately, this is a hack whose real purpose is to give a "good enough" placeholder solution so that more
time can be spent on fingerprinting and identifying headless browsers (EG: via how they do font rendering)
so that the challenge proof of work page doesn't need to be presented to users that are much more likely to
be legitimate.</p>
<p>Please note that Anubis requires the use of modern JavaScript features that plugins like <a
href="https://jshelter.org/">JShelter</a> will disable. Please disable JShelter or other such
plugins for this domain.</p>
</details>
<noscript>
<p>
Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have changed
Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have
changed
the social contract around how website hosting works. A no-JS solution is a work-in-progress.
</p>
</noscript>
<div id="testarea"></div>
</div>
</div>
}
templ errorPage(message string) {
<div class="centered-div">
<div class="centered-div">
<img
id="image"
alt="Sad Anubis"
style="width:100%;max-width:256px;"
src={ "/.within.website/x/cmd/anubis/static/img/reject.webp?cacheBuster=" + anubis.Version }
/>
<p>{ message }.</p>
<button onClick="window.location.reload();">Try again</button>
<p><a href="/">Go home</a></p>
</div>
</div>
}
templ bench() {
<div style="height:20rem;display:flex">
<div style="height:20rem;display:flex">
<table style="margin-top:1rem;display:grid;grid-template:auto 1fr/auto auto;gap:0 0.5rem">
<thead style="border-bottom:1px solid black;padding:0.25rem 0;display:grid;grid-template:1fr/subgrid;grid-column:1/-1">
<tr id="table-header" style="display:contents">
@ -139,7 +158,8 @@ templ bench() {
<th style="width:4rem">Iters B</th>
</tr>
</thead>
<tbody id="results" style="padding-top:0.25rem;display:grid;grid-template-columns:subgrid;grid-auto-rows:min-content;grid-column:1/-1;row-gap:0.25rem;overflow-y:auto;font-variant-numeric:tabular-nums"></tbody>
<tbody id="results"
style="padding-top:0.25rem;display:grid;grid-template-columns:subgrid;grid-auto-rows:min-content;grid-column:1/-1;row-gap:0.25rem;overflow-y:auto;font-variant-numeric:tabular-nums"></tbody>
</table>
<div class="centered-div">
<img
@ -149,14 +169,15 @@ templ bench() {
anubis.Version }
/>
<p id="status" style="max-width:256px">Loading...</p>
<script async type="module" src={ "/.within.website/x/cmd/anubis/static/js/bench.mjs?cacheBuster=" + anubis.Version }></script>
<script async type="module" src={
"/.within.website/x/cmd/anubis/static/js/bench.mjs?cacheBuster=" + anubis.Version }></script>
<div id="sparkline"></div>
<noscript>
<p>Running the benchmark tool requires JavaScript to be enabled.</p>
</noscript>
</div>
</div>
<form id="controls" style="position:fixed;top:0.5rem;right:0.5rem">
</div>
<form id="controls" style="position:fixed;top:0.5rem;right:0.5rem">
<div style="display:flex;justify-content:end">
<label for="difficulty-input" style="margin-right:0.5rem">Difficulty:</label>
<input id="difficulty-input" type="number" name="difficulty" style="width:3rem"/>
@ -171,5 +192,5 @@ templ bench() {
<option value="NONE">-</option>
</select>
</div>
</form>
</form>
}

188
web/index_templ.go generated
View file

@ -13,7 +13,7 @@ import (
"github.com/TecharoHQ/anubis/xess"
)
func base(title string, body templ.Component) templ.Component {
func base(title string, body templ.Component, ogTags map[string]string) templ.Component {
return templruntime.GeneratedTemplate(func(templ_7745c5c3_Input templruntime.GeneratedComponentInput) (templ_7745c5c3_Err error) {
templ_7745c5c3_W, ctx := templ_7745c5c3_Input.Writer, templ_7745c5c3_Input.Context
if templ_7745c5c3_CtxErr := ctx.Err(); templ_7745c5c3_CtxErr != nil {
@ -34,14 +34,14 @@ func base(title string, body templ.Component) templ.Component {
templ_7745c5c3_Var1 = templ.NopComponent
}
ctx = templ.ClearChildren(ctx)
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 1, "<!doctype html><html><head><title>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 1, "<!doctype html><html lang=\"en\"><head><title>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var2 string
templ_7745c5c3_Var2, templ_7745c5c3_Err = templ.JoinStringErrs(title)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 12, Col: 17}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 12, Col: 18}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var2))
if templ_7745c5c3_Err != nil {
@ -54,13 +54,49 @@ func base(title string, body templ.Component) templ.Component {
var templ_7745c5c3_Var3 string
templ_7745c5c3_Var3, templ_7745c5c3_Err = templ.JoinStringErrs(xess.URL)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 13, Col: 41}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 13, Col: 42}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var3))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 3, "\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"><meta name=\"robots\" content=\"noindex,nofollow\"><style>\n body,\n html {\n height: 100%;\n display: flex;\n justify-content: center;\n align-items: center;\n margin-left: auto;\n margin-right: auto;\n }\n\n .centered-div {\n text-align: center;\n }\n\n #status {\n font-variant-numeric: tabular-nums;\n }\n\n #progress {\n display: none;\n width: min(20rem, 90%);\n height: 2rem;\n border-radius: 1rem;\n overflow: hidden;\n margin: 1rem 0 2rem;\n outline-color: #b16286;\n outline-offset: 2px;\n outline-style: solid;\n outline-width: 4px;\n }\n\n .bar-inner {\n background-color: #b16286;\n height: 100%;\n width: 0;\n transition: width 0.25s ease-in;\n }\n </style>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 3, "\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"><meta name=\"robots\" content=\"noindex,nofollow\">")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
for key, value := range ogTags {
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 4, "<meta property=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var4 string
templ_7745c5c3_Var4, templ_7745c5c3_Err = templ.JoinStringErrs(key)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 17, Col: 24}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var4))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 5, "\" content=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var5 string
templ_7745c5c3_Var5, templ_7745c5c3_Err = templ.JoinStringErrs(value)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 17, Col: 42}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var5))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 6, "\">")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 7, "<style>\n body,\n html {\n height: 100%;\n display: flex;\n justify-content: center;\n align-items: center;\n margin-left: auto;\n margin-right: auto;\n }\n\n .centered-div {\n text-align: center;\n }\n\n #status {\n font-variant-numeric: tabular-nums;\n }\n\n #progress {\n display: none;\n width: min(20rem, 90%);\n height: 2rem;\n border-radius: 1rem;\n overflow: hidden;\n margin: 1rem 0 2rem;\n outline-color: #b16286;\n outline-offset: 2px;\n outline-style: solid;\n outline-width: 4px;\n }\n\n .bar-inner {\n background-color: #b16286;\n height: 100%;\n width: 0;\n transition: width 0.25s ease-in;\n }\n </style>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
@ -68,20 +104,20 @@ func base(title string, body templ.Component) templ.Component {
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 4, "</head><body id=\"top\"><main><center><h1 id=\"title\" class=\".centered-div\">")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 8, "</head><body id=\"top\"><main><center><h1 id=\"title\" class=\".centered-div\">")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var4 string
templ_7745c5c3_Var4, templ_7745c5c3_Err = templ.JoinStringErrs(title)
var templ_7745c5c3_Var6 string
templ_7745c5c3_Var6, templ_7745c5c3_Err = templ.JoinStringErrs(title)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 60, Col: 49}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 64, Col: 52}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var4))
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var6))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 5, "</h1></center>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 9, "</h1></center>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
@ -89,7 +125,7 @@ func base(title string, body templ.Component) templ.Component {
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 6, "<footer><center><p>Protected by <a href=\"https://github.com/TecharoHQ/anubis\">Anubis</a> from <a href=\"https://techaro.lol\">Techaro</a>. Made with ❤️ in 🇨🇦.</p><p>Mascot design by <a href=\"https://bsky.app/profile/celphase.bsky.social\">CELPHASE</a>.</p></center></footer></main></body></html>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 10, "<footer><center><p>Protected by <a href=\"https://github.com/TecharoHQ/anubis\">Anubis</a> from <a href=\"https://techaro.lol\">Techaro</a>. Made with ❤️ in 🇨🇦.</p><p>Mascot design by <a href=\"https://bsky.app/profile/celphase.bsky.social\">CELPHASE</a>.</p></center></footer></main></body></html>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
@ -113,53 +149,54 @@ func index() templ.Component {
}()
}
ctx = templ.InitializeContext(ctx)
templ_7745c5c3_Var5 := templ.GetChildren(ctx)
if templ_7745c5c3_Var5 == nil {
templ_7745c5c3_Var5 = templ.NopComponent
templ_7745c5c3_Var7 := templ.GetChildren(ctx)
if templ_7745c5c3_Var7 == nil {
templ_7745c5c3_Var7 = templ.NopComponent
}
ctx = templ.ClearChildren(ctx)
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 7, "<div class=\"centered-div\"><img id=\"image\" style=\"width:100%;max-width:256px;\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var6 string
templ_7745c5c3_Var6, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/pensive.webp?cacheBuster=" +
anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 84, Col: 18}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var6))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 8, "\"> <img style=\"display:none;\" style=\"width:100%;max-width:256px;\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var7 string
templ_7745c5c3_Var7, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/happy.webp?cacheBuster=" +
anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 90, Col: 18}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var7))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 9, "\"><p id=\"status\">Loading...</p><script async type=\"module\" src=\"")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 11, "<div class=\"centered-div\"><img id=\"image\" style=\"width:100%;max-width:256px;\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var8 string
templ_7745c5c3_Var8, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/js/main.mjs?cacheBuster=" + anubis.Version)
templ_7745c5c3_Var8, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/pensive.webp?cacheBuster=" +
anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 93, Col: 116}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 88, Col: 18}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var8))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 10, "\"></script><div id=\"progress\" role=\"progressbar\" aria-labelledby=\"status\"><div class=\"bar-inner\"></div></div><details><summary>Why am I seeing this?</summary><p>You are seeing this because the administrator of this website has set up <a href=\"https://github.com/TecharoHQ/anubis\">Anubis</a> to protect the server against the scourge of <a href=\"https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/\">AI companies aggressively scraping websites</a>. This can and does cause downtime for the websites, which makes their resources inaccessible for everyone.</p><p>Anubis is a compromise. Anubis uses a <a href=\"https://anubis.techaro.lol/docs/design/why-proof-of-work\">Proof-of-Work</a> scheme in the vein of <a href=\"https://en.wikipedia.org/wiki/Hashcash\">Hashcash</a>, a proposed proof-of-work scheme for reducing email spam. The idea is that at individual scales the additional load is ignorable, but at mass scraper levels it adds up and makes scraping much more expensive.</p><p>Ultimately, this is a hack whose real purpose is to give a \"good enough\" placeholder solution so that more time can be spent on fingerprinting and identifying headless browsers (EG: via how they do font rendering) so that the challenge proof of work page doesn't need to be presented to users that are much more likely to be legitimate.</p><p>Please note that Anubis requires the use of modern JavaScript features that plugins like <a href=\"https://jshelter.org/\">JShelter</a> will disable. Please disable JShelter or other such plugins for this domain.</p></details><noscript><p>Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have changed the social contract around how website hosting works. A no-JS solution is a work-in-progress.</p></noscript><div id=\"testarea\"></div></div>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 12, "\"> <img style=\"display:none;\" style=\"width:100%;max-width:256px;\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var9 string
templ_7745c5c3_Var9, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/happy.webp?cacheBuster=" +
anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 94, Col: 18}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var9))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 13, "\"><p id=\"status\">Loading...</p><script async type=\"module\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var10 string
templ_7745c5c3_Var10, templ_7745c5c3_Err = templ.JoinStringErrs(
"/.within.website/x/cmd/anubis/static/js/main.mjs?cacheBuster=" + anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 98, Col: 84}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var10))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 14, "\"></script><div id=\"progress\" role=\"progressbar\" aria-labelledby=\"status\"><div class=\"bar-inner\"></div></div><details><summary>Why am I seeing this?</summary><p>You are seeing this because the administrator of this website has set up <a href=\"https://github.com/TecharoHQ/anubis\">Anubis</a> to protect the server against the scourge of <a href=\"https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/\">AI companies aggressively scraping websites</a>. This can and does cause downtime for the websites, which makes their resources inaccessible for everyone.</p><p>Anubis is a compromise. Anubis uses a <a href=\"https://anubis.techaro.lol/docs/design/why-proof-of-work\">Proof-of-Work</a> scheme in the vein of <a href=\"https://en.wikipedia.org/wiki/Hashcash\">Hashcash</a>, a proposed proof-of-work scheme for reducing email spam. The idea is that at individual scales the additional load is ignorable, but at mass scraper levels it adds up and makes scraping much more expensive.</p><p>Ultimately, this is a hack whose real purpose is to give a \"good enough\" placeholder solution so that more time can be spent on fingerprinting and identifying headless browsers (EG: via how they do font rendering) so that the challenge proof of work page doesn't need to be presented to users that are much more likely to be legitimate.</p><p>Please note that Anubis requires the use of modern JavaScript features that plugins like <a href=\"https://jshelter.org/\">JShelter</a> will disable. Please disable JShelter or other such plugins for this domain.</p></details><noscript><p>Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have changed the social contract around how website hosting works. A no-JS solution is a work-in-progress.</p></noscript><div id=\"testarea\"></div></div>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
@ -183,38 +220,38 @@ func errorPage(message string) templ.Component {
}()
}
ctx = templ.InitializeContext(ctx)
templ_7745c5c3_Var9 := templ.GetChildren(ctx)
if templ_7745c5c3_Var9 == nil {
templ_7745c5c3_Var9 = templ.NopComponent
templ_7745c5c3_Var11 := templ.GetChildren(ctx)
if templ_7745c5c3_Var11 == nil {
templ_7745c5c3_Var11 = templ.NopComponent
}
ctx = templ.ClearChildren(ctx)
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 11, "<div class=\"centered-div\"><img id=\"image\" style=\"width:100%;max-width:256px;\" src=\"")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 15, "<div class=\"centered-div\"><img id=\"image\" alt=\"Sad Anubis\" style=\"width:100%;max-width:256px;\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var10 string
templ_7745c5c3_Var10, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/reject.webp?cacheBuster=" + anubis.Version)
var templ_7745c5c3_Var12 string
templ_7745c5c3_Var12, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/reject.webp?cacheBuster=" + anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 119, Col: 93}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 138, Col: 102}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var10))
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var12))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 12, "\"><p>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 16, "\"><p>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var11 string
templ_7745c5c3_Var11, templ_7745c5c3_Err = templ.JoinStringErrs(message)
var templ_7745c5c3_Var13 string
templ_7745c5c3_Var13, templ_7745c5c3_Err = templ.JoinStringErrs(message)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 121, Col: 14}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 140, Col: 16}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var11))
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var13))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 13, ".</p><button onClick=\"window.location.reload();\">Try again</button><p><a href=\"/\">Go home</a></p></div>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 17, ".</p><button onClick=\"window.location.reload();\">Try again</button><p><a href=\"/\">Go home</a></p></div>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
@ -238,39 +275,40 @@ func bench() templ.Component {
}()
}
ctx = templ.InitializeContext(ctx)
templ_7745c5c3_Var12 := templ.GetChildren(ctx)
if templ_7745c5c3_Var12 == nil {
templ_7745c5c3_Var12 = templ.NopComponent
templ_7745c5c3_Var14 := templ.GetChildren(ctx)
if templ_7745c5c3_Var14 == nil {
templ_7745c5c3_Var14 = templ.NopComponent
}
ctx = templ.ClearChildren(ctx)
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 14, "<div style=\"height:20rem;display:flex\"><table style=\"margin-top:1rem;display:grid;grid-template:auto 1fr/auto auto;gap:0 0.5rem\"><thead style=\"border-bottom:1px solid black;padding:0.25rem 0;display:grid;grid-template:1fr/subgrid;grid-column:1/-1\"><tr id=\"table-header\" style=\"display:contents\"><th style=\"width:4.5rem\">Time</th><th style=\"width:4rem\">Iters</th></tr><tr id=\"table-header-compare\" style=\"display:none\"><th style=\"width:4.5rem\">Time A</th><th style=\"width:4rem\">Iters A</th><th style=\"width:4.5rem\">Time B</th><th style=\"width:4rem\">Iters B</th></tr></thead> <tbody id=\"results\" style=\"padding-top:0.25rem;display:grid;grid-template-columns:subgrid;grid-auto-rows:min-content;grid-column:1/-1;row-gap:0.25rem;overflow-y:auto;font-variant-numeric:tabular-nums\"></tbody></table><div class=\"centered-div\"><img id=\"image\" style=\"width:100%;max-width:256px;\" src=\"")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 18, "<div style=\"height:20rem;display:flex\"><table style=\"margin-top:1rem;display:grid;grid-template:auto 1fr/auto auto;gap:0 0.5rem\"><thead style=\"border-bottom:1px solid black;padding:0.25rem 0;display:grid;grid-template:1fr/subgrid;grid-column:1/-1\"><tr id=\"table-header\" style=\"display:contents\"><th style=\"width:4.5rem\">Time</th><th style=\"width:4rem\">Iters</th></tr><tr id=\"table-header-compare\" style=\"display:none\"><th style=\"width:4.5rem\">Time A</th><th style=\"width:4rem\">Iters A</th><th style=\"width:4.5rem\">Time B</th><th style=\"width:4rem\">Iters B</th></tr></thead> <tbody id=\"results\" style=\"padding-top:0.25rem;display:grid;grid-template-columns:subgrid;grid-auto-rows:min-content;grid-column:1/-1;row-gap:0.25rem;overflow-y:auto;font-variant-numeric:tabular-nums\"></tbody></table><div class=\"centered-div\"><img id=\"image\" style=\"width:100%;max-width:256px;\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var13 string
templ_7745c5c3_Var13, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/pensive.webp?cacheBuster=" +
var templ_7745c5c3_Var15 string
templ_7745c5c3_Var15, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/img/pensive.webp?cacheBuster=" +
anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 149, Col: 19}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 169, Col: 22}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var13))
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var15))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 15, "\"><p id=\"status\" style=\"max-width:256px\">Loading...</p><script async type=\"module\" src=\"")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 19, "\"><p id=\"status\" style=\"max-width:256px\">Loading...</p><script async type=\"module\" src=\"")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
var templ_7745c5c3_Var14 string
templ_7745c5c3_Var14, templ_7745c5c3_Err = templ.JoinStringErrs("/.within.website/x/cmd/anubis/static/js/bench.mjs?cacheBuster=" + anubis.Version)
var templ_7745c5c3_Var16 string
templ_7745c5c3_Var16, templ_7745c5c3_Err = templ.JoinStringErrs(
"/.within.website/x/cmd/anubis/static/js/bench.mjs?cacheBuster=" + anubis.Version)
if templ_7745c5c3_Err != nil {
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 152, Col: 118}
return templ.Error{Err: templ_7745c5c3_Err, FileName: `index.templ`, Line: 173, Col: 89}
}
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var14))
_, templ_7745c5c3_Err = templ_7745c5c3_Buffer.WriteString(templ.EscapeString(templ_7745c5c3_Var16))
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 16, "\"></script><div id=\"sparkline\"></div><noscript><p>Running the benchmark tool requires JavaScript to be enabled.</p></noscript></div></div><form id=\"controls\" style=\"position:fixed;top:0.5rem;right:0.5rem\"><div style=\"display:flex;justify-content:end\"><label for=\"difficulty-input\" style=\"margin-right:0.5rem\">Difficulty:</label> <input id=\"difficulty-input\" type=\"number\" name=\"difficulty\" style=\"width:3rem\"></div><div style=\"margin-top:0.25rem;display:flex;justify-content:end\"><label for=\"algorithm-select\" style=\"margin-right:0.5rem\">Algorithm:</label> <select id=\"algorithm-select\" name=\"algorithm\"></select></div><div style=\"margin-top:0.25rem;display:flex;justify-content:end\"><label for=\"compare-select\" style=\"margin-right:0.5rem\">Compare:</label> <select id=\"compare-select\" name=\"compare\"><option value=\"NONE\">-</option></select></div></form>")
templ_7745c5c3_Err = templruntime.WriteString(templ_7745c5c3_Buffer, 20, "\"></script><div id=\"sparkline\"></div><noscript><p>Running the benchmark tool requires JavaScript to be enabled.</p></noscript></div></div><form id=\"controls\" style=\"position:fixed;top:0.5rem;right:0.5rem\"><div style=\"display:flex;justify-content:end\"><label for=\"difficulty-input\" style=\"margin-right:0.5rem\">Difficulty:</label> <input id=\"difficulty-input\" type=\"number\" name=\"difficulty\" style=\"width:3rem\"></div><div style=\"margin-top:0.25rem;display:flex;justify-content:end\"><label for=\"algorithm-select\" style=\"margin-right:0.5rem\">Algorithm:</label> <select id=\"algorithm-select\" name=\"algorithm\"></select></div><div style=\"margin-top:0.25rem;display:flex;justify-content:end\"><label for=\"compare-select\" style=\"margin-right:0.5rem\">Compare:</label> <select id=\"compare-select\" name=\"compare\"><option value=\"NONE\">-</option></select></div></form>")
if templ_7745c5c3_Err != nil {
return templ_7745c5c3_Err
}