Perplexity has some proper documentation available for their crawlers,
with published IP addresses: https://docs.perplexity.ai/guides/bots.
Signed-off-by: Timon de Groot <timon.degroot@team.blue>
This makes it clear that when generating a kubernetes secret to pull the bot stopper image that:
- no email is required
- a user is required but the actual value of the username is not checked
- the GH token needs to be pasted in
Signed-off-by: Thomas Arrow <tarrow@users.noreply.github.com>
* fix(web): include base prefix in generated URLs
Forgot to add the base prefix to these URLs. Committed a fix for this
and added a test to ensure this does not repeat. Oops!
Closes: #1402
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: update CHANGELOG
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
The Accept-Language header parsing was not correctly handling quality
factors. When a browser sends "en-GB,de-DE;q=0.5", the expected behavior
is to prefer English (q=1.0 by default) over German (q=0.5).
The fix uses golang.org/x/text/language.ParseAcceptLanguage to properly
parse and sort language preferences by quality factor. It also adds base
language fallbacks (e.g., "en" for "en-GB") to ensure regional variants
match their parent languages when no exact match exists.
Fixes#1022
Signed-off-by: majiayu000 <1835304752@qq.com>
* test(nginx): fix tests to work in GHA
Closes: #1371
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(test): does this work lol
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(test): does this other thing work lol
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(test): pki folder location
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Jason Cameron <git@jasoncameron.dev>
* docs: split nginx configuration files to their own directory
Signed-off-by: Xe Iaso <me@xeiaso.net>
* test: add nginx config smoke test based on the config in the docs
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>
TIL docker clients don't include the Accept header all the time. I would
have thought they did that. Oops.
Closes: #1346
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat: first implementation of honeypot logic
This is a bit of an experiment, stick with me.
The core idea here is that badly written crawlers are that: badly
written. They look for anything that contains `<a href="whatever" />`
tags and will blindly use those values to recurse. This takes advantage
of that by hiding a link in a `<script>` tag like this:
```html
<script type="ignore"><a href="/bots-only">Don't click</a></script>
```
Browsers will ignore it because they have no handler for the "ignore"
script type.
This current draft is very unoptimized (it takes like 7 seconds to
generate a page on my tower), however switching spintax libraries will
make this much faster.
The hope is to make this pluggable with WebAssembly such that we force
administrators to choose a storage method. First we crawl before we
walk.
The AI involvement in this commit is limited to the spintax in
affirmations.txt, spintext.txt, and titles.txt. This generates a bunch
of "pseudoprofound bullshit" like the following:
> This Restoration to Balance & Alignment
>
> There's a moment when creators are being called to realize that the work
> can't be reduced to results, but about energy. We don't innovate products
> by pushing harder, we do it by holding the vision. Because momentum can't
> be forced, it unfolds over time when culture are moving in the same
> direction. We're being invited into a paradigm shift in how we think
> about innovation. [...]
This is intended to "look" like normal article text. As this is a first
draft, this sucks and will be improved upon.
Assisted-by: GLM 4.6, ChatGPT, GPT-OSS 120b
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(honeypot/naive): optimize hilariously
Signed-off-by: Xe Iaso <me@xeiaso.net>
* feat(honeypot/naive): attempt to automatically filter out based on crawling
Signed-off-by: Xe Iaso <me@xeiaso.net>
* fix(lib): use mazeGen instead of bsGen
Signed-off-by: Xe Iaso <me@xeiaso.net>
* docs: add honeypot docs
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore(test): go mod tidy
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: fix spelling metadata
Signed-off-by: Xe Iaso <me@xeiaso.net>
* chore: spelling
Signed-off-by: Xe Iaso <me@xeiaso.net>
---------
Signed-off-by: Xe Iaso <me@xeiaso.net>