* feat: first implementation of honeypot logic This is a bit of an experiment, stick with me. The core idea here is that badly written crawlers are that: badly written. They look for anything that contains `<a href="whatever" />` tags and will blindly use those values to recurse. This takes advantage of that by hiding a link in a `<script>` tag like this: ```html <script type="ignore"><a href="/bots-only">Don't click</a></script> ``` Browsers will ignore it because they have no handler for the "ignore" script type. This current draft is very unoptimized (it takes like 7 seconds to generate a page on my tower), however switching spintax libraries will make this much faster. The hope is to make this pluggable with WebAssembly such that we force administrators to choose a storage method. First we crawl before we walk. The AI involvement in this commit is limited to the spintax in affirmations.txt, spintext.txt, and titles.txt. This generates a bunch of "pseudoprofound bullshit" like the following: > This Restoration to Balance & Alignment > > There's a moment when creators are being called to realize that the work > can't be reduced to results, but about energy. We don't innovate products > by pushing harder, we do it by holding the vision. Because momentum can't > be forced, it unfolds over time when culture are moving in the same > direction. We're being invited into a paradigm shift in how we think > about innovation. [...] This is intended to "look" like normal article text. As this is a first draft, this sucks and will be improved upon. Assisted-by: GLM 4.6, ChatGPT, GPT-OSS 120b Signed-off-by: Xe Iaso <me@xeiaso.net> * fix(honeypot/naive): optimize hilariously Signed-off-by: Xe Iaso <me@xeiaso.net> * feat(honeypot/naive): attempt to automatically filter out based on crawling Signed-off-by: Xe Iaso <me@xeiaso.net> * fix(lib): use mazeGen instead of bsGen Signed-off-by: Xe Iaso <me@xeiaso.net> * docs: add honeypot docs Signed-off-by: Xe Iaso <me@xeiaso.net> * chore(test): go mod tidy Signed-off-by: Xe Iaso <me@xeiaso.net> * chore: fix spelling metadata Signed-off-by: Xe Iaso <me@xeiaso.net> * chore: spelling Signed-off-by: Xe Iaso <me@xeiaso.net> --------- Signed-off-by: Xe Iaso <me@xeiaso.net> |
||
|---|---|---|
| .. | ||
| advice.md | ||
| allow.txt | ||
| candidate.patterns | ||
| excludes.txt | ||
| expect.txt | ||
| line_forbidden.patterns | ||
| patterns.txt | ||
| README.md | ||
| reject.txt | ||
check-spelling/check-spelling configuration
| File | Purpose | Format | Info |
|---|---|---|---|
| dictionary.txt | Replacement dictionary (creating this file will override the default dictionary) | one word per line | dictionary |
| allow.txt | Add words to the dictionary | one word per line (only letters and 's allowed) |
allow |
| reject.txt | Remove words from the dictionary (after allow) | grep pattern matching whole dictionary words | reject |
| excludes.txt | Files to ignore entirely | perl regular expression | excludes |
| only.txt | Only check matching files (applied after excludes) | perl regular expression | only |
| patterns.txt | Patterns to ignore from checked lines | perl regular expression (order matters, first match wins) | patterns |
| candidate.patterns | Patterns that might be worth adding to patterns.txt | perl regular expression with optional comment block introductions (all matches will be suggested) | candidates |
| line_forbidden.patterns | Patterns to flag in checked lines | perl regular expression (order matters, first match wins) | patterns |
| expect.txt | Expected words that aren't in the dictionary | one word per line (sorted, alphabetically) | expect |
| advice.md | Supplement for GitHub comment when unrecognized words are found | GitHub Markdown | advice |
Note: you can replace any of these files with a directory by the same name (minus the suffix) and then include multiple files inside that directory (with that suffix) to merge multiple files together.