nuke/data/crawlers
Xe Iaso 7c0996448a
chore(default-config): allowlist common crawl (#753)
This may seem strange, but allowlisting common crawl means that scrapers
have less incentive to scrape because they can just grab the data from
common crawl instead of scraping it again.
2025-07-04 00:10:45 +00:00
..
_allow-good.yaml chore(default-config): allowlist common crawl (#753) 2025-07-04 00:10:45 +00:00
ai-search.yaml Split up AI filtering files (#592) 2025-06-01 20:21:18 +00:00
ai-training.yaml Split up AI filtering files (#592) 2025-06-01 20:21:18 +00:00
applebot.yaml Add Applebot definition (#589) 2025-05-31 10:18:32 -04:00
bingbot.yaml feat: enable loading config fragments (#321) 2025-04-23 07:01:28 -04:00
commoncrawl.yaml chore(default-config): allowlist common crawl (#753) 2025-07-04 00:10:45 +00:00
duckduckbot.yaml feat: enable loading config fragments (#321) 2025-04-23 07:01:28 -04:00
googlebot.yaml feat: enable loading config fragments (#321) 2025-04-23 07:01:28 -04:00
internet-archive.yaml feat: enable loading config fragments (#321) 2025-04-23 07:01:28 -04:00
kagibot.yaml feat: enable loading config fragments (#321) 2025-04-23 07:01:28 -04:00
marginalia.yaml feat: enable loading config fragments (#321) 2025-04-23 07:01:28 -04:00
mojeekbot.yaml Fix: mojeekbot regex (#351) 2025-04-24 14:24:41 +00:00
openai-gptbot.yaml Opt-in policies for OpenAI and MistralAI bots (#590) 2025-05-31 16:48:57 -04:00
openai-searchbot.yaml Opt-in policies for OpenAI and MistralAI bots (#590) 2025-05-31 16:48:57 -04:00
qwantbot.yaml feat: enable loading config fragments (#321) 2025-04-23 07:01:28 -04:00