This is an automated archive made by the Lemmit Bot.
The original was posted on /r/selfhosted by /u/biolds on 2025-06-11 12:03:19+00:00.
Hey everyone! We're excited to announce the release of Sosse 1.13, the newest version of our open-source search engine, web archiving, and crawling platform.
For those unfamiliar, Sosse (Selenium Open Source Search Engine) lets you:
π Search the full content of web pages, including JavaScript-rendered content
π΅οΈ Crawl sites on a schedule and detect content changes
π₯ Download files in bulk from web pages
π Archive web pages (with assets) for full offline access
π Monitor websites and generate Atom feeds for updates
π Authenticate to access protected or private content
π Whatβs new in 1.13?
This release includes powerful new features and improvements to make Sosse more useful and easier to integrate:
- π·οΈ Support for Document Tagging β Categorize and filter your indexed data
- π‘ Webhook Triggers During Crawling β Integrate crawling into workflows (AI, automation, notifications, and more)
- π€ CSV Export β Export crawl results in a standard format
- π³ Simplified Setup with Docker Compose β Get started faster with pre-configured services
- π οΈ Metadata Extraction with Scripting β Use JavaScript or webhooks to scrape and index custom metadata
Sosse 1.13 is more powerful, more flexible, and easier to integrate into your data pipelines and research workflows.
- π Website: https://sosse.io/
- π Docs: https://sosse.readthedocs.io/
- π GitHub: https://github.com/biolds/sosse
- πΌοΈ Screenshots: https://sosse.readthedocs.io/en/stable/screenshots.html
- π Guides with Real-World Use Cases: https://sosse.readthedocs.io/en/stable/guides.html
- π Full Changelog: https://sosse.readthedocs.io/en/stable/CHANGELOG.html
π Thank You!
Huge thanks to everyone who provided feedback and suggestions after the 1.12 release β your input directly shaped the improvements in this version.
Weβre looking forward to hearing what you think about 1.13! π