A Go package and CLI tool that saves web pages as single HTML files with all assets embedded.
Obelisk is a Go package and command-line tool that archives web pages by downloading all their assets and embedding them into a single HTML file. It solves the problem of preserving web content in a portable, offline-friendly format that is easy to share and store. The tool is designed for speed, using concurrent downloads, and supports archiving pages that require authentication via cookies.
Developers and archivists who need to save web pages for offline use, documentation, or content preservation, especially those working in Go ecosystems or requiring programmatic access to web archival.
Obelisk offers a fast, self-contained archival solution with concurrent downloads and cookie support, producing cleaner HTML output than some alternatives by inlining scripts and styles instead of relying solely on base64 encoding.
Go package and CLI tool for saving web page as single HTML file
Downloads assets concurrently with configurable limits, significantly speeding up archival compared to sequential methods, as highlighted in the README for complex pages.
Embeds all external resources like CSS, images, and JavaScript into a single HTML5 file using base64 data URLs or inline tags, ensuring archives are portable and offline-viewable without dependencies.
Accepts Netscape cookie files via the --load-cookies flag, enabling archiving of pages behind logins or paywalls, a key feature for accessing restricted content.
Disables external requests via Content Security Policy by default, enhancing security and ensuring archives are truly self-contained without needing manual configuration.
By default, JavaScript is disabled and resources are embedded statically, which may break interactive elements or fail to capture content that loads dynamically via client-side rendering.
Embedding large images or videos as base64 data URLs can lead to massive HTML files, potentially making storage and sharing impractical for media-heavy sites despite inlining for scripts and styles.
Primarily a Go package, so integration into non-Go environments relies on the CLI tool, which may lack the flexibility or advanced features of native solutions in other programming languages.
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
⬛️ CLI tool and library for saving complete web pages as a single HTML file
💾 dn - offline full-text search and archiving for your Chromium-based browser.
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.