A robust HTML to Markdown converter with plugin support, usable as a Go library, CLI tool, or via hosted API.
html-to-markdown is a versatile converter that transforms HTML content into clean Markdown format. It solves the problem of converting web content, documents, or HTML snippets into a readable and editable Markdown representation, supporting everything from simple tags to entire websites. The tool is designed to be robust, handling complex formatting and offering extensive customization through plugins.
Developers and content creators who need to convert HTML to Markdown for documentation, web scraping, content migration, or Markdown-based workflows. It's especially useful for Go developers integrating conversion into applications and CLI users processing HTML files.
Developers choose html-to-markdown for its reliability, extensibility via plugins, and multiple interfaces (library, CLI, API). It stands out by supporting entire websites, offering fine-grained control over output, and adhering to CommonMark standards, making it a comprehensive solution for HTML-to-Markdown conversion.
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Handles complex HTML elements like nested lists, blockquotes, and code blocks with smart escaping, ensuring clean, readable Markdown output as shown in the README's examples.
Offers a plugin system for extending functionality, such as adding strikethrough and table support, with the ability to create custom plugins via WRITING_PLUGINS.md.
Provides a Go library, CLI tool, online demo, and REST API, catering to diverse workflows from development integration to quick testing without installation.
Converts relative links to absolute URLs using domain configuration, useful for processing web content and maintaining link integrity in scraped Markdown.
Several plugins from v1, like GitHubFlavored and task lists, are still planned for v2, limiting out-of-the-box features and requiring manual workarounds.
Primarily a Go library, so non-Go projects must rely on the CLI or API, which the README admits don't yet support all customization options available in the library.
Does not sanitize untrusted HTML content by default, requiring additional steps with external libraries like bluemonday to prevent security vulnerabilities such as XSS attacks.