A Python library and CLI tool that converts HTML into clean, readable Markdown-formatted plain text.
html2text is a Python library and command-line utility that converts HTML into clean, readable plain text formatted as Markdown. It solves the problem of extracting meaningful text from web pages or HTML snippets while preserving formatting like bold, italics, and links in a Markdown-compatible way. This makes it useful for developers, writers, and anyone needing to transform HTML content into a more portable text format.
Python developers, technical writers, and content creators who need to programmatically convert HTML to Markdown for documentation, data processing, or content migration tasks.
Developers choose html2text for its simplicity, configurability, and dual CLI/library interface, offering a straightforward, dependency-light alternative to more complex HTML parsing tools.
Convert HTML to Markdown-formatted text.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Directly converts HTML tags like <strong> to **bold** and <a> to [link](url), as shown in the Python example with Zed's dead baby output.
Offers command-line flags such as --ignore-links and --escape-all, and Python settings like ignore_links for customizable formatting, detailed in the usage section.
Can be used as a CLI tool for quick conversions or imported as a Python library for programmatic integration, demonstrated in the README with both modes.
Available on PyPI with a simple pip install, has minimal dependencies, and focuses on simplicity without heavy external libraries.
The --escape-all option admits corner case issues, indicating it can struggle with complex or malformed HTML, requiring trade-offs in output readability.
Focuses on core Markdown elements; lacks support for advanced features like tables, nested lists, or modern Markdown extensions, which may limit comprehensive conversions.
Documentation is linked externally and might be incomplete, as the README only provides basic examples, leaving users to explore usage details on their own.