A .NET framework for extracting and exporting text and data from a wide variety of document formats.
Toxy is a .NET framework for extracting text and structured data from a wide range of document formats, including DOCX, XLSX, PDF, EPUB, and HTML. It solves the problem of platform-dependent text extraction by providing a unified, cross-platform solution that automatically detects file formats and returns content in consistent data structures. The framework simplifies document processing by abstracting the complexities of parsing different file types.
.NET developers who need to programmatically extract text or data from various document formats in cross-platform applications, particularly those moving away from Windows-specific solutions like IFilter.
Developers choose Toxy for its ease of use, automatic format detection, and platform independence, offering a .NET-native alternative to Java-based tools like Apache Tika with support for modern .NET Standard versions.
.net text extraction & export framework
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports a wide range of popular formats including DOCX, XLSX, PDF, EPUB, and HTML, as listed in the README, reducing the need for multiple libraries.
Intelligently identifies file formats without requiring manual extension specification, simplifying the extraction process for developers.
Returns extracted content in standardized objects like ToxyDocument and ToxySpreadsheet, ensuring consistency and ease of use across different file types.
Built on .NET Standard 2.0 and 2.1, enabling deployment on both Windows and Linux, addressing the limitations of Windows-specific solutions like IFilter.
No mention of support for OCR, encrypted files, or specialized parsing, which could be a significant gap for complex document extraction scenarios.
Tied exclusively to the .NET framework, making it unsuitable for projects using other programming languages or requiring language-agnostic solutions.
As a unified framework abstracting multiple formats, it might introduce performance overhead compared to optimized, format-specific libraries, especially with large or complex documents.
Toxy is an open-source alternative to the following products: