Apache Tika is a content analysis toolkit that detects and extracts metadata and structured text content from various documents using existing parser libraries.. There is currently 1 open-source alternative to Apache Tika, with a combined total of 451 GitHub stars. The most common language among these projects is C#.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.