A free, state-of-the-art library and toolkit for named entity extraction and binary relation detection from text.
MITIE is an open-source library and toolkit for information extraction, specifically focused on named entity recognition and binary relation detection. It solves the problem of extracting structured information like people, organizations, and their relationships from unstructured text data, using state-of-the-art machine learning models.
Developers, researchers, and data scientists working on natural language processing projects who need robust, customizable tools for entity and relation extraction without commercial licensing restrictions.
Developers choose MITIE because it offers free, commercially usable, state-of-the-art performance, supports multiple languages out of the box, and provides extensive APIs and tools for training custom models tailored to specific domains.
MITIE: library and tools for information extraction
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Built on dlib with Structural SVMs and distributional word embeddings, offering high accuracy in named entity and relation extraction as referenced in its evaluation wiki.
Includes pre-trained models for English, Spanish, and German trained on diverse resources like CoNLL and Wikipedia, reducing initial setup for these languages.
Provides native C++ core with APIs for Python, R, Java, C, MATLAB, and community bindings for OCaml, .NET, PHP, and Ruby, ensuring broad integration options.
Released under the Boost Software License, allowing unrestricted use in both open-source and commercial projects without licensing fees or restrictions.
Requires downloading separate model files, compiling shared libraries with BLAS dependencies, and using CMake or make, as detailed in the setup sections, which is error-prone and time-consuming.
Only offers three languages (English, Spanish, German) out of the box; other languages necessitate custom training, which is resource-intensive and minimally documented.
The README emphasizes legacy Python 2.7 binaries and has outdated release info (v0.4), indicating limited recent updates and potential compatibility issues with modern tools.