A Java framework for developing statistical natural language processing (NLP) components on Apache UIMA.
ClearTK is a Java framework for developing statistical natural language processing (NLP) components, built on top of Apache UIMA. It provides modular tools for machine learning in NLP, enabling the creation of structured NLP pipelines and annotations. It is developed by the Center for Computational Language and Education Research (CLEAR) at the University of Colorado at Boulder.
Java developers and researchers building statistical NLP applications within the UIMA ecosystem, particularly those needing modular, machine-learning-integrated components for tasks like syntax parsing or classification.
Developers choose ClearTK for its seamless integration with Apache UIMA for structured NLP pipelines, its modular design allowing flexible dependency management via Maven, and its wrappers for machine learning libraries like SVMlight and Mallet for model training and classification.
Machine learning components for Apache UIMA
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Seamlessly builds on Apache UIMA for structured NLP pipelines and annotations, enabling modular and scalable workflows as outlined in the README.
Organized into sub-projects like syntax parsing and ML wrappers, allowing flexible dependency management via Maven for targeted use.
Includes wrappers for classifiers like SVMlight and Mallet, facilitating model training and classification within the UIMA framework, though with licensing caveats.
Primarily BSD-licensed with explicit notes on GPL/LGPL dependencies for specific sub-projects, aiding compliance decisions as detailed in the Dependencies section.
Relies on older machine learning libraries like SVMlight and Mallet, lacking integration with modern deep learning frameworks such as TensorFlow or PyTorch.
Some sub-projects have GPL or LGPL dependencies, and SVMlight requires separate commercial licensing, complicating use in commercial settings as warned in the README.
Built on Apache UIMA, necessitating familiarity with UIMA's concepts and architecture, which adds to the initial learning curve.
The project's last major update appears to be around 2014 based on the copyright, indicating limited recent development and community support.