A Java library for converting IUPAC chemical names to molecular structures (SMILES, CML, InChI) with high accuracy.
OPSIN is an open-source Java library that converts systematic IUPAC chemical names into molecular structures. It solves the problem of interpreting complex chemical nomenclature programmatically, enabling automated extraction of structural information from textual data for cheminformatics and database applications.
Cheminformaticians, computational chemists, and developers working on chemical database systems, text-mining tools, or applications requiring automated chemical name interpretation.
Developers choose OPSIN for its high accuracy, extensive support of IUPAC nomenclature, and open-source availability, providing a reliable alternative to commercial chemical name parsers.
Open Parser for Systematic IUPAC Nomenclature. Chemical name to structure conversion
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves high recall and precision for systematic IUPAC nomenclature, as validated in peer-reviewed research and maintained with extensive test coverage.
Generates SMILES, CML, and InChI/InChIKey representations, providing flexibility for diverse cheminformatics pipelines and tools.
Supports a wide range of organic chemical types including stereochemistry, polymers, and isotopic labelling, detailed in the README's comprehensive list.
Offers advanced settings like radical handling and detailed failure analysis, allowing customization for specific use cases without modifying core code.
Explicitly unsupported for most alkaloids, terpenoids, and less common stereochemical terms, limiting utility in natural product and specialized chemistry domains.
Requires Java 8 or higher and Maven for integration, adding complexity for projects in other languages and potentially increasing deployment overhead.
Cannot generate (Std)InChI for polymers or radicals when using the wildcardRadicals option, as noted in the README, restricting output consistency in some cases.