A curated collection of papers, datasets, tools, and resources for applying machine learning to small-molecule drug discovery.
Awesome Small Molecule Machine Learning is a curated, community-maintained list of resources focused on applying machine learning techniques to the discovery and development of small-molecule drugs. It aggregates academic papers, datasets, software frameworks, and expert blogs to provide a comprehensive starting point for researchers in computational chemistry and AI-driven drug design. The project aims to organize the rapidly growing body of knowledge in this interdisciplinary field.
Computational chemists, medicinal chemists, bioinformaticians, and machine learning researchers working on or entering the field of AI for drug discovery. It is particularly valuable for graduate students, postdocs, and industry professionals seeking a structured overview of state-of-the-art tools and literature.
Developers and researchers choose this resource because it saves significant time in literature review and tool discovery by providing a vetted, well-organized, and continuously updated aggregation. Its community-driven nature ensures it reflects practical, impactful work and includes direct links to code and data, unlike static academic lists.
A curated list of resources for machine learning for small-molecule drug discovery
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Aggregates high-quality papers, datasets like ChEMBL and MoleculeNet, and tools such as Chemprop and DeepChem in one place, saving researchers extensive literature review time.
Organizes content into clear categories like generative algorithms and ADME prediction, making it easy to navigate specific subfields without sifting through unrelated materials.
Encourages contributions to keep the list current with fast-evolving research, as seen in the inclusion of recent 2023 papers on MS/MS prediction.
Provides direct links to GitHub repositories (e.g., for EquiBind or GROVER) and datasets, facilitating immediate implementation and experimentation.
As a community-driven list, there's no formal vetting process, so some entries may be outdated, low-quality, or lack maintenance, requiring users to verify independently.
Focuses on listing resources without tutorials or learning paths, which can overwhelm newcomers who need foundational knowledge in both chemistry and ML.
Exclusively targets small molecules, omitting resources for peptides, proteins, or other drug modalities, limiting its utility for broader drug discovery efforts.