A curated list of Python software for data science, covering machine learning, deep learning, visualization, and data manipulation.
Awesome Python Data Science is a curated list of Python libraries, frameworks, and tools for data science and machine learning. It organizes hundreds of resources into categories like machine learning, deep learning, data manipulation, and visualization to help practitioners discover the right tools efficiently. The project solves the problem of information overload by providing a vetted, structured directory instead of an unorganized collection.
Data scientists, machine learning engineers, researchers, and Python developers looking for reliable libraries to incorporate into their workflows. It's especially useful for those new to the ecosystem or exploring new subfields.
Developers choose this list because it saves time searching through PyPI or GitHub by offering a pre-filtered, categorized collection maintained by the community. Its focus on quality and relevance reduces noise and highlights proven tools.
Probably the best curated list of data science software in Python.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Organizes libraries into logical sections like Machine Learning, Deep Learning, and Visualization, as shown in the detailed table of contents with over 30 categories, making it easy to navigate specific subfields.
Prioritizes well-maintained and popular libraries, evidenced by icons for frameworks like scikit-learn and PyTorch next to entries, ensuring users discover practical and reliable tools.
Includes tools for major frameworks such as PyTorch, TensorFlow, JAX, and scikit-learn, with dedicated subsections that highlight compatibility, as seen in the deep learning and machine learning sections.
Regularly updated to reflect the evolving ecosystem, supported by a contribution guide in the README and a large number of entries spanning niche areas like quantum computing and synthetic data.
Listings are brief with only names, links, and occasional icons, lacking in-depth explanations, usage examples, performance benchmarks, or guidance on when to choose one tool over another.
Does not indicate library versions, update frequencies, or compatibility information, which can lead to issues when integrating tools in fast-moving Python environments where APIs change rapidly.
Accuracy depends on volunteer contributions, so entries may become outdated or incomplete without consistent maintenance, and there's no formal verification process for new additions.
The sheer volume of tools—hundreds across many categories—without prioritization or ratings can paralyze users, especially novices, who need help selecting the best option for their specific use case.