A Python framework for mining and analyzing Git repositories, extracting commits, developers, files, diffs, and source code.
PyDriller is a Python framework for mining and analyzing Git repositories. It enables developers and researchers to programmatically extract detailed information about commits, developers, modified files, diffs, and source code from version control systems. It simplifies the process of studying software evolution, contributor activity, and code changes over time.
Software engineers, researchers, and data scientists who need to analyze Git repositories for insights into development processes, code quality, or team dynamics. It's particularly useful for those conducting mining software repositories (MSR) research or building tools around version control data.
PyDriller offers a high-level, Pythonic API that abstracts the complexities of Git, making repository analysis accessible without deep Git internals knowledge. It provides comprehensive data extraction capabilities out-of-the-box, is well-documented, and is designed for both simplicity and performance in mining tasks.
Python Framework to analyse Git repositories
Provides a Pythonic API to traverse commits and access file changes with minimal code, as shown in the quick usage example iterating over Repository commits.
The README links to detailed documentation on Read the Docs and includes a YouTube tutorial, ensuring users have multiple learning resources.
Build status badges show continuous integration, and monthly download stats indicate ongoing development and community trust.
Cited in an ACM paper, making it a credible tool for mining software repositories studies, as highlighted in the citation section.
Setting up tests requires unzipping a repository archive and managing multiple requirement files, which can be cumbersome for new contributors.
Designed exclusively for Git repositories, so it cannot analyze other version control systems without conversion or additional tools.
Mining large repositories with extensive commit histories may lead to slow execution or high memory usage, common in data-intensive frameworks.
RefactoringMiner is a Java library and API designed to automatically identify refactoring operations within code changes across multiple programming languages. It analyzes commits, pull requests, and commit ranges to detect over 100 refactoring types, from simple renames to complex structural changes. The tool also generates detailed Abstract Syntax Tree (AST) diffs, providing a deeper understanding of code evolution beyond traditional line-based diffs. ## Key Features - **Refactoring Detection** — Identifies 40+ classic refactorings from Fowler's catalog, 52 API-level changes, 8 migration patterns, and 5 test-specific refactorings. - **Multi-Language Support** — Works with Java, Python, and Kotlin codebases, with TypeScript support planned. - **AST Diff Generation** — Produces syntax-aware diffs at commit, pull request, and commit range levels. - **Visualization Tools** — Includes a Chrome extension for refactoring-aware commit reviews and interactive diff visualization in browsers. - **Advanced Diff Features** — Supports refactoring-aware tooltips, single-page views, embedded GitHub comments, and handling of code moved between files. ## Philosophy RefactoringMiner aims to make code evolution transparent and understandable by precisely tracking structural changes, helping developers and researchers analyze refactoring practices and improve code review processes.
Send Sir Perceval on a quest to retrieve and gather data from software repositories.
A library for mining of path-based representations of code (and more)
Detects smells and computes metrics of Java code
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.