Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Empirical Software Engineering
  3. PyDriller

PyDriller

Apache-2.0Python2.9

A Python framework for mining and analyzing Git repositories, extracting commits, developers, files, diffs, and source code.

Visit WebsiteGitHubGitHub
955 stars155 forks0 contributors

What is PyDriller?

PyDriller is a Python framework for mining and analyzing Git repositories. It enables developers and researchers to programmatically extract detailed information about commits, developers, modified files, diffs, and source code from version control systems. It simplifies the process of studying software evolution, contributor activity, and code changes over time.

Target Audience

Software engineers, researchers, and data scientists who need to analyze Git repositories for insights into development processes, code quality, or team dynamics. It's particularly useful for those conducting mining software repositories (MSR) research or building tools around version control data.

Value Proposition

PyDriller offers a high-level, Pythonic API that abstracts the complexities of Git, making repository analysis accessible without deep Git internals knowledge. It provides comprehensive data extraction capabilities out-of-the-box, is well-documented, and is designed for both simplicity and performance in mining tasks.

Overview

Python Framework to analyse Git repositories

Use Cases

Best For

  • Analyzing commit histories to study software evolution patterns
  • Extracting developer contribution metrics from Git repositories
  • Research in mining software repositories (MSR) for academic studies
  • Building custom tools for code change analysis or audit trails
  • Tracking file modifications and diffs across project timelines
  • Automating repository data extraction for reporting or dashboards

Not Ideal For

  • Real-time Git monitoring or CI/CD pipelines requiring instant feedback
  • Simple Git tasks like basic cloning or viewing file history
  • Projects analyzing non-Git version control systems like SVN or Mercurial
  • Environments where minimal dependencies are critical due to PyDriller's Python package requirements

Pros & Cons

Pros

Easy Git Data Extraction

Provides a Pythonic API to traverse commits and access file changes with minimal code, as shown in the quick usage example iterating over Repository commits.

Comprehensive Documentation

The README links to detailed documentation on Read the Docs and includes a YouTube tutorial, ensuring users have multiple learning resources.

Active Maintenance

Build status badges show continuous integration, and monthly download stats indicate ongoing development and community trust.

Academic Research Ready

Cited in an ACM paper, making it a credible tool for mining software repositories studies, as highlighted in the citation section.

Cons

Testing and Contribution Overhead

Setting up tests requires unzipping a repository archive and managing multiple requirement files, which can be cumbersome for new contributors.

Git-Only Limitation

Designed exclusively for Git repositories, so it cannot analyze other version control systems without conversion or additional tools.

Potential Performance Issues

Mining large repositories with extensive commit histories may lead to slow execution or high memory usage, common in data-intensive frameworks.

Frequently Asked Questions

Quick Stats

Stars955
Forks155
Contributors0
Open Issues13
Last commit4 months ago
CreatedSince 2018

Tags

#research-tool#version-control#python3#git#python#commit-analysis#git-analysis#mining-software-repositories#code-analysis#python-framework#software-engineering

Built With

P
Python

Links & Resources

Website

Included in

Empirical Software Engineering475
Auto-fetched 1 day ago

Related Projects

RefactoringMinerRefactoringMiner

RefactoringMiner is a Java library and API designed to automatically identify refactoring operations within code changes across multiple programming languages. It analyzes commits, pull requests, and commit ranges to detect over 100 refactoring types, from simple renames to complex structural changes. The tool also generates detailed Abstract Syntax Tree (AST) diffs, providing a deeper understanding of code evolution beyond traditional line-based diffs. ## Key Features - **Refactoring Detection** — Identifies 40+ classic refactorings from Fowler's catalog, 52 API-level changes, 8 migration patterns, and 5 test-specific refactorings. - **Multi-Language Support** — Works with Java, Python, and Kotlin codebases, with TypeScript support planned. - **AST Diff Generation** — Produces syntax-aware diffs at commit, pull request, and commit range levels. - **Visualization Tools** — Includes a Chrome extension for refactoring-aware commit reviews and interactive diff visualization in browsers. - **Advanced Diff Features** — Supports refactoring-aware tooltips, single-page views, embedded GitHub comments, and handling of code moved between files. ## Philosophy RefactoringMiner aims to make code evolution transparent and understandable by precisely tracking structural changes, helping developers and researchers analyze refactoring practices and improve code review processes.

Stars487
Forks158
Last commit1 day ago
PercevalPerceval

Send Sir Perceval on a quest to retrieve and gather data from software repositories.

Stars319
Forks185
Last commit2 days ago
astminerastminer

A library for mining of path-based representations of code (and more)

Stars300
Forks78
Last commit6 months ago
DesigniteJavaDesigniteJava

Detects smells and computes metrics of Java code

Stars192
Forks68
Last commit1 year ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub