A GitHub scanning tool that identifies hardcoded credentials and filters false positives using machine learning models.
Credential Digger is an open-source tool that scans GitHub and GitLab repositories to identify hardcoded credentials such as passwords, API keys, and tokens. It uses machine learning models to filter out false positives, significantly reducing the manual effort required to review potential security leaks. The tool supports various scanning targets including pull requests, wiki pages, and local files.
Developers, security engineers, and DevOps teams who need to proactively detect and prevent secret leaks in their codebases, especially those managing open-source or private repositories.
It stands out by combining regex-based scanning with machine learning to minimize false positives, offers multiple deployment options (CLI, Docker, library), and integrates into developer workflows via VS Code extensions and pre-commit hooks.
A Github scanning tool that identifies hardcoded credentials while filtering the false positive data through machine learning models :lock:
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses models like PathModel and PasswordModel to significantly reduce false positives, directly addressing the high false-positive rates common in secret scanning tools.
Offers multiple integration points including a Python library, CLI, Docker container with UI, VS Code extension, and pre-commit hooks, as documented in the README.
Supports scanning of public/private GitHub and GitLab repos, pull requests, wiki pages, local files, and folders, covering a wide range of source types.
Groups similar discoveries to streamline manual assessment and reduce duplicate reviews, enhancing efficiency in post-scan analysis.
Native installation only supports Linux and MacOS, forcing Windows users to rely on Docker, which adds overhead and complexity.
The Docker container requires at least 8 GB of free RAM, as noted in the README, making it unsuitable for lightweight or resource-constrained environments.
Requires initial setup of rules and a database (sqlite or postgres), and ML models must be explicitly enabled for optimal results, adding to deployment time.