A Python tool that generates YARA rules for malware detection by filtering out strings and opcodes that appear in goodware.
yarGen is a Python-based tool that generates YARA rules for detecting malware. It works by extracting strings and opcodes from malicious files, then filtering out those that also appear in a database of known goodware to produce high-fidelity detection signatures. The tool helps analysts quickly create rules that are less likely to trigger false positives.
Malware analysts, threat hunters, and incident responders who need to create custom YARA rules for detecting and tracking malware campaigns. It's also useful for security researchers building detection logic for threat intelligence platforms.
yarGen automates the most time-consuming part of YARA rule creation—filtering out benign strings—by leveraging extensive goodware databases. Its support for opcodes, super rules, and custom databases allows for highly tailored detection logic that adapts to specific environments.
yarGen is a generator for YARA rules
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses extensive databases to exclude common strings and opcodes from legitimate software, significantly reducing false positives in generated YARA rules.
Identifies similarities across multiple malware files to create combined rules that match broader threat families, improving detection coverage.
Extracts and analyzes code sections from PE files, adding instruction sequences to rules for enhanced specificity beyond just strings.
Allows creation and management of custom goodware databases for specific environments, such as Office software, tailoring rules to reduce context-specific false positives.
Monitors a folder for new samples, automatically generates rules, and cleans up files, enabling continuous and streamlined malware analysis workflows.
Requires at least 4GB of RAM (6GB with opcodes), loading the entire goodware database into memory, which can be a bottleneck on resource-constrained systems.
Marked as 'Not Maintained' with the developer focusing on yarGen-Go, meaning bugs may not be fixed and new features are unlikely.
Generated rules often contain irrelevant strings and must be post-processed manually, as admitted in the README, adding to analyst workload.
Involves downloading large databases (913 MB), installing Python dependencies, and managing multiple files, which can be cumbersome for quick deployment.