An open-source malware analysis pipeline system that automates sample collection, processing, and JSON-based artifact storage.
Aleph is an open-source malware analysis pipeline system that automates the collection, processing, and storage of malware samples. It uses collectors to gather samples from sources like filesystems and email, runs analysis plugins to extract artifacts, and stores results as JSON in Elasticsearch for structured querying. The system replaces manual grep/regex workflows with an objective, scalable pipeline.
Security analysts, incident responders, and malware researchers who need to automate and scale malware sample analysis. It's also suitable for organizations building internal threat intelligence platforms.
Developers choose Aleph for its modular, extensible design that supports custom collectors and plugins, its ability to process samples in parallel for efficiency, and its structured JSON output that enables powerful querying via Elasticsearch instead of manual text processing.
An Open Source Malware Analysis Pipeline System
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports extensible analysis with plugins like PEInfo and VirusTotal, enabling customizable artifact extraction as listed in the README.
Configurable SampleManager services process samples in parallel, improving throughput for large volumes as described in the 'How?' section.
Converts sample data to JSON and stores it in Elasticsearch, facilitating objective querying instead of manual grep workflows, a core philosophy.
Collectors like FileCollector and MailCollector automate ingestion from sources such as filesystems and IMAP, reducing manual effort.
Requires manual setup of Elasticsearch, JVM, Python virtualenv, and folder creation, with the README admitting incomplete code and workaround steps.
Web interface uses hard-coded default credentials (admin/changeme12!) that must be manually changed, posing a security threat if overlooked.
Tight coupling with Elasticsearch limits flexibility and adds overhead for environments not already using it, with no alternative storage options mentioned.