A suite of command-line tools for analyzing Apache Cassandra SSTables to optimize performance and troubleshoot data models.
Instaclustr SSTable Tools is a collection of command-line utilities for analyzing Apache Cassandra SSTables (Sorted String Tables). It helps database administrators and developers understand data distribution, identify performance bottlenecks like wide partitions, analyze tombstone impact, and optimize compaction strategies by providing detailed insights into the storage layer.
Apache Cassandra database administrators, DevOps engineers, and backend developers who need to troubleshoot performance issues, optimize data models, and maintain large-scale Cassandra clusters.
These tools provide production-grade, deep inspection capabilities that are not available in Cassandra's standard tooling, enabling operators to identify problematic data patterns, tune compaction settings, and reclaim disk space efficiently.
Tools for working with sstables
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The sstables command provides exhaustive metadata like timestamps, compaction levels, and partition stats, enabling precise tuning of compaction strategies beyond Cassandra's built-in tools.
Pstats identifies largest partitions and their distribution across SSTables with percentile statistics, crucial for pinpointing performance bottlenecks in production clusters.
Cfstats offers cell-level details on rows, deletions, and tombstones, helping uncover anti-patterns like wide partitions or excessive tombstones that degrade query performance.
The summary command displays repair percentages and last repaired timestamps per column family, aiding in consistency management for incremental repairs.
Purge tool simulates compactions to estimate reclaimable data from tombstones, providing actionable insights for disk optimization without triggering actual compactions.
Installation requires copying JAR files to Cassandra lib directories and scripts to PATH, which is error-prone and assumes administrative access, unlike package manager installs.
Tools like purge perform fake compactions that can be CPU and memory heavy, making them unsuitable for frequent use on loaded production nodes without careful scheduling.
Being command-line only, it lacks a web interface or programmatic API, forcing users to parse textual output manually and integrate results into monitoring systems with custom scripting.
The project uses separate Git branches for different Cassandra major versions (e.g., cassandra-4.1), requiring recompilation or different builds for version upgrades, adding maintenance overhead.