Interactive visualization tool for monitoring Hadoop HDFS cluster usage and file storage efficiency.
HDFS-DU is an open-source visualization tool for Hadoop HDFS clusters. It provides interactive web-based visualizations to monitor folder sizes, track usage trends, and identify inefficient file storage, such as directories with too many small files. The tool helps administrators optimize storage and understand data distribution across the cluster.
Hadoop administrators, data engineers, and DevOps teams managing HDFS clusters who need to visualize storage usage and improve efficiency.
Developers choose HDFS-DU for its intuitive, interactive visualizations that go beyond command-line tools, offering real-time insights into storage patterns and inefficiencies with minimal setup.
Visualize your HDFS cluster usage
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers tree-map and file-tree visualizations with click-to-drill navigation, enabling intuitive exploration of HDFS folder hierarchies as shown in the UI screenshots.
Provides two layout toggles where node area represents total file size or descendant count, allowing analysis of both storage volume and file distribution patterns.
Colors nodes based on size per descendant to visually flag folders with many small files, helping administrators quickly spot inefficient storage.
Serves as a lightweight web UI accessible via browser after starting the local server, facilitating easy sharing and use across teams.
Generating custom data requires a multi-step process with Hadoop fsimage export, Pig processing, and Python post-processing, which the README admits is not streamlined.
Relies on JavaScript InfoVis Toolkit, an older library that may lack modern browser support or active updates compared to newer alternatives.
Visualizations are based on static fsimage snapshots, not live data, limiting real-time monitoring capabilities for dynamic clusters.