A Hadoop log aggregator and dashboard for visualizing cluster utilization across users.
White Elephant is a Hadoop log aggregator and dashboard that processes Hadoop job logs to visualize cluster utilization across users. It transforms raw log data into structured Avro format and provides interactive charts showing resource consumption patterns over time. The system helps administrators understand how Hadoop clusters are being used and identify optimization opportunities.
Hadoop administrators and data platform engineers who need to monitor and analyze cluster utilization patterns across multiple users and jobs. Organizations running Hadoop clusters who want visibility into resource consumption.
White Elephant provides a specialized solution for Hadoop log analysis with built-in aggregation and visualization, unlike generic monitoring tools. Its incremental processing capability makes it efficient for ongoing log analysis, and its Avro-based data pipeline ensures structured, queryable usage data.
Hadoop log aggregator and dashboard
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The incremental job only processes new log data, reducing overhead for ongoing analysis, as described in the job configuration files for efficient updates.
Supports monitoring multiple Hadoop clusters through configurable log paths in base.properties, enabling centralized management across environments.
Converts raw logs to Avro format, creating a queryable data cube for analytics, which simplifies integration and querying as outlined in the Hadoop jobs section.
Uses Rickshaw and D3.js to provide interactive charts for cluster usage insights, helping administrators visualize patterns over time with detailed graphs.
Only supports Hadoop 1.0.x and explicitly does not work with Hadoop 2.0, making it unsuitable for modern Hadoop deployments as admitted in the README.
Requires manual configuration of Hadoop JARs, keytabs for security, and WAR deployment to Tomcat, which can be cumbersome and error-prone for production environments.
Relies on HyperSQL in-memory database, which may not scale for large datasets and lacks built-in persistence, potentially impacting performance and availability.