A native Go client library and command-line tool for HDFS that connects directly to the namenode via protocol buffers.
HDFS for Go is a native Go client library and command-line tool for interacting with Hadoop Distributed File System (HDFS). It connects directly to the namenode using the protocol buffers API, providing a fast and idiomatic alternative to Java-based HDFS clients. The library implements interfaces from Go's standard `os` package, making it familiar for Go developers.
Go developers and data engineers who need to interact with HDFS in their applications or workflows, especially those looking for a performant, native alternative to the Hadoop Java client.
Developers choose HDFS for Go because it offers significantly faster performance than `hadoop fs` by avoiding JVM startup overhead, provides an idiomatic Go API that mirrors the standard library, and includes a feature-rich command-line tool with bash tab completion.
A native go client for HDFS
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Avoids JVM startup overhead; benchmarks in the README show the command-line tool is over 100x faster than `hadoop fs` for operations like directory listing.
Mirrors Go's stdlib `os` package by implementing interfaces like `os.FileInfo`, making it intuitive for developers familiar with standard file operations.
Includes a comprehensive command-line tool with Unix-like verbs (ls, rm, mv) and bash tab completion for HDFS paths, enhancing workflow efficiency.
Supports Kerberos authentication using `kinit` and ccache files, aligning with enterprise security setups without additional configuration hassle.
The README states the project is seeking new maintainers as the original author no longer uses it in production, raising risks for future updates and bug fixes.
Only supports HDFS protocol Version 9, which may lack features or optimizations from newer versions, potentially limiting compatibility with cutting-edge Hadoop distributions.
Requires setting environment variables like HADOOP_HOME and HADOOP_CONF_DIR for proper setup, which can be cumbersome in containerized or multi-cluster environments.