Define, run, and deploy big data applications on AWS, OpenStack, and local machines using Docker.
Ferry is a big data development environment that lets you define, run, and deploy big data applications using Docker. It simplifies the process of launching and managing clusters on platforms like AWS, OpenStack, and local machines, eliminating the need for manual infrastructure configuration.
Developers and data scientists who want to experiment with or develop big data applications without dealing with the intricacies of setting up and configuring technologies like Hadoop, Spark, or Cassandra.
Ferry provides a streamlined, Docker-based approach to big data cluster management, offering pre-built stacks, isolated environments, and easy sharing via Dockerfiles, which reduces setup time and complexity compared to manual configurations.
Ferry lets you define, run, and deploy big data applications on AWS, OpenStack, and your local machine using Docker
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses YAML configuration files to define stacks, as shown in the README example with GlusterFS and YARN nodes, reducing manual infrastructure work.
Supports running clusters on AWS, OpenStack, and local machines, providing flexibility for development and testing across different environments.
Includes ready-to-use configurations for popular technologies like Hadoop, Spark, and Cassandra, saving time on installation and configuration.
Each cluster is isolated in Docker containers, allowing multiple clusters to run concurrently without interference, as mentioned for different applications.
Enables quick sharing and evaluation of applications using Dockerfiles, facilitating collaboration among developers and data scientists.
The README lists specific versions like Hadoop 2.5.1 and Spark 1.1.0, which are older and may lack features or security updates of newer releases.
Relies on Docker, which can introduce performance overhead and may not be suitable for environments where containerization is restricted or inefficient.
Only supports a fixed set of backends; adding new technologies requires creating Dockerfiles and configuration modules, which can be complex.
Requires Python and pip installation, and the documentation link might not cover all edge cases, potentially leading to setup issues.