A framework enabling spatial data analysis within Hadoop ecosystems using Hive and SparkSQL.
Spatial Framework for Hadoop is an open-source library that enables spatial data analysis within Hadoop ecosystems. It provides User-Defined Functions and serialization tools for Hive and SparkSQL, allowing users to process and query geographic data at scale. The framework solves the problem of integrating geospatial analytics into big data workflows without requiring specialized standalone systems.
Data engineers and data scientists working with large-scale spatial data in Hadoop environments, particularly those using Hive or SparkSQL for analytics. It's also relevant for organizations with existing Esri/ArcGIS infrastructure looking to extend spatial capabilities to big data platforms.
Developers choose this framework because it provides native spatial functions within familiar Hadoop tools, avoiding the need for separate geospatial processing systems. Its tight integration with Esri's geometry standards ensures compatibility with ArcGIS workflows while leveraging Hadoop's distributed processing power.
The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
The JSON utilities specifically handle JSON exported from ArcGIS, making it easy to incorporate Esri data into Hadoop workflows without reformatting.
Provides User-Defined Functions and SerDes for spatial analysis directly in Hive and SparkSQL, enabling native geospatial queries like ST_Intersects or ST_Buffer on large datasets.
Built on the Esri Geometry API for Java, ensuring accurate spatial operations and calculations that adhere to enterprise standards, as seen in the ST_Centroid fix in v2.1.
Supports Hive v1+ and SparkSQL, with ongoing updates like Hive v4 compatibility, allowing use across various Hadoop distributions.
Pre-built releases may not be on Maven Central, requiring manual builds or dependency management, as noted in the README issue #123.
Ant build files are available but marked as legacy and likely to be removed, forcing users to migrate to Maven for future updates.
Workflows requiring MapReduce jobs need custom job authoring and deployment, adding overhead compared to drop-in solutions.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.