An open specification for storing geospatial vector data (points, lines, polygons) in the Apache Parquet columnar storage format.
GeoParquet is a community-driven specification that defines how to store geospatial vector data (points, lines, polygons) within Apache Parquet files. It standardizes geospatial data representation in Parquet to enhance interoperability across tools and advance cloud-native geospatial workflows. The specification provides a stable foundation for efficient geospatial analytics within modern, columnar data ecosystems.
Data engineers, data scientists, and geospatial analysts working with cloud data warehouses (like BigQuery, Snowflake, Redshift) or columnar data processing frameworks who need to store and analyze geospatial vector data efficiently. It is also for developers building geospatial tools and libraries that require interoperable geospatial data storage.
Developers choose GeoParquet because it brings geospatial best practices to the widely adopted Parquet format, enabling high-performance, read-heavy analytic workflows with efficient compression and columnar storage. Its unique selling point is providing a standardized, interoperable specification that is supported by over 20 tools across 6 languages, fostering innovation in cloud-native and streaming vector workflows.
Specification for storing geospatial vector data (point, line, polygon) in Parquet
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages Parquet's columnar design to achieve high compression ratios, significantly reducing disk space and network transfer costs as highlighted in the README.
Standardizes geospatial data exchange with support from over 20 tools across 6 languages, enhancing compatibility across cloud data warehouses and libraries.
Enables cheap column reads and efficient filtering via statistics, making it ideal for read-heavy scenarios in modern data ecosystems, per the README's feature list.
Supports both planar and spherical coordinate systems, aligning with major cloud platforms like BigQuery and Snowflake for seamless integration.
The specification explicitly notes that row-based formats are better for constant data updates, making GeoParquet unsuitable for write-heavy systems like transactional databases.
As a community-driven spec, actual features and performance depend on individual tool implementations, which can lead to inconsistencies or gaps in support.
Requires teams to integrate geospatial concepts with Parquet's columnar storage, adding a learning curve for those new to either domain.