A composable and fully extensible C++ execution engine library for building high-performance data management systems.
Velox is a composable and fully extensible C++ execution engine library for data management systems. It provides reusable, high-performance data processing components that enable developers to build systems for analytical workloads like batch, interactive, stream processing, and AI/ML. It takes optimized query plans as input and handles the computation, but does not include higher-level layers like SQL parsers or optimizers.
Developers and engineers building or optimizing compute engines and data management systems, particularly those integrating execution engines into larger data platforms.
Velox offers a modular, high-performance foundation that eliminates the need to rebuild core execution logic from scratch. Its extensibility allows for custom components, and its vectorized, Arrow-compatible design ensures efficient data processing across diverse workloads.
A composable and fully extensible C++ execution engine library for data management systems.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The Vector module provides an Arrow-compatible columnar memory layout with encodings like Flat and Dictionary, enabling efficient data interchange and lazy materialization, as documented in the developer guides.
With a fully vectorized expression evaluation engine, Velox leverages SIMD instructions and optimized encodings to execute expressions efficiently on large datasets, as highlighted in the expression evaluation documentation.
Velox allows developers to define custom types, functions, operators, file formats, and more, enabling deep integration into specialized data systems without reinventing core logic.
It includes sets of vectorized scalar, aggregate, and window functions following Presto and Spark semantics, reducing implementation overhead for common data operations.
Provides primitives for memory arenas, buffer management, spilling, and caching, helping optimize computational resources in complex data pipelines, as detailed in the resource management guides.
Velox lacks a SQL parser, dataframe layer, and query optimizer, requiring users to provide fully optimized query plans, which adds significant overhead for teams without existing planning infrastructure.
Setting up Velox involves specific compiler versions (e.g., GCC 11 or Clang 15) and managing numerous dependencies via platform-specific scripts, which can be error-prone and time-consuming.
As a low-level C++ library, integrating Velox requires handling network serialization, I/O connectors, and custom extensions, demanding substantial development expertise and effort.
Velox is primarily a C++ library, so teams using other languages like Python or Java must build custom bindings, adding complexity compared to native solutions in those ecosystems.