A Jupyter Notebook kernel for interactive data exploration and analysis using Apache Spark with Scala.
Apache Toree is a Jupyter Notebook kernel that enables interactive data exploration and analysis using Apache Spark with the Scala programming language. It provides an interface for clients to send code snippets that are executed on a Spark cluster, allowing for real-time data processing, job execution, and result collection directly within Jupyter notebooks.
Data scientists, data engineers, and developers who need interactive, scalable data analysis and exploration using Apache Spark within Jupyter Notebook environments.
Developers choose Apache Toree for its seamless integration of Jupyter Notebooks with Apache Spark, enabling interactive, distributed data processing without leaving the familiar notebook interface. Its support for the latest Jupyter protocols and optional Spark monitoring plugin provides a robust, extensible platform for big data exploration.
Mirror of Apache Toree (Incubating)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Enables direct execution of Spark jobs and result collection from Jupyter notebooks, as outlined in the README for interactive data exploration and prototyping.
Optimized for Scala, Spark's native language, providing efficient and native data processing within the notebook environment without translation layers.
Implements Jupyter Protocol 5.0, ensuring compatibility with the latest Jupyter/IPython releases for up-to-date functionality and stability.
Spark monitor plugin offers enhanced application monitoring, though it requires separate JAR configuration as detailed in the usage section.
Only supports Scala, excluding popular languages like Python or R, which restricts use for teams with diverse data science toolchains.
Requires sbt, make, and Docker for building and packaging, as noted in the development section, adding setup overhead compared to simpler kernels.
Spark monitor plugin is not included by default and needs manual configuration via --magic-url, introducing deployment complexity and potential errors.
The README mentions ongoing documentation enhancements, indicating current guides might be sparse or outdated for advanced use cases.