Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Apache Spark
  3. spark-daria

spark-daria

MITScalav1.0.0

A Scala library providing essential Spark extensions, helper methods, and custom transformations to maximize developer productivity.

GitHubGitHub
767 stars150 forks0 contributors

What is spark-daria?

spark-daria is a Scala library that extends Apache Spark with additional helper methods, extensions, and transformations to streamline data processing workflows. It solves common pain points in Spark development by providing more expressive APIs for column manipulation, DataFrame validation, and custom transformations. The library is designed to fill gaps in the native Spark API and promote cleaner, more maintainable code.

Target Audience

Scala developers and data engineers working with Apache Spark who want to write more idiomatic, readable, and efficient Spark code. It's particularly useful for teams building Spark applications that require reusable transformations and validation logic.

Value Proposition

Developers choose spark-daria because it offers a curated set of extensions that simplify complex Spark operations, reduce boilerplate code, and enforce best practices. Its focus on code readability and productivity, along with comprehensive documentation, makes it a valuable addition to any Spark project.

Overview

Essential Spark extensions and helper methods ✨😲

Use Cases

Best For

  • Adding idiomatic Scala extensions to Spark's Column and DataFrame APIs
  • Snake-casing column names in DataFrames for consistency
  • Validating DataFrame schemas with descriptive error messages
  • Removing whitespace or formatting strings in Spark columns
  • Calculating week or month boundaries in datetime columns
  • Converting DataFrame columns to Arrays or Maps for easier data access

Not Ideal For

  • Projects exclusively using PySpark without Scala integration
  • Teams prioritizing minimal dependencies and avoiding third-party Spark extensions
  • Applications requiring highly specialized, performance-critical UDFs beyond basic helpers
  • Environments with unsupported Spark or Scala versions not listed in the compatibility matrix

Pros & Cons

Pros

Idiomatic Scala Extensions

Adds methods like `.isFalse` to Spark's Column class, enabling more expressive code that aligns with Scala conventions, as shown in the README example comparing to native Spark syntax.

Useful Column Functions

Provides functions such as `removeAllWhitespace()` and datetime helpers (e.g., `beginningOfWeek`), simplifying common tasks without complex regex or manual calculations.

Reusable Custom Transformations

Includes transformations like `snakeCaseColumns()` that integrate with Spark's `DataFrame.transform()` method, promoting code reuse and maintainability in data pipelines.

Descriptive DataFrame Validators

Validators throw clear error messages for missing columns or schema mismatches, improving debugging efficiency, as demonstrated with `validatePresenceOfColumns`.

Cons

Version Fragmentation Headaches

Requires matching versions for Spark 2/3 and Scala 2.11/2.12/2.13, complicating dependency management and upgrades, as noted in the separate release listings.

Limited Advanced Features

Focuses on common gaps but may not cover edge cases or highly complex transformations, forcing users to supplement with custom code for specialized needs.

Publishing and Contribution Barriers

The publishing process involves GPG and Sonatype setup, which can be cumbersome for contributors, and the project relies on community support for maintenance.

Frequently Asked Questions

Quick Stats

Stars767
Forks150
Contributors0
Open Issues15
Last commit8 months ago
CreatedSince 2017

Tags

#apache-spark#spark#scala-library#dataframe#data-engineering#developer-productivity#spark-sql#big-data

Built With

s
sbt
S
Scala
A
Apache Spark

Included in

Apache Spark1.9k
Auto-fetched 1 day ago

Related Projects

quinnquinn

pyspark methods to enhance developer productivity 📣 👯 🎉

Stars687
Forks95
Last commit1 year ago
Joblib Apache Spark BackendJoblib Apache Spark Backend

Joblib Apache Spark Backend

Stars250
Forks24
Last commit2 months ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub