Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Machine Learning
  3. tech.ml.dataset

tech.ml.dataset

EPL-1.0Clojure

A high-performance, functional tabular data processing library for Clojure, similar to Python's Pandas or R's data.table.

GitHubGitHub
746 stars34 forks0 contributors

What is tech.ml.dataset?

tech.ml.dataset is a Clojure library for tabular data processing that provides high-performance, functional alternatives to tools like Python's Pandas or R's data.table. It solves data-intensive problems on the JVM with efficient columnar storage and immutable datasets. The library focuses on pragmatic data work with abstractions that simplify implementing real-world solutions.

Target Audience

Clojure developers and data scientists working with tabular data on the JVM who need efficient, functional alternatives to Python or R tools. It's also suitable for Java developers via its Java API.

Value Proposition

Developers choose tech.ml.dataset for its functional design, which makes data transformations easier to reason about, and its high performance through memory-efficient columnar storage. It provides a pragmatic, JVM-native alternative to popular data processing libraries.

Overview

A Clojure high performance data processing system

Use Cases

Best For

  • Processing large tabular datasets efficiently on the JVM
  • Implementing functional data pipelines in Clojure
  • Replacing Python Pandas or R data.table in Clojure projects
  • Performing data analysis with immutable datasets
  • Integrating tabular data processing into Java applications via the Java API
  • Building memory-efficient data processing systems with columnar storage

Not Ideal For

  • Projects not running on the JVM, such as those in Python or Node.js environments
  • Teams needing integrated data visualization or GUI tools without additional libraries
  • Developers already proficient in Pandas or data.table who prefer to avoid learning Clojure
  • Applications requiring real-time streaming data processing instead of batch operations

Pros & Cons

Pros

Memory-Efficient Storage

Uses columnar storage with primitive arrays and packed datetime types to significantly reduce memory footprint, as highlighted in performance benchmarks linked in the README.

Functional Immutability

Datasets are immutable, making data transformations predictable and easier to debug compared to mutable alternatives like Pandas, which aligns with the library's functional design philosophy.

High Performance

Optimized for speed with independent benchmarks showing it competes well against tools like data.table and Pandas, as referenced in the related projects section.

Java Interoperability

Includes a full Java API and sample program, allowing seamless integration into Java-based applications without requiring deep Clojure knowledge.

Cons

Limited Cutting-Edge Features

The README acknowledges that an alternative API, tablecloth, offers some important extra features, indicating TMD may lag in advanced capabilities or newer innovations.

Ecosystem Size

Compared to Python's Pandas, the Clojure data science ecosystem is smaller, which can mean fewer tutorials, community support, and third-party integrations for complex workflows.

Clojure Dependency

Requires familiarity with Clojure and functional programming paradigms, posing a significant barrier for teams accustomed to imperative languages like Python or R, despite the Java API.

Open Source Alternative To

tech.ml.dataset is an open-source alternative to the following products:

Pandas
Pandas

Pandas is a fast, powerful, and flexible open-source data analysis and manipulation library for Python, built on top of NumPy.

data.table
data.table

data.table is an R package that provides an enhanced version of data.frame with fast aggregation, large dataset handling, and concise syntax.

Frequently Asked Questions

Quick Stats

Stars746
Forks34
Contributors0
Open Issues32
Last commit6 days ago
CreatedSince 2019

Tags

#etl-pipeline#functional-programming#high-performance#data-science#dataframe#java#columnar-storage#csv#clojure#tabular-data#jvm#data-processing#datascience#data-analysis#dataset#xlsx#machine-learning

Built With

C
Clojure
J
JVM

Included in

Machine Learning72.2k
Auto-fetched 6 hours ago

Related Projects

PigPenPigPen

Map-Reduce for Clojure

Stars565
Forks51
Last commit3 years ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub