Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Processing

Data Processing

81 projects

Showing 9 of 81 projects

samtools
samtoolsC

A suite of command-line tools for manipulating SAM, BAM, and CRAM files in next-generation sequencing data analysis.

#command-line-tools#sequencing-data#genomics
Stars1.9k
Forks609
Last commit7 days ago
datatable
datatableC++

A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.

#data-science#multi-threading#dataframe
Stars1.9k
Forks167
Last commit1 year ago
Apache Spark
Apache SparkShell

A curated list of awesome Apache Spark packages, libraries, and resources for data engineers and scientists.

#apache-spark#data-science#spark-ecosystem
Stars1.9k
Forks344
Last commit1 month ago
Kiba
KibaRuby

A Ruby framework for writing reliable, concise, and maintainable ETL (Extract-Transform-Load) data processing jobs.

#rubydatascience#etl-ruby#ruby-gem
Stars1.8k
Forks90
Last commit3 months ago
OpenXLSX
OpenXLSXC++

A C++ library for reading, writing, creating, and modifying Microsoft Excel .xlsx files.

#library#spreadsheet#cpp17
Stars1.7k
Forks384
Last commit2 days ago
jql
jqlRust

A fast, lightweight JSON Query Language CLI tool built with Rust for querying and transforming JSON data.

#json-query#shell-integration#tool
Stars1.7k
Forks32
Last commit1 month ago
flow
flowElixir

A computational parallel flow library for Elixir built on top of GenStage for parallel processing of collections.

#stream-processing#functional-programming#parallel-computing
Stars1.6k
Forks89
Last commit1 year ago
hashcat-utils
hashcat-utilsC

A collection of small, chainable command-line utilities for advanced password cracking operations.

#penetration-testing#security-tools#password-analysis
Stars1.6k
Forks400
Last commit5 months ago
Optimus
OptimusPython

A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.

#data-cleaning#cudf#spark
Stars1.5k
Forks232
Last commit1 year ago
PreviousPage 3 of 3

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
#Big Data16
#Json13
#Python12
#Rust11
#Performance11
#Machine Learning10
#Csv10
#Functional Programming9
#Dataframe8
#Stream Processing8
#Data Science8
#Data Analysis8