A Python library with fast C implementations for computing Dynamic Time Warping and other time series distances.
DTAIDistance is a Python library for computing distances between time series, with a primary focus on Dynamic Time Warping (DTW). It solves the problem of measuring similarity between temporal sequences that may vary in speed or timing, which is crucial in fields like speech recognition, bioinformatics, and financial analysis. The library provides both a pure Python reference implementation and a highly optimized C version for production use.
Data scientists, researchers, and engineers working with time series data who need efficient similarity measurement and clustering capabilities, particularly in domains like signal processing, pattern recognition, and machine learning.
Developers choose DTAIDistance for its exceptional performance (30-300x faster than pure Python DTW), seamless NumPy/Pandas integration, and comprehensive feature set including multi-dimensional support, pruning optimizations, and built-in clustering methods—all while maintaining an accessible Python API.
Time series distances: Dynamic Time Warping (fast DTW implementation in C)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The C implementation provides 30-300x speedups over pure Python DTW, with OpenMP parallelization for distance matrix calculations, as benchmarked in the library.
Implements PrunedDTW and Euclidean upper bounds (use_pruning=True) to dramatically reduce computation time, based on cited research like Silva and Batista's work.
Handles N-dimensional time series through the dtw_ndim package, allowing analysis of complex data like sensor fusion, added in v2.3.7.
Includes hierarchical clustering methods that work directly with DTW distances and plotting utilities for warping paths and dendrograms, facilitating end-to-end analysis.
The README notes compilation can fail if C dependencies or OpenMP are missing, requiring manual troubleshooting and making installation tricky on some platforms.
Focuses primarily on DTW and related methods; lacks other time series distance measures or machine learning models found in broader libraries like tslearn.
While numpy is optional, core functions expect numpy arrays for optimal speed, and the C code is tightly coupled, which may hinder use in minimal environments.