A Python library for Bayesian inference in Hidden Markov Models (HMMs) and Hidden semi-Markov Models (HSMMs) with nonparametric extensions.
pyhsmm is a Python library for Bayesian inference in Hidden Markov Models (HMMs) and Hidden semi-Markov Models (HSMMs). It enables unsupervised learning of time-series data by inferring hidden state sequences, transition dynamics, and model parameters using Bayesian nonparametric methods like the Hierarchical Dirichlet Process (HDP).
Researchers and data scientists working on time-series analysis, particularly those interested in Bayesian nonparametric methods, unsupervised learning, and flexible model selection for sequential data.
It provides a specialized implementation of HDP-HMM and HDP-HSMM with weak-limit approximations, offering automatic state count inference and extensible distributions, which is less common in general-purpose probabilistic programming libraries.
pyhsmm is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs). It focuses on Bayesian Nonparametric extensions like the HDP-HMM and HDP-HSMM, primarily using weak-limit approximations for scalable inference.
pyhsmm emphasizes Bayesian nonparametric approaches to model selection and uncertainty quantification, providing tools for flexible time-series modeling without pre-specifying the number of hidden states.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Implements HDP-HMM and HDP-HSMM to infer the number of hidden states without pre-specification, as shown in the basic example where Nmax is set but states are learned from data.
Uses weak-limit approximations to make Bayesian nonparametric inference computationally feasible, enabling handling of complex models without fixed state counts.
Supports custom observation and duration distributions by implementing interfaces defined in basic/abstractions.py, allowing flexibility for various data types.
Allows learning from multiple observation sequences by adding each to the model, useful for aggregated time-series data analysis.
The README warns that the package is no longer maintained, posing risks for bugs, compatibility issues with newer Python versions, and lack of updates.
Requires Cython and a C++11 compiler like gcc-4.7+, as noted in the installation instructions, making setup non-trivial and error-prone on modern systems.
Advanced features, such as faster message passing methods for durations, are mentioned but not documented, hindering optimization and usability.
Relies on Gibbs sampling for inference, which can be computationally intensive and slow for large or high-dimensional datasets, limiting scalability.