Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Data Science
  3. pyhsmm

pyhsmm

MITPython

A Python library for Bayesian inference in Hidden Markov Models (HMMs) and Hidden semi-Markov Models (HSMMs) with nonparametric extensions.

GitHubGitHub
578 stars178 forks0 contributors

What is pyhsmm?

pyhsmm is a Python library for Bayesian inference in Hidden Markov Models (HMMs) and Hidden semi-Markov Models (HSMMs). It enables unsupervised learning of time-series data by inferring hidden state sequences, transition dynamics, and model parameters using Bayesian nonparametric methods like the Hierarchical Dirichlet Process (HDP).

Target Audience

Researchers and data scientists working on time-series analysis, particularly those interested in Bayesian nonparametric methods, unsupervised learning, and flexible model selection for sequential data.

Value Proposition

It provides a specialized implementation of HDP-HMM and HDP-HSMM with weak-limit approximations, offering automatic state count inference and extensible distributions, which is less common in general-purpose probabilistic programming libraries.

Overview

pyhsmm is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs). It focuses on Bayesian Nonparametric extensions like the HDP-HMM and HDP-HSMM, primarily using weak-limit approximations for scalable inference.

Key Features

  • Bayesian Nonparametric Models — Implements Hierarchical Dirichlet Process (HDP) priors for HMMs and HSMMs to infer the number of states automatically.
  • Weak-Limit Approximations — Uses computationally efficient approximations for inference in nonparametric models.
  • Gibbs Sampling — Performs approximate posterior inference via Gibbs sampling over latent state sequences, transition matrices, and parameters.
  • Extensible Distributions — Supports custom observation and duration distributions by implementing defined interfaces.
  • Multiple Data Sequences — Allows learning from multiple observation sequences by adding each to the model.

Philosophy

pyhsmm emphasizes Bayesian nonparametric approaches to model selection and uncertainty quantification, providing tools for flexible time-series modeling without pre-specifying the number of hidden states.

Use Cases

Best For

  • Unsupervised segmentation of time-series data into hidden states
  • Modeling sequences with variable-duration hidden states (semi-Markov processes)
  • Bayesian nonparametric inference for automatic model complexity selection
  • Research in hierarchical Dirichlet process extensions for Markov models
  • Educational exploration of Gibbs sampling for HSMMs and HMMs
  • Analyzing multidimensional sequential data with unknown state persistence

Not Ideal For

  • Production systems requiring actively maintained and well-supported libraries
  • Projects with large-scale or real-time data needing fast, optimized inference
  • Teams without expertise in Bayesian statistics or comfort with compiling C++ dependencies
  • Applications prioritizing user-friendly APIs and extensive documentation over low-level customization

Pros & Cons

Pros

Automatic State Inference

Implements HDP-HMM and HDP-HSMM to infer the number of hidden states without pre-specification, as shown in the basic example where Nmax is set but states are learned from data.

Scalable Nonparametric Methods

Uses weak-limit approximations to make Bayesian nonparametric inference computationally feasible, enabling handling of complex models without fixed state counts.

Extensible Architecture

Supports custom observation and duration distributions by implementing interfaces defined in basic/abstractions.py, allowing flexibility for various data types.

Multiple Sequence Learning

Allows learning from multiple observation sequences by adding each to the model, useful for aggregated time-series data analysis.

Cons

Abandoned Maintenance

The README warns that the package is no longer maintained, posing risks for bugs, compatibility issues with newer Python versions, and lack of updates.

Complex Installation

Requires Cython and a C++11 compiler like gcc-4.7+, as noted in the installation instructions, making setup non-trivial and error-prone on modern systems.

Sparse Documentation

Advanced features, such as faster message passing methods for durations, are mentioned but not documented, hindering optimization and usability.

Performance Limitations

Relies on Gibbs sampling for inference, which can be computationally intensive and slow for large or high-dimensional datasets, limiting scalability.

Frequently Asked Questions

Quick Stats

Stars578
Forks178
Contributors0
Open Issues39
Last commit1 year ago
CreatedSince 2012

Tags

#probabilistic-modeling#python-library#hidden-markov-models#bayesian-inference#time-series#machine-learning#unsupervised-learning

Built With

C
Cython
P
Python
N
NumPy

Included in

Machine Learning72.2kData Science3.4k
Auto-fetched 1 day ago

Related Projects

PyTorch - Tensors and Dynamic neural networks in Python with strong GPU accelerationPyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Stars101,219
Forks28,169
Last commit1 day ago
keraskeras

Deep Learning for humans

Stars64,102
Forks19,744
Last commit1 day ago
streamlitstreamlit

Streamlit — A faster way to build and share data apps.

Stars45,106
Forks4,305
Last commit2 days ago
gradiogradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Stars43,033
Forks3,521
Last commit1 day ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub