A pure Python library for survival analysis, modeling time-to-event data with censoring.
Lifelines is a Python library for survival analysis, which models time-to-event data where some events may be censored. It helps answer questions about why events occur now versus later under uncertainty, such as measuring lifetimes or time to first actions. Originally developed for medical and actuarial applications, it is now used in SaaS, sociology, and A/B testing.
Data scientists, researchers, and analysts working with time-to-event data in fields like healthcare, SaaS, sociology, or inventory management who need to handle censored observations.
Lifelines provides a pure Python implementation of survival analysis with an intuitive API, comprehensive documentation, and support for various censoring types, making it easier to apply these methods outside traditional domains.
Survival analysis in Python
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
As a pure Python implementation, lifelines integrates seamlessly into Python data science workflows, avoiding the need for R or other languages, as emphasized in its philosophy.
It handles right-censored, left-censored, and interval-censored data out of the box, which is crucial for accurate survival analysis in various domains like SaaS and inventory management.
The library provides detailed documentation and tutorials, including an intro to survival analysis, making it easier for newcomers to learn and apply the methods, as highlighted in the README.
Lifelines is designed for applications beyond traditional fields, with examples in SaaS, sociology, and A/B testing, broadening its utility for modern data science problems.
While lifelines includes key models like Cox proportional hazards, it lacks some advanced parametric and semi-parametric models available in R's survival ecosystem, which may limit specialized research.
Being pure Python, it may suffer from performance bottlenecks with very large datasets compared to optimized libraries written in C or C++, potentially requiring workarounds like sampling.
Installation requires numpy, pandas, and scipy, which can be cumbersome for lightweight or embedded environments and adds complexity to deployment.