An R package for detecting statistically significant breakpoints in time series using robust energy statistics.
BreakoutDetection is an R package that detects statistically significant breakouts—changes in mean or distribution—in time series data. It uses the E-Divisive with Medians (EDM) algorithm, a non-parametric method based on energy statistics, to identify one or more breakpoints without assuming normal distributions. This solves the problem of accurately pinpointing shifts in data trends across various domains.
Data scientists, statisticians, and researchers working with time series data in fields like A/B testing, behavioral analysis, econometrics, financial engineering, and social sciences.
Developers choose BreakoutDetection for its robust, non-parametric approach that handles real-world data complexities, supports multiple breakout detection, and provides statistical validation through permutation tests, all within a simple R package interface.
Breakout Detection via Robust E-Statistics
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses E-Divisive with Medians (EDM) without assuming normal distributions, making it effective for real-world, non-normal time series data as highlighted in the README.
Identifies more than one statistically significant breakout in a single time series, addressing common scenarios where data contains multiple shifts, as demonstrated in the example with the Scribe dataset.
Employs permutation tests to estimate breakout significance, ensuring reliable results, which is a core feature described in the package's methodology.
Easy installation via devtools and straightforward function calls like breakout(), with clear examples provided in the README for quick setup.
Requires R and devtools for installation, limiting its use in ecosystems based on other programming languages and adding setup overhead for non-R users.
Documentation is primarily via R help functions (e.g., help(breakout)), lacking detailed tutorials or advanced use cases, which may hinder onboarding.
Permutation tests for significance can be slow for large datasets or when detecting many breakouts, as inferred from the statistical methods, impacting performance.