Code and data repository for reproducing examples from 'Evidence-based Software Engineering' book using publicly available data.
ESEUR-code-data is a repository containing all the code and datasets used in the book 'Evidence-based Software Engineering: based on the publicly available data'. It provides the complete analytical framework for reproducing the book's examples and exploring software engineering data. The project serves as a practical implementation of evidence-based research methodologies in software engineering.
Software engineering researchers, data scientists in tech, academic instructors teaching software engineering, and practitioners interested in evidence-based approaches to software development.
This repository offers a unique, fully reproducible research package that bridges academic software engineering research with practical data analysis, providing verified datasets and analysis code that saves researchers time and ensures methodological consistency.
Code and data used to create the examples in "Evidence-based Software Engineering based on the publicly available data"
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Includes all R scripts and compressed datasets to recreate every book example, enabling full verification of research findings as per the Key Features.
Aggregates publicly available software engineering datasets from various sources, pre-validated for analysis, saving time in data sourcing and cleaning.
Annual blog posts link to newly discovered data, keeping the repository current beyond the book's publication, as shown in the README with links from 2022 to 2025.
Provides a gallery of pre-generated plots and figures, offering immediate insights into data patterns without running code, accessible via the linked figures page.
Requires manual installation of R packages and configuration of environment variables, with potential issues if binaries are unavailable, leading to dependency headaches as noted in the README.
Datasets are compressed and may have filename inconsistencies (e.g., missing .xz), hindering direct use and requiring decompression, which can be error-prone for large-scale analysis.
Heavily relies on the book for context; without it, users may struggle to understand the analysis goals, as the README provides minimal explanatory guidance.