Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Data Science
  3. Covid-19

Covid-19

Python

A cleaned and normalized time series dataset of global COVID-19 confirmed cases, deaths, and recoveries, updated daily.

Visit WebsiteGitHubGitHub
1.2k stars604 forks0 contributors

What is Covid-19?

COVID-19 dataset is a curated collection of time series data tracking the global spread of Coronavirus disease 2019. It provides daily updated figures on confirmed cases, reported deaths, and reported recoveries, disaggregated by country and sometimes subregion. The dataset addresses the need for a reliable, cleaned, and normalized source of pandemic statistics for analysis and research.

Target Audience

Data scientists, researchers, public health analysts, and journalists who need structured, up-to-date COVID-19 data for modeling, visualization, or reporting.

Value Proposition

Developers choose this dataset because it offers cleaned, normalized, and well-documented data derived from authoritative sources like Johns Hopkins University, saving time on data wrangling and ensuring consistency for time-series analysis.

Overview

Novel Coronavirus 2019 time series data on cases

Use Cases

Best For

  • Tracking the global spread of COVID-19 over time
  • Building dashboards or visualizations of pandemic metrics
  • Conducting epidemiological research and modeling
  • Comparing COVID-19 impact across different countries and regions
  • Analyzing trends in cases, deaths, and recoveries
  • Integrating pandemic data into data science projects or reports

Not Ideal For

  • Applications requiring real-time or intra-day data updates for live dashboards
  • Research needing individual-level or highly granular data like patient demographics
  • Projects focused on predictive analytics that require additional metrics such as vaccination rates or mobility data
  • Teams wanting a fully hosted API for immediate querying without local data processing

Pros & Cons

Pros

Daily Updated Time Series

Provides consistent, date-stamped records tracking the pandemic's evolution globally, with data updated daily from authoritative sources like Johns Hopkins University.

Global and Subregional Coverage

Includes data from over 100 countries and territories, with subregional breakdowns where available, enabling detailed geographic analysis and comparisons.

Cleaned and Normalized Data

Raw data is processed to tidy dates, consolidate files, and ensure consistency, saving significant time on data wrangling for researchers and analysts.

Multiple Format Availability

Available in CSV and JSON formats, making it easy to integrate into various tools and workflows, such as Python projects or visualization dashboards.

Cons

Dependency on Upstream Sources

Data accuracy and timeliness are contingent on the upstream Johns Hopkins University repository, which may inherit delays or errors from original health agency reports.

Manual Processing Required

Local use requires installing Python dependencies and running scripts, adding setup complexity compared to plug-and-play hosted APIs or datasets.

Limited Metrics Scope

Focuses only on confirmed cases, deaths, and recoveries, lacking other key indicators like testing rates or hospitalization data in many regions, which limits comprehensive analysis.

Frequently Asked Questions

Quick Stats

Stars1,157
Forks604
Contributors0
Open Issues33
Last commit1 month ago
CreatedSince 2020

Tags

#data-cleaning#data-science#time-series-data#pandas#open-data#dataset#public-health

Built With

p
pandas
P
Python

Links & Resources

Website

Included in

Data Science28.8k
Auto-fetched 8 hours ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub