A curated collection of open data sources across government, academic, and private sectors for data science and research.
Data is a curated collection of open data sources compiled for data scientists, researchers, and developers. It provides organized access to datasets from government agencies, academic institutions, and private organizations across various domains. The project helps users discover machine-readable data for analysis, statistical modeling, and research projects.
Data scientists, researchers, statisticians, and developers who need access to quality datasets for analysis, machine learning, or academic research. It's particularly useful for those working on projects requiring diverse, publicly available data sources.
This collection saves time by aggregating and categorizing open data sources from multiple domains in one place. Unlike searching scattered repositories, it provides a curated, organized directory following open data principles with clear licensing and accessibility information.
Open Data Sources
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Aggregates data from government, academia, NGOs, and private sectors, as shown in dedicated sections like Governmental Data, Academic Data, and Non-Governmental Org Data.
Includes editorialized lists like 'Interesting Data Sets for Statisticians' and carefully categorized sources, ensuring relevance and accessibility for users.
Features international sources such as Data.gov (USA), Africa Open Data, and The World Bank, supporting cross-country research and analysis.
Lists unique datasets like 200,000+ Jeopardy questions and 10,000 annotated cat images, ideal for side projects and learning experiments.
Adheres to the Open Knowledge Foundation's definition, emphasizing machine-readable formats and reuse rights without restrictive licensing.
The README is a static list; linked sources might be deprecated or unavailable, requiring users to manually verify data currency and links.
Only provides links to raw sources, so users must handle all data cleaning, formatting, and integration, which adds overhead for analysis.
Some entries, like the Pew Research Center data, are noted with 'license is not truly open, involves some limitations,' risking compliance issues for commercial use.
The directory is selective and may miss niche or emerging datasets, forcing users to search elsewhere for specialized needs.