A weekly social data project providing real-world datasets for practicing data tidying, visualization, and analysis.
TidyTuesday is a weekly social data project that provides curated, real-world datasets for people to practice and learn data science skills. It releases new datasets every Monday, encouraging participants to explore the data, create visualizations or models, and share their work with the community using dedicated hashtags. The project emphasizes hands-on learning through data tidying and visualization techniques while cautioning against drawing causal conclusions.
Data science learners, students, educators, and practitioners who want to practice data analysis, visualization, and modeling with real-world data in languages like R, Python, or Julia. It is also used by academic instructors to incorporate practical datasets into courses.
Developers choose TidyTuesday for its consistent, community-driven supply of diverse, real-world datasets that are ready for analysis, its supportive multi-language community for sharing and feedback, and its educational focus on practical skill-building over theoretical exercises.
Official repo for the #tidytuesday project
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
New, curated real-world datasets are released every Monday, providing regular practice opportunities as highlighted in the weekly schedule table in the README.
Explicitly supports R, Python, and Julia with dedicated hashtags and resources like Posit's PydyTuesday repo, encouraging broad participation across different programming ecosystems.
Successfully used in over 30 courses in 2024, with clear goals for academic use, making it ideal for structured learning environments as noted in the README's goals section.
Encourages creating visualizations, models, Quarto reports, or Shiny apps, allowing participants to practice various data science skills without restrictive output requirements.
The README explicitly warns against drawing causal conclusions due to unobserved variables, limiting its use for inferential statistics or serious analytical work beyond practice.
Data must be downloaded from social media posts or GitHub repositories, lacking an automated API or centralized access point, which can be inconvenient compared to data platforms with direct feeds.
With datasets curated from various sources and community submissions, there is inconsistency in data cleanliness, documentation, and relevance across weeks, as hinted by the 2026 goal for better curation tools.