How accurate is the spell and weapon data in dnddata?

ProcessedSpells and processedWeapons use heuristic string matching with error rates; about 5-6% of entries fail to match, and manual checks show small mistake rates (e.g., 1-2/200), so it's reliable for trends but not perfect.

Can I use dnddata with Python instead of R?

Yes, the dataset is available in JSON and TSV formats in the data-raw directory, making it easy to load into Python using libraries like pandas for analysis, though R examples are primary in the README.

How does dnddata compare to datasets from D&D Beyond or Roll20?

dnddata is free, open-source, and focused on character stats from specific apps, while others are proprietary or cover broader gameplay logs; it's best for community-sourced demographic analysis, not commercial tools.

How to account for multiclass characters in dnddata analysis?

The dataset includes class fields with levels separated by |, and the README provides R code examples for weighting multiclass levels, such as in the co-occurrence matrix plot for race and class distributions.

What biases should I watch out for in dnddata?

Key biases include selection bias from Reddit-sourced submissions and app users, plus potential overrepresentation of test characters, as detailed in the caveats section on selection bias.

How often is dnddata updated and can I contribute new data?

It's updated weekly with new submissions from the web apps, but contributions are indirect via app usage; there's no direct mechanism for submitting corrections or additions to the dataset itself.

Open-Awesome

dnddata

MITR

A weekly updated dataset of Dungeons & Dragons characters submitted to character sheet web applications, with over 7,900 entries and standardized fields.

GitHub

122 stars20 forks0 contributors

What is dnddata?

dnddata is an open-source dataset of Dungeons & Dragons characters collected from submissions to character sheet web applications. It provides a large, standardized collection of character attributes—such as race, class, abilities, and spells—for analysis and research. The dataset addresses the need for accessible, clean data on D&D character trends and demographics.

Target Audience

Data analysts, researchers, and D&D enthusiasts interested in exploring character statistics, trends, and demographics within the D&D community. R users and data scientists working with gaming datasets will find it particularly useful.

Value Proposition

Developers choose dnddata for its large, weekly updated sample size, standardized fields that handle free-text inconsistencies, and availability in multiple formats (R, JSON, TSV). It offers a unique, community-sourced dataset not readily available elsewhere.

Overview

A dataset of D&D characters submitted to https://oganm.com/shiny/printSheetApp and https://oganm.com/shiny/interactiveSheet. A superset of characters used in oganm/dndstats

Use Cases

Best For

Analyzing D&D character race and class distribution trends
Researching player preferences in spells, weapons, and feats
Studying alignment and ability score patterns in character creation
Teaching data analysis with a fun, accessible gaming dataset
Building visualizations of D&D character demographics
Comparing character statistics across different levels and backgrounds

Not Ideal For

Academic research requiring a statistically representative, unbiased sample of the global D&D player base
Applications needing real-time, error-free character data for live gameplay tools
Projects focused on character narratives, backstories, or role-playing elements beyond statistical attributes
Analysis of other tabletop RPG systems or D&D editions beyond 5th edition

Pros & Cons

Pros

Large, Growing Dataset

With over 7,900 characters and weekly automatic updates, it provides a substantial sample size for trend analysis, as noted in the README's examples and feature list.

Multiple Data Formats

Available as R data frames, JSON, and TSV files in the data-raw directory, facilitating easy integration with various programming languages and tools beyond R.

Standardized for Analysis

Includes processed fields like processedRace and processedSpells that clean up free-text inputs using heuristics, ensuring consistency for demographic studies.

Community-Inspired Credibility

Correlates with external analyses like FiveThirtyEight's D&D article, supporting reproducible research and validation of trends, as mentioned in the README.

Cons

Inherent Selection Bias

Data is sourced from niche web apps advertised on Reddit communities, skewing towards a specific subset of D&D players and potentially overrepresenting test characters.

Heuristic Data Reliability Issues

Processed fields like processedSpells use string matching with admitted error rates (e.g., 2/200 mistakes), compromising absolute accuracy for critical applications.

Simplistic Unique Detection

Filtering for unique characters relies on heuristics based only on name and class, which may incorrectly exclude valid entries or include duplicates, as cautioned in the caveats.

Frequently Asked Questions

Related Projects

The Quick, Draw! Dataset

Documentation on how to access and use the Quick, Draw! Dataset.

Stars6,797

Forks1,070

Last commit1 year ago

StarData

Starcraft AI Research Dataset

Stars575

Forks72

Last commit4 years ago

GTA-3D Dataset

A dataset of 2D imagery, 3D point cloud data, and 3D vehicle bounding box labels all generated using the Grand Theft Auto 5 game engine.

Stars147

Forks15