Question 1

How do I tidy data in Excel for a statistician?

Accepted Answer

Follow the guide's principles: create one table per variable type with full row names, avoid multiple worksheets or formatting, and export as CSV to prevent date handling issues. Ensure each column is a variable and each row an observation.

Question 2

What's the difference between raw data and tidy data?

Accepted Answer

Raw data is the unprocessed form from sources like machines or APIs, with no modifications. Tidy data is organized per Hadley Wickham's rules: variables in columns, observations in rows, and separate tables for different variable kinds, making it ready for analysis.

Question 3

Datasharing guide or Data Carpentry for beginners?

Accepted Answer

The datasharing guide is best for specific collaboration with statisticians, focusing on preparation and documentation. Data Carpentry offers broader, hands-on workshops for general data skills, so choose based on whether you need targeted advice or comprehensive training.

Question 4

How to document missing data in a code book?

Accepted Answer

Code missing values as NA and add a separate column for censored data with TRUE/FALSE indicators. In the code book, explain why data is missing, such as detection limits or lost follow-up, to avoid misinterpretation by analysts.

Question 5

Is this guide useful for machine learning projects?

Accepted Answer

Yes, for data preparation phases, as tidy data and reproducibility are key in ML. However, it doesn't cover model-specific issues like feature engineering or validation splits, so supplement with ML-focused resources.

How to Share Data with a Statistician

What is How to Share Data with a Statistician?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions