Question 1

How do I download the Fake News Corpus?

Accepted Answer

Download it from the GitHub releases page at https://github.com/several27/FakeNewsCorpus/releases/tag/v1.0. It's a large CSV file, so ensure you have sufficient storage and bandwidth.

Question 2

What types of labels are included in the dataset?

Accepted Answer

It includes 11 credibility types such as fake news, satire, extreme bias, conspiracy, and reliable, based on the OpenSources.co taxonomy. Each article inherits its domain's label, as detailed in the README table.

Question 3

How accurate are the labels in the Fake News Corpus?

Accepted Answer

The labels are not manually verified and rely on domain-level categorization, so there may be inaccuracies. However, for training ML algorithms, this noise is often acceptable, as noted in the limitations.

Question 4

Fake News Corpus vs LIAR dataset: which is better for research?

Accepted Answer

The Fake News Corpus is larger (9.4M articles) with more credibility types, while LIAR is smaller and fact-checked. Choose based on need for scale vs. label precision in your fake news detection projects.

Question 5

How to preprocess the Fake News Corpus for a machine learning model?

Accepted Answer

Load the CSV, clean text content using libraries like newspaper (as used in scraping), handle missing fields, and consider sampling due to its size. The structured format simplifies feature extraction.

Question 6

Is the Fake News Corpus updated with new articles?

Accepted Answer

No, the creator has stated it will not be updated after finalization, so it's best for historical analysis or training models that don't require recent data.

FakeNewsCorpus

What is FakeNewsCorpus?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions