Question 1

How to fine-tune BERT for a custom text classification task?

Accepted Answer

Use the run_classifier.py script with your dataset, adjusting parameters like max_seq_length and batch size. The README provides an MRPC example that can be adapted by changing the task_name and data_dir.

Question 2

BERT vs GPT: which is better for language understanding?

Accepted Answer

BERT is bidirectional and excels at understanding tasks like question answering, while GPT is unidirectional and better for generation. For pure understanding, BERT often outperforms, but models like RoBERTa have since improved upon it.

Question 3

Can I use BERT for text generation like chatbots?

Accepted Answer

No, BERT is an encoder model not designed for generation. For chatbot tasks, consider decoder models like GPT or encoder-decoder models like T5, which can handle sequence generation.

Question 4

What hardware do I need to run BERT-Large locally?

Accepted Answer

You need a GPU with at least 16GB of RAM for reasonable performance, but even then, batch sizes may be limited. The README notes that Cloud TPUs are recommended for full BERT-Large fine-tuning.

Question 5

How to handle out-of-memory errors with BERT on GPU?

Accepted Answer

Reduce max_seq_length, use smaller batch sizes, or switch to BERT-Base. The README suggests potential fixes like gradient accumulation, but these are not implemented in the current release.

Question 6

Is there an official PyTorch version of BERT?

Accepted Answer

No, Google only provides TensorFlow code, but third-party PyTorch versions from HuggingFace are available and compatible with pre-trained checkpoints, as mentioned in the November 2018 update.

BERT

What is BERT?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions