Question 1

How to load dbmdz BERT models in Python?

Accepted Answer

Use the Hugging Face Transformers library with AutoTokenizer and AutoModel, as shown in the README. For example, for German BERT: tokenizer = AutoTokenizer.from_pretrained('dbmdz/bert-base-german-cased'); model = AutoModel.from_pretrained('dbmdz/bert-base-german-cased').

Question 2

What's the difference between BERTurk and DistilBERTurk?

Accepted Answer

BERTurk is a full-sized BERT model for Turkish, while DistilBERTurk is a distilled version that's smaller and faster but may sacrifice some accuracy. DistilBERTurk uses knowledge distillation from BERTurk as described in the README.

Question 3

Can I use these models for named entity recognition (NER)?

Accepted Answer

Yes, but performance details are not in the main README. You need to refer to external repositories linked for each language, such as https://github.com/stefan-it/fine-tuned-berts-seq for German results.

Question 4

Are TensorFlow versions available for these models?

Accepted Answer

No, currently only PyTorch weights are provided. The README says to raise an issue if TensorFlow checkpoints are needed, which may involve delays or no guarantee.

Question 5

How were the historic language models trained?

Accepted Answer

They were trained on datasets like Europeana newspapers and the Delpher Corpus, with details in external repositories. For example, the Historic Dutch model uses 21GB of text from Dutch newspapers dating 1618-1879.

Question 6

Is there a GPT-2 model for Ukrainian?

Accepted Answer

No, the project includes an ELECTRA model for Ukrainian, but not GPT-2. For German, there's a GPT-2 model, but other languages focus on BERT-based architectures.

dbmdz BERT models

What is dbmdz BERT models?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions