Question 1

How does TensorFlow FastText compare to Facebook's FastText for text classification?

Accepted Answer

TensorFlow FastText implements core FastText ideas but within TensorFlow, offering better integration with TensorFlow Serving and distributed training. However, it lacks features like hierarchical softmax and separate embedding training, making the original FastText more feature-complete for research or large label spaces.

Question 2

How to deploy a TensorFlow FastText model with TensorFlow Serving?

Accepted Answer

Export the model using classifier.py with the --export_dir flag, then serve it with tensorflow_model_server on a port like 9000. Use the provided predictor_client.py for gRPC calls to get predictions, as detailed in the Tensorflow Serving section of the README.

Question 3

Does TensorFlow FastText support character ngrams and are they worth it?

Accepted Answer

Yes, it supports character ngrams via the --ngrams option in process_input.py, but the README notes that they make training much slower with only marginal performance gains in English. For other languages or high-performance needs, the trade-off may not be justified.

Question 4

Can I use pre-trained word embeddings with TensorFlow FastText?

Accepted Answer

Currently, no—the project does not support preloading embedding tables; embeddings are learned during classification training. The README mentions this as a not implemented feature, though embeddings can be exported after training.

Question 5

What accuracy can I expect for language identification with TensorFlow FastText?

Accepted Answer

Using word embeddings alone, it achieves about 96% accuracy, which increases to nearly 99% when adding character ngrams (--ngrams=2,3,4), as stated in the language identification section. This matches the performance described in the FastText blog post.

Question 6

Is TensorFlow FastText suitable for distributed training across multiple servers?

Accepted Answer

Yes, it supports distributed training across multiple GPUs on one or more servers using Horovod with MPI. The README provides examples with mpirun and notes close to linear scaling in performance, making it scalable for large datasets.

Question 7

How to preprocess data for TensorFlow FastText training?

Accepted Answer

Use process_input.py to convert data from Facebook format or text files with labels into TensorFlow Records. Specify options like --ngrams for character ngrams, and the README shows examples for both input types to streamline the workflow.

Tensorflow FastText

What is Tensorflow FastText?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions