Question 1

How do I extract image features for my own images with neural-vqa?

Accepted Answer

Use the extract_fc7.lua script with options like -split and -input_image_dir, but you'll need to have the VGG-19 Caffe model downloaded and ensure your images are in the correct format. The README provides specific commands for train/val splits.

Question 2

What accuracy does neural-vqa achieve on the VQA dataset?

Accepted Answer

The pre-trained checkpoint (vqa_epoch23.26_0.4610.t7) indicates about 46.1% accuracy on the validation set, but this is based on 2015 research and lags behind newer models that often exceed 60% with attention mechanisms.

Question 3

Can I use neural-vqa with PyTorch instead of Torch?

Accepted Answer

No, the implementation is tightly coupled to Torch and Lua; porting to PyTorch would require significant code changes due to framework differences in tensor operations and model serialization.

Question 4

neural-vqa vs VILBERT: which is better for visual question answering?

Accepted Answer

VILBERT is a more recent transformer-based model with higher accuracy and better handling of complex queries; neural-vqa is a simpler LSTM-based baseline ideal for learning fundamentals but not for cutting-edge performance.

Question 5

How to train neural-vqa on a custom dataset?

Accepted Answer

You would need to modify the data loading scripts to support your dataset's format, extract fc7 features using VGG-19, and adjust paths in train.lua, which requires deep familiarity with Torch and the codebase.

Question 6

Is a GPU required to run neural-vqa for inference?

Accepted Answer

No, you can run it on CPU by setting gpuid=-1 in predict.lua, but inference will be slower; the pre-trained models have separate GPU and CPU versions to accommodate this.

Torch code for Visual Question Answering using a CNN+LSTM model

What is Torch code for Visual Question Answering using a CNN+LSTM model?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions