Image to text models generate text from the images. Depending on the model, the generated text may be a caption for the image or an answer to the question about the image. This demo provides two types of models - image captioning and visual question answering. The image captioning models generate a caption for the image. The visual question answering models generate an answer to the question about the image. I recommend using quantized versions of the models as they are much smaller in size but provide almost the same quality as the full-precision models.

How to use the demo:

  1. Select the model.
  2. Load the image.
  3. Enter the prefix. If the selected model is an image captioning model, the prefix is used as a starting point for the caption. If the selected model is a visual question answering model, the prefix is used as a question about the image. For image captioning models, the prefix is optional.
  4. Click the "Process" button.
