Web AI Next.js image to text demo

Image to text

Image to text models generate text from the images. Depending on the model, the generated text may be a caption for the image or an answer to the question about the image. This demo provides two types of models - image captioning and visual question answering. The image captioning models generate a caption for the image. The visual question answering models generate an answer to the question about the image. I recommend using quantized versions of the models as they are much smaller in size but provide almost the same quality as the full-precision models.

How to use the demo:

Select the model.
Load the image.
Enter the prefix. If the selected model is an image captioning model, the prefix is used as a starting point for the caption. If the selected model is a visual question answering model, the prefix is used as a question about the image. For image captioning models, the prefix is optional.
Click the "Process" button.

Status: select and load the model

Image to text

Select the image