Skip to content

hugging face image-to-text

Description

Note

More information about the service specification can be found in the Core concepts > Service documentation.

This service uses Hugging Face's inference API to query image-to-text AI models.

You can choose from any model available on the inference API from the Hugging Face Hub that takes an image as input and outputs text(JSON). The model must take only one image as input.

This service takes two input files:

  • A JSON file that defines the model you want to use, your access token and optionally, you can set a specific field from the json answer as the output. If you specify nothing, the whole json will be returned.
  • The image file used as input.

json_description.json example:

1
2
3
4
5
{
     "api_token": "your_token",
     "api_url": "https://api-inference.huggingface.co/models/Salesforce/blip-image-captioning-base",
     "desired_output" : "generated_text"
}

In this model example "Salesforce/blip-image-captioning-base" is used for image captioning.


The API documentation for this service is automatically generated by FastAPI using the OpenAPI standard. A user-friendly interface provided by Swagger is available under the /docs route, where the endpoints of the service are described.

Environment variables

Check the Core concepts > Service > Environment variables documentation for more details.

Run the tests with Python

Check the Core concepts > Service > Run the tests with Python documentation for more details.

Start the service locally

Check the Core concepts > Service > Start the service locally documentation for more details.