Skip to content

hugging face text-to-audio

Description

Note

More information about the service specification can be found in the Core concepts > Service documentation.

This service uses Hugging Face's inference API to query text-to-audio AI models.

You can choose from any model available on the inference API from the Hugging Face Hub that takes a text(JSON) as input and outputs audio.

It must have the following JSON input structure:

1
2
3
{
    "inputs" : "your input text"
}

This service takes two input files:

  • A JSON file that defines the model you want to use and your access token.
  • A text file.

json example:

1
2
3
4
{
    "api_token": "your_token",
    "api_url": "https://api-inference.huggingface.co/models/facebook/musicgen-small"
}

This model example is a text-to-music model capable of generating music samples conditioned on text descriptions.

input_text example:

liquid drum and bass, atmospheric synths, airy sounds

This service creates the JSON payload from the input text and queries the given model. The generated audio is returned in the Ogg format.

The API documentation for this service is automatically generated by FastAPI using the OpenAPI standard. A user-friendly interface provided by Swagger is available under the /docs route, where the endpoints of the service are described.

Environment variables

Check the Core concepts > Service > Environment variables documentation for more details.

Run the tests with Python

Check the Core concepts > Service > Run the tests with Python documentation for more details.

Start the service locally

Check the Core concepts > Service > Start the service locally documentation for more details.