The image-to-text endpoint allows you to analyze an image and ask questions about its content via the WetroCloud API.
This endpoint supports two response formats to suit different use cases:

  • Free Text Response: A natural language answer to your query.
  • Structured Output Response: A structured JSON output (Coming Soon…).

Each response type has unique request and response formats, which are explained in detail on their respective pages.

Free Text Response

Free text response provides natural, conversational-style answers to your queries. It is ideal for general Q&A and scenarios where a narrative or contextualized explanation is needed. Unlike structured output, free text does not require additional parameters like json_schema and json_schema_rules.

Request Example

curl --request POST \
  --url https://api.wetrocloud.com/v1/image-to-text/ \
  --header 'Authorization: Token <api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTQBQcwHfud1w3RN25Wgys6Btt_Y-4mPrD2kg&s",
    "request_query": "What animal is this?"
  }'

Response Example: Free Text

{
    "response": "This is a dog, specifically a Labrador Retriever.",
    "tokens": 1594,
    "success": true
}
FieldDescription
responseConversational response to the query.
tokensNumber of tokens used for processing.
successIndicates whether the query was successful.