Skip to main content
The Wetrocloud Data Extraction API allows you to extract specific information from any web page and receive it in a structured JSON format that matches your requirements. Simply provide a URL, describe what you want to extract, and define your desired output schema.

How It Works

The API uses advanced AI to:
  1. Load and analyze the content from the provided URL
  2. Understand your extraction requirements from the prompt
  3. Extract the relevant data
  4. Format the results according to your JSON schema

Endpoint

POST https://api.wetrocloud.com/v1/extract/

Example Request (Structured Output)

This example shows how to extract data in a structured JSON format by providing a json_schema:
import requests
import json

url = "https://api.wetrocloud.com/v1/extract/"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Token <api_key>"
}

payload = {
    "link": "https://theweek.com/news/people/954994/billionaires-richest-person-in-the-world",
    "prompt": "Extract the names and networth of Billionares in the article",
    "json_schema": [
        {"name": "string"},
        {"networth": "number"}
    ],
    "delay": 2
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json())
Response:
{
  "response": [
    {
      "name": "Elon Musk",
      "networth": "$462 billion"
    },
    {
      "name": "Larry Ellison",
      "networth": "$340 billion"
    },
    {
      "name": "Mark Zuckerberg",
      "networth": "$258 billion"
    },
    {
      "name": "Jeff Bezos",
      "networth": "$244 billion"
    },
    {
      "name": "Larry Page",
      "networth": "$221 billion"
    },
    {
      "name": "Sergey Brin",
      "networth": "$207 billion"
    },
    {
      "name": "Bernard Arnault",
      "networth": "$197 billion"
    },
    {
      "name": "Steve Ballmer",
      "networth": "$179 billion"
    },
    {
      "name": "Jensen Huang",
      "networth": "$158 billion"
    },
    {
      "name": "Michael Dell",
      "networth": "$156 billion"
    }
  ],
  "success": true
}

Request Parameters

ParameterTypeRequiredDescription
linkStringYesThe URL of the website to extract data from
promptStringYesInstructions describing what data to extract
json_schemaArrayNoThe structure defining your desired output format. If not provided, returns plain text
delayIntegerNoDelay in seconds before extraction (useful for dynamic content). Default: 0

The json_schema Parameter (Optional)

The json_schema parameter defines the structure of the data you want to extract. It’s an array of objects where each object represents a field in your output. When to use:
  • Use json_schema when you want structured data in a specific format
  • Omit json_schema when you want a plain text response
Format:
[
  {"field_name": "data_type"},
  {"another_field": "data_type"}
]
Supported data types:
  • "string" - Text values
  • "number" - Numeric values
  • "boolean" - True/false values
Example schemas: Single object extraction:
[
  {"title": "string"},
  {"price": "number"},
  {"in_stock": "boolean"}
]
Multiple items extraction (same schema, returns array):
[
  {"name": "string"},
  {"networth": "number"}
]

The prompt Parameter

The prompt tells the AI what information to extract. Be specific and clear about what you want. Good prompts:
  • “Extract the names and net worth of all billionaires mentioned in the article”
  • “Get all product names, prices, and ratings from this page”
  • “Find all email addresses and phone numbers in the contact section”
Tips for better prompts:
  • Be specific about what data you want
  • Mention if you want all instances or just specific ones
  • Indicate any filtering criteria

The delay Parameter

Use the delay parameter when extracting from dynamic websites that load content via JavaScript. The delay gives the page time to fully load before extraction begins. When to use:
  • Single-page applications (SPAs)
  • Pages with lazy-loaded content
  • Dynamic dashboards
  • Sites with JavaScript-rendered content

More Examples

Extract Billionaire Data (Plain Text Response)

When you omit the json_schema, you get a plain text response:
import requests
import json

url = "https://api.wetrocloud.com/v1/extract/"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Token <api_key>"
}

payload = {
    "link": "https://theweek.com/news/people/954994/billionaires-richest-person-in-the-world",
    "prompt": "Extract the names and networth of Billionares in the article"
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json())
Response:
{
  "response": "Here are the names and net worths of the billionaires listed in the article \"Who are the 10 richest people in the world in 2025?\":\n\n1.  **Elon Musk:** $462 billion\n2.  **Larry Ellison:** $340 billion\n3.  **Mark Zuckerberg:** $258 billion\n4.  **Jeff Bezos:** $244 billion\n5.  **Larry Page:** $221 billion\n6.  **Sergey Brin:** $207 billion\n7.  **Bernard Arnault:** $197 billion\n8.  **Steve Ballmer:** $179 billion\n9.  **Jensen Huang:** $158 billion\n10. **Michael Dell:** $156 billion",
  "success": true
}

Extract Product Information

import requests
import json

url = "https://api.wetrocloud.com/v1/extract/"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Token <api_key>"
}

payload = {
    "link": "https://example-store.com/products",
    "prompt": "Extract all product information including name, price, and availability",
    "json_schema": [
        {"product_name": "string"},
        {"price": "number"},
        {"available": "boolean"}
    ]
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json())

Extract Article Metadata

import requests
import json

url = "https://api.wetrocloud.com/v1/extract/"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Token <api_key>"
}

payload = {
    "link": "https://example-blog.com/article",
    "prompt": "Extract the article title, author, publication date, and read time",
    "json_schema": [
        {"title": "string"},
        {"author": "string"},
        {"date": "string"},
        {"read_time": "number"}
    ],
    "delay": 1
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json())

Response Format

All successful requests return a JSON object with the following structure: With json_schema (Structured Output):
{
  "response": [...],
  "success": true
}
Without json_schema (Plain Text):
{
  "response": "Plain text response...",
  "success": true
}
FieldTypeDescription
responseArray or StringExtracted data matching your JSON schema (array) or plain text (string) if no schema provided
successBooleanIndicates whether the extraction was successful

Error Handling

If the request fails, you’ll receive an error response:
{
  "error": "Error message describing what went wrong",
  "success": false
}
Common errors:
  • Invalid API key
  • Malformed JSON schema
  • Inaccessible URL
  • Invalid URL format

Best Practices

  1. Be specific in your prompts: Clear, detailed prompts produce better results
  2. Use appropriate delays: Add a delay for JavaScript-heavy websites
  3. Design clear schemas: Use descriptive field names and appropriate data types
  4. Handle errors gracefully: Always check the success field in responses
  5. Test your schemas: Start with simple schemas and iterate based on results

Use Cases

The Data Extraction API is perfect for:
  • Price monitoring: Track competitor pricing across multiple websites
  • Lead generation: Extract contact information from business directories
  • Content aggregation: Gather articles, blogs, or news from various sources
  • Market research: Collect product data, reviews, and ratings
  • Data migration: Extract data from old systems or websites
  • Real estate: Gather property listings and details
  • Job boards: Collect job postings and requirements

Authentication

All requests require an API key in the Authorization header:
Authorization: Token <your_api_key>
Get your API key from the Wetrocloud Console. If you need help obtaining your API key, refer to this guide.

Rate Limits

Please refer to your plan details in the Wetrocloud Console for rate limit information.

Need Help?