How It Works
The API uses advanced AI to:- Load and analyze the content from the provided URL
- Understand your extraction requirements from the prompt
- Extract the relevant data
- Format the results according to your JSON schema
Endpoint
Example Request (Structured Output)
This example shows how to extract data in a structured JSON format by providing ajson_schema:
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
link | String | Yes | The URL of the website to extract data from |
prompt | String | Yes | Instructions describing what data to extract |
json_schema | Array | No | The structure defining your desired output format. If not provided, returns plain text |
delay | Integer | No | Delay in seconds before extraction (useful for dynamic content). Default: 0 |
The json_schema Parameter (Optional)
The json_schema parameter defines the structure of the data you want to extract. It’s an array of objects where each object represents a field in your output.
When to use:
- Use
json_schemawhen you want structured data in a specific format - Omit
json_schemawhen you want a plain text response
"string"- Text values"number"- Numeric values"boolean"- True/false values
The prompt Parameter
The prompt tells the AI what information to extract. Be specific and clear about what you want.
Good prompts:
- “Extract the names and net worth of all billionaires mentioned in the article”
- “Get all product names, prices, and ratings from this page”
- “Find all email addresses and phone numbers in the contact section”
- Be specific about what data you want
- Mention if you want all instances or just specific ones
- Indicate any filtering criteria
The delay Parameter
Use the delay parameter when extracting from dynamic websites that load content via JavaScript. The delay gives the page time to fully load before extraction begins.
When to use:
- Single-page applications (SPAs)
- Pages with lazy-loaded content
- Dynamic dashboards
- Sites with JavaScript-rendered content
More Examples
Extract Billionaire Data (Plain Text Response)
When you omit thejson_schema, you get a plain text response:
Extract Product Information
Extract Article Metadata
Response Format
All successful requests return a JSON object with the following structure: Withjson_schema (Structured Output):
json_schema (Plain Text):
| Field | Type | Description |
|---|---|---|
response | Array or String | Extracted data matching your JSON schema (array) or plain text (string) if no schema provided |
success | Boolean | Indicates whether the extraction was successful |
Error Handling
If the request fails, you’ll receive an error response:- Invalid API key
- Malformed JSON schema
- Inaccessible URL
- Invalid URL format
Best Practices
- Be specific in your prompts: Clear, detailed prompts produce better results
- Use appropriate delays: Add a delay for JavaScript-heavy websites
- Design clear schemas: Use descriptive field names and appropriate data types
- Handle errors gracefully: Always check the
successfield in responses - Test your schemas: Start with simple schemas and iterate based on results
Use Cases
The Data Extraction API is perfect for:- Price monitoring: Track competitor pricing across multiple websites
- Lead generation: Extract contact information from business directories
- Content aggregation: Gather articles, blogs, or news from various sources
- Market research: Collect product data, reviews, and ratings
- Data migration: Extract data from old systems or websites
- Real estate: Gather property listings and details
- Job boards: Collect job postings and requirements
Authentication
All requests require an API key in the Authorization header:Rate Limits
Please refer to your plan details in the Wetrocloud Console for rate limit information.Need Help?
- Email us at [email protected]
- Check out the API Reference for more details