# Nabz Search

Search the web, scrape pages, map site URLs, and extract structured data — all through a single unified API.

Results are cached automatically to improve response times for repeated requests.

---

## Search

Search the web and get structured results.

`GET /api/nabzsearch/search?query=your+search+query`

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `query` | string | Yes | The search query (max 500 characters). |
| `limit` | integer | No | Maximum results (1-10, default: 5). |
| `format` | string | No | Content format when scraping results: `markdown` (default), `html`, `rawHtml`, `links`. Auto-enables scraping. |
| `sources` | string | No | Source type: `web` (default), `images`, or `news`. |
| `scrape` | boolean | No | If `true`, fetches page content for each result (default: `false`). |
| `location` | string | No | Geo-targeting location (e.g., `"Germany"`). |
| `country` | string | No | Two-letter ISO country code (e.g., `US`, `DE`). |
| `max_content_length` | integer | No | Truncate scraped content to this many characters (100-100,000). Only applies when `scrape` is enabled. Great for keeping responses small when feeding into LLMs. |

### Example Request

```bash
curl -X GET "https://developer.nabzclan.vip/api/nabzsearch/search?query=laravel+sanctum&limit=3" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json"
```

### Response

```json
{
  "success": true,
  "query": "laravel sanctum",
  "result_count": 3,
  "results": [
    {
      "url": "https://laravel.com/docs/sanctum",
      "title": "Laravel Sanctum - Laravel Documentation",
      "description": "Laravel Sanctum provides a featherweight authentication system..."
    }
  ]
}
```

---

## Scrape

Extract content from a single URL as markdown, HTML, or other formats. Supports JavaScript-rendered pages.

`POST /api/nabzsearch/scrape`

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `url` | string | Yes | The URL to scrape (max 2000 characters). Must be a public URL (no localhost or private IPs). |
| `format` | string | No | Output format: `markdown` (default), `html`, `rawHtml`, `links`, `images`, `screenshot`, `summary`. |
| `only_main_content` | boolean | No | Strip navigation/sidebars and return only main content (default: `false`). |
| `wait_for` | integer | No | Wait time in ms before scraping (0-10000). Useful for JS-rendered pages. |
| `country` | string | No | Two-letter ISO country code for geo-targeted scraping. |
| `max_content_length` | integer | No | Truncate content to this many characters (100-100,000). Useful for keeping payloads small. |

### Example Request

```bash
curl -X POST "https://developer.nabzclan.vip/api/nabzsearch/scrape" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "markdown", "only_main_content": true}'
```

### Response

```json
{
  "success": true,
  "url": "https://example.com",
  "results": {
    "markdown": "Example Domain\n==============\n\nThis domain is for use in documentation examples...",
    "metadata": {
      "title": "Example Domain",
      "language": "en",
      "statusCode": 200,
      "url": "https://example.com"
    }
  }
}
```

---

## Map

Discover all URLs on a website. Useful for sitemaps, link analysis, or finding specific pages before scraping.

`GET /api/nabzsearch/map?url=https://example.com`

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `url` | string | Yes | The website URL to map (max 2000 characters). Must be a public URL. |
| `limit` | integer | No | Maximum URLs to discover (1-500, default: 100). |
| `search` | string | No | Filter discovered URLs by search query. |
| `include_subdomains` | boolean | No | Include subdomains in results (default: `false`). |

### Example Request

```bash
curl -X GET "https://developer.nabzclan.vip/api/nabzsearch/map?url=https://example.com&limit=50" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json"
```

### Response

```json
{
  "success": true,
  "url": "https://example.com",
  "result_count": 3,
  "results": [
    "https://example.com/",
    "https://example.com/about",
    "https://example.com/contact"
  ]
}
```

---

## Extract <span class="badge">AI</span>

AI-powered structured data extraction. Two modes:

1. **From URLs** — provide specific URLs and a prompt to extract structured data from those pages.
2. **Web Search mode** — just send a prompt with `web_search: true` and the AI will search the web, find relevant sources, and return a structured answer. No URLs needed. Ask a question, get structured data back.

**Note:** This endpoint requires a VIP plan and uses your AI request quota.

`POST /api/nabzsearch/extract`

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `urls` | array | Conditional | Array of URLs to extract data from (max 10). Required unless `web_search` is `true`. |
| `prompt` | string | Yes | Natural language description of what data to extract (min 5, max 2000 characters). |
| `schema` | object | No | JSON schema for structured output. Improves accuracy and guarantees output shape. |
| `web_search` | boolean | No | If `true`, the AI searches the web to answer your prompt. URLs become optional. (default: `false`). |

### Example: Extract from URLs

```bash
curl -X POST "https://developer.nabzclan.vip/api/nabzsearch/extract" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "prompt": "Extract the page title and main description",
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "description": {"type": "string"}
      }
    }
  }'
```

### Example: Web Search mode

No URLs — just ask a question and get structured data back from the web.

```bash
curl -X POST "https://developer.nabzclan.vip/api/nabzsearch/extract" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "web_search": true,
    "prompt": "What is the current price of Bitcoin in USD?",
    "schema": {
      "type": "object",
      "properties": {
        "price_usd": {"type": "string"},
        "source": {"type": "string"},
        "timestamp": {"type": "string"}
      }
    }
  }'
```

### Response

```json
{
  "success": true,
  "prompt": "What is the current price of Bitcoin in USD?",
  "results": {
    "price_usd": "$83,721.00",
    "source": "CoinMarketCap",
    "timestamp": "2026-04-05"
  }
}
```

---

## Use Cases

### AI Agents & Tool Use

Give your AI agent access to the web. Use search as a tool so the agent can look things up in real time, scrape to read full page content, and extract to pull structured data from any site.

```python
# Example: AI agent tool that searches the web
import requests

def web_search(query):
    r = requests.get("https://developer.nabzclan.vip/api/nabzsearch/search", 
        headers={"Authorization": "Bearer YOUR_TOKEN"},
        params={"query": query, "limit": 5, "scrape": True})
    return r.json()["results"]
```

### Web Scraping Pipelines

Scrape pages at scale in your Python or Node scripts. Use map to discover all pages on a site first, then scrape each one.

```python
# Map a site, then scrape every page
import requests

API = "https://developer.nabzclan.vip/api/nabzsearch"
HEADERS = {"Authorization": "Bearer YOUR_TOKEN"}

# Step 1: Discover all pages
pages = requests.get(f"{API}/map", headers=HEADERS, 
    params={"url": "https://example.com", "limit": 50}).json()["results"]

# Step 2: Scrape each page
for url in pages:
    content = requests.post(f"{API}/scrape", headers=HEADERS,
        json={"url": url, "format": "markdown", "only_main_content": True}).json()
    print(content["results"]["markdown"])
```

### Lead Generation & Data Collection

Extract structured data from company websites, job boards, product pages, or directories. Use a JSON schema to get consistent output every time.

```bash
curl -X POST "https://developer.nabzclan.vip/api/nabzsearch/extract" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/about"],
    "prompt": "Extract the company name, email, and phone number",
    "schema": {
      "type": "object",
      "properties": {
        "company": {"type": "string"},
        "email": {"type": "string"},
        "phone": {"type": "string"}
      }
    }
  }'
```

### SEO & Competitor Research

Map competitor sites to see their full page structure, scrape their content to analyze keywords, or search for how they rank on specific queries.

### Monitoring & Alerts

Run periodic searches or scrapes from a cron job to track price changes, new content, or keyword rankings. The caching keeps things efficient when polling frequently.

### RAG (Retrieval-Augmented Generation)

Feed real-time web content into your LLM. Search for relevant pages, scrape them as markdown, and pass the content as context to your model for up-to-date answers.

---

## Caching

Search, scrape, and map results are cached automatically. Cached responses include `"cached": true` in the response body. Cache durations:

| Endpoint | Cache Duration |
|----------|---------------|
| Search | 5 minutes |
| Scrape | 10 minutes |
| Map | 15 minutes |
| Extract | Not cached (AI-generated) |

---

## Error Responses

| Status | Description |
|--------|-------------|
| `422` | Invalid parameters, blocked URL, or no results. |
| `502` | Search engine unavailable or returned an error. |
| `503` | Search service is not configured on the server. |

### Notes

URLs must be publicly accessible (`http` or `https`). Internal and private network addresses are blocked.
