# Text to Speech

The Text to Speech API converts text into spoken audio using Capa TTS voices. It supports raw audio downloads for direct playback and a JSON/base64 mode for apps that need to store or transport audio inside JSON.

> [!IMPORTANT]
> This endpoint uses Capa Text to Speech and counts toward your **AI request limit**.
>
> Voice IDs use the Capa model format, for example `v1-capa-tts-nabzclan/en-AU-NatashaNeural`.

## Endpoint

**POST** `https://developer.nabzclan.vip/api/ai/v1/audio/speech`

## Authentication

Include your Developer API token in the `Authorization` header:

```bash
Authorization: Bearer YOUR_API_TOKEN
```

## When to Use It

- **App narration**: Read articles, stories, guides, onboarding screens, or generated AI responses aloud.
- **Accessibility**: Add spoken output for users who prefer audio or need screen-reader-style content.
- **Customer support**: Generate voice replies for chatbots, support flows, IVR menus, and help center content.
- **Education**: Create pronunciation examples, language-learning clips, flashcards, and lesson audio.
- **Creator tools**: Generate voiceovers for reels, shorts, podcasts, announcements, and product demos.
- **Notifications**: Turn alerts, status changes, moderation results, or monitoring messages into audio.

## Request Parameters

### Required

| Field | Type | Description |
|-------|------|-------------|
| `model` | string | Voice/model ID to use, such as `v1-capa-tts-nabzclan/en-AU-NatashaNeural`. |
| `input` | string | Text to convert into speech. Maximum 50,000 characters. |

### Optional

| Field | Type | Description |
|-------|------|-------------|
| `response_format` | string | Output format. Supports `mp3`, `wav`, `opus`, `aac`, `flac`, `pcm`, or `json`. Use `json` for base64 audio JSON. |
| `voice` | string | Optional voice override for advanced voice routing. Usually not needed for `v1-capa-tts-nabzclan/...` models. |
| `speed` | number | Playback speed from `0.25` to `4`. Use `1` for normal speed. |

## Response Modes

### Raw Audio

By default, the API returns audio bytes. Save the response body to a file, stream it to a media player, or send it directly to a client.

```bash
curl -X POST https://developer.nabzclan.vip/api/ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "v1-capa-tts-nabzclan/en-AU-NatashaNeural",
    "input": "Hello, this is a text to speech test."
  }' \
  --output speech.mp3
```

### JSON Base64

Pass `response_format=json` in the query string when you want JSON containing base64-encoded audio instead of raw bytes. The JSON response contains the generated audio as base64 data.

```bash
curl -X POST "https://developer.nabzclan.vip/api/ai/v1/audio/speech?response_format=json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "v1-capa-tts-nabzclan/en-AU-NatashaNeural",
    "input": "Hello, this is a text to speech test."
  }'
```

Use JSON/base64 mode when:

- You are calling from a serverless function that prefers JSON payloads.
- You need to store the audio inside a database record or queue message.
- Your client cannot easily handle binary responses.
- You want a single API response containing metadata and audio data.

## Voice IDs

Voice IDs should be sent in the `model` field. The current format is:

```text
v1-capa-tts-nabzclan/{locale}-{VoiceName}Neural
```

Examples:

| Voice ID | Locale | Style |
|----------|--------|-------|
| `v1-capa-tts-nabzclan/en-AU-NatashaNeural` | English Australia | Female, natural general-purpose voice |
| `v1-capa-tts-nabzclan/en-US-GuyNeural` | English United States | Male, confident general-purpose voice |
| `v1-capa-tts-nabzclan/en-US-EmmaMultilingualNeural` | English United States | Female, multilingual voice |
| `v1-capa-tts-nabzclan/es-MX-JorgeNeural` | Spanish Mexico | Male Spanish voice |
| `v1-capa-tts-nabzclan/es-AR-ElenaNeural` | Spanish Argentina | Female Spanish voice |

## Available Voices

This list shows the currently available Capa TTS voices. Send these IDs in the `model` field using the `v1-capa-tts-nabzclan/...` format.

Need another voice or locale? Open a ticket at [helpdesk.nabzclan.vip](https://helpdesk.nabzclan.vip/) with the voice you want, and the Nabzclan team will review it and add it when possible.

Each voice includes a sample generated with this phrase:

```text
Hello, this is a voice test generated with the Capa text to speech model by nabzclan.
```

### English

| Voice ID | Region | Suggested use | Sample |
|----------|--------|---------------|--------|
| `v1-capa-tts-nabzclan/en-AU-NatashaNeural` | Australia | Australian female voice for apps, narration, and support | [Play MP3](/audio/tts-voice-tests/en-AU-NatashaNeural.mp3) |
| `v1-capa-tts-nabzclan/en-CA-ClaraNeural` | Canada | Canadian female voice for product and support audio | [Play MP3](/audio/tts-voice-tests/en-CA-ClaraNeural.mp3) |
| `v1-capa-tts-nabzclan/en-CA-LiamNeural` | Canada | Canadian male voice for narration and announcements | [Play MP3](/audio/tts-voice-tests/en-CA-LiamNeural.mp3) |
| `v1-capa-tts-nabzclan/en-HK-YanNeural` | Hong Kong | Hong Kong English female voice | [Play MP3](/audio/tts-voice-tests/en-HK-YanNeural.mp3) |
| `v1-capa-tts-nabzclan/en-HK-SamNeural` | Hong Kong | Hong Kong English male voice | [Play MP3](/audio/tts-voice-tests/en-HK-SamNeural.mp3) |
| `v1-capa-tts-nabzclan/en-IN-NeerjaExpressiveNeural` | India | Expressive Indian English female voice | [Play MP3](/audio/tts-voice-tests/en-IN-NeerjaExpressiveNeural.mp3) |
| `v1-capa-tts-nabzclan/en-IN-PrabhatNeural` | India | Indian English male voice | [Play MP3](/audio/tts-voice-tests/en-IN-PrabhatNeural.mp3) |
| `v1-capa-tts-nabzclan/en-NZ-MitchellNeural` | New Zealand | New Zealand male voice | [Play MP3](/audio/tts-voice-tests/en-NZ-MitchellNeural.mp3) |
| `v1-capa-tts-nabzclan/en-NZ-MollyNeural` | New Zealand | New Zealand female voice | [Play MP3](/audio/tts-voice-tests/en-NZ-MollyNeural.mp3) |
| `v1-capa-tts-nabzclan/en-US-MichelleNeural` | United States | US female voice for assistants and narration | [Play MP3](/audio/tts-voice-tests/en-US-MichelleNeural.mp3) |
| `v1-capa-tts-nabzclan/en-US-EmmaMultilingualNeural` | United States | US multilingual female voice for mixed-language content | [Play MP3](/audio/tts-voice-tests/en-US-EmmaMultilingualNeural.mp3) |
| `v1-capa-tts-nabzclan/en-US-GuyNeural` | United States | US male voice for explainers and announcements | [Play MP3](/audio/tts-voice-tests/en-US-GuyNeural.mp3) |
| `v1-capa-tts-nabzclan/en-US-ChristopherNeural` | United States | US male voice for product and support audio | [Play MP3](/audio/tts-voice-tests/en-US-ChristopherNeural.mp3) |

### Spanish

| Voice ID | Region | Suggested use | Sample |
|----------|--------|---------------|--------|
| `v1-capa-tts-nabzclan/es-AR-ElenaNeural` | Argentina | Argentine Spanish female voice | [Play MP3](/audio/tts-voice-tests/es-AR-ElenaNeural.mp3) |
| `v1-capa-tts-nabzclan/es-MX-JorgeNeural` | Mexico | Mexican Spanish male voice | [Play MP3](/audio/tts-voice-tests/es-MX-JorgeNeural.mp3) |
| `v1-capa-tts-nabzclan/es-PR-KarinaNeural` | Puerto Rico | Puerto Rican Spanish female voice | [Play MP3](/audio/tts-voice-tests/es-PR-KarinaNeural.mp3) |
| `v1-capa-tts-nabzclan/es-ES-AlvaroNeural` | Spain | Spanish male narration | [Play MP3](/audio/tts-voice-tests/es-ES-AlvaroNeural.mp3) |
| `v1-capa-tts-nabzclan/es-DO-RamonaNeural` | Dominican Republic | Dominican Spanish female voice | [Play MP3](/audio/tts-voice-tests/es-DO-RamonaNeural.mp3) |
| `v1-capa-tts-nabzclan/es-DO-EmilioNeural` | Dominican Republic | Dominican Spanish male voice | [Play MP3](/audio/tts-voice-tests/es-DO-EmilioNeural.mp3) |

## Choosing a Voice

- **For apps and assistants**: Use clear, neutral voices such as `v1-capa-tts-nabzclan/en-AU-NatashaNeural`, `v1-capa-tts-nabzclan/en-US-MichelleNeural`, or `v1-capa-tts-nabzclan/en-US-GuyNeural`.
- **For expressive narration**: Pick voices with stronger delivery such as `v1-capa-tts-nabzclan/en-IN-NeerjaExpressiveNeural` or `v1-capa-tts-nabzclan/en-US-ChristopherNeural`.
- **For multilingual content**: Prefer multilingual voices when available, such as `v1-capa-tts-nabzclan/en-US-EmmaMultilingualNeural`.
- **For regional products**: Match the locale to your user base, for example `en-AU` for Australia, `en-CA` for Canada, `en-HK` for Hong Kong, or `es-MX` for Mexico.

## Examples

### Generate an MP3 File

```bash
curl -X POST https://developer.nabzclan.vip/api/ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "v1-capa-tts-nabzclan/en-AU-NatashaNeural",
    "input": "Welcome to Nabzclan Developer. Your audio is ready."
  }' \
  --output speech.mp3
```

### Generate JSON/Base64 Audio

```bash
curl -X POST "https://developer.nabzclan.vip/api/ai/v1/audio/speech?response_format=json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "v1-capa-tts-nabzclan/en-AU-NatashaNeural",
    "input": "Hello, this is a text to speech test."
  }'
```

### Change Audio Format

```bash
curl -X POST "https://developer.nabzclan.vip/api/ai/v1/audio/speech?response_format=wav" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "v1-capa-tts-nabzclan/en-US-MichelleNeural",
    "input": "This file will be returned as WAV audio."
  }' \
  --output speech.wav
```

### Python: Save MP3

```python
import requests

url = "https://developer.nabzclan.vip/api/ai/v1/audio/speech"
headers = {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json",
}
payload = {
    "model": "v1-capa-tts-nabzclan/en-AU-NatashaNeural",
    "input": "Hello, this is a text to speech test.",
}

response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()

with open("speech.mp3", "wb") as audio_file:
    audio_file.write(response.content)
```

### Python: Decode JSON/Base64

Check the response keys, then decode the base64 audio field returned by Capa TTS.

```python
import base64
import requests

url = "https://developer.nabzclan.vip/api/ai/v1/audio/speech?response_format=json"
headers = {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json",
}
payload = {
    "model": "v1-capa-tts-nabzclan/en-AU-NatashaNeural",
    "input": "Hello, this is a text to speech test.",
}

response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
data = response.json()

# Common names are audio, data, base64, or b64_json.
audio_base64 = data.get("audio") or data.get("data") or data.get("base64") or data.get("b64_json")
if not audio_base64:
    raise ValueError(f"No base64 audio field found. Response keys: {list(data.keys())}")

with open("speech.mp3", "wb") as audio_file:
    audio_file.write(base64.b64decode(audio_base64))
```

### JavaScript: Browser Playback

```javascript
const response = await fetch("https://developer.nabzclan.vip/api/ai/v1/audio/speech", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "v1-capa-tts-nabzclan/en-US-EmmaMultilingualNeural",
    input: "This audio plays directly in the browser.",
  }),
});

if (!response.ok) {
  throw new Error(`TTS failed: ${response.status}`);
}

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
await audio.play();
```

### Node.js: Save Audio

```javascript
import fs from "node:fs/promises";

const response = await fetch("https://developer.nabzclan.vip/api/ai/v1/audio/speech", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "v1-capa-tts-nabzclan/en-US-ChristopherNeural",
    input: "Your generated narration is ready.",
  }),
});

if (!response.ok) {
  throw new Error(await response.text());
}

const audio = Buffer.from(await response.arrayBuffer());
await fs.writeFile("speech.mp3", audio);
```

## Best Practices

- Keep text clean: remove unsupported markup unless your chosen voice supports it.
- Split very long scripts into smaller chunks for easier retrying and editing.
- Cache repeated audio so you do not regenerate the same phrase every time.
- Use JSON/base64 only when needed; raw audio is smaller and better for direct downloads.
- Match the locale of the voice to the language and region of the text.
- Test multiple voices for long-form content because pacing and tone can change listener comfort.

## Errors

| Status | Meaning | Fix |
|--------|---------|-----|
| `401` | Missing or invalid API token | Send `Authorization: Bearer YOUR_API_TOKEN`. |
| `422` | Validation failed | Check `model`, `input`, `response_format`, and `speed`. |
| `429` | AI rate limit exceeded | Wait for reset or upgrade your plan. |
| `500` | Internal server error | Retry later; contact support if it continues. |
| `502` | Voice generation temporarily unavailable | Retry later; contact support if it continues. |

## Rate Limiting

Text to Speech is an AI endpoint. Each successful or failed authenticated request is tracked as an AI API request and applies to your plan's AI request limits.
