> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kaanha.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Agents

> AI-powered voice agents for inbound and outbound calls — locked stack, multi-tenant, Premium tier.

# Voice Agents

> **Rolling out:** Voice agents are currently in limited general availability for **Premium** and **Enterprise** plans. Free / Starter / Pro plans do not include voice features. Voice changes ship to the `staging` branch first; production deployment is gated behind the `voiceAgents` feature flag per organization.

Kaanha AI voice agents are full-duplex, multi-turn AI callers — they answer your inbound numbers, place outbound calls, run on a fixed AI provider stack with predictable unit economics, and share the same prompt + tools system as your text chatbots.

## Provider Stack (Locked April 24, 2026)

The AI provider stack is **locked**. There is no per-tenant BYOK on AI; the platform owns the provider relationships and charges per-minute markup. Customers can pick **Sarvam** (Indic languages) or **Deepgram Aura** (English) for TTS — every other layer is fixed.

| Layer                    | Provider | Model                              | Cost            |
| ------------------------ | -------- | ---------------------------------- | --------------- |
| **STT** (speech-to-text) | Sarvam   | `saarika:v2.5`                     | \$0.0055 / min  |
| **LLM** (reasoning)      | Google   | Gemini 1.5 Flash                   | \$0.0004 / call |
| **TTS — Indic**          | Sarvam   | `bulbul:v2` (anushka, abhilash, …) | \$0.018 / call  |
| **TTS — English**        | Deepgram | `aura-2-*`                         | \$0.030 / call  |

**Cost per 3-minute call: $0.035–0.050. Margin at $0.12 / min: 65–75%.**

> Why locked? Predictable unit economics. The platform eats the AI cost so per-minute pricing stays flat for every customer. See `src/lib/voice/agent-builder.js` for the rationale comment.

### Engine

The deployed voice engine is **Bolna 0.10.1** (PyPI) wrapped by a custom `server.py` (\~1,028 lines) that adds exactly two things on top of stock Bolna:

1. **TwilioOutputHandler chunking patch** — splits TTS output into 20 ms (160-byte) mu-law frames before sending to Twilio. Without this, large welcome-message audio chunks play as choppy / breaking audio.
2. **Welcome message pre-synthesis** — calls the TTS provider at agent-create time, base64-encodes the audio, and caches it in Redis under `welcome_audio_cache:v3:{hash}` with a 30-day TTL. Supports all three locked-stack providers: **ElevenLabs**, **Sarvam**, and **Deepgram Aura**.

## What Voice Agents Do

* **Answer inbound calls** routed from a Twilio number you assign to the agent.
* **Place outbound calls** from the dialer or via the API, subject to TCPA consent and DNC scrubbing.
* **Multi-turn conversation** — full STT → LLM → TTS loop with interruption handling, end-of-utterance detection, and configurable silence timeouts.
* **Tool calls** — invoke functions you define (e.g. `book_appointment`, `lookup_order`, `escalate_to_human`) mid-call. Tool calls are HMAC-verified.
* **Real-time transcription** — every utterance persisted to the database; visible in `/chat` once the call ends.
* **Multi-language** — Sarvam for Hindi + Indic languages, Deepgram Aura for English. Auto-routed by your voice selection.

## Setting Up a Voice Agent

Navigate to `/voice-agents` and click **+ New Voice Agent**. The wizard walks you through:

1. **Name + description** — internal labels, not spoken to the caller.
2. **Welcome message** — the first thing the agent says when the call connects (e.g. "Hi, this is Riya from Acme. How can I help?"). Pre-synthesized and cached on save.
3. **Voice picker** — choose **Sarvam** (Indic) or **Deepgram Aura** (English). The picker auto-routes to the right TTS provider.
4. **Voice ID** — for Sarvam: `anushka`, `abhilash`, etc. For Deepgram: `aura-2-thalia-en`, `aura-2-asteria-en`, etc. Voices are previewable in the UI.
5. **System prompt** — the LLM behavior contract. Same format as your text chatbots; see [AI Behavior & System Prompts](#ai-behavior--system-prompts).
6. **Tools (optional)** — function definitions the agent can call mid-conversation.
7. **Save** — the platform writes the `VoiceAgent` row to the database, then HTTP-POSTs the merged config to the Bolna engine, which caches it in Redis under the agent ID.

The engine is now ready to handle calls for that agent.

## Phone Number Assignment

Voice agents need a phone number to receive inbound calls or to place outbound calls.

1. Go to `/phone-numbers`.
2. Click **Buy Number** — the platform purchases a Twilio number on your behalf (subject to your plan's allowance).
3. Assign the number to a voice agent from the dropdown.
4. Twilio's voice webhook is auto-configured to route to the engine.

You can reassign or release numbers at any time. Released numbers are deprovisioned at Twilio.

## Inbound Calls

When a caller dials your assigned Twilio number:

```
Caller's Phone
   ↓
Twilio
   ↓ POST /api/webhooks/twilio/voice (signature-verified)
Next.js (resolves AccountSid → TelephonyConnection → org → assigned VoiceAgent)
   ↓ TwiML <Stream> response
Bolna voice engine (WebSocket)
   ↓ STT → LLM → TTS loop (20 ms mu-law frames)
Caller hears the agent
```

Twilio signature verification is **fail-closed** on the inbound webhook. Calls from unknown sources are rejected.

## Outbound Calls

Outbound calls require explicit consent and DNC scrubbing.

1. Open a voice agent at `/voice-agents/[id]`.
2. Click **Place Call** in the dialer panel.
3. Enter the destination number in E.164 format (e.g. `+14155550100`).
4. Optionally pre-load context (contact ID, deal ID) so the agent has memory of the relationship.

> **TCPA — required reading.** Outbound calls in the US require **prior express written consent** from the recipient. The platform records consent metadata on each contact, but it is the customer's legal responsibility to verify consent before dialing. Consent records survive contact archival; see [Compliance Considerations](#compliance-considerations).

Bulk outbound campaigns are not yet exposed in the UI — they are on the [Roadmap](#roadmap).

## AI Behavior & System Prompts

Voice agents share Kaanha AI's text-chatbot prompt format. A good voice prompt:

* States the agent's **identity** ("You are Riya, a sales rep for Acme Inc.").
* Defines the **task scope** (Meta's January 2026 policy requires task-specific agents — generic "do anything" prompts are not compliant on WhatsApp; same principle applies to voice).
* Lists **what the agent must NOT do** (don't promise refunds, don't quote prices, etc.).
* Sets **escalation triggers** ("If the caller asks for a human, immediately call the `escalate_to_human` tool.").

Example:

```
You are Riya, the AI receptionist for Acme Plumbing. Your only job is
to qualify the caller's plumbing emergency and book an appointment via
the `book_appointment` tool.

Do NOT quote prices, offer discounts, or make promises about technician
arrival times. If the caller asks anything outside scope, politely
redirect or call `escalate_to_human`.

Speak in short, natural sentences. Pause for the caller to respond.
```

Debug a misbehaving agent by reading the call transcript at `/voice-agents/[id]/calls/[callId]` — every STT input, LLM token stream, and tool invocation is logged.

## Voice Selection

| Family                  | Use for                                                                                                | Voices                                                        |
| ----------------------- | ------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------- |
| **Sarvam `bulbul:v2`**  | Hindi + Indic languages (Tamil, Telugu, Bengali, Marathi, Gujarati, Punjabi, Kannada, Malayalam, Odia) | `anushka`, `abhilash`, plus per-language defaults             |
| **Deepgram `aura-2-*`** | English (US, UK, Australian)                                                                           | `aura-2-thalia-en`, `aura-2-asteria-en`, `aura-2-orion-en`, … |

The voice picker UI auto-suggests the right provider based on the agent's primary language. Switching language updates the available voice IDs without losing the rest of your config.

## Welcome Message Pre-Synthesis

When you save a voice agent, the engine pre-synthesizes the welcome message and caches it in Redis (30-day TTL). This guarantees the caller hears a clean, immediate greeting before the live STT/LLM/TTS loop spins up. All three locked-stack TTS providers support pre-synth:

| TTS Provider              | Pre-synth | Notes                          |
| ------------------------- | --------- | ------------------------------ |
| Sarvam `bulbul:v2`        | ✅         | Hindi + Indic languages        |
| Deepgram Aura-2           | ✅         | English                        |
| ElevenLabs (voicePremium) | ✅         | Multilingual; admin-grant-only |

Verified clean \~3-second playback in production logs:

```
[CHUNKED] handle() sent 163 frame(s) format=pcm input_bytes=52012
mulaw_bytes=26006 category=agent_welcome_message
```

## Tool Calls

Tools let the agent invoke real backend functions mid-conversation. Define each tool with a JSON schema (same format as OpenAI / Anthropic function calling):

```json theme={null}
{
  "name": "book_appointment",
  "description": "Book a service appointment for the caller.",
  "parameters": {
    "type": "object",
    "properties": {
      "service_type": {
        "type": "string",
        "enum": ["leak", "blockage", "installation", "inspection"]
      },
      "preferred_window": {
        "type": "string",
        "description": "ISO 8601 datetime range, e.g. 2026-04-27T09:00/2026-04-27T11:00"
      },
      "callback_phone": { "type": "string" }
    },
    "required": ["service_type", "preferred_window"]
  }
}
```

When the LLM calls a tool, the engine HTTP-POSTs to `/api/voice-agents/tool-call` with an HMAC signature derived from `ENGINE_WS_SECRET`. The route verifies the signature, looks up the tool handler for that organization, executes it, and returns the result to the engine — which feeds it back into the LLM context.

Tools are scoped to the organization. There is no cross-tenant tool execution.

## Conversation Persistence

Every call generates a transcript and (optionally) a recording.

| Asset                      | Default retention | Where to find it                                                                           |
| -------------------------- | ----------------- | ------------------------------------------------------------------------------------------ |
| Transcript (per-utterance) | 90 days           | `/chat` → conversation thread (linked to the contact's phone) + `/voice-agents/[id]/calls` |
| Call recording (audio)     | 90 days           | Same as above, when recording is enabled on the agent                                      |
| Tool-call audit log        | 365 days          | `AuditLog` table; visible to org admins via `/audit`                                       |

Retention is configurable per organization for Enterprise plans.

## AI Disclosure & Human Escalation

Per Meta's January 2026 policy and the EU AI Act's "limited risk" classification, voice agents must:

* **Play a pre-AI disclosure** at call start (e.g. "This call may be handled by an AI assistant."). This is enforced platform-wide and is not opt-out-able.
* **Honor `HUMAN` keyword routing** — if the caller says "human", "agent", "representative", or any other configured escalation phrase, the agent calls the escalation tool and the call is transferred to a configured human destination (or queued, if no human is available).

See [`/ai-disclosure`](/ai-disclosure) in your dashboard for the exact disclosure text and to customize it.

## Content Safety

Voice agents pass every LLM input/output through the same OpenAI omni-moderation filter used by your text chatbots. If a flagged classification is returned:

* The agent emits a safe fallback line ("Let me get a colleague to help with that.") instead of the LLM's response.
* The conversation is automatically routed to a human handler.
* The flagged content is logged in `AuditLog` for compliance review.

## Compatibility Checker

Some Bolna features require bundled audio assets that aren't shipped with every voice. The platform's compatibility resolver (`resolveCompatibility(agent)`) auto-disables:

* `backchanneling` (umm, hmm, yeah inserts)
* `use_fillers` (uh, like, you know)
* `ambient_noise` (background office sounds)

…when the selected voice / provider can't safely support them. The override is **transparent** — surfaced in the UI form, not silently swallowed — so you always know which features are active.

## Multi-Tenant Isolation

Every `VoiceAgent` row is scoped to `organizationId`. The full pipeline enforces this:

* API routes use `requireAuth` + org-scoped Prisma queries.
* The engine HTTP API authenticates each agent operation against the org-scoped agent ID.
* Twilio webhooks resolve `AccountSid → TelephonyConnection → organizationId` before routing.
* HMAC verification on engine → app webhooks (`/api/voice-agents/tool-call`, `/api/voice-agents/webhook`) prevents cross-tenant spoofing.

There is no shared agent state across organizations.

## Compliance Considerations

| Regulation                                                   | What it requires                                             | How Kaanha AI handles it                                                 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------------------ |
| **TCPA (US)**                                                | Prior express written consent for outbound AI calls          | Customer responsibility; consent metadata stored per contact             |
| **2-party consent (CA, FL, IL, MD, MA, MT, NV, NH, PA, WA)** | Both parties must consent to call recording                  | Disclosure played at call start; recording is opt-in per-agent           |
| **DNC (US National Do Not Call list)**                       | Scrub against DNC before outbound                            | Customer must verify; integration with telnyx/Federal DNC API on Roadmap |
| **EU AI Act (limited risk)**                                 | Disclosure that user is interacting with AI                  | Pre-AI disclosure played at call start (mandatory)                       |
| **Meta Jan 2026 (text + voice)**                             | Task-specific agents only — no generic "do anything" prompts | Enforced via system prompt review; flagged in audits                     |

## Cost & Margin

| Metric                              | Value                                       |
| ----------------------------------- | ------------------------------------------- |
| Cost per 3-min call (Sarvam TTS)    | \$0.035                                     |
| Cost per 3-min call (Deepgram Aura) | \$0.050                                     |
| Customer billing                    | \$0.12 / min (Premium), custom (Enterprise) |
| Margin at \$0.12 / min              | 65–75%                                      |

Voice usage is metered separately from text AI tokens. Each minute of call time deducts from your plan's voice allowance; overage is billed at the per-minute rate.

## Outbound SIP Trunking — External Carrier Integration

If you have phone numbers from an external carrier (Vobiz, Plivo, Exotel, Vonage, or any SIP provider), configure the carrier to route calls **outbound to the platform's Twilio SIP domain**. No carrier API credentials are required on the platform side.

**How it works:**

When a caller dials your carrier's DID, the carrier routes the call outbound via SIP to `k-{orgId}.sip.twilio.com` using the credentials generated here. Twilio receives the call, verifies the credentials, and forwards it to the platform webhook → Bolna voice engine — the same path as natively-purchased Twilio numbers.

**Setup:**

1. Go to `/phone-numbers` → **Connect External Number**.
2. The platform generates a per-org isolated SIP domain and credentials.
3. Copy the three values into your carrier's **SIP trunk / outbound route** config:
   * **SIP Server**: `k-{subSidSuffix}.sip.twilio.com`
   * **SIP Username**: shown in the modal
   * **SIP Password**: shown in the modal (copy once — you can regenerate at any time)
4. Click **I've configured my carrier →** to complete setup.
5. Assign the number to a voice agent from the phone numbers list.

**Credential management:** credentials are stored encrypted and scoped to your organization's isolated Twilio subaccount. Use **Revoke** in the modal to rotate or remove credentials; the SIP domain and infrastructure stay provisioned so regeneration is instant.

## Current Limitations

* **24/7 calling not yet available** — calls are restricted to your configured business hours. Off-hours inbound calls play a configurable voicemail message.
* **Bulk outbound campaigns** not exposed in UI yet — single-call outbound only.
* **Live agent transfer** (warm transfer to a human via SIP) is on the roadmap.
* **Voicemail drop** (pre-recorded outbound to detected voicemail) is on the roadmap.
* **`voiceAgents` feature flag** must be enabled per organization. Premium plans get it on by default during the rollout window; Enterprise requires explicit provisioning.

## Roadmap

| Item                                          | Status                                  | Target  |
| --------------------------------------------- | --------------------------------------- | ------- |
| ~~Multi-provider TTS welcome pre-synth~~      | ✅ Done (Sarvam + Deepgram + ElevenLabs) | —       |
| ~~Per-org isolated SIP domain + credentials~~ | ✅ Done                                  | —       |
| 24/7 calling with off-hours routing rules     | Planned                                 | Q2 2026 |
| Outbound dialer queue (bulk campaigns)        | Planned                                 | Q3 2026 |
| Live agent transfer (warm + cold)             | Planned                                 | Q3 2026 |
| Voicemail drop                                | Planned                                 | Q3 2026 |
| Per-agent custom voice IDs (Enterprise)       | Planned                                 | Q3 2026 |

## Troubleshooting

| Symptom                                     | Likely cause                                                                      | Fix                                                                                                                                                    |
| ------------------------------------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Welcome message silent**                  | Welcome text is empty, or TTS provider key is missing                             | Check voice agent welcome message field; verify TTS provider API key on the voice engine service                                                       |
| **Choppy / breaking audio**                 | Pre-chunking patch not loaded                                                     | Already fixed in production via the TwilioOutputHandler chunking patch — if you see this on a fresh deploy, redeploy the voice engine                  |
| **"Compatibility error" in UI**             | Selected voice doesn't support `backchanneling` / `use_fillers` / `ambient_noise` | Pick a different voice, or accept the auto-disable (surfaced transparently in the form)                                                                |
| **HMAC verification failed** on tool call   | Engine and app have different `ENGINE_WS_SECRET`                                  | Re-sync the secret on both Railway services and redeploy                                                                                               |
| **Agent answers but doesn't follow prompt** | Pre-lock legacy provider config in DB                                             | Open the agent in `/voice-agents`, click Save (forces lock-stack providers), or run `node scripts/resync-voice-engine-agent.mjs --all`                 |
| **Inbound call rings but never connects**   | Twilio number not assigned, or webhook not configured                             | Re-assign the number at `/phone-numbers`; the platform re-writes the webhook automatically                                                             |
| **Engine returns 500 on save**              | Bolna Pydantic schema rejection (unsupported field for the chosen provider)       | Check Next.js server logs for the `bolna-config.js` payload; the per-provider registry should filter unsupported fields, but custom additions can leak |

For deeper diagnosis, use the built-in slash commands: `/voice-logs`, `/voice-diagnose`, `/voice-parity`, `/voice-resync`.

## Soft-Migration of Legacy Agents

Voice agents created before the April 24, 2026 stack lock continue to work — the Bolna registry still ships every provider file. There are two paths to bring legacy agents onto the locked stack:

1. **Natural migration** — when a user opens a legacy agent in the UI and clicks Save, the picker only offers Sarvam / Google / Sarvam+Deepgram, so the locked values are written.
2. **One-shot script (recommended)** — run `DATABASE_URL=... node scripts/resync-voice-engine-agent.mjs --all` to set every `VoiceAgent` row to the locked stack and re-sync the Bolna config in Redis.

Path 2 is the safer default — it forces engine-level consistency and avoids drift between the DB and the Redis-cached agent config.

## API Reference

Voice agents are managed via:

* `GET /api/voice-agents` — list agents for the org
* `POST /api/voice-agents` — create an agent (triggers engine sync)
* `PUT /api/voice-agents/[id]` — update an agent (re-syncs to engine)
* `DELETE /api/voice-agents/[id]` — delete an agent (deprovisions on engine)
* `GET /api/voice-agents/[id]/calls` — list call history
* `POST /api/voice-agents/tool-call` — engine → app tool invocation (HMAC-verified)
* `POST /api/voice-agents/webhook` — engine → app callback for transcript / call-end events (HMAC-verified)
* `POST /api/webhooks/twilio/voice` — Twilio → app inbound call webhook (signature-verified)

SIP trunking credentials:

* `GET /api/voice-phone-numbers/sip-credentials` — returns (or lazily provisions) the org's SIP credentials: `{ username, password, sipDomain, isNew }`. OWNER/ADMIN only. Rate-limited 5/min/org.
* `DELETE /api/voice-phone-numbers/sip-credentials` — revokes the org's SIP credentials from Twilio and clears them from the DB. SIP domain and credential list remain (regenerate at any time).

Full request/response schemas are in [`/api-reference`](./api-reference).

## Related

* [AI Chatbots](./ai-chatbots) — text counterpart with shared prompt + tools system
* [Phone Numbers](/phone-numbers) — purchase + assign Twilio numbers
* [Channels](/channels) — voice as a channel alongside WhatsApp, SMS, Slack, etc.
* [Billing & Plans](./billing-plans) — voice availability per plan
* [SMS Channel](./sms-channel) — Twilio SMS shares the same number pool
* [AI Disclosure](/ai-disclosure) — pre-AI disclosure configuration
* [Security & Compliance](./security-compliance) — TCPA, DNC, AI Act, multi-tenant isolation
