D DigitalCallers

Module · AI Conversations

A voice your customer
doesn’t hang up on.

The conversation engine inside DigitalCallers. Sub-second response, Indian-language native, multi-engine — pick the right voice for the right campaign without rebuilding your stack.

Live transcript

Pranav K · 2:14 · Suresh ● LIVE Lead · 14:02 Bhaiya, plot ka rate kya hai? Suresh · 14:02 1300 rupaye per square feet sir, 30x40 ka plot 15.6 lakh. Lead · 14:02 Document clear hai? Suresh · 14:03 Bilkul sir, NA aur KJP cleared, title clear hai. Original papers site par dikha denge.

Engine snapshot

ENGINEprimary
VOICEIndia-tuned, M
TTFT920 ms
LANGUAGEHi-En code-switch
SIP TRUNKour SIP carrier primary
FAILOVERbackup-trunk-01
RECORDINGstereo, on
PII REDACTIONenabled

Multi-engine

A multi-engine voice stack you don’t have to think about.

We benchmark every voice engine that ships in this market — proprietary, open-source, multilingual, English-only — and pick the right one for your campaign. You see one product. Behind the scenes, we route each call to the engine that gets the best result for that customer, that language and that use-case.

DEFAULT

India-first conversational voice

Our default for every campaign. Sub-second response, Hindi-English code-switching that doesn’t crash, native Kannada/Marathi/Tamil/Telugu/Bengali. The voice your lead doesn’t hang up on.

PRECISION MODE

Naturalness-first voice

A slightly slower engine we route long, emotionally-loaded conversations to — site visits with hesitant buyers, post-discharge healthcare calls, family-decision sales. Emotion-aware delivery, sighs, soft acknowledgements.

ENGLISH ONLY

English-only premium

For pure-English B2B / SaaS / global client conversations only. Fastest response time available, with a Western voice catalog. We never route Indian-language calls to this engine.

PRIVATE / ON-PREM

Indic self-hosted stack

For enterprise & regulated industries that require zero-data-egress. Runs in our private Indian-region environment with a fine-tuning loop that learns from your own conversation history.

We also publish our list of vetoed engines internally so customers know what we won’t deploy. American-accented TTS on Hindi calls. Western multilingual engines that mispronounce English-loanwords like “EMI” and “RERA”. We tested them. They don’t make it past our internal benchmark for India.

Speed of speech

Sub-second TTFT, without sounding robotic.

Time-to-first-token is the moment the lead stops waiting. Above 1.5 s and they assume the line dropped. Below 700 ms and the AI starts talking over them. We tune for the 800-1000 ms sweet spot — natural conversational rhythm.

  • Sliding-window context — long calls don’t blow up token counts
  • Affective dialog on Native engine — sighs, “hmm”, soft acknowledgements
  • Interruption handling — if the lead cuts in, the AI yields cleanly
  • Build-version stamping — every prompt change is auditable from a runtime banner

Audio waveform — stereo capture

CHANNEL 0 · LEAD CHANNEL 1 · AGENT (SURESH) 2:14 · 16-bit PCM · 16 kHz · ~1.8 MB · stored locally

Reliability

Multi-trunk SIP failover, dispatch pacing.

Real calling networks are messy. Carriers rate-limit. Trunks return 486 Busy. Our infrastructure assumes failure as a normal operating mode and routes around it.

  • Auto-retry on 486 / 429 — second trunk picks up inside 2 s
  • Dispatch pacing — INVITEs stagger across the second to respect per-second carrier caps
  • Idempotency guard — one click in the UI dispatches exactly one call, even with worker auto-reload
  • Webhook signing — HMAC-SHA256 between worker and Flask; replay-safe
Dispatch flow · 1 click 1. Click “Call” → idempotency guard +0 ms 2. Worker dispatch → primary SIP trunk +120 ms 3. Primary returns 486 Busy +340 ms 4. Auto-retry on backup-trunk-01 +520 ms 5. Lead picks up · agent.start_session() +1.4 s 6. First word spoken (920 ms TTFT) +2.3 s total

Code switching

Hindi-English the way Indians actually speak it.

Most TTS engines treat Hindi-English as two separate languages and switch awkwardly at sentence boundaries. our voice engine handles mid-sentence switching — and so do we.

Generic engines

“Sir, plot. ka. [switch to English] rate is one thousand three hundred rupees per square foot.”

Sentence-level switch. Awkward pause at the language boundary. The “rate” sounds like a different speaker.

DigitalCallers · our voice engine

“Sir, plot ka rate 1300 rupaye per square feet hai, total 15.6 lakh.”

One voice. One prosody. Fluid mid-sentence code-switching. Numbers spoken in Hindi number-words, currency in lakhs.

The receipts

Real numbers from real calls.

P50 TTFT
920 ms
across 12,400 prod calls
P95 TTFT
1.4 s
network-bound
Connect rate
62%
vs. 51% on bare-trunk
Recording loss
0%
since stereo + finally-cleanup

FAQ

AI conversation engine, answered.

Can we run different voice engines for different agents?
Yes. Each AI agent has its own voice profile and engine assignment, controlled from your dashboard. Switching takes effect on the next dispatched call. We pick the right combination for your use-case during onboarding so you don’t need to think about it day-to-day.
What happens if the primary voice engine fails mid-call?
It auto-falls-back to a stable engine on session-start failure. Mid-session crashes drop the call cleanly and the dispatch pipeline marks the call as failed with a reason — your CRM gets a call.failed webhook. The next dispatched call starts fresh.
Can I use my own SIP trunk?
Yes. We default to our SIP carrier because it has the cleanest Indian routing, but you can plug in any standards-compliant SIP trunk. Multi-trunk failover supports up to four trunks ranked by priority.
How are recordings stored?
Stereo WAV (channel 0 = lead, channel 1 = agent), stored locally on the DigitalCallers host with HMAC-signed URLs for external CRM consumption. Default retention 90 days; configurable per-account. PII is never written to disk in the transcript — only the raw audio.
How small a model can I run for cost control?
our voice engine Native is already the “small” production engine. The next step down is the planned Indic stack with our own quantized LLM — that’s targeted for v2 and is designed for self-hosting on a single A10 / L4 GPU.

Hear it on your own number.

A 20-minute demo where we dial your phone live and you score the conversation against your top human caller.