⚡Automationadvanced

Building a Hindi Voice Banking Bot With Sarvam AI

Voice AI for 22 Indian languages — IVR, customer support, and regional outreach

AutoKaam Editorial·Apr 13, 2026·10 min read

Sarvam AI is India's most credible foundational AI for the 22 scheduled Indian languages. OpenAI and Anthropic handle Hindi well, but Sarvam is actually trained for Indic languages — voice quality, accent handling, and code-switching (Hindi with English words mixed in) all feel native.

Why Sarvam Over OpenAI For Regional

Quick comparison on Hindi voice:

Task	OpenAI Voice	Sarvam Voice
Hindi pronunciation	Decent but American accent leaks	Native Indian
Code-switching	Okay	Excellent
Regional (Tamil, Telugu)	Weak	Native
Latency from India	400-600ms	150-300ms
Pricing	$0.03/min	~Rs 1.50/min

For Indian consumer-facing use cases, Sarvam wins on authenticity.

Setup — API Access

Sign up at sarvam.ai
Dashboard → API Keys → generate
Free tier: 1,000 minutes/month
Paid: ~Rs 1.50/min (voice)

Environment variable:

export SARVAM_API_KEY="sk-sarvam-..."

Use Case: Banking Balance Inquiry Voice Bot

An Indian customer calls, speaks in Hindi, and wants their balance.

Architecture

[Phone (Exotel/Twilio)]
         ↓
  [Your webhook (FastAPI)]
         ↓
  [Sarvam STT: audio → Hindi text]
         ↓
  [Intent classifier (Sarvam-30B)]
         ↓
  [Banking API (mock)]
         ↓
  [Sarvam TTS: response text → Hindi audio]
         ↓
  [Response to phone]

Step 1 — STT (Speech-To-Text)

import requests

def hindi_stt(audio_file_path: str) -> str:
    with open(audio_file_path, "rb") as f:
        files = {"file": f}
        response = requests.post(
            "https://api.sarvam.ai/speech-to-text",
            headers={"API-Subscription-Key": SARVAM_KEY},
            files=files,
            data={"language_code": "hi-IN", "model": "saarika:v2"}
        )
    return response.json()["transcript"]

# Example:
# Input audio: "Mera balance kya hai?"
# Output text: "मेरा balance क्या है?"

Note: Sarvam handles code-switching (Hindi + English) automatically.

Step 2 — Intent Classification

def classify_intent(text: str) -> dict:
    response = requests.post(
        "https://api.sarvam.ai/chat/completions",
        headers={"Authorization": f"Bearer {SARVAM_KEY}"},
        json={
            "model": "sarvam-30b",
            "messages": [
                {"role": "system", "content": "You are a banking assistant. Classify user queries into intents: balance, transfer, statement, help."},
                {"role": "user", "content": text}
            ],
            "response_format": {"type": "json_object"}
        }
    )
    return response.json()

Step 3 — Banking Action (Mock)

def get_balance(user_id: str) -> dict:
    # In production: call the actual banking API
    return {"balance": 12500, "currency": "INR", "account": "XXXX1234"}

Step 4 — Response Generation

def generate_response(intent: dict, banking_data: dict) -> str:
    response = requests.post(
        "https://api.sarvam.ai/chat/completions",
        headers={"Authorization": f"Bearer {SARVAM_KEY}"},
        json={
            "model": "sarvam-30b",
            "messages": [
                {
                    "role": "system",
                    "content": "You are a polite Hindi banking assistant. Respond in 2-3 sentences. Use formal 'aap' form. Amounts in rupees with 'rupay'."
                },
                {
                    "role": "user",
                    "content": f"User asked: {intent}. Banking data: {banking_data}. Respond in Hindi."
                }
            ]
        }
    )
    return response.json()["choices"][0]["message"]["content"]

Step 5 — TTS (Text-To-Speech)

def hindi_tts(text: str) -> bytes:
    response = requests.post(
        "https://api.sarvam.ai/text-to-speech",
        headers={"API-Subscription-Key": SARVAM_KEY},
        json={
            "text": text,
            "target_language_code": "hi-IN",
            "speaker": "meera",  # Indian female voice
            "pitch": 0,
            "pace": 1.0,
            "loudness": 1.5,
            "speech_sample_rate": 22050,
            "enable_preprocessing": True,
            "model": "bulbul:v2"
        }
    )
    return response.json()["audios"][0]  # base64 audio

Full Pipeline

def handle_call(audio_input_path: str) -> bytes:
    user_text = hindi_stt(audio_input_path)
    intent = classify_intent(user_text)
    if intent["type"] == "balance":
        data = get_balance(user_id="USER_001")
    # ... handle other intents
    response_text = generate_response(intent, data)
    audio_response = hindi_tts(response_text)
    return audio_response

Phone Integration (Exotel)

Exotel is the leading Indian telephony provider. You can connect a webhook:

# Exotel flow
- Record user audio (up to 30 sec)
- POST to your webhook with the audio URL
- Your webhook downloads audio, calls handle_call()
- Returns the response audio URL
- Exotel plays the response to the caller

Typical round-trip: 3-5 seconds. Acceptable for a banking use case.

Cost Analysis (1000 calls/day)

Average call: 2 minutes
STT: 2 min × 1000 = 2000 min × Rs 0.50/min = Rs 1,000
TTS: 1 min × 1000 = 1000 min × Rs 1.00/min = Rs 1,000
LLM: 2 API calls × 1000 = 2,000 × Rs 0.10 = Rs 200
Exotel: Rs 0.50/call × 1000 = Rs 500

Total: Rs 2,700/day (~Rs 81,000/month for 30k calls)

Versus a human call centre: Rs 15-20/call = Rs 15,000-20,000/day. AI is 6-7x cheaper.

Regional Languages

The same code works for:

Tamil: language_code: "ta-IN"
Telugu: te-IN
Bengali: bn-IN
Marathi: mr-IN
Gujarati: gu-IN
Kannada: kn-IN
Malayalam: ml-IN
Punjabi: pa-IN
Odia: or-IN

All 22 scheduled languages are supported.

Common Pitfalls

Numbers spoken vs written: "12,500" reads as "bara hazar paanch sau" — Sarvam TTS handles this if enable_preprocessing: true
Long text latency: break responses into <200 chars for faster TTS
Accent drift: test with real users — some regional speakers may still hit recognition errors
Silence detection: include VAD in your phone integration

Beyond Banking

Government helplines (PMKVY, Ayushman Bharat)
Telco customer support (Jio, Airtel)
Agri-tech advisory (crop prices in regional languages)
Healthcare triage

The Sarvam ecosystem is growing fast. Their recent $350M raise should bring more capabilities soon.

#Sarvam AI#Voice AI#Hindi#Banking#API

More Automation

⚡Automationadvanced

Cut AI API Costs 98% With DeepSeek V3.2 — A Production Migration Guide

Take advantage of DeepSeek V3.2's disruptive pricing. Production setup, failover to Claude/GPT, cost comparison with real numbers. OpenRouter setup included.

Apr 5, 2026·9 min read

⚡Automationadvanced

Running Gemma 4 Locally With Ollama — Setup Guide For Indian Devs

Local inference setup for Gemma 4 with Ollama on Mac, Windows, and Linux. Hardware requirements, performance benchmarks, and use cases where local beats cloud.

Apr 4, 2026·8 min read