Automationadvanced

Building a Hindi Voice Banking Bot With Sarvam AI

Voice AI for 22 Indian languages — IVR, customer support, and regional outreach

AutoKaam Editorial··10 min read
Phone with voice AI interface

Sarvam AI is India's most credible foundational AI for the 22 scheduled Indian languages. OpenAI and Anthropic handle Hindi well, but Sarvam is actually trained for Indic languages — voice quality, accent handling, and code-switching (Hindi with English words mixed in) all feel native.

Why Sarvam Over OpenAI For Regional

Quick comparison on Hindi voice:

Task OpenAI Voice Sarvam Voice
Hindi pronunciation Decent but American accent leaks Native Indian
Code-switching Okay Excellent
Regional (Tamil, Telugu) Weak Native
Latency from India 400-600ms 150-300ms
Pricing $0.03/min ~Rs 1.50/min

For Indian consumer-facing use cases, Sarvam wins on authenticity.

Setup — API Access

  1. Sign up at sarvam.ai
  2. Dashboard → API Keys → generate
  3. Free tier: 1,000 minutes/month
  4. Paid: ~Rs 1.50/min (voice)

Environment variable:

export SARVAM_API_KEY="sk-sarvam-..."

Use Case: Banking Balance Inquiry Voice Bot

An Indian customer calls, speaks in Hindi, and wants their balance.

Architecture

[Phone (Exotel/Twilio)]
         ↓
  [Your webhook (FastAPI)]
         ↓
  [Sarvam STT: audio → Hindi text]
         ↓
  [Intent classifier (Sarvam-30B)]
         ↓
  [Banking API (mock)]
         ↓
  [Sarvam TTS: response text → Hindi audio]
         ↓
  [Response to phone]

Step 1 — STT (Speech-To-Text)

import requests

def hindi_stt(audio_file_path: str) -> str:
    with open(audio_file_path, "rb") as f:
        files = {"file": f}
        response = requests.post(
            "https://api.sarvam.ai/speech-to-text",
            headers={"API-Subscription-Key": SARVAM_KEY},
            files=files,
            data={"language_code": "hi-IN", "model": "saarika:v2"}
        )
    return response.json()["transcript"]

# Example:
# Input audio: "Mera balance kya hai?"
# Output text: "मेरा balance क्या है?"

Note: Sarvam handles code-switching (Hindi + English) automatically.

Step 2 — Intent Classification

def classify_intent(text: str) -> dict:
    response = requests.post(
        "https://api.sarvam.ai/chat/completions",
        headers={"Authorization": f"Bearer {SARVAM_KEY}"},
        json={
            "model": "sarvam-30b",
            "messages": [
                {"role": "system", "content": "You are a banking assistant. Classify user queries into intents: balance, transfer, statement, help."},
                {"role": "user", "content": text}
            ],
            "response_format": {"type": "json_object"}
        }
    )
    return response.json()

Step 3 — Banking Action (Mock)

def get_balance(user_id: str) -> dict:
    # In production: call the actual banking API
    return {"balance": 12500, "currency": "INR", "account": "XXXX1234"}

Step 4 — Response Generation

def generate_response(intent: dict, banking_data: dict) -> str:
    response = requests.post(
        "https://api.sarvam.ai/chat/completions",
        headers={"Authorization": f"Bearer {SARVAM_KEY}"},
        json={
            "model": "sarvam-30b",
            "messages": [
                {
                    "role": "system",
                    "content": "You are a polite Hindi banking assistant. Respond in 2-3 sentences. Use formal 'aap' form. Amounts in rupees with 'rupay'."
                },
                {
                    "role": "user",
                    "content": f"User asked: {intent}. Banking data: {banking_data}. Respond in Hindi."
                }
            ]
        }
    )
    return response.json()["choices"][0]["message"]["content"]

Step 5 — TTS (Text-To-Speech)

def hindi_tts(text: str) -> bytes:
    response = requests.post(
        "https://api.sarvam.ai/text-to-speech",
        headers={"API-Subscription-Key": SARVAM_KEY},
        json={
            "text": text,
            "target_language_code": "hi-IN",
            "speaker": "meera",  # Indian female voice
            "pitch": 0,
            "pace": 1.0,
            "loudness": 1.5,
            "speech_sample_rate": 22050,
            "enable_preprocessing": True,
            "model": "bulbul:v2"
        }
    )
    return response.json()["audios"][0]  # base64 audio

Full Pipeline

def handle_call(audio_input_path: str) -> bytes:
    user_text = hindi_stt(audio_input_path)
    intent = classify_intent(user_text)
    if intent["type"] == "balance":
        data = get_balance(user_id="USER_001")
    # ... handle other intents
    response_text = generate_response(intent, data)
    audio_response = hindi_tts(response_text)
    return audio_response

Phone Integration (Exotel)

Exotel is the leading Indian telephony provider. You can connect a webhook:

# Exotel flow
- Record user audio (up to 30 sec)
- POST to your webhook with the audio URL
- Your webhook downloads audio, calls handle_call()
- Returns the response audio URL
- Exotel plays the response to the caller

Typical round-trip: 3-5 seconds. Acceptable for a banking use case.

Cost Analysis (1000 calls/day)

  • Average call: 2 minutes
  • STT: 2 min × 1000 = 2000 min × Rs 0.50/min = Rs 1,000
  • TTS: 1 min × 1000 = 1000 min × Rs 1.00/min = Rs 1,000
  • LLM: 2 API calls × 1000 = 2,000 × Rs 0.10 = Rs 200
  • Exotel: Rs 0.50/call × 1000 = Rs 500

Total: Rs 2,700/day (~Rs 81,000/month for 30k calls)

Versus a human call centre: Rs 15-20/call = Rs 15,000-20,000/day. AI is 6-7x cheaper.

Regional Languages

The same code works for:

  • Tamil: language_code: "ta-IN"
  • Telugu: te-IN
  • Bengali: bn-IN
  • Marathi: mr-IN
  • Gujarati: gu-IN
  • Kannada: kn-IN
  • Malayalam: ml-IN
  • Punjabi: pa-IN
  • Odia: or-IN

All 22 scheduled languages are supported.

Common Pitfalls

  1. Numbers spoken vs written: "12,500" reads as "bara hazar paanch sau" — Sarvam TTS handles this if enable_preprocessing: true
  2. Long text latency: break responses into <200 chars for faster TTS
  3. Accent drift: test with real users — some regional speakers may still hit recognition errors
  4. Silence detection: include VAD in your phone integration

Beyond Banking

  • Government helplines (PMKVY, Ayushman Bharat)
  • Telco customer support (Jio, Airtel)
  • Agri-tech advisory (crop prices in regional languages)
  • Healthcare triage

The Sarvam ecosystem is growing fast. Their recent $350M raise should bring more capabilities soon.

#Sarvam AI#Voice AI#Hindi#Banking#API