
Sarvam Vision Sets New OCR Benchmarks

Sarvam Vision, the company’s OCR-focused AI model, has delivered standout results on internationally recognised benchmarks. According to data shared by Sarvam AI co-founder Pratyush Kumar, the model achieved an accuracy score of 84.3 percent on the olmOCR-Bench, surpassing Google Gemini 3 Pro, DeepSeek OCR v2, and other popular solutions. ChatGPT ranked considerably lower on the same benchmark.
The model also excelled on OmniDocBench v1.5, a test designed to evaluate how effectively AI systems can read and interpret real-world documents. Sarvam Vision recorded an overall score of 93.28 percent, showing particular strength in handling complex layouts, dense tables, and mathematical formulas—areas that traditionally challenge OCR systems.
Global Praise and Changing Perceptions
The impressive results have triggered widespread discussion among global AI researchers, developers, and investors. Sarvam AI, once questioned for its focus on Indic-language models, is now receiving recognition for addressing a gap largely ignored by major global AI labs.
Tech commentator Deedy Das publicly acknowledged underestimating the startup’s approach. In a post on X, he praised Sarvam’s OCR and speech models, noting that they offer unmatched value for Indian languages and are priced competitively for real-world deployment.
Bulbul V3 Strengthens AI Voice Capabilities
Alongside its OCR breakthrough, Sarvam AI has introduced Bulbul V3, an advanced text-to-speech model tailored for Indian languages. Designed to produce natural and expressive voices, Bulbul V3 aims to compete with global leaders such as ElevenLabs while focusing on India-specific use cases.
Currently, Bulbul V3 supports over 35 voices across 11 Indian languages, with plans underway to expand coverage to all 22 scheduled languages. The company says the model prioritises accuracy, emotional nuance, and speech stability, making it suitable for education, media, governance, and customer service applications.
Industry Adoption and Real-World Impact
Bulbul V3 has already found acceptance among Indian startups and developers. Pratik Desai, founder of KissanAI, described Bulbul as his preferred text-to-speech solution for Indic use cases, citing continuous improvement and cost efficiency compared to international alternatives.
Users testing Sarvam’s tools have also shared positive feedback, highlighting the models’ accuracy and natural output. Such responses underline the growing demand for AI systems that understand India’s linguistic diversity at a foundational level.
India’s Push Towards Sovereign AI
Sarvam AI’s rise aligns with India’s broader push toward digital sovereignty and indigenous technology development. Policymakers and industry leaders have increasingly stressed the need for AI systems trained on Indian data, languages, and societal contexts. Official initiatives and research efforts supported by the Government of India aim to reduce dependence on foreign platforms while encouraging local innovation.
A Turning Point for Indian AI
The success of Sarvam Vision and Bulbul V3 represents more than just strong benchmark scores. It signals a turning point for Indian AI development, proving that globally competitive models can be built within the country to serve local needs at scale.
As Sarvam AI continues to refine its models and expand language coverage, experts believe its work could set new standards for inclusive, region-specific artificial intelligence, strengthening India’s position in the global AI landscape.
