India’s Sovereign AI Gamble
Building Foundations Beyond Silicon
Valley
R Kannan
As the global race for Artificial Intelligence enters a
high-stakes infrastructure phase, India is pivoting away from mere adoption
toward a "sovereign capability" model. A new white paper from the
Office of the Principal Scientific Adviser reveals a roadmap designed to break
structural dependence on foreign models by building a domestic full-stack AI
ecosystem. This strategy is driven by the recognition that relying solely on
external models risks the under-representation of Indian languages and cultural
contexts, which could cause biases to cascade across all downstream
applications.
The Infrastructure Backbone
At the heart of this strategy is the IndiaAI Mission,
backed by a ₹10,371.92 crore investment. Unlike the private-heavy
approach of the West, India is treating AI as public infrastructure. The
government's primary vehicle, the IndiaAI Compute Portal, has already
onboarded over 38,000 GPUs to provide subsidized
"compute-as-a-service" to 114 researchers, 47 startups, and 58
government entities as of early 2026. Complementing this is AIKosh, a
unified national platform hosting over 10,021 datasets and 279 AI
models to reduce duplication and improve training quality.
The Shift to "Linguistic Inclusion"
The report argues that developing indigenous foundation
models is a strategic priority to strengthen technological autonomy amid a
globally competitive ecosystem. To counter the limitations of foreign systems,
the government is funding a diverse portfolio of models:
- Frontier
Scale: Projects
like Sarvam-105B and Soket AI’s 120B parameter "Project
EKA" are being trained from scratch on Indic datasets to maximize
national capability.
- Efficiency
First: There is
a strategic emphasis on Small Language Models (SLMs) like Tech
Mahindra’s Project Indus (8B) and Zoho's Zia LLM, which
focus on dialect-heavy regions and edge deployment for MSMEs.
- Domain
Mastery: New
specialized systems are emerging, such as Vaidya 2.0 for medical
reasoning and BrahmAI for scientific computing and industrial
innovation.
- Multimodal
Voices: The BharatGen
initiative, led by IIT Bombay, is releasing models like Shrutam for
speech and Patram for document comprehension across all 22
scheduled Indian languages.
A "Middle Path" for Governance
India is also carving out a unique regulatory identity
through the India AI Governance Guidelines (2025), which emphasize
accountability across the entire value chain. While the EU leans toward rigid
mandates, India is proposing a "hybrid model" for Intellectual
Property. This framework would grant AI developers a "blanket
license" to use lawfully accessed data for training, with royalties
becoming payable to creators only upon commercialization of the AI tools.
Furthermore, as the Digital Personal Data Protection
(DPDP) Act mandates strict safeguards for personal data in training sets,
the government is introducing formal requirements for "synthetically
generated information". This includes mandatory labelling and embedding
metadata to enhance transparency, while requiring "Significant Social
Media Intermediaries" to validate user declarations regarding the
authenticity of AI-generated content.
Objective Evaluation
To move beyond high-level principles, India is
institutionalizing evaluation through the Bhashini ecosystem and the Bureau
of Indian Standards (BIS). New benchmarks like Indic-Bias and MILU
(covering 42 subjects in 11 languages) ensure that models are tested against
Indian social identities and regional examination standards rather than just
English-centric metrics. By tethering massive public compute power to a bespoke
legal and evaluation framework, India is attempting to ensure that the
"intelligence" of its future economy is homegrown, inclusive, and
culturally aligned.