Vietnamese Language Models and AI Sovereignty: PhoGPT, VinaLLaMA, KiLM, and the Race to Master the Mother Tongue

An overview of large language models (LLMs) for Vietnamese — PhoGPT, VinaLLaMA, VNG's KiLM, models from Viettel, GreenMind — and the role of domestic models, Vietnamese data, and digital sovereignty in the AI era.

Updated 2026-06-13 ·English

Vietnamese Language Models and AI Sovereignty

Vietnamese is spoken by nearly 100 million people, yet in the world of AI, it falls into the category of a “low-resource language” compared to English or Chinese. Developing large language models (LLMs) that deeply understand Vietnamese is not just a technical challenge; it’s also a matter of digital sovereignty — the right to national autonomy over data, culture, and knowledge infrastructure. This article reviews prominent Vietnamese models and explains why domestic models are crucial.

1. Why Does Vietnamese Need Its Own Models?

International models like GPT, Gemini, and Llama all support Vietnamese, but Vietnamese typically constitutes only a small fraction of their training data. The consequences include:

Limited cultural context understanding: Vietnamese idioms, history, laws, and customs are easily misunderstood or hallucinated.
Complex tones and orthography: Vietnamese has diacritics and tone marks; models with insufficient data can easily misplace marks, leading to changes in meaning.
Reliance on foreign infrastructure: Using third-party APIs raises questions about data privacy, cost, and long-term stability.

These factors are the impetus for Vietnam to develop “made in Vietnam” LLMs.

2. Prominent Vietnamese Language Models

PhoGPT (VinAI Research)

PhoGPT-4B is a monolingual Vietnamese language model, pre-trained from scratch on a Vietnamese corpus of approximately 102 billion tokens, with a context length of 8192. PhoGPT was released by VinAI as an open-source research project, marking one of the first systematic efforts to build a Vietnamese LLM from the ground up. (Note: VinAI’s generative AI division was acquired by Qualcomm in April 2025.)

VinaLLaMA (Independent Research Group)

VinaLLaMA is an open-weight model built upon LLaMA-2, further trained with an additional 800 billion Vietnamese tokens. The VinaLLaMA-7B-chat version, trained on 1 million high-quality synthetic samples, achieved leading results on benchmarks such as VLSP, VMLU, and the Vietnamese version of Vicuna Benchmark. VinaLLaMA’s strengths lie in its proficiency in Vietnamese and its understanding of Vietnamese culture.

KiLM (VNG / Zalo)

VNG developed KiLM from scratch, placing Vietnam among the Southeast Asian nations possessing their own LLMs. The KiLM 7B-parameter model was launched in late 2023 at the Zalo AI Summit; by late 2024, the 13B-parameter version was reported to surpass several international models (GPT-4, Gemma2-9B, Phi-3-small) in Vietnamese processing capabilities within the VMLU evaluation framework, trailing only Meta’s Llama-70B. KiLM serves as the foundation for Zalo’s Kiki voice assistant.

Models from Viettel and GreenMind

Viettel AI developed VT-Super-120B-A12B (~120 billion parameters), which leads its segment in accuracy, and the Llama 3 ViettelSolution 8B model, which uses data cleaned with NVIDIA NeMo Curator. GreenNode’s GreenMind-Medium-14B-R1 became the first open-source Vietnamese reasoning LLM integrated with NVIDIA NIM, capable of running on a single NVIDIA H100 GPU — suitable for enterprise assistants, chatbots, and Vietnamese document retrieval.

ViGPT (VinBigData)

VinBigData’s ViGPT-1.6B-v1 model is among the notable Vietnamese models, aimed at virtual assistant applications and language processing within the Vingroup ecosystem.

3. The Role of International Models

Global LLMs remain important for Vietnamese users: GPT (OpenAI) and Gemini (Google) offer relatively good Vietnamese support thanks to their massive data scale, serving as popular tools for daily tasks. Meta’s open-source Llama model family has become a foundational platform for many Vietnamese teams to fine-tune rather than training from scratch — significantly saving costs. Vietnam’s practical strategy is therefore a hybrid approach: leveraging international open models as a base, then fine-tuning them with local data and knowledge.

4. Vietnamese Data — The “Oil” of Domestic AI

The quality of LLMs directly depends on the quality of their data. This is both a bottleneck and a strategic advantage:

Scarcity of large-scale clean data: High-quality digitized Vietnamese texts (books, newspapers, legal documents, conversations) are still scarce compared to English.
Data cleaning tools: Viettel’s use of NVIDIA NeMo Curator to curate Vietnamese data indicates that data processing is being standardized.
Population-scale data: In 2026, NVIDIA announced the development of a population-scale dataset with FPT — a significant step for national data infrastructure.

Whoever controls high-quality Vietnamese data will have a decisive advantage in model development.

5. AI Sovereignty and Digital Sovereignty

“Sovereign AI” is a central concept in Vietnam’s strategic direction: achieving autonomy over models, data, and computing infrastructure rather than being entirely dependent on foreign entities. In 2026, Vietnam emerged as a focal point in NVIDIA’s sovereign AI strategy, with FPT and Viettel participating. Viettel AI is confirmed to be developing a national legal AI application on open model infrastructure — a prime example of an application requiring absolute data sovereignty.

AI sovereignty carries multi-layered significance: protecting citizen data, preserving Vietnamese cultural and historical values within machine knowledge, and ensuring security for sensitive applications (national defense, law, healthcare). This is why domestic models are not merely a technical choice but a national strategic imperative.

Conclusion

From PhoGPT, VinaLLaMA to KiLM, and models from Viettel and GreenMind, Vietnam has demonstrated its capability to build competitive Vietnamese LLMs independently. The path forward involves consolidating high-quality Vietnamese data, investing in computing infrastructure, and developing high-level research human resources. Mastering the mother tongue in the AI world is equivalent to mastering a part of the nation’s digital sovereignty in the 21st century.

Foundations

Vietnam

Direction

Vietnamese Language Models and AI Sovereignty: PhoGPT, VinaLLaMA, KiLM, and the Race to Master the Mother Tongue

Vietnamese Language Models and AI Sovereignty

1. Why Does Vietnamese Need Its Own Models?

2. Prominent Vietnamese Language Models

PhoGPT (VinAI Research)

VinaLLaMA (Independent Research Group)

KiLM (VNG / Zalo)

Models from Viettel and GreenMind

ViGPT (VinBigData)

3. The Role of International Models

4. Vietnamese Data — The “Oil” of Domestic AI

5. AI Sovereignty and Digital Sovereignty

Conclusion

References

Vietnamese Language Models and AI Sovereignty

1. Why Does Vietnamese Need Its Own Models?

2. Prominent Vietnamese Language Models

PhoGPT (VinAI Research)

VinaLLaMA (Independent Research Group)

KiLM (VNG / Zalo)

Models from Viettel and GreenMind

ViGPT (VinBigData)

3. The Role of International Models

4. Vietnamese Data — The “Oil” of Domestic AI

5. AI Sovereignty and Digital Sovereignty

Conclusion

References

Vietnam's AI Ecosystem 2026: Businesses, Policies, Human Resources, and Investment