Nvidia Launches Open-Source Nemotron-Nano-9B-v2 AI Model

▼ Summary
– Nvidia released Nemotron-Nano-9B-V2, a small language model (SLM) with 9 billion parameters, designed to fit on a single Nvidia A10 GPU and offering toggleable AI reasoning capabilities.
– The model combines Transformer and Mamba architectures, enabling efficient handling of long sequences with lower memory and compute costs compared to pure Transformer models.
– Nemotron-Nano-9B-V2 supports multiple languages and tasks, including instruction following and code generation, and is available on Hugging Face and Nvidia’s model catalog.
– The model achieves competitive benchmark scores, with features like runtime “thinking budget” management to balance accuracy and latency in applications like customer support.
– Released under Nvidia’s permissive Open Model License, the model is commercially usable with conditions focused on safety, compliance, and attribution, but no usage-based fees.
Nvidia has unveiled its latest open-source AI model, Nemotron-Nano-9B-v2, designed to deliver high performance while maintaining efficiency for enterprise applications. This compact language model stands out with its hybrid architecture and unique reasoning capabilities, offering developers a versatile tool for tasks ranging from instruction following to code generation.
The 9-billion-parameter model represents a streamlined version of Nvidia’s earlier 12B variant, optimized to run efficiently on a single Nvidia A10 GPU. According to Oleksii Kuchiaev, Nvidia’s Director of AI Model Post-Training, this reduction in size was intentional, enabling faster processing and larger batch sizes while maintaining accuracy. Compared to traditional transformer models, Nemotron-Nano-9B-v2 can be up to six times quicker, making it ideal for real-world deployments where speed matters.
Unlike many large language models (LLMs) that rely solely on transformer architecture, Nvidia’s latest offering blends transformer and Mamba-based layers. This hybrid approach leverages selective state space models (SSMs), which excel at handling long sequences without the heavy computational overhead of standard self-attention mechanisms. The result is a model that scales efficiently, processing extended contexts with up to three times higher throughput than conventional designs.
One of the standout features of Nemotron-Nano-9B-v2 is its adjustable reasoning capability. Users can toggle self-checking behavior on or off using simple control tokens like `/think` or `/no_think`. Additionally, developers can set a “thinking budget” to limit the tokens allocated for internal reasoning, a feature that helps balance accuracy with response speed in applications like customer support or autonomous agents.
Benchmark results highlight the model’s competitive edge. In reasoning mode, it achieved 72.1% accuracy on AIME25, 97.8% on MATH500, and 71.1% on LiveCodeBench. Instruction-following and long-context tests also showed strong performance, surpassing comparable models like Qwen3-8B. Nvidia’s data suggests that optimizing the reasoning budget can further enhance efficiency without sacrificing quality.
Training involved a mix of curated, web-sourced, and synthetic datasets, including general text, code, and specialized documents in fields like law and finance. Synthetic reasoning traces generated by larger models were also incorporated to improve complex problem-solving abilities.
Licensed under Nvidia’s Open Model License Agreement, the model is free for commercial use without restrictive scaling clauses. Enterprises can deploy it immediately, provided they adhere to guidelines on safety, redistribution, and compliance. Unlike some open licenses, there are no hidden fees or usage thresholds, just clear requirements for ethical and legal deployment.
With Nemotron-Nano-9B-v2, Nvidia is catering to developers who need a balance of power and efficiency. Available on Hugging Face and Nvidia’s model catalog, this release underscores the company’s commitment to advancing AI tools that prioritize both performance and practicality. By refining hybrid architectures and introducing flexible reasoning controls, Nvidia continues to push the boundaries of what smaller, specialized models can achieve.
(Source: VentureBeat)




