8 Key Questions Revealing NVIDIA’s Vision for AI’s Future

The following FAQ aims to provide a deeper understanding of NVIDIA’s vision for the future, as outlined by the company’s CEO, Jensen Huang, during his keynote speech at the NVIDIA GTC 2025 conference.
1. What is the significance of “tokens” in the context of AI, according to Jensen Huang’s keynote?
Tokens are described as the fundamental building blocks of AI, acting as a new frontier for intelligence creation. They are the units that AI models generate, and their applications are vast and transformative. Tokens can convert images into scientific data for exploring alien atmospheres, turn raw data into predictive foresight, decode the laws of physics, identify diseases early, unravel the language of life, and connect data points to protect ecosystems. They also enable robots to interact more naturally and bring more utility to human life. The generation of tokens is presented as a new kind of “factory” for intelligence.
2. What is “agentic AI” and how does it represent a fundamental advance in the field?
Agentic AI is a major breakthrough characterized by AI that possesses agency. This means it can perceive and understand the context of a situation, reason about how to answer or solve a problem through step-by-step processes (like Chain of Thought and consistency checking), and plan and take action using tools. Unlike previous generative AI that primarily translated between modalities or retrieved and augmented information, agentic AI can reason and generate solutions based on understanding and planning, fundamentally changing how computing is done.
3. What are the three fundamental matters involved in enabling each wave and phase of AI, as highlighted in the keynote?
The three core challenges are: * Solving the data problem: AI is data-driven and requires vast amounts of digital experience to learn and gain knowledge. The question is how to acquire and utilize this data effectively. * Solving the training problem without human-in-the-loop: To achieve superhuman learning rates and scale, AI needs to be trained without being limited by the speed and scale of human demonstration. * Scaling the model with resources: Finding algorithms where increased computational resources directly lead to a smarter AI, governed by resilient and hyper-accelerated scaling laws.
4. Why is the computational demand for AI, particularly reasoning-based AI, significantly higher than previously anticipated?
Agentic AI, with its ability to reason step by step, generates substantially more tokens compared to previous one-shot models. For each step of reasoning, a sequence of tokens is produced, and the output of one step becomes the input for the next. This leads to a hundredfold or more increase in token generation for a single task. Additionally, to maintain responsiveness with more complex models generating more tokens, the speed of computation (flops) needs to increase proportionally, resulting in a dramatic overall rise in computational requirements for both training and inference.
5. How has NVIDIA addressed the challenge of scaling up AI infrastructure, and what is the significance of MVLink 72 and liquid cooling?
NVIDIA has moved from an integrated MVLink system to a disaggregated architecture with the MVLink switch at the center, enabling every GPU to communicate with every other GPU at full bandwidth simultaneously. This disaggregated MVLink, now at generation 72, allows for massive scale-up within a single system. Coupled with full liquid cooling, which significantly improves energy efficiency and allows for denser packing of compute nodes, NVIDIA can now achieve one exaflop of computing power in a single rack with the Blackwell architecture. This scale-up is crucial before scaling out to larger data centers.
6. What is NVIDIA Dynamo, and what role does it play in the operation of AI factories?
NVIDIA Dynamo is described as the operating system of an AI factory. It is an open-source software layer that manages the complex operations of running large language models for inference on massive GPU clusters like the MVLink 72-based Blackwell systems. Dynamo handles tasks such as workload management, pipeline and tensor parallelism, expert parallelism, in-flight batching, and the routing and management of the KV cache across the distributed GPUs. Its goal is to optimize the AI factory for both high throughput (tokens per second for the entire factory) and low latency (fast response time for individual users), adapting to different workloads by dynamically allocating resources for prefill (context processing) and decode (token generation).
7. What are NVIDIA’s key product roadmap announcements for AI infrastructure beyond Blackwell, and what are their anticipated benefits?
NVIDIA has outlined an annual roadmap for AI infrastructure: * Blackwell Ultra (Second half of 2025): An upgrade to Blackwell with 1.5x more flops, increased memory and bandwidth, and a new instruction for attention. It offers an incremental performance boost within the same architecture. * Vera Rubin (Second half of 2026): A completely new architecture featuring a new CPU (2x performance), new GPU, CX9 networking, MVLink 6, and HBM4 memory. It aims for a significant leap in performance while managing infrastructure risks. * Rubin Ultra (Second half of 2027): An extreme scale-up architecture with MVLink 576, achieving 15 exaflops and 4,600 TB/s scale-up bandwidth in a 600kW rack. It promises a dramatic increase in computational power and efficiency. This roadmap provides a multi-year vision for customers to plan their AI infrastructure investments, anticipating substantial gains in performance and reductions in TCO over time.
8. How is NVIDIA approaching the robotics industry, and what key technologies and collaborations were announced?
NVIDIA sees robotics as a major future industry driven by physical AI. Their approach involves a continuous loop of simulation, training, testing, and real-world experience enabled by three types of NVIDIA computers. Key technologies include: * Omniverse and Cosmos: Omniverse is the operating system for physical AI, and Cosmos is a generative model conditioned by Omniverse to create massive amounts of diverse synthetic data for robot training. * Isaac Lab: Used for post-training robot policies with augmented data through imitation and reinforcement learning. * Omniverse for Software and Hardware in the Loop Testing: Simulating robot policies in digital twins with realistic physics and sensor simulation. * Mega: An Omniverse blueprint for testing fleets of robots working together in simulated environments. A significant announcement was NVIDIA Isaac Groot N1, a generalist foundation model for humanoid robots, featuring a dual “fast and slow thinking” system. Groot N1 has been open-sourced. Furthermore, a collaboration with DeepMind and Disney Research, called Newton, was announced, introducing a high-performance, GPU-accelerated physics engine designed for fine-grained rigid and soft body simulation, essential for training robots with tactile feedback and fine motor skills.