AI Terms You Nodded Along To, Explained

▼ Summary
– The article defines AGI (Artificial General Intelligence) as AI more capable than the average human at most tasks, with definitions varying between companies like OpenAI and Google DeepMind.
– An AI agent is an autonomous tool that performs multistep tasks like booking tickets or writing code, but its specific meaning and supporting infrastructure are still evolving.
– Chain-of-thought reasoning improves LLM accuracy by breaking problems into intermediate steps, though it takes longer to produce answers.
– Hallucination refers to AI models generating incorrect information due to gaps in training data, posing risks and driving development of specialized models.
– Tokens are the basic units of text processed by LLMs, and token throughput measures how much work an AI system can handle at once, influencing cost and user response speed.
Artificial intelligence is reshaping the world, and along the way, it has created an entirely new vocabulary to describe how it’s doing so. Spend even a short time reading about AI, and you’ll encounter LLMs, RAG, RLHF, and a host of other terms that can leave even seasoned tech professionals feeling out of their depth. This glossary aims to change that. We update it frequently as the field evolves, so consider it a living document, much like the AI systems it covers.
Artificial general intelligence, or AGI, remains a fuzzy concept. Broadly, it refers to AI that surpasses the average human in many, if not most, tasks. OpenAI CEO Sam Altman once described AGI as the “equivalent of a median human that you could hire as a co-worker.” OpenAI’s own charter defines it as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind takes a slightly different view, seeing AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Don’t worry. Even the experts leading AI research are still debating the exact definition.
An AI agent is a tool that uses AI technologies to handle a series of tasks on your behalf, going beyond what a basic AI chatbot can do. This could include filing expenses, booking tickets or a restaurant table, or even writing and maintaining code. However, as we have noted before, this emerging space is full of moving parts, so the term “AI agent” can mean different things to different people. The necessary infrastructure to fully deliver on its promised capabilities is still being built. Still, the core idea is an autonomous system that might draw on multiple AI systems to complete multistep tasks.
Think of API endpoints as hidden “buttons” on the back of a piece of software. Other programs can press these buttons to make the software do things. Developers use these interfaces to build integrations, for example, allowing one application to pull data from another or enabling an AI agent to control third-party services directly without a human manually operating each interface. Most smart home devices and connected platforms have these hidden buttons available, even if ordinary users never see them. As AI agents become more capable, they are increasingly able to find and use these endpoints on their own, opening up powerful and sometimes unexpected possibilities for automation.
Given a simple question, a human brain can answer without much thought, like “which animal is taller, a giraffe or a cat?” But in many cases, you need a pen and paper to work through the intermediate steps. For example, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might write down a simple equation to find the answer (20 chickens and 20 cows). In the AI world, chain-of-thought reasoning for large language models means breaking a problem into smaller, intermediate steps to improve the quality of the final result. This approach usually takes longer to produce an answer, but the answer is more likely to be correct, especially in logic or coding tasks. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking through reinforcement learning.
This is a more specific concept than an “AI agent,” which is a program that can take actions on its own, step by step, to complete a goal. A coding agent is a specialized version applied to software development. Rather than just suggesting code for a human to review and paste, a coding agent can write, test, and debug code autonomously. It handles the kind of iterative, trial-and-error work that typically consumes a developer’s day. These agents can operate across entire codebases, spotting bugs, running tests, and pushing fixes with minimal human oversight. Think of it like hiring a very fast intern who never sleeps and never loses focus. However, as with any intern, a human still needs to review the work.
Although somewhat of a multivalent term, compute generally refers to the vital computational power that allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often a shorthand for the kinds of hardware that provide the computational power, things like GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry.
A subset of self-improving machine learning, deep learning involves AI algorithms designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain. Deep learning AI models can identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results, often millions or more. They also typically take longer to train compared to simpler machine learning algorithms, so development costs tend to be higher.
Diffusion is the technology at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data, such as photos, songs, and text, by adding noise until nothing is left. In physics, diffusion is spontaneous and irreversible; sugar diffused in coffee cannot be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise.
Distillation is a technique used to extract knowledge from a large AI model using a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to check for accuracy. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior. Distillation can create a smaller, more efficient model based on a larger one with minimal loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4. While all AI companies use distillation internally, some may have also used it to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI APIs and chat assistants.
This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focus of its training. This is typically done by feeding in new, specialized data. Many AI startups take large language models as a starting point to build a commercial product. They aim to increase utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.
A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI for producing realistic data, including deepfake tools. GANs involve a pair of neural networks. One network draws on its training data to generate an output, which is passed to the other model to evaluate. The two models are programmed to try to outdo each other. The generator tries to get its output past the discriminator, while the discriminator works to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without additional human intervention. GANs work best for narrower applications, such as producing realistic photos or videos, rather than for general-purpose AI.
Hallucination is the AI industry’s preferred term for AI models making things up, literally generating information that is incorrect. Obviously, this is a huge problem for AI quality. Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks, with potentially dangerous consequences, such as a health query that returns harmful medical advice. The problem of AIs fabricating information is thought to arise from gaps in training data. Hallucinations are contributing to a push toward more specialized and vertical AI models, domain-specific AIs that require narrower expertise, as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.
Inference is the process of running an AI model. It involves setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference cannot happen without training. A model must learn patterns in a set of data before it can effectively extrapolate from that training data. Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. However, not all of them can run models equally well. Very large models would take ages to make predictions on a laptop versus a cloud server with high-end AI chips.
Large language models, or LLMs, are the AI models used by popular AI assistants such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different tools, such as web browsing or code interpreters. LLMs are deep neural networks made of billions of numerical parameters, or weights, that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words. These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt.
Memory cache refers to an important process that boosts inference, which is the process by which AI generates a response to a user’s query. In essence, caching is an optimization technique designed to make inference more efficient. AI is driven by high-octane mathematical calculations, and every time those calculations are made, they use more power. Caching is designed to cut down on the number of calculations a model might need to run by saving particular calculations for future user queries and operations. There are different kinds of memory caching, although one of the more well-known is KV, or key value, caching. KV caching works in transformer-based models and increases efficiency, driving faster results by reducing the time and algorithmic labor needed to generate answers.
A neural network refers to the multi-layered algorithmic structure that underpins deep learning and, more broadly, the whole boom in generative AI tools following the emergence of large language models. Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs) via the video game industry that truly unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier eras, enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery.
Open source refers to software or, increasingly, AI models where the underlying code is made publicly available for anyone to use, inspect, or modify. In the AI world, Meta’s Llama family of models is a prominent example. Linux is the famous historical parallel in operating systems. Open source approaches allow researchers, developers, and companies around the world to build on top of one another’s work, accelerating progress and enabling independent safety audits that closed systems cannot easily provide. Closed source means the code is private. You can use the product but not see how it works, as is the case with OpenAI’s GPT models. This distinction has become one of the defining debates in the AI industry.
Parallelization means doing many things at the same time instead of one after another, like having 10 employees working on different parts of a project simultaneously instead of one employee doing everything sequentially. In AI, parallelization is fundamental to both training and inference. Modern GPUs are specifically designed to perform thousands of calculations in parallel, which is a big reason they became the hardware backbone of the industry. As AI systems grow more complex and models grow larger, the ability to parallelize work across many chips and many machines has become one of the most important factors in determining how quickly and cost-effectively models can be built and deployed. Research into better parallelization strategies is now a field of study in its own right.
RAMageddon is the fun new term for a not-so-fun trend sweeping the tech industry: an ever-increasing shortage of random access memory, or RAM chips, which power pretty much all the tech products we use daily. As the AI industry has blossomed, the biggest tech companies and AI labs, all vying to have the most powerful and efficient AI, are buying so much RAM to power their data centers that there is not much left for the rest of us. This supply bottleneck means that what is left is getting more and more expensive. This affects industries like gaming, where major companies have had to raise prices on consoles because it is harder to find memory chips for their devices. It also impacts consumer electronics, where the memory shortage could cause the biggest dip in smartphone shipments in more than a decade, and general enterprise computing, because those companies cannot get enough RAM for their own data centers. The surge in prices is only expected to stop after the shortage ends, but unfortunately, there is not much sign that will happen anytime soon.
Like AGI, recursive self-improvement is a threshold for how smart AI can get and how little it may rely on humans. In the RSI scenario, AI models start improving themselves without human intervention, leading to a huge acceleration in capabilities and autonomy. In some tellings, this would be a cataclysmic moment akin to the singularity, a moment when AI models become immune to outside intervention. But RSI also describes a basic capability, can an AI model design its own successor, which makes it much easier for engineers to try to build it. A number of recent AI startups have set out to build recursively self-improving models, but most of them dismiss the apocalyptic implications, presenting RSI as simply the next frontier for research.
Reinforcement learning is a way of training AI where a system learns by trying things and receiving rewards for correct answers. It is like training a pet with treats, except the “pet” is a neural network and the “treat” is a mathematical signal indicating success. Unlike supervised learning, where a model is trained on a fixed dataset of labeled examples, reinforcement learning lets a model explore its environment, take actions, and continuously update its behavior based on the feedback it receives. This approach has proven especially powerful for training AI to play games, control robots, and, more recently, sharpen the reasoning ability of large language models. Techniques like reinforcement learning from human feedback, or RLHF, are now central to how leading AI labs fine-tune their models to be more helpful, accurate, and safe.
When it comes to human-machine communication, there are obvious challenges. People communicate using human language, while AI programs execute tasks through complex algorithmic processes informed by data. Tokens bridge that gap. They are the basic building blocks of human-AI communication, representing discrete segments of data that have been processed or produced by an LLM. They are created through a process called tokenization, which breaks down raw text into bite-sized units a language model can digest, similar to how a compiler translates human language into binary code a computer can understand. In enterprise settings, tokens also determine cost. Most AI companies charge for LLM usage on a per-token basis, meaning the more a business uses, the more it pays. Tokens are the small chunks of text, often parts of words rather than whole ones, that AI language models break language into before processing them. They are roughly analogous to “words” for the purposes of understanding AI workloads.
Throughput refers to how much can be processed in a given period of time, so token throughput is essentially a measure of how much AI work a system can handle at once. High token throughput is a key goal for AI infrastructure teams, since it determines how many users a model can serve simultaneously and how quickly each of them receives a response. AI researcher Andrej Karpathy has described feeling anxious when his AI subscriptions sit idle, echoing the feeling he had as a grad student when expensive computer hardware was not being fully utilized. This sentiment captures why maximizing token throughput has become something of an obsession in the field.
Developing machine learning AIs involves a process known as training. In simple terms, this means feeding data into the model so it can learn from patterns and generate useful outputs. Essentially, it is the process of the system responding to characteristics in the data that enables it to adapt outputs toward a sought-for goal, whether that is identifying images of cats or producing a haiku on demand. Training can be expensive because it requires lots of inputs, and the volumes required have been trending upwards. This is why hybrid approaches, such as fine-tuning a rules-based AI with targeted data, can help manage costs without starting entirely from scratch.
A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task. Transfer learning allows knowledge gained in previous training cycles to be reapplied. It can drive efficiency savings by shortcutting model development. It can also be useful when data for the task the model is being developed for is somewhat limited. However, it is important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focus.
Validation loss is a number that tells you how well an AI model is learning during training, and lower is better. Researchers track it closely as a kind of real-time report card, using it to decide when to stop training, when to adjust hyperparameters, or whether to investigate a potential problem. One of the key concerns it helps flag is overfitting, a condition in which a model memorizes its training data rather than truly learning patterns it can generalize to new situations. Think of it as the difference between a student who genuinely understands the material and one who simply memorized last year’s exam. Validation loss helps reveal which one your model is becoming.
Weights are core to AI training, as they determine how much importance, or weight, is given to different features, or input variables, in the data used for training the system. This shapes the AI model’s output. Put another way, weights are numerical parameters that define what is most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target. For example, an AI model for predicting housing prices that is trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, and whether it has parking or a garage. Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset.
This article is updated regularly with new information.
(Source: TechCrunch)




