Foundation models, tokens, context windows, agents, RAG, multimodal — the vocabulary of modern AI can feel like a wall of jargon. This guide breaks down what actually matters, in plain language, so you can understand the tools you're being sold.
A few years ago, "AI" meant a narrow program that did one job — flag spam, recommend a film, recognize a face. Modern AI is different. Today the phrase usually points to a handful of very large, very general systems that can write, code, analyze, and converse across almost any subject. Understanding a few core ideas is enough to make sense of nearly every AI tool on the market.
At the center of modern AI are foundation models — systems trained on enormous amounts of text, code, images, and audio. Because they've absorbed so much, one model can handle many tasks instead of being built for a single purpose. When people say "the AI wrote this" or "the AI fixed my code," a foundation model is usually the engine underneath.
Crucially, a model is not the same as a product. The model is the engine; the app you use is the car built around it. This is why two very different-feeling tools can run on similar underlying technology.
AI models don't read words exactly the way we do. They break text into tokens — small chunks, often pieces of words. This matters for two practical reasons: pricing is frequently measured per token, and every model has a limit on how many tokens it can consider at once.
That limit is the context window: the amount of text the model can "hold in mind" in a single conversation. A larger context window means you can feed it a long document, a whole codebase, or a lengthy chat history and have it reason over all of it. When a tool "forgets" what you said earlier, you've usually run past its context window.
A simple analogy. Think of the context window as the model's desk. A bigger desk lets you spread out more papers at once. But if you pile on more than fits, the papers at the edges fall off — and the model stops "seeing" the earliest parts of your conversation.
Early chat AI handled only text. Multimodal models work across formats — reading images, listening to audio, and sometimes producing pictures or speech. In practice this means you can show a model a screenshot and ask about it, or hand it a chart and get an explanation. When choosing a tool, it's worth checking which formats it genuinely supports, not just which it advertises.
A plain chatbot answers your question. An agent goes further: it can break a goal into steps and take actions — searching the web, running code, editing files, or calling other software — with less step-by-step direction from you. Instead of "tell me how to do X," an agent tries to do X.
Agents are powerful but less predictable. They can complete impressive multi-step tasks and also confidently go wrong, which is why the useful ones keep a human in the loop to review what they did.
Foundation models only know what they learned during training, so they can be out of date or simply wrong about specifics. Retrieval-augmented generation (RAG) fixes this by looking up relevant information — from your documents, a database, or the web — and handing it to the model before it answers. The result is grounded in real sources rather than the model's memory.
If you've used an AI tool that answers questions about your files or cites its sources, you've likely used RAG. It's one of the most reliable ways to reduce confident-but-wrong answers.
You don't need to track every model release to choose well. A few plain-language principles go a long way:
Modern AI is a small number of general foundation models, wrapped in products, that read text as tokens within a limited context window, increasingly work across formats (multimodal), can take actions (agents), and answer more reliably when given the right facts (RAG). Learn those ideas and the marketing gets a lot easier to see through.
Modern AI usually refers to large foundation models — systems trained on huge amounts of text, code, images, and audio that can generate and reason across many tasks. Instead of one narrow program per job, a single model powers writing, coding, analysis, and more, often wrapped in tools and agents.
A model is the underlying engine that generates text, code, or images. An AI tool is the product built around it — the interface, features, and workflow. Many tools can run on the same model, which is why two apps can feel very different.
Start from the task, not the hype. Decide what you want done, try a free tier on real work, and judge it on quality, reliability, and how much editing or review it saves you. The best tool is the one that fits your workflow.
Now that the jargon makes sense, see how the actual tools performed in our independent testing.
Browse all reviews