This past year has moved so fast that it’s worth taking a moment for a retrospective.
Looking back at 2025, what was the single major breakthrough that allowed us to have such powerful Large Language Models (LLMs) and agents?
The answer is simple, yet it has fundamentally changed everything: reasoning.
The reasoning revolution: How models learned to ‘think’
Excitement about emergent reasoning capabilities was already bubbling as the year began. In 2025, however, OpenAI did something that changed the game. Instead of relying on clever prompting to coax step-by-step thinking out of a model, they decided to build it in from the ground up. The approach shifted from explicitly teaching a model to predict the next word to teaching it to reason by default before it even speaks.
This was a revolution.
Suddenly, reasoning wasn't an afterthought; it became a given. All the big models that defined 2025 - O3, GPT-5, the Claude family, Gemini 2.5 Pro - came with reasoning ability baked in. In some cases, like with Gemini 2.5 Pro, you can't even disable the reasoning. This capability naturally extended to the open-source world, where the most powerful models now also think by default.
How does this work? How do you "teach" a model to reason? Or even better, how can it learn to reason on its own without explicitly teaching traces of reasoning in the training datasets?
The answer is programmable reinforcement learning. Learning by doing, by task execution, represents a paradigm shift from traditional supervised fine-tuning or Reinforcement Learning from Human Feedback (RLHF), toward a more sophisticated form of reinforcement learning that is more autonomous.
Process
The model is given a context (prompt) and shown the next correct word.
The model is given a problem and the final correct answer, then it must figure out the reasoning path itself.
Data
Requires massive, curated datasets of step-by-step instructions.
Only requires problems and final answers. Data collection is cheaper and easier.
Learning
The model learns a statistical pattern for next-word prediction.
The model learns to generate a long chain of thought. Successful chains that lead to the correct answer are rewarded.
This new method is incredibly powerful. The model learns the "how" by itself. An AI model trained this way achieved a gold medal standard at the International Math Olympiad (IMO) - a feat that requires being in approximately the top 8% of the world’s brightest young mathematicians, depending on the year.
Alexander Wei, a research scientist at OpenAI who competed in the IMO in 2015, highlighted the most significant part of this achievement: the model was not specifically trained for that competition. The reasoning capability emerged out of other tasks.
From models to products: The rise of agents
This newfound reasoning power is the engine behind the explosion of AI agents. An agent, at its core, follows a simple loop:
It observes its environment.
It reasons about what to do next.
Then it acts.
That action changes the environment, and the loop begins again. This is a concept well-established in computational agent theory.
LLMs have always been great at the "observe" part; they are fantastic world models.
With the reasoning breakthrough, they mastered the "reason" part.
The final piece of the puzzle was "act."
By applying the same reinforcement learning techniques used for reasoning to the task of using tools (APIs, web browsers, etc.), we enabled models to act on their environment. The combination of these three abilities - observe, reason, act - is what defines the powerful agents we have today.
What does this mean for 2026? We're already seeing the next evolution in agents through quality-of-life (QoL) improvements. Two potential futures are coming into view:
Memory for agents: I work with Copilot a lot. I’d love for it to remember a helper function I wrote last week instead of re-implementing it from scratch when I work on a related task. Persistent memory will make agents true partners rather than amnesiac assistants.
Sub-agents for task delegation: Complex tasks like web scraping involve many sub-problems: unblocking, page discovery, data extraction, and monitoring. It’s hard for a single agent to manage all of that. The future lies in a main agent that can delegate these tasks to specialized sub-agents. Microsoft is already exploring this with Agent HQ, which allows a primary agent to manage and orchestrate other agents without polluting its own context. This modular approach is a game-changer.
A shifting landscape: New players and market dynamics
For a long time, the LLM space was dominated by one name: OpenAI. Many people were mocking Google for falling behind, despite their pioneering research and proprietary TPU hardware.
Then, in 2025, everything changed. Google released Gemini 2.5 Pro, a groundbreaking model that quickly became the go-to for the hardest problems. At the same time, new players like xAI with Grok entered the top tier, creating more competition than ever.
The leap in intelligence has been matched by a dramatic drop in cost.
In late 2024, a model like GPT-4o cost around $4 per million tokens.
By mid-2025, new open-source models offering nearly double the capability were available for just three cents per million tokens.
Interestingly, the open-source frontier is now being led by China. While the top-tier closed-source models largely come from the US, an explosion of powerful open-source models has emerged from Chinese companies, filling a vacuum left as others focused on proprietary systems.
MiniMax-Abab-v2
62
134B
gpt-neo-125B (hygi)
61
127B
58
237B
57
288B
DeepSeek-V2 3.1 Exp (Reasoning)
57
87.4B
56
9.7B
The impact of a high-quality, open-source release can be immense. When DeepSeek launched its model, it was so disruptive that it was dubbed "The Day DeepSeek Turned Tech and Wall Street Upside Down," wiping a staggering $1 trillion from the stock market as investors re-evaluated the moats of incumbent players.
This naturally leads to the question: are we in an AI bubble? A bubble involves overvaluation based on hyped expectations that can't be met. We saw this with the dot-com bust - but is this time different?
The sheer scale of investment is immense. Project Stargate, a joint data center initiative, is projected to cost hundreds of billions of dollars. In the first six months of 2025 alone, investment in AI infrastructure contributed more to the growth of the US economy than all consumer spending combined. This level of growth can't be sustained forever.
User behavior and the new web
This technological shift is fundamentally changing how we interact with information. The consumer of web data is shifting from humans to AI bots. Google's "AI Overview" provides direct answers by scraping and synthesizing information from websites. This means fewer human visits to the source websites themselves.
This creates a new layer on top of traditional SEO. It's no longer just about your rank on a results page; it's about how an LLM interprets your content and whether it features you in its synthesized answer.
Projections show a stark trend: traffic from traditional organic search is set to decline, while traffic from LLM/AI sources will skyrocket. The question for web data providers becomes: do we adapt to serve these new AI consumers, or do we block them? Companies like Cloudflare are already building a business model around "AI-ndependence" - allowing websites to block AI bots with a single click.
This has also created a negative sentiment among many users. The internet feels different, cluttered with low-quality, AI-generated content - a phenomenon now widely known as "AI slop." There's a growing feeling that "the pure internet is gone."
What comes next?
Reasoning models are the new default, and the pace of change is only accelerating.
The most important takeaway is to remain model-agnostic and agent-agnostic. The market is fiercely competitive; the leading model next month might come from a provider that's new on the scene. Building systems that can easily swap out underlying models and agents is key to staying relevant.
The huge growth in model capabilities will likely slow down at some point, but the shift in how we consume information is permanent.
For anyone building products in this space, one thing has become more important than ever: objective evaluation. As use-cases become more complex and the number of models explodes, having a robust, automated way to evaluate which model performs best for your specific task is no longer a nice-to-have. It is a necessity for survival.
