The Dual Nature of AI's Compute Demand

The Conventional Wisdom (And Why It's Wrong)

When NVIDIA's stock price began its meteoric rise and data center GPUs became the new gold rush, the investment thesis seemed straightforward enough. We were told that artificial intelligence would require massive computational resources to train increasingly large language models. The bigger the model, the better the performance, and thus the greater the demand for training compute. Simple enough.

This narrative had an appealing elegance to it. Training was a one-time, intensive burst of computation—like building a factory. Once built, the factory (the trained model) could produce widgets (responses) relatively cheaply. The real money was in the construction phase, not the operation phase.

‍

‍What We're Actually Seeing

The reality unfolding before us tells a fundamentally more complex story. Training workloads are indeed continuing to grow rapidly, driven by what researchers call "scaling laws"—empirical relationships that show predictable performance improvements as you increase model size, data, and compute1. These aren't just theoretical constructs; they're the ROI logic that justifies billion-dollar infrastructure buildouts. When 10x more compute yields measurably better performance, the investment case is compelling.

But in fact training isn't a one-time event. What we call "training" encompasses pre-training (the headline-grabbing runs), supervised fine-tuning, reinforcement learning from human feedback (RLHF), and countless iterative R&D runs. Each generation of models requires dozens or hundreds of smaller experimental runs. The total compute cost of bringing a model to market is far greater than a simple one-time cost.

‍

The Inference Revolution

Simultaneously, something profound is happening on the inference side. Consider what's actually occurring when you interact with today's advanced AI systems.

You pose a question, and rather than receiving an instant response, the system goes away to "think" for hours. Not minutes. Hours. This isn't a bug; it's a feature enabled by the scaling laws that show “thinking” longer yields dramatically better results. But this thinking is computationally expensive. Very expensive.

And then there are the “agents”. These aren't passive question-answering systems but active digital entities that can plan, execute tasks, learn from feedback, and operate semi-autonomously. An agent helping you plan a vacation doesn't just spit out a pre-computed itinerary. It researches current prices, checks weather patterns, evaluates your preferences against available options, perhaps even negotiates bookings on your behalf. Each of these steps is a fresh set of computations2.

‍

The Hardware Disruption Hidden in Plain Sight

On the back of this are the semiconductors powering it all. The GPUs—primarily NVIDIA's H100s and similar architectures—are optimized for the large matrix multiplications and parallel processing patterns that characterize model training.

Inference has distinct computational needs: lower precision arithmetic, unique memory access patterns, and a focus on low latency and power efficiency over maximizing data processing speed. This has opened the door for specialized inference accelerators from several companies, creating potential new leaders beyond NVIDIA and AMD.

‍

Where Does This All Lead Us?

On the one hand training demand continues to grow explosively. Foundation model companies are locked into an arms race driven by scaling laws that practically guarantee ROI from additional compute investment.

At the same time, we are seeing inference demand growing even faster. Every question asked, every agent task executed, every hour spent "thinking" through a complex problem requires fresh computational resources.

This is creating a dual demand profile: sustained, lumpy growth in training compute driven by scaling laws and model competition, plus smooth, continuous growth in inference compute that scales directly with AI adoption.

Investment Implications

The market seems to be building two distinct infrastructure layers—one optimized for training, another for inference—with different hardware, different geographic requirements, and different operators. This has interesting investment implications.

First, the foundation model companies aren't just buying compute; they're buying guaranteed performance improvements via scaling laws. This means predictable, compounding demand for chips that's not as cyclical as the traditional semiconductor market.

Second, the winners won't just be the chip makers but the companies that can build and operate the infrastructure efficiently. This means data center operators, cooling specialists, and power infrastructure companies that can handle sustained high utilization rather than periodic spikes.

Third, as inference demand grows, proximity to users becomes critical. This creates opportunities for edge computing infrastructure and regional data centers in population centers rather than just cheap-power remote locations.

The AI compute story isn't about training OR inference—it's about training AND inference, each following different scaling laws, requiring different hardware, and creating different investment opportunities.

Investors that understand this dual nature of AI compute demand, and position themselves accordingly, may be best positioned for the next phase of the AI boom.

This content is provided for informational purposes only and does not constitute investment advice. It should not be relied upon as the basis for making any financial or investment decisions.