
The release of DeepSeek’s AI Models, V3 and R1 has caused shockwaves across the technology and investment world. The initial market reaction has hit the semiconductor stocks, a few of the Magnificent Seven stocks and (somewhat surprisingly) the energy stocks that were poised to benefit from the power demands of AI.
A few things are worth mentioning from an investment perspective. But to do so, it is important to understand the background and genesis of DeepSeek.
Founded as an offshoot of a Quant Hedge Fund, DeepSeek harnessed patriotic PhD talent and resource-efficient strategies to develop cutting-edge AI despite U.S. export controls.
Initially setup as the research arm of a Quant Hedge Fund called High-Flyer, DeepSeek was born out of the founder, Liang Wenfeng’s drive for scientific curiosity. According to Liang, when he put together DeepSeek’s research team, he was not looking for experienced engineers to build a consumer-facing product. Instead, he focused on PhD students from China’s top universities who were eager to prove themselves.
The hiring strategy helped create a collaborative company culture where people were free to use ample computing resources to pursue unorthodox research projects. The fact that these young researchers were almost entirely educated in China added to their drive. “This younger generation embodies a sense of patriotism, particularly as they navigate US restrictions and choke points in critical hardware and software technologies,” according to one researcher.
Because of US government export controls, DeepSeek had to come up with more efficient methods to train its models. “They optimized their model architecture using a battery of engineering tricks—custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of the mix-of-models approach,” says Wendy Chang, a policy analyst at the Mercator Institute for China Studies. “Many of these approaches aren’t new ideas, but combining them successfully to produce a cutting-edge model is a remarkable feat.”
This is very different from the approach thus far in the U.S., where cheap memory and compute power (funded though money being thrown at the problem, we might add) made it feasible to just use brute force methods at fitting and forecasting problems. Thus what we have are highly inefficient models using huge computational capacity to do amazing things.
To be fair, the first iteration of anything is fantastically expensive. Then the cost cutting gets to work. And in a way, DeepSeek and the Chinese researchers have taken us back to the previous generations of computer programming, where one had to be really careful about resources, using efficient algorithms, and as little memory and compute power as possible.
DeepSeek’s new V3 and R1 models rival OpenAI and Anthropic in performance, earning praise for real-world usability and achieving 45x training efficiency through groundbreaking optimizations.
As an investor who wants to better understand what is under the hood, without getting into the technical details of it, a few things worth knowing.
DeepSeek has released two new models that have basically world-competitive performance levels on par with the best models from OpenAI and Anthropic (blowing past the Meta Llama3 models and other smaller open source model players such as Mistral). These models are called DeepSeek-V3 (essentially their answer to GPT-4o and Claude3.5 Sonnet) and DeepSeek-R1 (their answer to OpenAI's o1 model). For reference, GPT-4o is the typical ChatGPT web query model, while the o1/R1 models are the newer reasoning models (also called chain-of-though models) – something for which OpenAI was recently charging $200 per month.
Furthermore, two other important facets are worth sharing (which for lack of our own technical know-how, we take from the blogsphere)1:
DeepSeek's innovation under constraints signals a shift in AI development toward efficient algorithms and clever engineering, challenging resource- heavy models.
DeepSeek's breakthrough demonstrates how constraints can drive innovation in unexpected ways. Their success, achieved despite limited access to advanced chips, has shown that efficient model architecture and clever engineering can compete with approaches that rely primarily on massive computing power. This development is likely to democratize AI development, accelerate innovation through open-sourcing, and shift focus toward more efficient algorithmic approaches. While this may disrupt current market leaders and cause short- term market volatility, it ultimately points toward a more sustainable and accessible future for AI development. The industry is entering a new phase where success will depend not just on computational resources, but on innovative approaches to model design and training efficiency.
The key takeaway is that the AI race is far from over - it's merely entering a new, more nuanced phase where multiple paths to advancement exist, and where clever engineering may prove as valuable as raw computing power.
1. The Short Case for Nvidia Stock | YouTube Transcript Optimizer