“Deepseek’s AI training only cost $6 million!!” Ah, no. More like $1.3 billion.

Silicon Valley takes out its abacus and computes.

 

7 February 2025 (San Francisco, CA) — When I first began writing about DeepSeek last month, I noted one of pieces of the puzzle we might never verify is the $5.6 million cost to train the model – because the only source for that figure was DeepSeek.

But many of the other pieces of the puzzle we knew – for instance the team’s tactic to cut down on the data processing needed to train the models, plus using some very innovative inventions and techniques of their own, many adopted by similarly constrained Chinese AI companies. Because these factors had been written about in verifiable AI journals and news reports over the past year.

So we knew the Chinese thinking was the LLM training process was time-consuming and expensive, and that they were looking at other techniques, such as “mixture of experts”, which allowed them to delegate questions to a roster of experts in specific fields, such as fiction, periodicals and cooking. Each expert needs less training, easing the demand on chips to do everything at once. And the Chinese were writing about it – but being fluffed off by the “U.S. AI Industrial Complex”.

But analysts figured if the Chinese could develop an approach that required less time and power before the question is asked, then, all things considered, DeepSeek’s shortcuts could help it train AI at a fraction of the cost of competing models. What that meant in $$$ terms nobody could say.

But when DeepSeek said its V3 model was trained at a reported cost of $5.58 to $5.6 million, yet somehow managed to outperform every LLM model in the “U.S. AI Industrial Complex” arsenal, everybody fell out of their chairs. And we had no way to verify the $ figures.

Well, until somebody decided to check the math.

And that somebody was SemiAnalysis, an independent research and analysis company specializing in the semiconductor and AI industries which I have noted and quoted in previous posts. They have been tracking DeepSeek for well over a year, via published news articles, research papers, and “HUMINT” – human intelligence, people in China.

I refer you to “Research exposes Deepseek’s AI Training Cost Is Not $6M, It’s a Staggering $1.3B “.

The assertions in the write up are interesting and closer to the actual cost of the Deepseek open source smart software. Let’s take a look at the allegedly accurate and verifiable information. Then I want to point out a few costs not included in the estimated cost of DeepSeek.

The article explains that the analysis for training was closer to $1.3 billion. I am not sure if this estimate is on the money, but a higher cost is certainly understandable based on the billions of $$$$ burned in LLM/chat bot activities of outfits like Amazon, Facebook (Meta), Google, Microsoft, OpenAI, among others. The authors note:

In its latest report, SemiAnalysis, an independent research company, has spotlighted Deepseek, a rising player in the AI landscape. The SemiAnalysis challenges some of the prevailing narratives surrounding Deepseek’s costs and compares them to competing technologies in the market. One of the most prominent claims in circulation is that Deepseek V3 incurs a training cost of around $6 million.

One important point is that building and making available for free a smart software system incurs many costs. The consulting firm has narrowed its focus to training costs. The authors report:

The $6 million estimate primarily considers GPU pre-training expenses, neglecting the significant investments in research and development, infrastructure, and other essential costs accruing to the company. The report highlights that Deepseek’s total server capital expenditure (CapEx) amounts to an astonishing $1.3 billion. Much of this financial commitment is directed toward operating and maintaining its extensive GPU clusters, the backbone of its computational power.

But “astonishing”? Nope. Sam AI-Man has tossed around numbers in the trillions. Read the full article to how/what they calculated.

I am not sure we will ever know how much Amazon, Facebook, Google, and Microsoft – to name just 4 outfits – have spent in the push to win the AI war, get a new monopoly, and control everything from baby cams to zebra protection in South Africa.

I do agree that the low ball number thrown out by DeepSeek was fantastically low, but I think the pitch for this low ball was a tactic designed to see what a Chinese-backed AI product could do to the U.S. financial markets. As Ars Technica and others have noted, there seemed to be a “lot-more-than-usual” fiddling with/in the U.S. stock markets and the U.S. option markets by hedge funds and others before the DeepSeek news hit the wires.

Plus there are some costs that neither the SemiAnalytics outfit or the Interesting Engineering website have considered.

First, if you take a look at the authors of the Deepseek ArXiv papers you will see a lot of names. A lot. Cross-check them (we had someone do that for us) and most of these individuals are affiliated with Chinese universities. How are these costs handled? My hunch is that the costs were paid by the Chinese government and the authors of the paper did what was necessary to figure out how to come up with a “do-more-for-less” system.

The idea is that China, hampered by U.S. export restrictions, is better at AI than the mythological Silicon Valley. Okay, that’s a good intelligence operation: test destabilization with a reasonably believable free software gilded with AI sparklies. And having worked at Apple for 6 years and having dealt with China, I can tell you China uses its financial and scientific muscle to manipulate technologies and markets in a manner that not only upsets economic systems, but poses huge risks to global security.

But the costs? Staff, overhead, and whatever perks go with being a wizard at a Chinese university have to be counted, multiplied by the time required to get the system to work mostly, and then included in the statement of accounts. These steps have not been taken, but I know a few companies doing that “math” so we should see some analysis shortly.

Second, what was the cost of the social media campaign that made DeepSeek more visible than Taylor Swift during the warm up to the Super Bowl? Stories were planted everywhere, and then amplified, and then “forced viral”. That cost has not been considered. Someone should grind through the posts, count the authors or their handles, and produce an estimate. As far as I know, there is no information out there about who is a paid promoter of DeepSeek. But obviously there are many.

As I have noted, the cost and complexity and tentacles of undercover social media campaigns goes (almost) unnoticed in the commercial and financial world. Social media in commerce and finance is about sociology and psychology more than technology, just as it is in the more obvious political use.

Third, how much did the electricity cost do to get DeepSeek to do its tricks? We must not forget the power at the universities, the research labs, and the laptops. MIT Technology Review has some thoughts along this power line.

Finally, what’s the cost of the overhead? I am thinking about the planning time, the lunches, the meetings, and the back and forth needed to get DeepSeek on track to coincide with Trump’s push to “make China not so great again” and completely gut his White House AI performance the day before? I have nothing, other than to know such planning was certainly on China’s bingo card. They are the most political of political animals.

BOTTOM LINE?

The DeepSeek operation worked. The recriminations, the allegations, and the explanations are in full swing. But I am not sure they will have as much impact on this “China smart, U.S. dumb” strategy that China is successfully running.

I think China is ahead of the U.S. in the AI game … and pretty much every other game in town.

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top