Two issues with this. One, it's profitable assuming you just keep serving the same model forever, which is not realistic in this market. A given model has a shelf-life, which these days is measured in months, not years. Which means that trying to separate the cost of training the model from the cost of serving it doesn't make much business sense. And two, for providers that provide inference only via open weight models, the margins quickly move to commoditization. The "someday" when frontier model providers can enjoy their current high inference margins without the burden of significant training costs is never going to arrive.
Commoditization means there's price competition. From a consumer perspective, that's good. You want it to be a low-margin, high volume, competitive business.
Although from a business perspective, it can end up being ruinous competition like solar panels or airlines. A stable equilibrium with prices neither too low or too high isn't guaranteed; it depends on market structure.
It's anyone's guess whether this reaches an equilibrium or not, but I still expect that there will be companies like OpenRouter and Fireworks that offer inference at reasonable prices.
>Commoditization means there's price competition. From a consumer perspective, that's good. You want it to be a low-margin, high volume, competitive business.
You and I may want it to be a low-margin, high-volume business. But the valuations of OpenAI, Anthropic, and much of the rest of the AI industry are not based on that assumption. They are based on the assumption that there will be a couple of winners, like in the smartphone wars, and that those winners will be able to maintain good margins.
There are low-end and premium smartphones, with different profit margins. Similarly, it seems like open weights models and high-end models could co-exist?
I don’t think there’s any way of knowing what the market structure will be in the end.
> A given model has a shelf-life, which these days is measured in months, not years.
Not all new models are trained from scratch. ChatGPT 5.3 to 5.4 (and likely 5.5) was basically the same model, but probably trained a bit more, not a new model from scratch.
> The "someday" when frontier model providers can enjoy their current high inference margins without the burden of significant training costs is never going to arrive.
That is debatable. I believe the moat for the frontier model providers is the compute. At the level of 10 trillion parameters (that Fable/Mythos are rumored to have), you need serious compute to serve inference, and you also need serious compute to train. Will DeepSeek, Qwen, Kimi, GLM come up with a 10T new model anytime soon? I doubt that. People keep saying that the Chinese labs are catching up to the US big 3, and measured in months the gap is now only 4-6 months. I doubt a Chinese version of Fable/Mythos will be released in the next 12 months.
How is this sustainable in the long run tho? Major enterprises are already making the headlines as they cut licenses and reduce token consumptions. The conversation is moving a lot toward return on token... is automating this activity with an agent giving me a higher return rather than having a human run it? Plus the advent of local AI and open source models are creating new opportunities to reduce token costs
Profitable on a per token basis is meaningless. Ed Zitron doesn’t argue that it is impossible to offer profitable inference, he argues that the business as it stands today is deeply unprofitable and only getting worse because it isn’t a high-margin inference business.
Play out the most likely pessimist’s scenario: LLMs are useful but frontier models are overkill so businesses just use dirt cheap open weight models on their own hardware and/or they rent hardware instead of paying per token. Then what for OpenAI and Anthropic?
OpenAI’s business collapses if customers are happy with an LLM that costs $0.10 per million tokens even if it only costs OpenAI $0.05 in inference per million tokens. The insane bonkers claim from Garry Tan that in 2 years we will be using 90,000x as many tokens as today is… well, obviously not true.
The fixed costs that OpenAI and Anthropic have created need inference demand far beyond what is plausible.
edit: and hand waving away the vast losses of companies like OpenAI because of “training” is ridiculous. Anthropic are spending a billion dollars per month to rent additional capacity from xAI for inference, not training. The models don’t need to get better: if there is a case for LLMs to change business forever, GPT 5.4 is just as capable of achieving it as GPT 5.5.
Two issues with this. One, it's profitable assuming you just keep serving the same model forever, which is not realistic in this market. A given model has a shelf-life, which these days is measured in months, not years. Which means that trying to separate the cost of training the model from the cost of serving it doesn't make much business sense. And two, for providers that provide inference only via open weight models, the margins quickly move to commoditization. The "someday" when frontier model providers can enjoy their current high inference margins without the burden of significant training costs is never going to arrive.
Commoditization means there's price competition. From a consumer perspective, that's good. You want it to be a low-margin, high volume, competitive business.
Although from a business perspective, it can end up being ruinous competition like solar panels or airlines. A stable equilibrium with prices neither too low or too high isn't guaranteed; it depends on market structure.
It's anyone's guess whether this reaches an equilibrium or not, but I still expect that there will be companies like OpenRouter and Fireworks that offer inference at reasonable prices.
>Commoditization means there's price competition. From a consumer perspective, that's good. You want it to be a low-margin, high volume, competitive business.
You and I may want it to be a low-margin, high-volume business. But the valuations of OpenAI, Anthropic, and much of the rest of the AI industry are not based on that assumption. They are based on the assumption that there will be a couple of winners, like in the smartphone wars, and that those winners will be able to maintain good margins.
There are low-end and premium smartphones, with different profit margins. Similarly, it seems like open weights models and high-end models could co-exist?
I don’t think there’s any way of knowing what the market structure will be in the end.
> A given model has a shelf-life, which these days is measured in months, not years.
Not all new models are trained from scratch. ChatGPT 5.3 to 5.4 (and likely 5.5) was basically the same model, but probably trained a bit more, not a new model from scratch.
> The "someday" when frontier model providers can enjoy their current high inference margins without the burden of significant training costs is never going to arrive.
That is debatable. I believe the moat for the frontier model providers is the compute. At the level of 10 trillion parameters (that Fable/Mythos are rumored to have), you need serious compute to serve inference, and you also need serious compute to train. Will DeepSeek, Qwen, Kimi, GLM come up with a 10T new model anytime soon? I doubt that. People keep saying that the Chinese labs are catching up to the US big 3, and measured in months the gap is now only 4-6 months. I doubt a Chinese version of Fable/Mythos will be released in the next 12 months.
Agree fully.
The smaller models will only get better which push out the usefulness of older gpus.
How is this sustainable in the long run tho? Major enterprises are already making the headlines as they cut licenses and reduce token consumptions. The conversation is moving a lot toward return on token... is automating this activity with an agent giving me a higher return rather than having a human run it? Plus the advent of local AI and open source models are creating new opportunities to reduce token costs
Profitable on a per token basis is meaningless. Ed Zitron doesn’t argue that it is impossible to offer profitable inference, he argues that the business as it stands today is deeply unprofitable and only getting worse because it isn’t a high-margin inference business.
Play out the most likely pessimist’s scenario: LLMs are useful but frontier models are overkill so businesses just use dirt cheap open weight models on their own hardware and/or they rent hardware instead of paying per token. Then what for OpenAI and Anthropic?
OpenAI’s business collapses if customers are happy with an LLM that costs $0.10 per million tokens even if it only costs OpenAI $0.05 in inference per million tokens. The insane bonkers claim from Garry Tan that in 2 years we will be using 90,000x as many tokens as today is… well, obviously not true.
The fixed costs that OpenAI and Anthropic have created need inference demand far beyond what is plausible.
edit: and hand waving away the vast losses of companies like OpenAI because of “training” is ridiculous. Anthropic are spending a billion dollars per month to rent additional capacity from xAI for inference, not training. The models don’t need to get better: if there is a case for LLMs to change business forever, GPT 5.4 is just as capable of achieving it as GPT 5.5.
What is the actual Return on Equity of Anthropic and OpenAI?
[flagged]