Microsoft AI CEO: Scale will define AI winners

Anchal Verma
Written by Anchal Verma

The future of artificial intelligence (AI) may depend less on building smarter models and more on the ability to run them at scale. Mustafa Suleyman, chief executive of Microsoft AI, has outlined a clear view of where the industry is heading. He says the next phase of AI will be shaped by access to inference compute, not just model intelligence. In simple terms, the companies that can afford to serve AI to millions of users in real time are likely to move ahead, while others may struggle to keep pace.

Inference compute becomes key constraint

For several years, the AI sector focused heavily on training larger and more advanced models. However, the current challenge lies in running these models efficiently at scale. This process, known as inference, is now the main cost driver.

According to Times of India, inference accounts for nearly two thirds of total AI compute spending in 2026. At the same time, supply remains tight. Graphics processing units continue to face long delivery timelines, often close to a year. High bandwidth memory, which is essential for AI workloads, is largely sold out through the year. Data centre expansion is also slower than expected, with only a fraction of announced capacity currently under construction.

This imbalance between supply and demand has created a situation where compute resources are limited and expensive. As a result, companies must decide how much they can afford to spend to deliver AI services at speed and scale.

Margins decide who can scale AI

Suleyman argues that profit margins now play a central role in AI competition. Companies with high-margin products can afford to pay more for inference compute. This allows them to offer faster responses and more reliable performance to users.

Lower latency improves user experience and increases engagement. When users return more often, they generate valuable data. This data can then be used to refine models and improve outputs. Over time, this creates a cycle of continuous improvement.

This cycle, often referred to as a data flywheel, strengthens products that already have strong financial backing. As these products improve, they attract more users, which in turn generates more data and revenue.

Enterprise AI gains an advantage

High-margin enterprise applications are well positioned in this environment. Tools used in legal services, healthcare software, and productivity platforms such as Microsoft 365 Copilot can absorb higher compute costs.

Microsoft’s own data highlights this trend. Paid Copilot users reached 15 million in the second quarter of the 2026 financial year, marking a significant increase from the previous year. Despite this growth, the figure still represents a small portion of the total commercial user base, indicating further room for expansion.

These enterprise products benefit from stable revenue streams, which support ongoing investment in AI infrastructure and performance improvements.

Startups and consumer apps face pressure

In contrast, consumer-focused AI applications and early-stage startups face a more difficult path. Limited budgets restrict their ability to spend on inference compute. This can lead to slower response times and reduced user engagement.

Without consistent usage, these platforms struggle to gather the data needed to improve their models. As a result, the improvement cycle slows down or fails to begin altogether.

Some industry participants suggest that advances in open-source models or on-device AI could reduce reliance on expensive infrastructure. However, these approaches are still evolving and may not immediately address the current supply constraints.

Short-term shift with long-term impact

Suleyman’s outlook focuses on the next two to three years, a period during which compute scarcity is expected to remain a defining factor. During this time, companies with the financial capacity to invest heavily in infrastructure may build a lasting advantage.

With major players committing significant capital to AI development, the ability to fund large-scale operations is becoming as important as innovation itself. For now, the balance of power in AI appears to be shifting towards those who can afford to run it efficiently at scale.