Qualcomm AI200/250: AI accelerators for data centers

Avatar
Lisa Ernst · 27.10.2025 · Technology · 7 min

Qualcomm enters the AI inference market in data centers with the AI200 and AI250 systems. The focus is on a lot of memory per card and efficient rack operation from 2026 or 2027. Up to 768 GB LPDDR per accelerator card, direct liquid cooling and scaling over PCIe in the rack as well as Ethernet between racks are central features. A 200-MW deployment deal with the Saudi Arabian startup Humain from 2026 is already planned.

Qualcomm AI Inference

Qualcomm positions the AI200 and AI250 accelerator cards as well as complete racks for AI inference in data centers. Inference here means that already trained models answer requests rather than being retrained. This represents a cost-driven, continuous operation in data centers, where memory size, memory bandwidth, and energy efficiency are decisive. The new systems are based on Qualcomm's Hexagon NPU, which has been scaled from the mobile domain for data center workloads. Each AI200 card is expected to carry up to 768 GB LPDDR memory. The systems use direct liquid cooling, PCIe for scale-up within the rack, and Ethernet for scale-out between racks. This aims at a better total cost of ownership (TCO) through high memory density and efficiency. The AI250 relies on a near-memory architecture with more than tenfold effective memory bandwidth, which is particularly relevant for large transformer models and long contexts.

Background & Context

Qualcomm's move into the data center market for AI inference is part of a diversification strategy to become less dependent on the smartphone cycle and to expand into markets with sustainable AI capex. Inference is seen by many operators as a larger cost driver than training, because it scales 24/7 and is closely connected to the end user. A high RAM capacity per card can reduce model sharding, minimize communication overhead, and thus lower latency and cost per response. The entry of a mobile chip giant into rack-scale AI draws attention, as it targets established players like Nvidia and AMD in their core market and foresees an annual roadmap.

Across – The Qualcomm Cloud AI 100 Chip, a predecessor of the AI200/AI250 series, demonstrates Qualcomm's commitment in the field of AI accelerators for data centers.

Quelle: computerworld.ch

The Qualcomm Cloud AI 100 chip, a predecessor of the AI200/AI250 series, demonstrates Qualcomm's commitment in the field of AI accelerators for data centers.

The underlying technology, Qualcomm's Hexagon NPU, is known from the mobile space and has been scaled up for data center workloads. This enables Qualcomm to build on existing know-how while opening up new markets. The strategic significance of this step is underscored by the collaboration with Humain, a Saudi Arabian startup that, with support from the Public Investment Fund (PIF), aims to cover the entire AI value chain, including data centers and large Arabic language models.

The announcement of the AI200 and AI250 comes in a context where demand for efficient and high-performance AI inference solutions is steadily growing. Companies are looking for ways to reduce operating costs for AI applications while increasing performance at the same time. Qualcomm's approach of combining high memory density with energy efficiency could offer a competitive advantage here.

Quelle: YouTube

The CNBC clip provides further context to Qualcomm's data center strategy and explains the integration with existing AI stacks from previous announcements.

Current Status & Development

The development around Qualcomm's data center initiative has accelerated in recent months:

These events show a clear strategy and rapid progress in implementing Qualcomm's data center ambitions. The partnership with Humain is an early and concrete sign of market acceptance and confidence in the new products.

Analysis & Implications

Qualcomm's entry into the AI inference market for data centers is strategically motivated. The company aims for independence from the smartphone cycle and wants to invest in the growing AI capex market. Inference is identified as the larger cost driver in data centers, since it requires 24/7 operation and is closely tied to end users. The high RAM capacity per card (up to 768 GB LPDDR) of the AI200 and AI250 is intended to reduce model sharding, minimize interconnect traffic, and thus lower latency as well as cost per response. This is particularly relevant for large language models (LLMs) and long contexts.

Across – Qualcomm's comprehensive AI strategy integrates edge computing with cloud solutions, with the AI200/AI250 accelerators playing a central role in the data center infrastructure.

Quelle: heise.de

Qualcomm's comprehensive AI strategy integrates edge computing with cloud solutions, with the AI200/AI250 accelerators playing a central role in the data center infrastructure.

The AI250 relies on a near-memory architecture that promises more than tenfold effective memory bandwidth. This is a crucial factor for processing large transformer models and long contexts, as these benefit greatly from high memory bandwidth. Media reaction to Qualcomm's announcement is positive, as the entry of a mobile-chip giant into the rack-scale AI market is turning heads and challenging Nvidia and AMD in their core market. Qualcomm also plans an annual roadmap, signaling long-term commitment.

For data center operators, the new systems potentially mean a reduction in total cost of ownership (TCO) and improved energy efficiency per query. Direct liquid cooling and scaling over PCIe within the rack as well as Ethernet between racks are technical features aimed at efficient operation. The 200 MW deployment deal with Humain from 2026 is a strong signal for market acceptance and confidence in Qualcomm's solutions.

It is important to note that the AI200 and AI250 are explicitly designed for inference and not for training. This is a key difference from many other AI accelerators on the market and underscores Qualcomm's focus on the operational deployment of AI models. Qualcomm's challenge will be to compete against Nvidia's established ecosystem and to provide comparable software support.

For procurement teams, it is advisable to check early the supply readiness in 2026/2027, the integration into existing CNI/network topologies, and the availability of confidential computing features. The classification of media reports should always be supplemented by primary sources and technical reviews to separate hype from reliable data.

Quelle: YouTube

The video provides background on the concept of AI factories and helps place rack-scale inference in economic terms.

Open Questions & Conclusion

Despite the promising announcements, some questions remain. The concrete performance per watt and per dollar of the AI200/AI250 compared to current Nvidia and AMD racks, as measured by standardized benchmarks such as MLPerf Inference, is still unclear. Qualcomm has so far not provided MLPerf inference values or tokens/s, which leaves the relative performance in numbers open. It will be crucial how quickly the near-memory architecture of the AI250 matures in real workloads and what software maturity Qualcomm's inference stack will bring to market at launch. Detailed documentation or whitepapers on bandwidths, latencies, and orchestration from Qualcomm are still pending.

In summary, Qualcomm's move into rack-scale inference is clearly defined: lots of memory per card, efficient cooling, and a roadmap to be rolled out starting in 2026. Architectural goals, memory design, and a large initial customer are already established. What remains are hard benchmark data from real-world implementations. For companies planning for 2026/2027, it is advisable to evaluate options now, review software paths, and prepare procurement and energy planning for the new parameters.

Teilen Sie doch unseren Beitrag!