Alphabet’s Google is holding advanced discussions with Marvell Technology to co-develop two new custom chips specifically designed to make artificial intelligence inference far more efficient and cost-effective.
According to sources familiar with the matter, the partnership would focus on AI inference — the high-volume phase where trained models deliver real-time responses, search results, recommendations, and generative content. The first chip is a specialized memory processing unit (MPU) engineered to pair directly with Google’s existing Tensor Processing Units (TPUs) and ease memory bottlenecks. The second is an entirely new TPU architecture built from the ground up for inference workloads.
This potential collaboration would position Marvell in a design-services role, similar to Google’s recent work with MediaTek on its latest Ironwood TPU. The goal is to tackle persistent challenges around power consumption and memory bandwidth that drive up the massive operational costs of running AI at hyperscale.
Google has been aggressively expanding its in-house silicon strategy since launching the first TPU in 2016. By adding Marvell alongside longtime partner Broadcom, the company aims to diversify its supply chain, speed up innovation, and gain greater control over the hardware that powers everything from Gemini to YouTube and enterprise AI tools.
Marvell Technology, already a major player in high-speed networking and custom ASICs, has seen strong growth in its AI-related business. A deal with Google would further establish the company as a key supplier in the booming custom-AI-chip market.
The talks are still ongoing and have not yet resulted in a signed contract. Sources indicate the companies are aiming for early design finalization on the memory processing unit as soon as 2027, with test production to follow.
Neither Google nor Marvell has issued an official comment on the reports.
The development reflects the broader industry shift: major cloud providers are racing to create proprietary accelerators that deliver better performance-per-watt than general-purpose GPUs, especially as inference now dominates real-world AI compute demands.
