As the competition between generative AI models reaches a fever pitch, the exorbitant costs of inference and the attendant computational bottlenecks have emerged as the primary grievances for AI behemoths. According to a recent report by The Information, the AI startup Anthropic is strategizing to procure hardware from Fractile, a nascent British semiconductor firm, in a bid to drastically diminish the fiscal burden of AI inference.
Fractile has emerged as a formidable contender, asserting that its architecture can amplify Large Language Model (LLM) inference performance to a hundredfold that of NVIDIA’s offerings, while concurrently reducing costs to a mere tenth. The company anticipates bringing its silicon to market by 2027. Established in 2022 by Dr. Walter Goodwin of Oxford University, Fractile boasts a distinguished engineering cohort, recruiting veterans from industry stalwarts such as Graphcore, NVIDIA, and Imagination Technologies. Reports from Tom’s Hardware indicate that after securing $15 million in seed funding in 2024, the firm is currently pursuing a $200 million funding round, aiming for a “unicorn” valuation of $1 billion.
The prevailing architectural paradigm utilized by NVIDIA’s GPUs segregates computational units from memory components (such as HBM or DRAM). During the execution of AI inference, data must incessantly oscillate between these two entities, precipitating deleterious latency and substantial power consumption—a phenomenon colloquially derided in the industry as the “Memory Wall.”
To dismantle this barrier, Fractile utilizes a chip architecture predicated on the RISC-V instruction set, paired with SRAM (Static Random-Access Memory). It boldly integrates computational logic and memory onto a single die. The company maintains that this “near-memory” design, by eliminating data movement bottlenecks, enables LLM inference speeds to soar to 100 times those of NVIDIA GPUs at only 10% of the operational cost.
Anthropic’s interest in an early-stage hardware startup is driven by the acute “compute anxiety” accompanying its explosive growth. By the close of 2025, Anthropic’s annualized revenue surged to $30 billion. Eschewing the strategy of OpenAI or xAI—which involve building proprietary data centers—Anthropic maintains a “cloud-neutral” stance, utilizing a heterogeneous array of third-party platforms including NVIDIA GPUs, Amazon’s Trainium, and Google’s TPUs. As its operational scale expands, inference costs have become a staggering liability. Analysts suggest this maneuver is designed not only to secure more efficient and economical compute but also to “diversify supply chain risk,” mitigating an over-reliance on NVIDIA’s singular architecture in the ongoing computational arms race.
Anthropic’s rumored partnership with Fractile reflects a pivotal trend in the AI market: the accelerating bifurcation of hardware requirements for “training” versus “inference.” For instance, Google recently bifurcated its 8th-generation TPU into the TPU 8t (optimized for training) and the TPU 8i (tailored for inference).
In the training phase, NVIDIA’s dominance remains unassailable, bolstered by its expansive CUDA software ecosystem and superior hardware integration. However, once a model is refined and deployed for practical application—the inference stage—enterprises prioritize latency and cost-effectiveness above all else. This shift has provided a strategic entry point for innovators like Groq (recently acquired), Cerebras, and Fractile, all of whom champion SRAM or near-memory architectures to redefine the boundaries of AI performance.
Support Our Threat Intelligence
If you find our CVE report and cybersecurity news helpful, consider supporting our work.