Fractile raises $220M to build the next generation of inference hardware

May 13, 2026 -- Fractile was founded in 2022 on the bet that, eventually, the world’s most capable AI systems would be limited in their impact by the amount of time they take to produce useful outputs. We bet everything on the logical conclusion: that the only way to truly unlock this latent value, to make speed viable at scale, was to radically re-invent the hardware that we run our frontier AI models on. Ever since, we have been building chips and systems that tackle this problem.

Since then, raw AI capability has already reached the point where time from query to output is the key limit to frontier capabilities. As models have improved, so has their ability to be orchestrated over increasingly long output sequences. The toughest problems demand generating many tens of millions of tokens, and we see continual capability returns to generating longer outputs. At the same time, the unit economics of inference have become a brutal constraint. Inference is both the revenue engine of the AI industry and the rate-limiting factor on expanding it.

The positive correlation between performance and the amount of compute deployed at inference time has been a longstanding hallmark of frontier AI systems. DeepMind’s AlphaGo achieved superhuman performance through not just running a neural network once to pick one particular next move, but running a tree search over many possible futures, with each future explored by sequential, repeated inference of a neural network. The emergence of reasoning models in 2024 made clear that similar principles applied to LLMs. What we are seeing now, though, with some of the most valuable applications of AI consuming many millions of tokens, is also a reflection of a fundamental property of hard work. Serious intellectual work involves many sequential steps, each dependent on the last.

For very hard work, these sequential steps can sum to an extraordinary body of intermediate output, yet lead to incredibly valuable outcomes when those outputs are synthesised. After years of work on Fermat’s Last Theorem, Andrew Wiles realised that the approach he was working on that day looked like a dead end, but fit perfectly to resolve an approach he had explored three years earlier. (A film taken shortly afterwards illustrates it perfectly, showing reams of paper filled with working, dead ends and fruitful directions, spilling across his desk.) The ability to operate over long context, exploring different directions in sequence – and the enormous stack of papers Wiles accumulated – is what we are starting to push frontier LLMs towards doing as we throw them at our hardest problems.

Today’s LLMs are already producing up to 100 million tokens in pursuit of tackling these hard problems. At the ~40 tokens per second or so at which these models tend to run on existing chips, a single output of this length takes a month to complete. The technical and economic limits on inference speed, above all from memory bandwidth that has failed to scale on current architectures, are what is constraining progress. To compress that month into a day, we will need to generate output at ~1,200 tokens per second, while handling the complexity and capacity challenges of operating large models at very long contexts. This is exactly the problem Fractile has been building from the ground up to tackle.

However, what is most exciting about the hardware moonshot is not accelerating the workloads of today, but rather the entirely new workloads that we will enable. Compressing a month of work into a day, a weekend of lab computation into a coffee break, will make all that work happen radically faster, but it will also make far more ambitious AI use cases economically viable. Agentic coding is only the start of the story. The defining work of the 21st century will be marked by the engine of inference delivering immense and diffuse chains of intellectual inquiry, in drug discovery, in software engineering, in materials discovery, in any field where humanity will benefit from sheer intellectual work to resolve complex problems. As with any technological revolution, those who drive this progress fastest, who push the frontier furthest, will capture the greatest share of the value. The workloads that push to the limits of the current frontier are already transformational. The ones that lie beyond that frontier, that we are about to break open, will stretch our imaginations and redefine the entire economy. Fractile is seeking to increase the clock speed of global progress, one chip at a time.

Making this possible begins with people. Since founding, we’ve been working across the full stack, from foundational AI research to foundry process innovation to chip micro-architecture, to aggressively chase the most promising solutions and develop systems that break the trade-off curve, reject the inference pareto frontier of cost-versus-latency, and chart a course to changing what we can do with the world’s best AI models.

Today, we are delighted to share that we have raised $220M to accelerate the path to getting our first chips and systems into customers’ hands, in a financing round led by Accel, Factorial Funds, and Founders Fund, with participation from Conviction, Gigascale, O1A, Felicis, Buckley Ventures and 8VC, investing alongside our brilliant existing backers.

Fractile's journey has only just begun, and the most important work lies ahead. We are hiring across the UK (London, Bristol), the US (San Francisco), and Taiwan (Taipei). If you are looking for the opportunity to join on a singularly ambitious, hard and consequential mission, we want to hear from you.

Fractile raises $220M to build the next generation of inference hardware

Related Semiconductor IP

Related News

Latest News

Fractile raises $220M to build the next generation of inference hardware

Subscribe to the Semi IP Hub Newsletter

Related Semiconductor IP

Related News

Latest News