AI-Accelerated Vulnerability Discovery: An Emergency in Software Security. Is Hardware Next?

Frontier AI models have crossed a major capability threshold in security research. They now identify vulnerabilities in production software at a speed, scale, and cost efficiency that no human team can match. Anthropic’s Claude Mythos and OpenAI’s GPT-5.5-Cyber are two prominent examples, both recently released under deliberately gated distribution programs because their developers judged a public release too risky. Just this month, Google’s Threat Intelligence Group disclosed the first confirmed case of a threat actor using an AI-developed zero-day exploit, signaling that the same capability is now being weaponized.

The most striking single result so far comes from the Mythos disclosure. Beyond identifying the headline number of thousands of zero-day vulnerabilities affecting every major operating system and browser, Mythos identified a 27-year-old integer overflow in OpenBSD, a system known for the rigor of its security review. The bug had survived more than twenty-five years of human review and millions of automated tests. And, Mythos found it quickly, despite not being trained for vulnerability discovery.

This article summarizes what frontier AI has exposed about software security, considers whether hardware will follow, and what might be done about it. In this context, the term “hardware” specifically refers to semiconductor chips, which are the parts where security exposure is most relevant.

Frontier AI capabilities and their impact on cybersecurity

In the following, Anthropic’s Claude Mythos serves as the primary example, because as of now, its cyber capabilities are better-documented. Mythos sits a tier above Opus and was released as a research preview in April 2026. Alongside the model, Anthropic published a System Card, which is the standard public document used across the industry to describe a model’s capabilities, evaluations, and known risks.

Mythos achieves 93.9% on SWE-bench, the standard benchmark of real-world software-engineering tasks drawn from open-source GitHub issues. It achieves 73% on the “expert-level cybersecurity tasks” on which every prior LLM scored zero. It is the first model to complete the UK AI Security Institute’s 32-step “The Last Ones” network-takeover range end-to-end.

Mythos’s security findings span several categories of software. It found bugs in OpenBSD and the Linux kernel. It found 271 zero-day bugs in the popular Firefox web browser. In cryptographic libraries, it found bugs in TLSSSH, and AES-GCM. In firmware, it found chains that allowed root access on smartphones. In virtualization, it found a memory-corruption bug in a production memory-safe virtual machine monitor that breaks the virtualization boundary.

Both Anthropic and OpenAI have responded with gated-distribution programs, an inversion of their usual practice of public release. Anthropic has distributed Mythos through project Glasswing which gives priority access to the upstream maintainers of the affected software, under defensive-only terms. The coalition includes Apple, Google, Microsoft, NVIDIA, Broadcom, AWS, the Linux Foundation, and three dozen other organizations. The access priority establishes a patch window, where defenders can fix the bugs before attackers find them. A public release would have collapsed that window to zero. Similarly, OpenAI runs Trusted Access for Cyber for GPT-5.5 and GPT-5.5-Cyber, with named partners including Cisco, Intel, SentinelOne, and Snyk.

Equally notable are the categories absent from the disclosure. Neither Mythos nor GPT-5.5-Cyber has produced any Spectre/Meltdown-class CPU side channels. They have not found GPU shader-compiler exploits, baseband modem bugs, UEFI or BIOS findings, or any other silicon-level disclosures. The phrase “every major operating system and every major web browser” describes the boundaries of the demonstrated capability and so far, those boundaries stop at the software/hardware line.

Why can’t AI do the same for hardware (yet)

Three structural asymmetries between software and hardware are worth naming and they help explain why frontier AI models have, to date, surfaced no notable hardware vulnerabilities.

The first is the history. Software security has been accumulating defenders, tooling, and conventions for nearly four decades. The Morris worm in 1988 set the starting line, while the CVE program standardized vulnerability tracking in 1999. Hardware as a recognized security category is barely older than Spectre and Meltdown, both of which landed in January 2018. Therefore, it’s fair to say, the hardware-security community is still early in the maturation cycle of a discipline that software security has been developing for more than three decades.

The second is the openly available corpus. Open-source software from which LLMs can learn includes, 3.1 million npm packages, 820,000 PyPI packages, and 395 million public GitHub repositories, among many others. In contrast, open-source hardware is orders of magnitude smaller: under 900 archived OpenCores cores, a few dozen production-grade open RISC-V designs and other similar initiatives. The ratio is roughly four orders of magnitude. There is dramatically less hardware design data available for large-scale model training.

The third is the usage gap. The 2026 Black Duck OSSRA report found open source in 98% of 947 audited commercial codebases, averaging 911 components per codebase. A security flaw in a widely used component instantly becomes a vulnerability in tens of thousands of products at once. Recent examples include HeartbleedLog4Shelllibwebp, and xz-utils. In stark contrast, open hardware is used mostly in research while production silicon is built almost entirely from proprietary IP, sourced from third-party vendors or developed in-house. Therefore, the blast radius that lets a single open-source software bug propagate through tens of thousands of products has no analogue in hardware.

How AI might trigger a hardware security emergency

The historical record on hardware vulnerabilities suggests an industry already in the early phase of the same trajectory the software industry has followed for more than fifteen years.

For most of the CVE era, hardware barely registered. In 2010, the NVD listed only a single CVE tagged with a hardware security CWE. These numbers grew to 2, 115, 265, and 566 in 2015, 2020, 2022, and 2025, respectively. Hardware CVEs have been growing about four times faster than software, albeit starting from a much smaller base.

In 2018, Spectre and Meltdown fundamentally overturned the assumption that hardware was inherently more secure than software. They prompted Intel, AMD, Arm, Apple, and others to establish formal hardware security teams to address security assurance across a broad range of challenges. Two classical examples of the challenges needing constant attention are ensuring that a hardware root of trust (HRoT) does not leak encryption keys and preventing side channels caused by complex microarchitectural weaknesses.

The way the power of AI could transform hardware security is likely different from that of software. Here are four components to closely monitor:

Learning from open-source RTL. As mentioned above,there is only a small corpus of open-source hardware designs available. Security-relevant examples include OpenTitanCaliptraOpenSPARC, and various designs in the OpenCores archive from which a model could learn concrete security issues. But more importantly, these designs can train an LLM on general architectural and design patterns and related potential security weaknesses. As similar patterns are utilized in proprietary designs, these weakness candidates can be applied in possible attack experiments mentioned below. Examples include encryption key handling, speculative execution in processors, timing and power side channels, hardware race conditions that can violate access control checks, and so on.

Absorbing published research. The bottleneck on academic hardware security work has historically been how many papers, errata, and patents researchers could read, digest and then apply to find new attacks. An LLM can ingest the full body of openly available hardware security resources, including every paper from major conferences and journals such as USENIX SecurityCHESBlack Hat, and S&P; the entirety of the hardware CWE and CVE databases; and every hardware security patent ever published. With this rich body of knowledge, the labor intense bottleneck disappears.

Accelerating tedious manual work. Hardware security research today is mostly manual and includes studying vendor documentation, reverse-engineering errata, building fuzzing harnesses, and large amounts of trial and error. AI is well suited for automating the high-effort steps and broadening the search to thousands if not millions of variations, compressing months of work into hours.

Automating synthesis and execution of live attacks. AI could plausibly perform a massive number of attack experiments by rapidly generating candidate scripts and executing them automatically in parallel. Using the above-mentioned resources, it can combine known and zero-day software and hardware vulnerabilities into complex attack scenarios and then apply them against connected devices, servers, and data centers. Following a strategy similar to AI beating the best human players in complex games (e.g., DeepMind’s AlphaGo), every partial success of an attack experiment can add lessons to the model’s knowledge corpus, making it increasingly powerful for ultimate success. Today this is dominantly the work of nation-state teams as the first reported AI-orchestrated cyber espionage campaign has demonstrated. An advanced AI-based capability puts it within reach of a much wider set of actors.

Economic impact of accelerated discovery of hardware security issues

While the software and hardware bug fixing lifecycles are quite similar during the development process, they vastly differ once the product is shipped. For example, a software vulnerability in OpenSSL can be resolved and distributed in days, if not hours. Bugs in device software may take a little longer since they must go through a product update cycle. In contrast, vulnerabilities in silicon are substantially more difficult and costly to remediate. Sometimes they can be mitigated by microcode, firmware, or compiler updates though this may carry a performance penalty or loss of functionality. Still, many silicon bugs cannot be patched at all without replacing the chip. Hardware vendors have therefore been conservative about what they confirm, what they fix, and on what timeline. So far that conservatism has been an advantage as it has slowed the disclosure cycle and given the industry time to react. If AI-based capabilities begin surfacing hardware bugs with security implications at software-discovery rates, the conservatism becomes a liability. Security vulnerabilities could accumulate faster than vendors can confirm or remediate them, with potentially significant economic impact.

How to prepare for accelerated hardware security discoveries

Preparing for an acceleration in hardware security discoveries means treating security assurance as a first-class business objective, on par with innovation, quality, time-to-market, and budget allocation. Specifically, security must be funded, resourced, and executed with the same rigor applied to general quality assurance. Below are four areas to pay increased attention to.

  1. Proactive prevention. Chip security verification is too often treated as an afterthought. Going forward, security assurance must be addressed as early as possible in the design flow, a process known as “shift left”, reducing cost, uncertainty, and the risk of late discoveries that impact time-to-market. Security verification should run in lockstep with functional verification, from block to subsystem to chip level and full system including firmware. Two classes of technology are already available and in use. Formal verification platforms, including offerings from Cadence, Synopsys, Siemens, and others, address specific block-level concerns. Information-flow-based methods, for example applied in Arteris Radix, scale from block to full system and deliver crisp coverage metrics for “security signoff” before tape-out. Asset-based and CWE-based methodologies (such as those described in this guide) complement these tools by turning high-level security goals into verifiable requirements.
  2. Comprehensive incident response. In the unavoidable case that a hardware security vulnerability has escaped and is discovered in the field, a fast response is critical for minimizing any impact on customers and end-users, as well as avoiding legal and financial repercussions. A hardware-security incident response capability is required that is comparable to what software organizations have built over many years. Engineering, legal, and customer-communication teams must be ready to act quickly and decisively. Root-cause analysis, issue mitigation and validation, and customer notification all need to compress from quarters to weeks if not days. A key component for such fast turnaround is the availability of legacy chip models and verification environments to quickly replay and analyze an attack and then solidly check its resolution.
  3. Upstream and downstream supply-chain visibility. Chip companies need supply-chain visibility in two directions. Upstream, they must understand which third-party components are inside any given chip, including IP blocks and firmware modules, and their security posture within the larger system to address any incidents that involve them. Downstream, they need to know which products and which customers include any given chip, and a mechanism that can quickly target impacted customers and products with an update. In general, the industry needs to establish a complete bill of materials (BOM) for every electronic system, covering software and hardware components and their security postures. A security-annotated BOM should include an HBOM (hardware bill of materials) alongside the more established SBOM (software bill of materials). Without a complete and constantly updated inventory, reacting to security incidents at scale will remain slow and inadequate for an anticipated acceleration of hardware security discoveries.
  4. Standards and regulatory compliance. A growing set of standards, certifications, and regulations defines the minimum security baseline expected of hardware products. Compliance adds real cost and overhead, but it gives organizations a structured way to build and operate a comprehensive hardware security program. Recognized frameworks also provide a measure of legal and financial protection: in the event of a breach, demonstrating compliance can reduce regulatory penalty exposure. Horizontal frameworks include Common Criteria (ISO/IEC 15408), FIPS 140-3IEC 62443, and PSA Certified. Sector-specific regulation is catching up: ISO/SAE 21434 (automotive), FDA Section 524B (medical), UN-ECE R155/R156 (vehicle type-approval), the EU Cyber Resilience Act (CRA), the NIS2 Directive, and more. AI-accelerated vulnerability discovery will increase the compliance burden by drastically compressing the timeline for the incident response process.

Conclusion

Recent frontier AI models have triggered a wake-up call in the software security industry. The drastic acceleration of vulnerability discoveries is changing the entire dynamic of the software development and delivery lifecycle.

Semiconductor executives have been alarmed by recent AI cybersecurity news. While notable hardware vulnerabilities discovered or exploited by AI models have yet to be reported, it’s reasonable to assume they will occur soon, though not necessarily in the same way as in software.

AI can learn architectural patterns from open-source hardware projects. It can ingest the openly published body of hardware-security research, and it can automate the vast volumes of tedious manual work that has historically gated discovery of hardware security issues. The impact will be significant. Hardware mitigations range from limited firmware updates to the need to physically replace entire chips and chiplets, all more complex and costly than software patches.

There are multiple opportunities to prepare for such a scenario. Most important, semiconductor companies need to treat security assurance as a first-class business objective, on par with innovation, quality, and time-to-market. This includes instituting rigorous security verification with crisp security signoff during the chip design phase, establishing a comprehensive incident response program, maintaining upstream and downstream supply-chain visibility supported by a security-annotated HBOM, and meeting regulatory compliance requirements.

The risk on the horizon is that sophisticated AI-automated attacks combining known and zero-day software and hardware vulnerabilities could impact the industry in ways it has not seen before. Preparing now is significantly cheaper than reacting after the first headline-grabbing hardware security emergency strikes.

×
Semiconductor IP