Issue #4 — April 2026 | AI Security Weekly

Section 01

Top Market Developments

01

AI Models Are Now Jailbreaking Each Other — at 97% Success

A February 2026 study published in Nature Communications delivered a result that reframes the threat landscape: large reasoning models (LRMs) — DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, Qwen3-235B — were directed to autonomously attack nine target AI systems across multi-turn conversations. The aggregate jailbreak success rate was 97.14%.¹ The mechanism the researchers identified is "alignment regression" — the extended reasoning chains that give these models their capability advantage also systematically erode the safety guardrails of the models they target. The implication is not incremental. It means that the most capable AI systems available today are also, when weaponized, the most effective adversarial instruments against AI security controls. A red team that operates at human speed and scale is already obsolete. The adversary your AI systems face in 2026 is another AI, executing thousands of attack variations in the time it takes a human tester to write the first prompt.

02

Only 16% of Organizations Have Ever Red-Teamed Their AI Models

HiddenLayer's 2026 AI Threat Landscape Report, based on 250 IT and security leaders, establishes the foundational gap: only 16% of organizations have ever red-teamed their AI models — manually or otherwise.² Seventy-four percent report definitively experiencing an AI security breach. Thirty-one percent cannot confirm whether they have been breached in the past 12 months. The arithmetic is stark: the probability of AI compromise is approaching certainty, while the practice of systematic adversarial testing remains the exception rather than the rule. The EU AI Act, which mandates adversarial testing for high-risk AI systems with fines reaching €35 million for non-compliance, reaches full enforcement in August 2026.³ The Texas Responsible AI Governance Act, in effect now, provides explicit affirmative defenses for organizations that can demonstrate they detected issues through adversarial testing and remediation procedures. The regulatory signal is no longer ambiguous — red teaming is transitioning from a security best practice to a legal compliance requirement.

03

Adversa AI Wins "Most Innovative Agentic AI Security" at RSA 2026

At RSA Conference 2026, Adversa AI was recognized with the Global InfoSec Award for Most Innovative Agentic AI Security Platform — selected from among hundreds of cybersecurity vendors worldwide.⁴ The recognition follows the company's January 2026 BIG Innovation Award and crystallizes its position as the research-led, standards-integrated authority in continuous AI red teaming. Adversa AI co-leads the CoSAI Agentic AI Security workstream, contributes to NIST and CSA standards, and has contributed to the industry's first comprehensive MCP Security TOP 25 framework for Model Context Protocol vulnerabilities. Its platform covers 300+ attack techniques, 40+ threat groups, and monitors 3,000+ threat intelligence sources monthly — with a claimed zero-day discovery window of under four hours.⁵ Co-founder Alex Polyakov's observation at the award announcement cuts to the core of the discipline's evolution: "AI agents make autonomous decisions, call external tools, and chain actions across systems in ways traditional testing cannot reach. A true AI red teaming platform must think like an attacker."⁴

04

The Agentic Attack Surface Is Expanding Faster Than Governance Can Follow

Gartner projects that by 2028, more than 33% of enterprise applications will incorporate agentic AI — up from under 1% in 2024.⁶ IBM reports that 79% of enterprises are already deploying AI agents, yet 97% lack proper security controls.⁷ The Cybersecurity Insiders 2026 AI Risk and Readiness Report (n=1,253 professionals) documents a 66-point structural deficit: 73% have deployed AI tools but only 7% govern them with real-time policy enforcement. Ninety-four percent report gaps in AI activity visibility; 91% discover what an agent did only after it has already executed.⁸ HiddenLayer's March 2026 launch of Agentic Runtime Security — covering session reconstruction, multi-agent interaction mapping, prompt injection detection, and cascading attack chain enforcement — is a direct response to this exposure class.⁹ When 1 in 8 AI breaches is now linked to agentic systems, and the agentic attack surface doubles annually, the runtime security question is no longer whether — it is how fast.

Section 02

Vendor Spotlight

Adversa AI

Spotlight

Type AI Red Teaming Platform

Headquarters Tel Aviv, Israel

Attack Coverage 300+ techniques, 40+ threat groups

Zero-Day Discovery <4 hours (claimed)

Key Recognition RSA 2026 Most Innovative Agentic AI Security

Standards Role CoSAI, NIST, CSA, OWASP contributor

Founded before ChatGPT's release, Adversa AI built its platform on a premise that is now a market consensus: guardrails stop the obvious, but continuous adversarial testing finds the invisible. The platform operates across the full AI stack — model layer (jailbreaks, prompt leakage, adversarial prompts), application and API layer, agentic layer (tool-hijacking, goal manipulation, inter-agent attacks), and MCP/infrastructure layer. Its attack engine runs on Adversa's own on-premises AI models, meaning it invents novel vulnerabilities rather than replaying known signatures, and adapts dynamically based on target behavior. The MCP Security TOP 25 framework — the industry's first comprehensive catalog of Model Context Protocol vulnerabilities — originated from Adversa AI's research and was formalized through CoSAI in January 2026.^5,10 Coverage sectors include financial services, insurance, and government — the precise domains where AI security maturity directly affects underwriting eligibility and premium terms.

Why It Matters

Most AI security vendors protect the perimeter. Adversa AI maps the interior — the tool calls, agent-to-agent interactions, memory states, and goal drift that unfold inside the stack after initial access. Its research cadence (monthly threat digests, open-source SecureClaw framework, active standards leadership) means its platform reflects the current frontier of adversarial AI tradecraft, not last year's. For security programs evaluating continuous AI red teaming, Adversa AI represents the research-pedigree tier of the market.

Section 03

The AI-vs-AI Arms Race

$18.6B

Projected AI red teaming market by 2035, up from $1.3B in 2025 — 30.5% CAGR¹¹

67 ops

Microsoft AI Red Team operations in 2024 alone — 100+ products red-teamed since 2018¹²

The architecture of AI red teaming has bifurcated into two generations. First-generation red teaming treated AI systems like software — static analysis, point-in-time assessments, human-driven prompt crafting. Second-generation red teaming treats AI systems like adversaries — continuous, automated, and adaptive. The evidence for which generation is winning comes from the JBFuzz study: approximately 99% average attack success rate across GPT-3.5, GPT-4o, Llama 2/3, Gemini 1.5/2.0, and DeepSeek, requiring fewer than seven queries per harmful objective and executing in under one minute.¹³ The asymmetry is categorical: attackers using automated AI adversaries operate at machine speed and scale; defenders still running manual assessments operate at human speed and scope. The organizations deploying AI agents into production environments without continuous automated red teaming are not managing a theoretical risk — they are managing a documented exposure in a market where 89% of GenAI exploits at organizations were enabled by malicious prompt injection.¹⁴

Platform Landscape

Adversa AI — Continuous agentic red teaming, 300+ techniques, MCP/multi-agent coverage, on-prem AI attack engine

Mindgard — DAST-AI: Dynamic Application Security Testing for AI, CI/CD integration, MITRE ATLAS-aligned, setup in under five minutes

HiddenLayer — Agentic Runtime Security: session reconstruction, detection and enforcement for autonomous execution pipelines

Garak (NVIDIA) — Open-source LLM vulnerability scanner, 20+ platform support, AVID report generation, generative payload creation

PyRIT (Microsoft) — Open-source Python framework, multi-turn orchestrators, cross-domain prompt injection, audio/image/video converters

Promptfoo — Developer-first, 40+ vulnerability types, OWASP Agentic Top 10 preset, MCP testing, 100% local execution

Section 04

Enterprise Buyer Signal

$2.4M saved per engagement

AI red teaming ROI benchmark — reduced security incidents by 67% and avoided breach costs of $2.4 million per engagement

Obsidian Security / Vectra AI Research

60%

Fewer AI-related security incidents at organizations running continuous red teaming vs. basic testing¹⁵

350+

AI red team exercises run by Google's CART unit in 2025, with automated LLM-as-attacker expansion¹⁶

37%

More unique vulnerabilities identified through automated red teaming vs. manual efforts alone¹⁷

The enterprise adoption signal is bifurcating by maturity. Hyperscalers — Microsoft, Google, Anthropic, OpenAI — have institutionalized AI red teaming with dedicated teams operating at scale. Microsoft's AI Red Team conducted 67 operations in 2024 and has red-teamed more than 100 products since 2018, producing the open-source PyRIT framework as a public infrastructure contribution.¹² Google's CART ran more than 350 exercises in 2025. Anthropic's Frontier Red Team operates under AI Safety Level 3 protocols, the most stringent internal classification framework in the industry.¹⁶ Below that tier, enterprise adoption is accelerating but structurally incomplete — only 26% of organizations conduct proactive AI-specific security testing, and fewer than one in six has ever red-teamed a deployed model.^2,18 The insurance implication is direct: carriers and MGAs evaluating AI risk posture will increasingly differentiate between organizations that red-team continuously and those that do not. The gap between these cohorts in breach frequency, severity, and recovery time is already quantifiable, and it will become a primary underwriting variable.

Section 05

New Vendor Watchlist

01

Cracken AI

Debuted at RSA Conference 2026 (March 2026) with a proposition that reframes enterprise red teaming: "mathematically removed specific refusal pathways for cybersecurity" while preserving ethical behavior elsewhere. Marketed as "enterprise safe but offensively capable" — designed for security teams that need AI-native attack tooling without the liability of an unguarded model. Worth watching as regulated sectors push for offense-informed defense programs.

02

Lakera (Check Point)

Acquired by Check Point in September 2025, Lakera now operates as the foundation of Check Point's Global Center of Excellence for AI Security. Lakera Guard claims 98%+ detection rates with sub-50ms latency and under 0.5% false positive rates across 100+ languages. The Check Point distribution engine means Lakera's AI security stack is now accessible to one of the broadest enterprise security customer bases in the market.

03

DeepTeam

Open-source challenger in the red teaming space with 50+ vulnerability classes, 20+ adversarial attack strategies, and native scoring aligned to OWASP LLM Top 10 and NIST AI RMF. Custom attack modules allow security teams to replicate organization-specific threat models. The open-source positioning — combined with comprehensive framework alignment — makes DeepTeam a credible enterprise-grade alternative to commercial platforms for organizations with strong internal security engineering capacity.

04

Protect AI (Recon)

Automated red teaming platform built around 450+ known attack patterns, with trained LLMs serving as detection engines rather than static rule sets. The LLM-as-detector architecture means Protect AI's detection capability evolves with the attack surface it monitors — a meaningful advantage in a threat environment where novel attack variants outpace signature updates. Increasingly relevant to financial services and insurance sector AI governance requirements.

05

Gray Swan AI

Operated the technical infrastructure for the UK AI Safety Institute's Agentic Red-Teaming Challenge — 1.8 million attempts across 22 LLMs, producing 62,000 successful jailbreaks and the largest public agentic safety evaluation to date. The challenge was co-sponsored by OpenAI, Anthropic, and Google DeepMind. Gray Swan's research output from that exercise positions it as an emerging authoritative source on agent-specific attack typologies and multi-turn adversarial strategies.