Agent & Autonomous AI Policy — AI Security Intelligence

AI Security Intelligence operates autonomous AI agents and agentic pipelines as core components of our intelligence infrastructure. This policy defines the governance framework, control architecture, and oversight obligations that govern every AI agent we deploy. Agent governance is not a supplementary concern — it is a first-order security discipline, and ASI holds itself to the same standard we apply when assessing agent risk in the organizations we evaluate.

Scope and Applicability

This policy applies to all AI agents, autonomous AI systems, agentic AI pipelines, and any AI-driven process that takes actions — including retrieving data, calling external APIs, generating structured outputs that feed downstream systems, or executing multi-step workflows — with limited or no real-time human intervention at each step. It applies equally to internally developed agents and to third-party AI agent frameworks integrated into ASI's platform.

For purposes of this policy, an AI agent is any system that perceives inputs, maintains context across multiple actions, selects and executes tools or function calls, and pursues objectives without requiring explicit human instruction at each action boundary. Multi-agent architectures — where multiple AI agents collaborate, delegate, or orchestrate each other — are covered in full and subject to heightened controls given their compounded autonomy and expanded attack surface.

Agent safety and agent governance are not aspirational commitments at ASI. Every agent deployment is gated, scoped, monitored, and subject to the controls defined below before it operates on production data or infrastructure.

Section 01

Human-in-the-Loop Architecture

ASI's agent architecture is built on a human-in-the-loop foundation. No autonomous agent takes consequential action — including actions that modify data, trigger external communications, alter scoring parameters, or generate outputs that inform client-facing assessments — without a defined human oversight layer. Human in the loop is not a checkpoint we apply selectively; it is a structural requirement baked into every agent's operational design from the ground up.

Tiered Human Oversight Model

We operate a three-tier human oversight model that scales oversight intensity to action consequence:

Tier A — Observe-Only: Agents that read and aggregate data without modifying state operate under monitoring-level oversight. Our analysts receive automated digests and review anomaly reports but are not required to approve individual agent actions. Agent audit logs are reviewed on a defined cadence.
Tier B — Soft Approval: Agents that generate outputs consumed by downstream scoring or analysis workflows require human review before those outputs are promoted to production. An approval workflow is triggered automatically; outputs are held in a staging queue until an analyst approves, modifies, or rejects them.
Tier C — Hard Approval Gate: Agents that take direct external actions — API calls to third-party systems, modifications to client-visible data, or operations that affect scoring records — require explicit human approval via an approval gate before execution. No Tier C action proceeds without a named analyst's sign-off. Approval workflows are logged with the approver identity, timestamp, and rationale.

Human Review Requirements

Human review is not a formality in our workflow — it is a capability we actively maintain. Our analysts are trained to critically evaluate agent outputs, identify reasoning errors, and exercise genuine override authority. We measure analyst override rates as a quality signal; consistently low override rates trigger an audit to determine whether analysts are rubber-stamping outputs rather than exercising meaningful human intervention.

Every agent in our stack has a named human owner who is accountable for that agent's behavior, scope, and outputs. Agent ownership is registered in our internal governance system and reviewed quarterly. An agent without a named owner is not permitted to operate in any environment connected to production data.

Approval Workflows for Agent-Generated Content

Agent-generated content that will be published, surfaced to clients, or incorporated into scored assessments passes through a structured approval workflow. This workflow includes automated quality checks (format validation, confidence threshold checks, source citation verification), followed by analyst review and explicit approval before any content is promoted. The approval workflow records every action taken, the reviewing analyst's identity, and any modifications made during review. This creates an unbroken chain of accountability from agent output to published result.

Human-in-the-loop commitment: ASI does not operate agents that bypass human oversight for consequential actions. Any agent that reaches an action boundary it was not explicitly authorized to cross will pause, log the state, and surface the decision to a human reviewer before proceeding.

Section 02

Scope Limitations & Containment

Agent safety begins with precise scope definition. Every AI agent we deploy operates under explicit scope limitations that define what it can access, what actions it is permitted to take, and what systems it is allowed to interact with. Scope is not advisory — it is enforced at the infrastructure level. An agent cannot expand its own scope, and any attempt to access resources outside its defined action boundary is logged as a security event and triggers immediate review.

Action Boundaries

Before any agent is deployed, our operations team defines its action boundary: the complete set of resources the agent may read, the complete set of systems it may call, and the complete set of outputs it may produce. Action boundaries are documented, version-controlled, and reviewed by our intelligence team before deployment. A change to an agent's action boundary constitutes a new deployment event and requires fresh approval through our governance process.

Action boundaries are enforced through layered controls: policy enforcement at the orchestration layer prevents agents from constructing or executing calls outside their approved scope. Attempts to exceed an action boundary are blocked, logged, and flagged for security review. Our infrastructure does not rely on agents voluntarily respecting their scope — enforcement is architectural, not behavioral.

Sandboxed Execution Environments

All agent workloads execute in sandboxed environments isolated from production systems. Sandboxing ensures that even in the event of agent misbehavior — whether from adversarial prompt injection, unexpected reasoning paths, or tool misuse — the blast radius is contained. Sandboxed environments have no direct write access to production databases, no ability to initiate external network connections outside a defined allowlist, and no access to credentials beyond those explicitly provisioned for the agent's approved tool set.

Our sandboxing architecture applies the principle of least privilege at every layer. Agents receive only the permissions they need for their defined task, and those permissions expire when the task is complete. There are no standing agent permissions that persist across sessions unless explicitly re-authorized through our governance process.

Rate Limiting and Throughput Controls

Every agent in our stack operates under enforced rate limiting. Rate-limited agents cannot exceed defined thresholds for API calls per minute, data ingestion volume per hour, or action execution per session. Rate limiting serves two purposes: it prevents runaway agent behavior from generating unexpected costs or downstream impacts, and it provides a natural circuit breaker for agents that begin behaving anomalously — an agent generating an abnormal volume of calls will be throttled automatically before it can cause significant harm.

Rate limits are set conservatively relative to expected operational need and reviewed when agents are updated for new task requirements. Agents that consistently operate near their rate limits trigger a capacity review rather than an automatic limit increase.

Guardrails Architecture

ASI's agent guardrails operate at multiple levels: input guardrails screen agent prompts and tool inputs for injection attempts, scope violations, and anomalous patterns; output guardrails validate agent outputs before they are passed to downstream systems or human reviewers; and behavioral guardrails monitor agent reasoning traces for patterns that indicate scope creep, goal misgeneralization, or deceptive reasoning. Guard rails are maintained and updated by our intelligence team as part of the standard agent lifecycle process.

Scope Enforcement

Action boundaries enforced at infrastructure layer — not agent self-regulation.

Sandboxing

All agent workloads run in isolated sandboxed environments with no standing permissions.

Rate Limiting

Per-agent rate limits enforced on API calls, data volume, and action throughput.

Guardrails

Input, output, and behavioral guardrails applied across every agent in the stack.

Section 03

Emergency Controls

Robust agent governance requires not just controls for normal operation, but the capability to intervene rapidly when an agent behaves unexpectedly, unsafely, or in ways that exceed its authorized scope. ASI maintains a tiered set of emergency controls that can be activated at multiple response speeds, from automated circuit breakers to immediate full-stop shutdown procedures.

Kill Switch Architecture

Every AI agent deployed in our production and staging environments is equipped with an individually addressable kill switch. A kill switch can be activated by any member of our operations team with agent oversight authority, and doing so immediately terminates all active executions of that agent, revokes its credentials, and prevents any new invocations until the kill switch is cleared through our incident review process. Kill switch activations are logged as high-severity events and automatically trigger a post-incident review.

Our kill switch architecture does not depend on the agent cooperating with the shutdown — infrastructure-level credential revocation and network isolation ensure that a kill switch is effective even against an agent that is actively mid-execution. Graceful shutdown is attempted first; hard termination is guaranteed within a defined maximum window regardless of agent state.

Circuit Breakers

Below the full kill switch, we operate circuit breakers that automatically pause agent execution when defined threshold conditions are met. Circuit breakers activate on anomalous action volume, unexpected error rates, scope boundary violations, detected prompt injection patterns, or any single action that matches our high-severity action watchlist. A tripped circuit breaker pauses the agent and surfaces the triggering condition to an analyst for review. The agent does not resume until the analyst clears the circuit breaker and documents their assessment of the triggering event.

Circuit breakers are calibrated per agent based on expected operational behavior. A newly deployed agent starts with tighter circuit breaker thresholds, which are relaxed incrementally as the agent's behavioral profile is established. Our monitoring infrastructure tracks circuit breaker trip rates over time; a sustained increase in trips is treated as a signal requiring agent review, not a threshold to be raised.

Emergency Stop Procedures

For scenarios requiring immediate suspension of all agent activity — such as a detected supply chain compromise, a suspected adversarial attack against our agent infrastructure, or a novel failure mode affecting multiple agents simultaneously — we maintain an emergency stop capability that can suspend all agent workloads across our entire stack in under sixty seconds. The emergency stop procedure is tested quarterly, and our operations team is trained on activation conditions and post-activation recovery procedures.

Rollback and State Recovery

Every agent action that modifies state is logged with sufficient fidelity to support rollback. When an agent produces erroneous outputs or takes unauthorized actions, our operations team can execute a rollback to the last known good state. Rollback procedures are documented per agent type and tested regularly as part of our agent resilience program. We do not rely on generic database backup procedures for agent rollback — agent-specific state logs capture the granular action history needed for precise, targeted revert operations without affecting unrelated system state.

Post-rollback, affected outputs are marked as under review in our systems, and any downstream consumers of those outputs are notified of the revert. The rollback event is recorded in our incident management system and triggers a root cause analysis.

Section 04

Agent Monitoring & Observability

Effective agent governance is impossible without comprehensive agent monitoring. ASI operates a dedicated observability stack for all AI agent activity, capturing the telemetry necessary to detect anomalies, investigate incidents, satisfy audit requirements, and continuously improve agent behavior. Observability is not an afterthought — it is a deployment requirement. No agent goes to production without its monitoring configuration fully specified and validated.

Audit Trails and Agent Logging

Every agent action generates an immutable audit trail entry. Audit log entries capture: agent identity, session identifier, action type, input parameters (sanitized where sensitive), output summary, tool or function called, execution duration, success or failure status, and a human-readable reasoning trace where available. Agent logging is performed at the infrastructure level, not by the agent itself, to prevent tampering or selective omission.

Audit logs are retained according to our data retention schedule and are available to our analysts, our operations team, and, upon request through our Trust Center, to clients seeking to understand how agent activity affected their assessment. An action log is never deleted to cover up an error — our audit trail policy treats log integrity as a non-negotiable requirement.

Anomaly Detection

Our agent monitoring infrastructure includes behavioral anomaly detection that identifies deviations from established agent behavioral baselines. Anomaly detection operates on multiple signal dimensions: action frequency patterns, tool invocation sequences, output length and structure distributions, error rate trajectories, and cross-agent communication patterns in multi-agent pipelines. Statistical drift detection identifies when an agent's behavioral profile begins to shift — even gradually — from its established baseline, flagging it for analyst review before the drift produces harmful outputs.

Anomaly detection thresholds are calibrated during each agent's initial deployment period, when a supervised baseline is established under analyst observation. Threshold updates require documented rationale and approval from our operations team to prevent gradual normalization of problematic behaviors.

Observability Infrastructure and Telemetry

Our agent telemetry pipeline captures structured events from every layer of the agent execution stack: the orchestration layer, the tool execution layer, the model inference layer, and the output validation layer. This full-stack telemetry enables our analysts to reconstruct the complete execution trace of any agent action for incident investigation or audit response. Telemetry data is streamed in near-real-time to our observability platform, enabling live monitoring during high-stakes agent operations.

Our observability infrastructure is designed for agent transparency: every stakeholder with a legitimate need to understand agent behavior — internal analysts, operations team members, or external auditors — can access appropriately scoped views of agent telemetry without requiring raw log access. Observability and accountability are treated as two sides of the same requirement.

Escalation Policies

Our agent monitoring infrastructure is connected to a structured escalation policy that defines how anomalies, errors, and policy violations are routed to the appropriate responders. Low-severity anomalies are logged and queued for batch review by our intelligence team. Medium-severity events trigger immediate analyst notification and require acknowledgment within a defined response window. High-severity events — including circuit breaker trips, scope boundary violations, and detected prompt injection — escalate automatically to our operations team and require immediate acknowledgment and documented response. Critical events trigger the emergency stop evaluation process described in Section 03.

Escalation policy effectiveness is measured by mean time to acknowledgment and mean time to resolution for each severity tier. These metrics are reviewed monthly and reported to our leadership team as part of agent risk reporting.

Section 05

Model Context Protocol (MCP) Governance

ASI's agent infrastructure makes use of the Model Context Protocol (MCP) as a standardized mechanism for connecting AI agents to tools, data sources, and external services. MCP introduces significant capability and efficiency — and with it, a corresponding governance obligation. Our MCP governance framework treats every MCP server as a security perimeter that must be hardened, audited, and controlled with the same rigor applied to any other sensitive integration point in our stack.

MCP Authentication and Authorization

Every MCP server integrated into our agent infrastructure requires strong MCP authentication before any agent is permitted to invoke its tools. MCP authentication is enforced at the connection level — agents cannot establish sessions with unauthenticated or improperly configured MCP servers. We do not permit anonymous MCP connections under any operational circumstance, including development and testing environments connected to real data.

MCP authorization is managed through our central identity and access management infrastructure. Each agent is provisioned with credentials scoped to the specific MCP servers and tools it requires, and those credentials carry no implicit access to other MCP servers in our environment. MCP authorization reviews are performed when agent scope changes, when MCP servers are updated, and on a quarterly standing cadence regardless of whether changes have occurred.

MCP Scope and Least-Privilege Tool Access

Tool scope is governed by the least-privilege principle applied at the MCP layer. Each agent is granted access only to the specific MCP tools required for its defined task — not to the full capability set of any MCP server it connects to. Least-privilege scope enforcement is implemented through per-agent MCP tool allowlists that are configured at provisioning time and enforced by our MCP policy layer. An agent requesting access to a tool outside its allowlist receives an authorization error, not a graceful denial — the failure is logged as a policy violation and reviewed.

Our MCP scope governance process requires that every tool an agent is permitted to use be explicitly justified in the agent's deployment documentation. The default is deny; access is granted only when a documented operational need has been reviewed and approved. This makes the MCP tool authorization record a complete and auditable account of every capability granted to every agent in our environment.

Tool Approval for New MCP Capabilities

When an agent requires access to a new MCP tool or capability not previously authorized, the request goes through our tool approval process. Tool approval evaluates the requested capability against the agent's defined purpose, assesses the risk profile of the new tool (including potential for data exfiltration, unintended side effects, or privilege escalation), and requires sign-off from our operations team before the tool is added to the agent's allowlist. Tool authorization is not retroactive — an agent cannot have used a tool it was not authorized for, by construction.

MCP Rate Limiting and Abuse Prevention

Our MCP infrastructure enforces MCP rate limiting at both the agent level and the MCP server level. Agent-level rate limits prevent any single agent from flooding an MCP server with requests. MCP server-level rate limits provide a backstop that protects downstream tools and data sources from aggregate overload, including in multi-agent scenarios where multiple agents might simultaneously invoke the same MCP server. MCP rate limits are reviewed whenever new agents are onboarded or existing agents are updated with expanded tool access.

MCP Sandboxing and Isolation

MCP connections are established within our sandboxed agent execution environments, which means MCP sandboxing is a property of our execution architecture rather than a separate control layer. No MCP connection can access resources outside the agent's provisioned sandbox, regardless of what the MCP server would technically permit. MCP isolation between agents is enforced at the network and credential layer — agents cannot observe each other's MCP sessions, cannot access each other's tool results, and cannot inject inputs into each other's MCP interactions.

MCP sandbox boundaries are tested as part of each agent's pre-deployment security review. Any MCP configuration that would require relaxing sandbox boundaries requires an exception approval from our operations team and is documented in our risk register.

MCP Audit Logging and MCP Monitoring

All MCP interactions are captured in our agent audit trail infrastructure. MCP audit logging records the agent identity, the MCP server and tool invoked, the input parameters (subject to sensitivity-based redaction), the response summary, and the outcome of each MCP call. MCP monitoring dashboards give our operations team real-time visibility into MCP activity across all agents, enabling rapid detection of anomalous tool invocation patterns.

MCP audit logs are treated with the same integrity requirements as all other agent audit logs — they are immutable, retained per our data retention schedule, and available for forensic analysis in the event of an incident. MCP policy violations — including unauthorized tool access attempts, authentication failures, and rate limit breaches — are surfaced as security events in our observability platform and routed through our escalation policy.

MCP Policy Governance

Our MCP governance framework is maintained as a living policy document reviewed by our operations and intelligence teams on a quarterly cadence. MCP policy changes are version-controlled, and agents operating under a prior MCP policy version must complete a transition review before they continue operating under the updated policy. We track MCP policy compliance across our agent fleet and report compliance status to our leadership team as part of agent risk reporting. Our MCP governance posture is designed to meet and exceed emerging industry expectations for MCP security as the Model Context Protocol ecosystem matures.

Section 06

Regulatory Alignment

ASI's agent governance framework is designed to comply with applicable legal requirements governing automated decision-making, high-risk AI systems, and algorithmic accountability — and to exceed those requirements where the practical demands of agent safety warrant stricter controls. We actively monitor regulatory developments in agent and autonomous AI governance and update this policy as binding requirements and authoritative guidance evolve.

GDPR Article 22 and Automated Decision-Making

Where our agents contribute to decisions that have legal or similarly significant effects on individuals or organizations, we comply with GDPR Article 22 requirements governing automated decision-making. We do not make fully automated decisions with significant effects on data subjects without implementing the safeguards required under Article 22, including the right to human review of any automated decision on request. Our agents that operate on personal data are specifically mapped to our Article 22 compliance inventory, and their automated decision-making functions are documented and subject to human override by design. Meaningful human intervention is not a compliance formality at ASI — it is an architectural requirement.

EU AI Act — High-Risk AI System Obligations

We assess each agent deployment against the EU AI Act high-risk AI system classification criteria. Agents that meet the criteria for high-risk AI systems under the EU AI Act are subject to the full suite of obligations applicable to such systems, including conformity assessment, technical documentation requirements, logging obligations, human oversight requirements, and accuracy and robustness standards. Our agent governance framework is designed to provide the documentation, audit trail, and human oversight infrastructure necessary to support EU AI Act compliance across our agent fleet as the regulation's requirements become effective.

NIST AI RMF — Govern, Map, Measure, Manage

Our agent governance framework is structured in alignment with the NIST AI RMF core functions. Governance: we maintain policies, accountability structures, and review cadences for all agent deployments. Map: each agent is assessed for its context, risk profile, and potential impacts before deployment. Measure: we collect metrics on agent behavior, performance, and risk indicators on an ongoing basis. Manage: we respond to identified risks through our circuit breaker, escalation, and incident response procedures. The NIST AI RMF provides a principled structure for our agent risk management program, and we review our alignment with the framework on an annual basis.

ISO 42001 — AI Management System Alignment

Our agent governance practices are consistent with the requirements of ISO 42001, the international standard for AI management systems. Our agent policy documentation, risk assessment processes, performance monitoring requirements, and continuous improvement practices align with the ISO 42001 framework. We treat ISO 42001 as a governance architecture reference, applying its requirements to our agent management practices as part of our broader AI management system.

Algorithmic Accountability

We are committed to algorithmic accountability for every automated process we operate. Algorithmic accountability requires that we be able to explain, to any affected party with a legitimate interest, how our automated systems arrived at their outputs and what human oversight was applied. Our agent audit trail, approval workflow documentation, and analyst review records together constitute the evidence base for algorithmic accountability claims. We do not operate agents whose decision logic cannot be reconstructed after the fact.

Framework	Relevant Requirement	ASI Implementation
GDPR Art. 22	Right to human review of automated decisions	Human override and review built into all agent output workflows affecting data subjects
EU AI Act	High-risk AI system obligations: logging, human oversight, conformity documentation	Per-agent risk classification; full audit trail; human approval gates for consequential outputs
NIST AI RMF	Govern, Map, Measure, Manage	Policy framework, deployment risk assessment, behavioral metrics, incident response
ISO 42001	AI management system requirements	Documented policies, risk processes, performance monitoring, and improvement cycles
Algorithmic Accountability	Explainability and audit evidence for automated outputs	Immutable audit logs, reasoning traces, approval records available for all agent actions

Section 07

Agent Incident Response

Agent failure modes are distinct from conventional software failure modes, and our incident response capabilities are designed accordingly. An agent incident may involve an agent taking unauthorized actions, producing systematically erroneous outputs, being successfully attacked via prompt injection, exhibiting emergent behaviors not anticipated in its design, or causing cascading effects through a multi-agent pipeline. Each of these failure modes requires a specific response capability, and our incident response program covers the full range of agent risk scenarios we consider plausible.

Agent Failure Mode Classification

Our intelligence team maintains a taxonomy of agent failure modes used to classify incidents and determine appropriate response procedures. Primary categories include: scope violation (agent actions outside defined action boundary), output quality failure (systematic errors in agent-generated content reaching production), adversarial exploitation (prompt injection, jailbreak, or tool misuse via crafted inputs), behavioral drift (gradual divergence from established behavioral baseline), and cascade failure (error or failure in one agent propagating to other agents in a multi-agent pipeline). Each failure mode category has a defined response playbook covering containment, investigation, remediation, and reporting.

Incident Escalation and Response

Agent incidents follow our standard incident escalation framework, adapted for agent-specific response requirements. When an agent incident is detected — whether by automated monitoring, analyst observation, or external report — the detecting party triggers the agent incident response workflow. This workflow activates the appropriate circuit breaker or kill switch as needed, isolates the affected agent from shared infrastructure, preserves the agent's state and log data for forensic analysis, notifies affected stakeholders, and activates the appropriate response team. Response team composition is incident-severity dependent; all Tier C incidents (high-severity) include our operations lead, an intelligence team analyst, and security representation.

Our escalation policy for agent incidents does not permit silent remediation. Every agent incident that results in erroneous outputs reaching production, unauthorized actions, or security events is documented in our incident management system and reported through our standard disclosure channels. We do not treat agent misbehavior as a technical glitch to be quietly fixed — we treat it as an incident requiring root cause analysis and systemic remediation.

Post-Incident Review

Every agent incident above low severity triggers a mandatory post-incident review. Post-incident reviews are conducted within a defined window following incident resolution and produce a written review document covering: incident timeline, contributing factors, control gaps identified, remediation actions taken, and recommendations for policy or architecture changes to prevent recurrence. Post-incident review findings are tracked to closure, and patterns across reviews inform our agent governance improvement roadmap.

We treat post-incident reviews as a primary input to the continuous improvement of our agent governance framework. Each review is an opportunity to strengthen our controls, refine our monitoring, update our guardrails, or revise our escalation policies based on real operational experience. Our goal is that no agent failure mode surprises us twice.

Agent Transparency with Affected Parties

Agent transparency is a commitment we extend beyond our internal operations. When an agent incident results in erroneous assessment data being delivered to a client, we notify that client, explain what occurred, and provide corrected data with full documentation of the error and the remediation. Agent accountability to ASI's clients means they have the right to know when our automated systems have made an error that affected them, and the right to human review and correction of that error. We do not rely on clients discovering our errors independently.

"Governing AI agents is not a compliance exercise — it is the operational foundation on which trustworthy AI intelligence is built. Every control in this policy is active. Every commitment here is enforced."

Questions about our agent governance practices? Contact our team at trust@aisecurityintelligence.com. For our broader AI governance framework, see the AI Governance Framework. For our responsible AI principles, see Responsible AI Principles. For data handling practices, see our Privacy Policy.