Claude Managed Agents for DevOps — Complete Guide 2026

Claude Managed Agents diagram showing autonomous DevOps agents running infrastructure tasks in parallel with multiagent orchestration

There’s a class of DevOps task that’s too complex for a simple script but not quite worth paging a human at 2am. Infrastructure drift detection. Kubernetes cost analysis across 15 namespaces. Post-incident runbook execution. Nightly security scans that generate a report and open a ticket if something looks wrong.

Until recently, you had two options: build a brittle custom automation stack and babysit it, or accept that someone always needs to be in the loop.

Claude Managed Agents is a third option — cloud-hosted AI agents that can run these tasks autonomously, resume after interruption, coordinate with subagents for parallel work, and get better at your specific environment over time. Launched in public beta in April 2026, it’s already being used in production by Notion, Asana, and Rakuten.

This guide covers what Managed Agents actually is, how it works, and three real DevOps automation patterns you can build with it today.

What Is Claude Managed Agents?

Claude Managed Agents is a hosted infrastructure layer for running AI agents at scale. Instead of you managing containers, state, checkpointing, tool orchestration, and error recovery — Anthropic handles all of that. You define the agent (model, system prompt, tools, MCP servers, skills), and the service handles everything required to actually run it reliably.

The infrastructure layer covers: secure sandboxing so the agent can’t touch systems outside its scope, long-running sessions that survive network interruptions, scoped permissions controlling what tools the agent can invoke, and full tracing of every action the agent takes.

Sessions are long-running, resume cleanly after pauses, and store conversation history, sandbox state, and outputs server-side. You define the model, system prompt, tools, MCP servers, and skills once — create the agent once and reference it by ID across sessions.

Pricing: Standard Claude API token rates plus $0.08 per session-hour. For long-running infrastructure tasks that would otherwise require human time, that’s a significant ROI.

Current status: Public beta. All Managed Agents endpoints require the managed-agents-2026-04-01 beta header — the SDK sets this automatically. Two features — multiagent coordination and Dreaming — are in research preview and require separate access requests.

How It Differs from Claude Code

If you’ve already been using Claude Code with Skills (covered in our Claude Skills for DevOps guide), you might wonder where Managed Agents fits. Here’s the practical distinction:

Claude Code is your interactive development partner. You’re in the loop, it’s running in your terminal, it handles tasks during your working session. Great for: writing Terraform, reviewing manifests, debugging pipelines while you’re actively working.

Managed Agents runs autonomously in the cloud. You trigger it via API, it executes a long-running task, you get the result — you don’t babysit it. Great for: scheduled infrastructure audits, automated incident triage, nightly jobs, anything that should run without a human watching.

Claude Code is best for software development teams and DevOps automation where engineers want to embed Claude directly into existing pipelines. Managed Agents is for tasks that need to run headlessly, asynchronously, and at scale.

Think of it this way: Claude Code is your pair programmer. Managed Agents is your on-call automation engineer that works while you sleep.

The Three Core Concepts

Before building anything, understand how Managed Agents structures work.

1. The Agent Definition

An agent is a reusable configuration you create once:

import anthropic

client = anthropic.Anthropic()

agent = client.beta.managed_agents.agents.create(
    name="k8s-cost-analyzer",
    model="claude-opus-4-6",
    system_prompt="""You are a Kubernetes cost optimization agent.
    When given a namespace or cluster context, you:
    1. Analyze resource requests vs actual usage
    2. Identify over-provisioned workloads
    3. Generate a prioritized list of right-sizing recommendations
    4. Estimate monthly savings for each recommendation
    Always output findings in structured JSON followed by a human-readable summary.""",
    tools=[{"type": "bash_20250124", "name": "bash"}],
    betas=["managed-agents-2026-04-01"]
)

print(f"Agent ID: {agent.id}")  # Save this — reference it for every session

Create the agent once. Reference it by ID forever. The agent definition is immutable — update it by creating a new version.

2. Sessions

A session is a single task execution. You start a session, pass it a task, and it runs to completion — or you can poll it, stream events from it, or subscribe to webhooks for async notification.

session = client.beta.managed_agents.sessions.create(
    agent_id="agent_01abc...",
    input="Analyze the production namespace in our EKS cluster and identify the top 5 cost reduction opportunities",
    betas=["managed-agents-2026-04-01"]
)

# Stream the session output in real time
for event in client.beta.managed_agents.sessions.stream(session.id):
    if event.type == "text":
        print(event.text, end="", flush=True)

Sessions are stateful — they checkpoint automatically. If a network interruption hits during a 2-hour infrastructure audit, the session resumes where it left off, not from scratch.

3. Vaults — Credential Management

DevOps agents need credentials: kubeconfig, AWS access keys, GitHub tokens. Vaults store these securely and inject them into agent sessions without you passing secrets through the API.

# Store credentials once in a vault
vault = client.beta.managed_agents.vaults.create(
    name="production-k8s-credentials",
    secrets={
        "KUBECONFIG": kubeconfig_content,
        "AWS_ACCESS_KEY_ID": aws_key,
        "AWS_SECRET_ACCESS_KEY": aws_secret
    },
    betas=["managed-agents-2026-04-01"]
)

# Reference the vault when starting a session — no raw secrets in API calls
session = client.beta.managed_agents.sessions.create(
    agent_id="agent_01abc...",
    vault_id=vault.id,
    input="Run the weekly cost analysis",
    betas=["managed-agents-2026-04-01"]
)

Vault secrets are injected as environment variables inside the agent’s sandboxed environment. The agent can use them, but they never appear in session logs or event streams.

Three Real DevOps Automation Patterns

Pattern 1: Nightly Kubernetes Cost Analysis

The problem: You want a weekly report of over-provisioned workloads across all namespaces, with right-sizing recommendations and estimated savings. Nobody wants to run this manually.

Before building this agent, make sure your cluster’s Kubernetes RBAC is properly scoped — the service account the agent uses should have read-only access to pods, deployments, and metrics, nothing more.

The agent setup:

cost_agent = client.beta.managed_agents.agents.create(
    name="k8s-weekly-cost-report",
    model="claude-opus-4-6",
    system_prompt="""You are a Kubernetes FinOps agent. Your job is weekly cost analysis.

    For each namespace provided:
    1. Run kubectl top pods to get actual CPU/memory usage
    2. Compare against resource requests in each Deployment/StatefulSet
    3. Flag any workload where requests exceed actual usage by more than 50%
    4. Calculate estimated monthly waste (over-provisioned resources × node cost)
    5. Recommend specific new resource request values based on p95 usage

    Output format:
    - JSON summary with total_estimated_waste_monthly_usd
    - Per-namespace findings with specific kubectl patch commands to apply fixes
    - Priority ranking: high (>$100/month waste), medium ($20-100), low (&lt;$20)

    Be conservative — never recommend reducing limits below 2x actual p95 usage.""",
    tools=[{"type": "bash_20250124", "name": "bash"}],
    betas=["managed-agents-2026-04-01"]
)

The GitHub Actions trigger (runs every Monday at 8am):

name: Weekly K8s Cost Report

on:
  schedule:
    - cron: '0 8 * * 1'

jobs:
  cost-analysis:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger Managed Agent
        run: |
          SESSION=$(curl -s -X POST https://api.anthropic.com/v1/beta/managed-agents/sessions \
            -H "x-api-key: ${{ secrets.ANTHROPIC_API_KEY }}" \
            -H "anthropic-beta: managed-agents-2026-04-01" \
            -H "content-type: application/json" \
            -d '{
              "agent_id": "${{ secrets.COST_AGENT_ID }}",
              "vault_id": "${{ secrets.K8S_VAULT_ID }}",
              "input": "Run full cost analysis for namespaces: production, staging, data-platform. Output report and open a GitHub issue with findings if total waste exceeds $500/month."
            }')
          echo "Session ID: $(echo $SESSION | jq -r .id)"

If you’re new to GitHub Actions schedules and triggers, our introduction to GitHub Actions covers the fundamentals before you wire this up.

The agent runs, generates the report, and opens a GitHub issue automatically if the waste threshold is hit. Zero human involvement until there’s something actionable.

Pattern 2: Automated Incident Triage Agent

The problem: An alert fires at 3am. Before waking anyone up, you want an agent to gather context, check the obvious causes, and either resolve it automatically or produce a concise brief for the on-call engineer.

Pair this agent with solid Kubernetes Network Policies — if a compromised pod triggered the alert, network isolation limits the blast radius while the agent investigates.

triage_agent = client.beta.managed_agents.agents.create(
    name="incident-triage",
    model="claude-opus-4-6",
    system_prompt="""You are an incident triage agent. When triggered with an alert, you:

    PHASE 1 — Gather context (always do this first):
    - kubectl describe the affected resource
    - Check recent events: kubectl get events --sort-by='.lastTimestamp'
    - Pull last 100 lines of pod logs
    - Check resource usage: kubectl top pods in the affected namespace
    - Query recent deployments: check ArgoCD or kubectl rollout history

    PHASE 2 — Check common causes in order:
    1. OOMKilled? → Check memory limits vs usage, recommend limit increase
    2. CrashLoopBackOff? → Check logs for error pattern, check liveness probe config
    3. Pending pods? → Check node capacity, PVC availability, image pull errors
    4. High latency? → Check HPA status, replica count vs load

    PHASE 3 — Take action or escalate:
    - If cause is clear AND fix is low-risk (restart, scale up): execute fix, document it
    - If cause requires code change or is ambiguous: do NOT act, produce escalation brief
    - Escalation brief format: Alert → Root cause → Evidence → Recommended next step → Severity (P1/P2/P3)

    Never delete resources. Never modify production config without the escalation path.""",
    tools=[{"type": "bash_20250124", "name": "bash"}],
    betas=["managed-agents-2026-04-01"]
)

Wire this to your alerting stack via webhook — Alertmanager, PagerDuty, or Grafana can all POST to an endpoint that starts an agent session. The agent gathers context and either resolves the issue or hands off a complete brief to the on-call engineer. P3 alerts stop waking people up.

Pattern 3: Pull Request Infrastructure Review

The problem: Your team opens PRs that modify Terraform, Kubernetes manifests, or Helm values. The review often misses subtle issues — missing resource limits, open security groups, IAM policies that are too permissive.

This agent works best alongside the ArgoCD GitOps workflow — catch issues in the PR before ArgoCD ever gets a chance to sync them to the cluster.

pr_review_agent = client.beta.managed_agents.agents.create(
    name="infra-pr-reviewer",
    model="claude-opus-4-6",
    system_prompt="""You are an infrastructure PR review agent. Review changed files and produce a structured review.

    For Terraform changes:
    - Flag any resource without required tags (Project, Environment, ManagedBy, Owner)
    - Flag security group rules with 0.0.0.0/0 ingress on non-80/443 ports
    - Flag IAM policies with wildcard actions or resources
    - Flag missing encryption on storage resources (S3, RDS, EBS)
    - Check for hardcoded values that should be variables

    For Kubernetes manifest changes:
    - Flag missing resource limits or requests
    - Flag containers running as root (no runAsNonRoot)
    - Flag missing liveness/readiness probes
    - Flag :latest image tags
    - Flag privileged containers

    For Helm values changes:
    - Validate values against known security baseline
    - Flag disabled security features (networkPolicy: false, podSecurityContext missing)

    Output format:
    - BLOCKING: issues that must be fixed before merge
    - WARNINGS: issues that should be addressed but won't block
    - SUGGESTIONS: best practice improvements
    - APPROVED: true/false

    Be specific — include the file path and line context for every finding.""",
    tools=[
        {"type": "bash_20250124", "name": "bash"},
    ],
    mcp_servers=[{
        "type": "url",
        "url": "https://mcp.github.com/sse",
        "name": "github"
    }],
    betas=["managed-agents-2026-04-01"]
)

Trigger this from a GitHub Actions workflow on every PR that touches .tf, .yaml, or values.yaml files. The agent posts a review comment directly on the PR via the GitHub MCP server.

Dreaming — Agents That Improve Over Time

Dreaming is a scheduled process that reviews your agent sessions and memory stores, extracts patterns, and curates memories so your agents improve over time. You decide how much control you want: dreaming can update memory automatically, or you can review changes before they land.

For DevOps agents, this is genuinely useful. Your incident triage agent will learn which alert patterns in your environment always turn out to be false positives. Your cost analysis agent will learn which teams always ignore recommendations and adjust how it prioritizes findings. Your PR review agent will learn which warnings your team consistently overrides and stop flagging them.

Dreaming is in research preview — request access via the Anthropic console. When it’s available to you, enable it with a dreaming config block in your agent definition.

Multiagent Coordination

A lead agent can run an investigation while subagents fan out through deploy history, error logs, metrics, and support tickets.

In multi-agent orchestration, a coordinator model assigns tasks and manages dependencies, multiple specialist agents run in parallel or in sequence, and results are aggregated, validated, and delivered as a unified output.

For DevOps, the pattern looks like this for a post-incident review:

Lead agent: "Run a full post-incident analysis for the outage on May 28"
├── Subagent 1: Pull and analyze all logs from the affected timeframe
├── Subagent 2: Check all deployments and config changes in the 24hrs before
├── Subagent 3: Pull metrics from Prometheus for the affected services
└── Subagent 4: Search Slack for any related discussion or alerts

Lead agent: Synthesize all findings → Generate RCA document → Open Jira ticket

Four parallel investigations that would take an engineer 2–3 hours take an agent cluster 10 minutes. Multiagent coordination is in research preview — request access separately.

Honest Limitations

This is public beta, and it matters to set expectations correctly.

Not eligible for ZDR or HIPAA BAA. Because sessions are stateful and data is stored server-side, Managed Agents is not currently eligible for Zero Data Retention or HIPAA Business Associate Agreement coverage. If your compliance requirements mandate ZDR, this isn’t the right fit yet. You can delete sessions and files via the API at any time.

Token costs add up fast. Running Claude agents autonomously against large codebases at high frequency generates substantial token usage. Teams that don’t actively monitor headless usage can run into unexpected bills. Set up usage alerts in the Anthropic console before going to production.

Dreaming and multiagent are research preview. The two most compelling features for complex DevOps workflows require separate access requests and aren’t available to everyone yet. Design your initial agents to work without them — treat these as future upgrades.

Behavior may change. Behaviors may be refined between releases to improve outputs. Pin your agent definitions and test after updates, the same way you pin Terraform provider versions.

Best Practices

Start with read-only agents. Before you give an agent kubectl exec or terraform apply permissions, run it in read-only mode for two weeks. Understand how it behaves, what it gets right, what it gets wrong. Earn trust before granting write access.

Use vaults for all credentials. Never pass secrets directly in session inputs or system prompts. Vaults exist precisely for this — use them from day one.

Write explicit system prompts with guardrails. The incident triage agent example above explicitly says “Never delete resources. Never modify production config without the escalation path.” Be this explicit. Agents follow instructions precisely — assume nothing is implied.

Log every session. Stream session events to your observability stack. You want a full audit trail of what every agent did and why. This is non-negotiable for production agents with write permissions.

Test failure modes. What happens if the agent’s bash commands fail? What if kubectl returns an unexpected error? Build your system prompts to handle these cases explicitly — specify what the agent should do when it hits an error, not just the happy path.

Combine with Skills. Managed Agents and Skills work together. You can attach Skills to an agent definition — the same SKILL.md files from the Claude Skills for DevOps guide work here too. Your Kubernetes manifest validation skill runs inside your PR review agent automatically.

What’s Next

You now have both layers of the Claude DevOps stack:

Skills — teach Claude your team’s conventions, standards, and procedures
Managed Agents — run long autonomous tasks in the cloud with those conventions baked in

The natural next step is connecting these agents to your actual infrastructure via MCP. In an upcoming post, we’ll cover MCP for DevOps — how to set up remote MCP servers that give your agents live access to your Kubernetes cluster, AWS account, GitHub repos, and observability stack, and the security model for doing it safely.

If you’re ready to start building now, the official Managed Agents quickstart gets you a working agent in under 10 minutes.

DevToolHub

Claude Managed Agents — Complete DevOps Guide 2026

What Is Claude Managed Agents?

How It Differs from Claude Code

The Three Core Concepts

1. The Agent Definition

2. Sessions

3. Vaults — Credential Management

Three Real DevOps Automation Patterns

Pattern 1: Nightly Kubernetes Cost Analysis

Pattern 2: Automated Incident Triage Agent

Pattern 3: Pull Request Infrastructure Review

Dreaming — Agents That Improve Over Time

Multiagent Coordination

Honest Limitations

Best Practices

What’s Next

Like this:

Related

Leave a ReplyCancel reply

What Is Claude Managed Agents?

How It Differs from Claude Code

The Three Core Concepts

1. The Agent Definition

2. Sessions

3. Vaults — Credential Management

Three Real DevOps Automation Patterns

Pattern 1: Nightly Kubernetes Cost Analysis

Pattern 2: Automated Incident Triage Agent

Pattern 3: Pull Request Infrastructure Review

Dreaming — Agents That Improve Over Time

Multiagent Coordination

Honest Limitations

Best Practices

What’s Next

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from DevToolHub