A New Discipline

Agent Engineering

The iterative process of refining non-deterministic LLM systems into reliable production experiences

If you've built an agent, you know that the delta between "it works on my machine" and "it works in production" can be huge. Traditional software assumes you mostly know the inputs and can define the outputs. Agents give you neither: users can say literally anything, and the space of possible behaviors is wide open. That's why they're powerful and why they can also go a little sideways in ways you didn't see coming.

Production Momentum is Real

57%

Have agents in production

30%

Actively developing with production plans

89%

Have implemented observability

52%

Running evaluations

What is Agent Engineering?

Agent engineering is the iterative process of refining non-deterministic LLM systems into reliable production experiences. It is a cyclical process: build, test, ship, observe, refine, repeat.

The key here is that shipping isn't the end goal. It's just the way you keep moving to get new insights and improve your agent. To make improvements that matter, you need to understand what's happening in production. The faster you move through this cycle, the more reliable your agent becomes.

The Agent Engineering Cycle

Step 1

Build

Design your agent's foundation, whether it's a simple LLM call with tools or a complex multi-agent system

Step 2

Test

Test your agent against example scenarios to catch obvious issues with prompts, tool definitions, and workflows

Step 3

Ship

Ship to see real-world behavior. You'll immediately start seeing inputs you hadn't considered

Step 4

Observe

Trace every interaction to see the full conversation, every tool called, and the exact context that informed each decision

Step 5

Refine

Once you've identified patterns in what's failing, refine by editing prompts and modifying tool definitions

Step 6

Repeat

Ship your improvements and watch what's changing in production. Each cycle teaches you something new

Three Skillsets Working Together

Product Thinking

Defines scope and shapes agent behavior

  • Writing prompts that drive agent behavior (often hundreds or thousands of lines)
  • Deeply understanding the 'job to be done' that the agent replicates
  • Defining evaluations that test whether the agent performs as intended

Engineering

Builds infrastructure that makes agents production-ready

  • Writing tools for agents to use
  • Developing UI/UX for agent interactions (with streaming, interrupt handling, etc.)
  • Creating robust runtimes that handle durable execution, human-in-the-loop pauses, and memory management

Data Science

Measures and improves agent performance over time

  • Building systems (evals, A/B testing, monitoring etc.) to measure agent performance and reliability
  • Analyzing usage patterns and error analysis
  • Identifying opportunities for improvement based on production data

Leading Agent Use Cases

26.5%

Customer Service

Agents handling customer inquiries, support tickets, and personalized interactions

24.4%

Research & Data Analysis

Synthesizing large volumes of information, reasoning across sources, and accelerating knowledge-intensive tasks

18%

Internal Workflow Automation

Boosting employee efficiency through automated internal processes and workflows

Biggest Barriers to Production

Quality

32%

Accuracy, relevance, consistency, and an agent's ability to maintain the right tone and adhere to brand or policy guidelines

Latency

20%

Response time becomes critical as agents move into customer-facing use cases. Tradeoff between quality and speed

Cost

Less concern

Cost concerns have dropped from previous years as teams find value in production agents

Why Agent Engineering is Different

Inputs

Traditional

Mostly known inputs with defined outputs

Agent Engineering

Every input is an edge case. Users can say literally anything in natural language

Debugging

Traditional

Debug code logic and data flow

Agent Engineering

Inspect each decision and tool call. Small prompt tweaks can create huge shifts in behavior

Working State

Traditional

Binary: working or broken

Agent Engineering

An agent can have 99.99% uptime while still being off the rails. 'Working' isn't binary

Development Cycle

Traditional

Test exhaustively, then ship

Agent Engineering

Test reasonably, ship to learn what actually matters. Shipping is how you learn

The IMPACT Framework

Key components for orchestrating and architecting intelligent AI agents

I

Integrated LLMs

Seamlessly embedded language models for intelligent reasoning

M

Meaningful Intent & Goals

Clear objectives and constraints that guide agent behavior

P

Plan-driven Control Flows

Strategic planning and execution pathways

A

Adaptive Planning Loops

Dynamic adjustment based on feedback and outcomes

C

Centralized Persistent Memory

Unified memory management for context retention

T

Trust & Observability

Monitoring, validation, and safety mechanisms

How Roles Are Evolving into Agent Engineering

Agent engineering isn't a new job title. Instead, it's a set of responsibilities that existing teams take on when building systems that reason, adapt, and behave unpredictably.

Software Engineer / ML Engineer

Agent Engineer
Traditional Focus

Writes deterministic code for fixed logic and builds ML models

Agent Engineering Responsibilities

Agent Engineer responsibilities: Writing prompts and building tools for agents to use, tracing why an agent made specific tool calls, and refining the underlying models. Designs agent scaffolds with tools, memory, and reflection loops.

Key Tasks
  • Write prompts that drive agent behavior (often hundreds or thousands of lines)
  • Build tools and APIs for agents to interact with
  • Trace agent decision-making and tool call sequences
  • Refine models and prompts based on production insights

Product Manager

Agent Engineer
Traditional Focus

Manages user stories, backlogs, and product roadmaps

Agent Engineering Responsibilities

Agent Engineer responsibilities: Writing prompts, defining agent scope, and ensuring the agent solves the right problem. Deeply understands the 'job to be done' that the agent replicates and defines evaluations that test whether the agent performs as intended.

Key Tasks
  • Write prompts that shape agent behavior and scope
  • Define high-level intent and goal specifications
  • Ensure the agent solves the right problem
  • Define evaluations that test agent performance

Platform Engineer

Agent Engineer
Traditional Focus

Manages CI/CD pipelines, uptime, and infrastructure

Agent Engineering Responsibilities

Agent Engineer responsibilities: Building agent infrastructure that handles durable execution and human-in-the-loop workflows. Creates robust runtimes that handle durable execution, human-in-the-loop pauses, and memory management.

Key Tasks
  • Build agent infrastructure for durable execution
  • Design human-in-the-loop workflow systems
  • Create robust runtimes with memory management
  • Develop UI/UX for agent interactions with streaming and interrupt handling

Data Scientist

Agent Engineer
Traditional Focus

Builds ML models, analyzes data, and creates predictive insights

Agent Engineering Responsibilities

Agent Engineer responsibilities: Measuring agent reliability and identifying opportunities for improvement. Building systems (evals, A/B testing, monitoring) to measure agent performance and reliability, and analyzing usage patterns and error analysis.

Key Tasks
  • Build evaluation systems to measure agent performance
  • Run A/B tests and monitor agent reliability
  • Analyze usage patterns and error analysis
  • Identify opportunities for improvement based on production data

These teams embrace rapid iteration. You'll often see software engineers tracing errors and handing off to PMs to tweak prompts based on those insights, or PMs identifying scope issues that require new tools from engineers. Each recognizes that the real work of hardening an agent happens through the cycle of observing production behavior and systematically refining based on what they learn.

The Future of Agent Engineering

Agent engineering is emerging because the opportunity demands it. Agents can now handle workflows that previously required human judgment, but only if you can make them reliable enough to trust. There is no shortcut, just the systematic work of iteration. The question isn't whether agent engineering will become standard practice. It's how quickly your team can adopt it to unlock what agents can do.

The teams shipping reliable agents today share one thing: they've stopped trying to perfect agents before launch and started treating production as their primary teacher. In other words, tracing every decision, evaluating at scale, and shipping improvements in days instead of quarters.