If you have used AI coding tools like Cursor, Windsurf, or Claude Code, you have probably wondered: how do they actually work? How does an LLM edit files, run commands, and iterate on its own output across multiple turns?
The answer is surprisingly straightforward. At their core, these tools are multi-turn AI agents — and the fundamental architecture can be built in under 400 lines of code. Understanding this pattern is essential for anyone building AI-powered automation, including the kind of enterprise modernization work we do at CloudHedge.
What Is an Agent?
An AI agent is not a chatbot. A chatbot takes your input, generates a response, and stops. An agent operates in a continuous loop — observing its environment, deciding what to do, executing actions, and repeating until the task is complete.
The agent loop in one sentence: observe the current state, decide what tool to use, execute that tool, feed the result back to the model, and repeat until the model says it is done.
This is sometimes called the observe-decide-execute pattern, and it is the same fundamental loop that powers everything from robotic process automation to autonomous vehicles. In the LLM context, the "observation" is the conversation history plus tool results, the "decision" is the model's next response, and the "execution" is running whatever tool the model selected.
The Architecture
A multi-turn agent has exactly four components:
- A system prompt that tells the model who it is and what tools are available
- A conversation history (array of messages) that grows each turn
- A set of tools the model can invoke (functions with defined schemas)
- A while loop that keeps calling the model until it produces a final response without tool calls
That is the entire architecture. No frameworks, no orchestration layers, no vector databases. Just a loop, a prompt, and tools.
The Agent Loop
Here is the core loop in pseudocode:
while True:
response = call_llm(messages)
if response.has_tool_calls:
for tool_call in response.tool_calls:
result = execute_tool(tool_call)
messages.append(tool_result(result))
else:
# Model gave a final text response — we're done
print(response.text)
break
Every "multi-turn" interaction is just this loop running multiple iterations. The model calls a tool, gets the result appended to the conversation, and then decides whether to call another tool or deliver a final answer. The model itself controls the flow.
The 3 Core Tools
A code-editing agent needs surprisingly few tools to be effective. Three are sufficient to handle the vast majority of tasks:
1. Read File
The read_file tool takes a file path and returns its contents. This is how the agent inspects existing code, configuration files, or documentation. Without the ability to read, the agent is working blind.
2. Write File
The write_file tool takes a file path and content, then writes (or overwrites) the file. This is the agent's primary mechanism for making changes. Some implementations use a more granular "edit" tool that applies diffs, but a simple write is sufficient for a working agent.
3. Execute Command
The run_command tool executes a shell command and returns its output (stdout and stderr). This gives the agent the ability to run tests, install dependencies, check git status, compile code, and verify its own work. This is the most powerful tool because it connects the agent to the entire operating system.
Why three tools are enough: Read gives the agent eyes. Write gives it hands. Execute gives it the ability to verify and interact with the world. Together, they form a complete feedback loop.
System Prompt Design
The system prompt is where you define the agent's behavior, capabilities, and constraints. A well-designed system prompt includes:
- Identity and role: What the agent is and what it specializes in
- Available tools: JSON schemas describing each tool's parameters
- Guidelines: Rules about when and how to use each tool
- Constraints: What the agent should never do (e.g., delete production databases)
- Output format: How the agent should structure its final responses
The quality of the system prompt directly determines the quality of the agent. A vague prompt produces a vague agent. A precise prompt that explicitly handles edge cases produces a reliable one.
Guardrails
Running arbitrary code is inherently dangerous. Production agents need guardrails:
- Command allowlists: Only permit specific commands or command patterns
- File path restrictions: Limit which directories the agent can read from or write to
- Confirmation prompts: Require human approval for destructive operations
- Timeout limits: Kill commands that run longer than expected
- Sandboxing: Run the agent in a container or VM with limited permissions
At CloudHedge, CHAI's agent architecture applies these same principles at enterprise scale. When CHAI Flow decomposes a monolith into microservices, every transformation step is validated, tested, and reversible. The agents operate within strict guardrails defined by the organization's policies.
Conversation Memory and Context
The conversation history is the agent's working memory. Every tool call and result is appended to the message array, giving the model full context of what it has done and what happened. This is what makes agents "multi-turn" — the model can reference previous steps, learn from errors, and build on earlier work.
However, conversation history grows with each turn, and LLMs have finite context windows. Production agents need strategies to manage this:
- Summarization: Periodically summarize older messages to compress history
- Sliding window: Keep only the most recent N messages
- Selective retention: Keep tool results that are still relevant, drop those that are not
Chatbot vs. Agent: A Comparison
| Dimension | Chatbot | Agent |
|---|---|---|
| Interaction | Single request-response | Multi-turn loop |
| Tools | None (text only) | Read, Write, Execute, and more |
| Flow control | User drives every step | Model drives autonomously |
| Error handling | User must retry manually | Agent retries and self-corrects |
| Verification | None | Runs tests, checks output |
| Context | Single message | Full conversation + tool results |
| Complexity | Simple API call | Loop + tools + state management |
Multi-Agent Systems
Once you have one agent working, the natural next step is orchestrating multiple agents. This is where things get interesting for enterprise use cases. A director agent can break a large task into subtasks and delegate each to a specialized worker agent.
This is exactly how CHAI works at scale. CHAI Universe acts as the discovery agent, mapping the entire application landscape. CHAI DART acts as the assessment agent, analyzing each application's architecture and dependencies. CHAI Flow acts as the execution agent, performing the actual modernization, containerization, and deployment. Each agent is specialized, and they coordinate through a shared understanding of the application portfolio.
Conclusion
The core architecture of a multi-turn AI agent is remarkably simple: a while loop that calls an LLM, executes tools, and repeats. The complexity comes not from the loop itself but from the system prompt design, the tool implementations, the guardrails, and the orchestration of multiple agents for larger tasks.
At CloudHedge, we have taken this pattern and applied it to one of the hardest problems in enterprise software: legacy application modernization. CHAI's agentic architecture — with specialized agents for discovery, assessment, and transformation — handles the kind of complex, multi-step work that previously required armies of consultants and years of effort.
Understanding how agents work is the first step toward building with them. The second step is putting them to work on problems that matter.