State of agent development (September 2025)

Author's note. This article reflects only the author's personal opinions and experience. There are far more advanced nerds exploring these topics with greater depth and rigor. What you've just read is simply a summary of the author's own hands-on journey — entirely subjective, practical, and incomplete by design.

Slides available.

The evolution of development tools

The journey to modern agentic coding tools has been rapid.

GitHub Copilot (2022*): introduced AI-powered autocompletion and local hints, fundamentally changing how developers write code by providing intelligent suggestions in real time.

Cursor (2024*): elevated the entire IDE to an agentic interface, enabling complex tasks like refactoring and leveraging full project context for more intelligent assistance.

Devin (no personal experience): the first public attempt at a fully autonomous software engineer — impressive concept, but not yet part of the author's daily toolkit.

Claude Code (June 2025*): represents the latest evolution with long context windows, agent mode, memory systems, hooks and MCP integration — creating a comprehensive agentic coding environment.

*Dates mark when the author personally began using each tool in real projects.

Claude Code: a deep dive

Claude Code introduces several powerful features that distinguish it from previous generations of coding assistants.

CLI-first approach

Claude Code operates as a command-line tool, which means it can run anywhere bash is available. This universality lets developers maintain a single workflow across multiple devices and environments. It offers three operational modes:

Safe mode: confirms each step before execution.
YOLO mode: executes autonomously with minimal intervention (all my work is fully based on this mode).
Planning mode: focuses on task breakdown before implementation.

Memory systems

Claude Code implements a dual-memory architecture.

Project memory (CLAUDE.md)

Root folder — use as a reference guide: code standards, project overview, module lists. Should remain concise and focused on essential information.
Module-specific (folder-scoped): component manifests, DB schemas, procedural steps.

User memory

Personal preferences and guidelines (DOs and DON'Ts).
Testing strategies (e.g. "fail tests first").
Code-style preferences (e.g. "avoid comments as much as possible").
Conventional commits format.
Report-generation preferences.
Whenever the user praises the agent's work, summarize what was appreciated and why. This reinforces effective behaviors in future interactions.

Hooks and commands

Hooks

Enable workflow automation through event-driven shell commands:

Sound or desktop notifications for wait/error states.
Automatic commits upon task completion.
Custom integration with existing development workflows.

Commands

Provide quick access to common operations, such as a requirements-builder that can scaffold project specifications. Learn more in the slash commands documentation.

Worktrees

Worktrees enable parallel development workflows, allowing you to:

Run multiple iterations simultaneously.
Test different prompting strategies.
Use different tool configurations in parallel.

This is particularly valuable for exploring multiple solution paths without context switching. See common workflows for implementation details.

Subagents

One of Claude Code's most powerful features is its subagent system, which enables:

Context-window management: breaking large tasks into smaller, focused subtasks reduces token consumption and improves response quality.

Role specialization: different subagents can assume specific roles. A few examples:

Data Science Expert: handles analysis, code optimization and compute runs.
Pragmatic Critic: evaluates solutions based on metrics, cost, deadlines and simplicity.

Learn more about subagents in the official documentation.

Case study: PCA

Guess which solution was provided by the Data Science Expert.

Solution 1:

stds = daily_returns.std()
standardized_data = (daily_returns - means) / stds
std_data_cov = standardized_data.cov()

eigenvalues, eigenvectors = LA.eig(std_data_cov)
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
pca = standardized_data.dot(eigenvectors)

Solution 2:

pca_pipeline = Pipeline([
   ('scaler', StandardScaler()),
   ('pca', PCA())
])
pca_result = pca_pipeline.fit_transform(daily_returns)

Model Context Protocol (MCP)

MCP represents a standardization effort for how AI agents interact with external systems and tools. Key considerations include (Model Context Protocol):

Safety and verification

Before integrating any MCP provider, verify its trustworthiness. The protocol enables powerful integrations, but this power comes with security responsibilities.

Integration examples

MCP supports diverse integrations:

Database connections: PostgreSQL and other database systems.
Browser automation: Puppeteer for web interactions.
External services: CI/CD pipelines, AWS, Docker.
Custom tools: the Sequential Thinking MCP server for enhanced reasoning, useful to break down complex tasks.

Find more in the MCP servers directory.

MCP hell

The complexity arises from over-integration. Like dependency hell, having too many MCP connections can create maintenance nightmares and reduce LLM productivity by spending tokens. Choose integrations deliberately and maintain strict controls.

Two core ideas shaping agentic development

1. Closed loops

Modern agentic systems operate in closed loops where agents:

Invoke other agents as needed.
Validate their own outputs.
Iterate based on self-critique and self-play.

This self-correcting mechanism dramatically improves output quality and reduces the need for human intervention.

2. Roles and contextual leadership

Everyone can lead, but not everyone should (stochasticlifestyle.com).

In traditional teams, we understand that different people lead in different contexts. The same principle applies to AI agents. A Data Science Expert should lead statistical analysis tasks, while a Pragmatic Critic should evaluate architectural decisions. Success comes from correct role routing — ensuring the right agent leads the right task.

Use cases for agentic coding

An unfinished list of what the author has successfully implemented using agents:

API integrations: provide an API specification URL and let agents iterate through implementation, testing and review until all checks pass.
Language migrations: "write once, run anywhere" — agents can translate codebases between languages while maintaining functionality.
Server monitoring: cron jobs combined with CLI tools can monitor server logs and provide fix instructions for trivial issues automatically.
Execution verification: automated checking of task completion and correctness.
Git search and analysis: deep repository analysis and historical code investigation.
Jupyter notebook assistance: all agent types can now work with individual cells, enabling more focused research assistance without loading entire files into memory.
Advanced document work: report generation and document analysis.
Service metrics: integration of monitoring systems (Prometheus and Grafana) into an existing system.

SDLC-inspired recipes

We need to adapt existing industry practices for agents. I call it "SDLC for agents". The topic is described in detail in Claude Code Best Practices. My adaptation of it:

Plan: use requirements-builder and planning mode to structure work.
Code: implement with a test-first approach, utilizing parallel worktrees.
Commit: create pull requests automatically, trigger a reviewer agent to do its work.
Review: provide feedback via GitHub PRs and trigger the implementer agent.
Iterate: loop back based on review comments.

This workflow mirrors traditional software development, but with agents handling much of the mechanical work.

Trade-offs

Claude Code has a 25k-token limit for reading files.
Claude Code cannot execute synchronous tasks that take longer than 2 minutes. It can run tasks in a daemon process, but cannot always kill it properly.
Complex refactoring remains a fragile area and is often better handled manually.
Boring part 1: you will need to review lots of code. As you master a tool, code quality will increase.
Boring part 2: to get a result, you should invest in "coaching".

Safety and risk management

A "minimal safe profile" for production use:

Database backups: always maintain backups before agent operations.
Whitelisted domains: restrict browser-tool access to approved domains.
Secret management: use proper secret stores, never commit credentials.
Comprehensive audit logs: track all agent actions for accountability.

Comparing tools: pros and cons

Cursor ($20/month)

Note: I know it comes with agents too as of today, but I haven't tried them.

Pros: code completion, inline quick edits, LLM-vendor agnostic, affordable.

Cons: context limits, requires purchasing Claude models separately for best performance.

Claude Code ($100–$200/month)

Pros: long context windows, role-based subagents, strong MCP integrations, active community.

Cons: higher cost, requires disciplined memory and hook management.

OpenAI Codex ($20/month, included in ChatGPT Plus)

Pros: GPT-5 Codex occasionally surprises with quality, included in existing subscriptions.

Cons: lacks subagents, weaker community support.

Beyond coding: agents in other domains

The agentic paradigm extends beyond software development:

Claude Chrome extension: brings agentic capabilities to web browsing.
Comet browser: a browser built around agentic workflows.

These tools suggest a future where agents assist with all knowledge work, not just coding.

Conclusion

The state of agent development in September 2025 can be summarized with a key insight: everyone can be a team lead.

Success in agentic development comes from two factors:

Correct role routing: ensuring the right specialized agent handles each task.
Closed feedback loops: allowing agents to self-critique and iterate.

The Claude Code / MCP stack provides a reference implementation for practical agentic workflows. As these tools mature, the distinction between "writing code" and "managing code-writing agents" will become increasingly important. Developers who master role routing and closed-loop thinking will have a significant advantage in this new paradigm.

References and resources

Essential reading

A Guide to Gen AI / LLM Vibecoding for Expert Programmers
Claude Code: Best practices for agentic coding (Anthropic documentation)
Model Context Protocol documentation (official MCP specification)

Community and learning

Simon Willison's Substack — regular insights on AI and development.
@anthropic-ai — official Anthropic channel.
@aiDotEngineer — AI engineering insights.
@AndrejKarpathy — deep learning and AI fundamentals.

All notes

State of agent development (September 2025): closed loops, role routing, and Claude Code