Self-Evolving AI: Teaching Edward to Rewrite Himself
Here's a question I couldn't stop thinking about: what if your AI assistant could improve itself? Not through fine-tuning or manual updates, but by actually writing and deploying its own code changes.
Edward's evolution system does exactly that. It's a self-coding pipeline that creates branches, writes code, runs validation, and merges changes — all without human intervention.
The Pipeline
Evolution runs as a managed cycle with clear stages:
- Branch — creates a feature branch from main
- Code — Claude Code writes the implementation based on an objective
- Validate — runs linting, type checking, and basic sanity checks
- Test — executes the test suite against the changes
- Review — a separate Claude instance reviews the diff for quality
- Merge — clean changes get merged to main
When a merge hits main, uvicorn --reload picks up the changes automatically. Edward is running the new code within seconds.
Why Not Just Ship Updates Manually?
Because I wanted to see what happens when the feedback loop is tight enough. Edward can identify patterns in how it's being used — tools that fail often, edge cases in memory retrieval, missing capabilities — and propose fixes for itself.
It's not AGI. It's closer to a CI pipeline where the developer is also an AI. The constraints are important: evolution operates within defined boundaries, changes go through validation, and there's a rollback mechanism if something breaks.
The Safety Model
Giving an AI write access to its own codebase sounds dangerous. In practice, the guardrails make it manageable:
- Changes happen on branches, not directly on main
- Validation must pass before merge
- A separate review step catches issues the author might miss
- One-click rollback reverts to the previous known-good state
- The evolution config controls what kinds of changes are allowed
What I've Learned
The most interesting result isn't the code it writes — it's the feedback cycle. Edward surfaces its own limitations through usage, proposes improvements, implements them, and then operates with the improvements in place. It's a closed loop between operation and development.
Is it perfect? No. The code quality varies. Some proposed changes are brilliant; others get caught in review. But the system improves over time, and that's the whole point.