Building a Multi-Agent Workflow for Design and Coding
From repeated, manual checks to structured collaboration

In my work with coding agents, I’ve run into a recurring frustration - I kept doing the same jobs over and over.
- Checking that we use dependency injection instead of ad-hoc mocks.
- Ensuring we write tests (and don’t weaken or remove them).
- Spotting code duplication.
- Enforcing the same quality rules that matter to me.
These aren’t glamorous tasks, but they’re critical. And they felt like the kind of things I should be able to delegate. That became the seed for my multi-agent workflow — a system where multiple agents, each representing a perspective I care about, collaborate to analyze features, design solutions, and create implementation plans.
The Agent Team
Instead of a single coding agent, I spun up four agents with distinct roles:
Fast Iterating Developer
Goal: Ship small, working increments quickly; prefer straightforward paths to progress. Biases: Rapid feedback over polish; accepts tactical debt if it’s explicitly documented and ticketed. Responsibilities: Propose the minimal viable changes, outline a short iteration plan, flag any corners cut. Quality guardrails: Must honor existing interfaces, dependency injection boundaries, and pass current tests before proposing new ones. Link: Fast Iterating Dev prompt
Test-Conscious Developer
Goal: Maintain and improve test rigor without slowing development to a crawl. Biases: Favors integration tests for behavioral guarantees; resists weakening or deleting tests without justification. Responsibilities: Identify risk areas, propose test additions (units/integration), define acceptance checks and regression traps. Quality guardrails: Avoids mocking concrete implementations when DI makes seams available; documents coverage deltas. Link: Test-Conscious Dev Prompt
Senior Engineer
Goal: Keep code simple and expressive; enforce consistency and thoughtful application of patterns. Biases: Favors clarity over cleverness; removes accidental complexity; pushes for consistent naming, module boundaries, and API shape. Responsibilities: Refactor proposals for readability/maintainability, highlight duplication, suggest idiomatic patterns already used in the repo. Quality guardrails: Upholds dependency injection over ad-hoc mocks; aligns proposals with existing conventions captured in the codebase analysis. Link: Senior Engineer prompt
Architect
Goal: Preserve a coherent, scalable architecture—avoid the “bag of parts.” Biases: Extends the system minimally to fit new needs; discourages “just one more component” proliferation. Responsibilities: Validate boundaries, data flow, ownership, and failure modes; ensure new design fits long-term direction. Quality guardrails: Explicitly calls out coupling risks, migration/compatibility concerns, and cross-cutting concerns (observability, auth, config). Link: Architect prompt
Each one encodes a mindset I normally apply (or remind others to apply) during reviews — with particular emphasis on the aspects that are important to me, like dependency injection or avoiding test shortcuts.
Orchestrating the Workflow
The workflow is structured, but produces a surprisingly rich set of artifacts. Here’s how it works:
Codebase Analysis (Step 0) The Senior Engineer analyzes the codebase and produces a document that describes the directory structure, coding patterns, and conventions. This ensures every agent begins from the same shared context.
Analysis (Step 1) Each agent produces an analysis document. This results in:
task_specification.md (feature description)
task_metadata.json (if derived from a PRD)
codebase_analysis.md (repository overview from Step 0)
One analysis document per agent (developer, tester, senior, architect)
context_pr.json (workflow state)
Design Consolidation (Step 2) The agents review each other’s analyses, highlight conflicts, and generate:
consolidated_design.md (unified design)
conflict_resolution.md (how disagreements were settled)
Design Finalization (Step 3) I review the consolidated design in a GitHub PR. My feedback is ingested and used to generate:
finalized_design.md (production-ready design)
feedback_incorporation_summary.md (what changed and why)
Ready for Development (Step 4) A bundle of documents (final design, codebase analysis, task spec, workflow context) becomes the blueprint for implementation.
Here’s the document flow diagram:
The richness of this process surprised me — by the time development starts, I already have a codebase analysis, per-agent perspectives, a consolidated design, conflict notes, a feedback summary, and a finalized design doc.
The Glue: MCP + PR Feedback
This workflow wouldn’t work without my MCP server. It
- reads PR comments, processes them, and feeds them back to the agents.
- posts replies directly into GitHub PRs.
- runs custom tools (e.g., parsing build logs, surfacing lint errors, or reading test outputs).
- manages state and context across steps.
Essentially, it acts as the orchestration backbone — ensuring agents don’t just generate documents in isolation, but participate in a continuous, feedback-driven process.
The Collaboration Problem
One challenge I haven’t solved is how to create a truly collaborative environment. In a real-world team, discussions are messy: people jump in with half-baked ideas, questions, and counterpoints in no particular order. I couldn't figure out how to do this with Agents. Instead, my workflow is serialized: one agent at a time, in a fixed order. That comes with trade-offs:
Bias: the first agent to write sets the tone for everyone else. Stopping criteria: when is enough iteration enough?
This serialization works, but it feels more rigid than human collaboration.
Context Engineering: The Real Challenge
Another hard problem was context engineering (see https://www.promptingguide.ai/guides/context-engineering-guide). The key wasn’t just writing prompts — it was carefully deciding what information each agent should see, when.
Some of the pitfalls I ran into:
- Agents generating giant design docs with entire chunks of code embedded.
- Agents producing inconsistent or even empty documents.
The coding agent that I used to build this workflow kept embedding concrete details (like class names or script names) directly into prompts — I had to keep reminding it: extract details from the repo, don’t hardcode them, keep it generic.
And one more practical problem: Sourcegraph AMP lets agents persist, but their local state is tied to the filesystem. I had to figure out how to:
Start each agent in its own scratch location. Switch them into the actual codebase directory without losing context.
Sounds simple — but too often, the response I’d get was: “I can’t find any code.” Solving that required a surprising amount of trial and error.
A Concrete Example
To test the workflow, I used a relatively simple feature: store in a database whether I already replied to a PR comment (to avoid duplicate replies).
Here’s how the agents responded:
Fast Developer: proposed a quick schema and plan. Test-Conscious Developer: raised consistency concerns and suggested integration tests. Senior Engineer: simplified the schema, reduced unnecessary complexity, and enforced consistency. Architect: confirmed the new table aligned with the broader DB model.
The final design was more balanced and higher quality than any single agent’s output — and I could trace its evolution across the analysis, consolidation, and finalization stages.
Artifacts
Some of the most useful artifacts that emerge from this workflow:
- Agent Prompts — charters defining each persona.
- Workflow Diagram — the document flow in Mermaid.
- Example Outputs — individual analysis vs. consolidated design.
All the code for this workflow is here: 👉 github.com/MarksStuff/github-agent/tree/main/multi_agent_workflow
Reflections
This multi-agent workflow is still experimental, but I’ve learned a few things:
- Agents focused on aspects you care about (dependency injection, testing discipline, scalability) can save you from repeating the same checks.
- Human in the loop is critical — PR comments act as arbitration between agents.
- Serialized workflows are a compromise — structured, but biased by order.
- Context engineering is the real bottleneck — harder and more fragile than prompt writing.
- State management is tricky — keeping agents “alive” while pointing them at the real codebase isn’t trivial.
Closing
I don’t think the future of AI in coding is “one agent that does everything.” It’s multi-agent workflows: diverse perspectives, structured collaboration, and better outcomes than any single agent can produce - similar how we humans build software.
This workflow is my first attempt at that. It’s messy, imperfect, but it’s working.
👉 I’d love to hear: if you built your own multi-agent workflow, what roles did you use? And could you figure out a truly collaborative workflow?



