Mastering Large-Scale Code Refactoring with AI

Written by DEMICON | Jan 8, 2026 11:49:33 AM

Anyone who has ever had to clean up large feature flags or obsolete code components in a monorepo architecture knows that this is not a task to be done on the side. It requires a clear strategy, automation - and in Atlassian's case: the targeted use of artificial intelligence with Rovo Dev.

In this article, we, as an Atlassian partner, provide an insight into how Atlassian engineers successfully implemented a complex refactoring project with the help of AI-driven workflows - including lessons learned for anyone planning something similar.

1. Challenge: Cleaning up feature flags on a large scale

In the course of a new navigation system ("Nav4"), numerous feature flags (FGs) and mocks had accumulated in Atlassian's frontend monorepo - spread over more than 100 packages and 1,400 files. Goal: Remove all FGs and mocks without leaving any residue, without any loss of quality or risks.

The challenge:

Heterogeneous tools and package structures
Different syntax variants
Technical dependencies and legacy issues
Minimum downtime, maximum code quality

2. The workflow: How AI was integrated in a meaningful way

The Atlassian teams relied on an iterative, safeguarding approach:

First, discovery prompts were used to systematically identify all FG and mock occurrences.
A memory file provided the AI (Rovo Dev) with the necessary context for targeted changes.
Based on this context, Rovo Dev generated automated suggestions for code cleanup - from bash scripts to pull requests.
The changes were tested in small packages, refined and rolled out step by step - including human code reviews.

3. Technical building blocks: Rovo Dev, prompts, memory files

Rovo Dev is Atlassian's AI agent, which is characterized by the following capabilities:

Multilingual code processing (Bash, Python, API, Git)
Analysis of error messages and suggestion of concrete fixes
Reusable transformations at package level
Support for parallel refactorings in multiple DevBoxes

The combination of permanent AI context (memory file) and precise prompts made it possible to work reliably, efficiently and without "AI hallucinations".

4. What worked - and what didn't

What went well:

Systematic mapping of FG variants with discovery prompts
Reproducible package-based transformations
Iterative expansion of the solution steps ("build simple → refine → scale")
Parallelization through DevBoxes & Rovo Dev

What did not work:
A purely "AI → Script → Auto-Cleanup" approach without human intervention led to erroneous changes. Code quality could only be ensured through targeted review and correction loops.

5. Best practices for safe, automated refactoring

Proceed iteratively: Small packages, frequent CI runs, close feedback.
Define AI context precisely: Memory files as a central source of knowledge.
Identify dead code in a targeted manner: AI can recognize correlations that simple scripts do not capture.
Safety first: Type tests, integration tests and manual reviews remain mandatory.
Discovery first: Transfer unclear tasks into structured backlogs first.

6. Conclusion: AI as a partner in the engineering process

The greatest added value of AI in large-scale refactoring does not lie in complete automation - but in intelligent assistance:

Faster identification of legacy issues
Greater consistency in code changes
Significantly reduced manual effort
Improved reusability of patterns

Atlassian has shown: With the right setup, AI becomes a reliable sparring partner - even for demanding cleanups.

Frequently asked questions about the use of AI for large-scale refactoring

View full post