Visualizing Refactors, Rewrites, and Software Evolution

We’ve all been there.  You inherit a codebase that’s horribly architected, riddled with technical debt, and coughs up blood if you so much as look at it.  Or perhaps, there’s still life in your software, but it needs some heavy physical therapy and medication to get it healthy again.  In either situation, the words refactor and rewrite will inevitably surface.  What causes a refactor or rewrite?  Are refactors always bad?  What happens if large refactors are needed, but aren’t done? How can big refactors and rewrites be avoided in the first place?

Refactors and Rewrites

Before we begin, let’s accurately define the terms refactor and rewrite.  A refactor is a modification to a troubled codebase, or module, that repairs major portions of the software in order to move it closer to an ideal state.  A rewrite, on the other hand, is throwing the entire codebase or module into the trashcan and creating a whole new one.  If your codebase was a car, then replacing the transmission and painting it a different color would be a refactor.  Crushing the car into a tiny metal cube and then buying a new car would be a rewrite.

Obviously, neither of these scenarios bring value to customers and users, and are therefore highly undesirable changes as seen by managers and product folks.  Hence, an inevitable conflict of interest arises – engineers might lobby for a refactor or rewrite, while managers and product folks might lobby for salvaging the current codebase.  What is the right decision?  To answer this question, we first need to understand the different types of codebases and codebase states.

Landfills and Amoebas

In my experience, there are two types of codebases, Landfills and Amoebas.

First, which is the more common type, is the Landfill codebase.  The Landfill codebase is a codebase that is built over time by piling new code on top of existing code, whether that existing code is good or bad, and continually repeating until you have layers and layers fragility.  Eventually, the Landfill codebase becomes so massive and so fragile, that it is nearly impossible to build anything new on top of it without major portions collapsing and sinking in on itself.  It is at this point that a team decides to rewrite the system and create a newer, better world.  That is of course, until the newer better world becomes a pile of debris as well, and the cycle continues.  The Landfill codebase is typically a result of an absence of a strong architect on the team, or simply a high volume of bad code changes that stream into it at a rate faster than it can be repaired, i.e. an avalanche of technical debt.  Landfill codebases typically last for six months to a year, sometimes a little bit longer.

The second type of codebase, which is far more rare, is the Amoeba Codebase.  The Amoeba codebase constantly evolves to changing conditions, and is therefore perpetually in a healthy state.  Rather than piling code on top of existing code to add new functionality, or piling on conditional statements to handle edge cases, the architecture itself constantly changes.  This is achieved with lots of incremental improvements and refactors throughout the software life cycle.  This is the type of codebase that engineers dream of working on.  It is the Mecca of codebases.  This type of codebase is typically led by a strong architect, and is made up of a team that rarely introduces technical debt.  Amoeba codebases can last for many years.

The Healthy Codebase

healthy-codebase

Here is a diagram that illustrates a healthy codebase state.  The ideal state is the dotted line at the top.  For a healthy codebase to even be possible, the team must have a strong architect on board who knows what the ideal state should be.  Although this state is theoretically impossible to achieve because it represents the perfect codebase, teams should strive to keep the gap between the ideal state and the actual state as small as possible.  In a healthy codebase, continual improvements and small refactors keep the gap small.

The Suffering Codebase

suffering-codebase

If  development of your codebase feels rocky and taxing, it is more than likely a suffering codebase.  Suffering codebases typically have a competent architect who knows what the ideal state should be, but for some reason or another, the gap between the ideal state and the actual state gets fairly large throughout the development life cycle.  This can happen if product requirements are unstable and drastically change within short periods of time, if the team is made up of mostly junior developers such that the volume of low quality code outweighs the volume of high quality code, or if the team allows more technical debt to enter the system than can possibly be addressed at a later time.  In all of these scenarios, the team is forced to make drastic refactors in an effort to save the codebase from a rewrite.

The Dead Codebase

dead-codebase

Here we have a dead codebase, the final chapter of a Landfill.  This type of codebase is typically realized due to an absence of an architect, or a competent one.  In this scenario, the team as a whole either doesn’t know what the ideal state should be, or doesn’t know how to get there.  Thus, over time, the codebase slowly drifts towards a more and more unhealthy state, such that the gap between the ideal state and the actual state continually increase.  Eventually, the codebase will become so disjointed, fragile, and unmaintainable, that the team is forced to throw the whole thing away and start over, i.e. a rewrite.  Without a competent architect, the cycle will inevitably repeat.

Tips for a Healthy Codebase

  • have at least one competent architect on the team
  • continually imagine what the ideal state should be
  • senior developers should outnumber junior developers
  • religiously resist technical debt if at all possible
  • constantly make small improvements and small refactors to close the gap between the ideal state and the actual state

Eric Rowell

Hi there, my name's Eric Rowell. I'm the founder of Coder Lifestyle, founder of Html5CanvasTutorials, the creator of the KineticJS library, author of HTML5 Canvas Cookbook, and the creator of BigOCheatSheet. I've worked in the Tech industry for about five years while at Yahoo, LinkedIn, and Platfora, and I've also worked in the Transportation industry for two years while at BNSF Railway. I'm currently leading the data visualization efforts at Platfora.

  • travis@gitprime.com

    Great article. I’m one of the co-founders of GitPrime – an application that extracts these exact types of metrics and pipes them into a UI. If you’d like a tour, feel free to ping me on twitter (@traviskimmel) and we’ll set it up.

  • Olumide

    This is one of my favorite software engineering articles. I love the analogy.