First, some context. FirstRoot is a fantastic collection of hard-working, agile developers who genuinely believe in our mission: creating greater financial literacy and civic engagement through Participatory Budgeting in schools. As I’ve documented in substantial detail on Scaled Agile’s website, FirstRoot is using SAFe® to guide our efforts and prepare our company for scale.
No, We’re Not Perfect
All that said, we’re not perfect. No agile team or SAFe Agile Release Train (ART) is. Sometimes teams tackle backlog items that turn out to be a lot harder than expected or estimated. Not surprisingly, this is often associated with architectural changes and/or new technology, where the team has little experience to guide critical decisions. And sometimes, precisely because these same teams are comprised of committed, caring, responsible developers, they simply ‘dig in.’ They work even harder, but then find it difficult to change course or ask for help.
So, I want to share a series of events about a backlog item that derailed us and how we’ve come together to get back on track. This article will reveal:
- What happens when a refactor unexpectedly becomes a re-architecture
- How to differentiate between learning curves and real trouble ahead
- When to pull the Andon Cord
- How we responded, refining or overhauling many of our practices
- How we regrouped, improved communication, and embrace the new normal
- What high-performance athletes have in common with good software dev teams
My hope is that this helps everyone remember that #SAFe is not just about delivering a reliable stream of value. It also provides the tools and practices that teams can leverage to get a derailed value-delivery train or ART back on the track.
What derailed our train?
Our super-focused, fast-moving, train of awesomeness got derailed. What happened? It’s a familiar, ages-old story in software development: What started as a ‘refactoring’ turned into a ‘re-architecture.’ And although the plot is familiar, I’m happy that SAFe gives us a chance to transform a potential tragedy into a just another agile-life vignette.
Our story starts with Flutter, the tech we have chosen for our client. Although it had been working well, the reality is that Flutter development was new to our team. But that was OK: They’d proven they had the experience to rapidly learn Flutter and deploy a fantastic app to our customers in record time (even by agile teams standards)!
As our app has matured, we realized that some of our earlier design choices were no longer meeting the needs of future requirements. This is NOT technical debt. This is LEARNING. And yeah, I do RANT against organizations framing legitimate learning and evolution as technical debt. It isn’t. But that story is not this story.
The team developed a plan (of course, we plan in agile) and set to work. We decided that most of the work would be done by one member of the team, with support from others. While we normally prefer the entire team to be working together to deliver a Feature, this kind of refactoring is often done alone, with frequent check-ins. (Like ice cream, pairing is good, except when it isn’t.)
The first sign of trouble was a stream of missed deliverables. The second was the solo developer working more hours than normal. A sprint review helped us realize that there was a bigger problem. The tell was the entire team spending a good 10+ minutes debating the implementation of a non-critical Feature because the re-architecture blocked other, higher-priority Features.
It was time to pull the agile Andon Cord.
How did we respond?
We realized that what started as a client-side refactor had morphed into a full-blown re-architecture. So, we responded by pulling the agile Andon Cord. What did that look like?
We stopped development on anything unrelated to the re-architecture.
This was the simplest, most obvious, and most important decision. Get everyone to stop talking and start behaving in sync with the concept, ‘Stop Starting. Start Finishing.’
We restructured tasks to allow every member to work on the re-architecture.
That may seem odd, as the server team doesn’t know Flutter as well as the client team. And while you never want to break Brook’s Law, “adding human resources to a late software project makes it later,” sometimes you need to bend it in your favor. So, to help the team become productive, our main developer focused on increasing the documentation of the existing and proposed solutions and helping others get up to speed (think: lots of diagrams and ‘how to build this’ conversations).
We changed our meeting rhythm, adding more meetings to maintain communication.
A natural response to stress is to communicate less frequently. In a time of crisis it is critical to break this pattern and communicate more often. So, we decided not to just change the DSU during a time of crisis. We threw it out and established an entirely new, more frequent sequence of meetings. Corollary: Make sure your team reviews and confirms your trunk-based dev practices.
We re-architected our test infrastructure and tests.
When you decide to re-architect and reset your architectural runway, remember to consider re-architecting your test infrastructure, too. Sometimes you’ll be OK with your existing test infrastructure. Most of the time, you’ll find that re-architecting your test infrastructure will accelerate the re-architecture.
We decided not to re-estimate until we were truly ready to.
Because markets define optimal release windows for products and services, we are date-driven agilists. That means we need delivery estimates to manage the business, except when estimates don’t help. And in the middle of a refactor, estimates are not helpful because the team can’t meet them. This is where #SAFe Lean Product Management (LPM) shines: Funding dev value streams frees us from project-cost funding. We’ll create estimates again when we’re ready.
Where are we now?
Good news! The concrete actions we took to get our train back on track are working.
We started with the most basic step: confirming that everyone on the team can build and run the client on a full-stack, local instance. As expected, this helped shake out a few assumptions while improving our documentation and creating a foundation for shared work. (Can every member of your dev team start from scratch and build a working system?)
We then reframed the tasks so that every team member could contribute to the goal of the re-architecture. In the process, we were able to identify and more fully leverage some team member experiences. This feels like adding a new superpower!
We re-learned the important lesson: Premature optimization is the root of all evil. Although I dislike using the word ‘evil,’ we had to admit that some of the problems we experienced were self-inflicted! We made some premature optimization choices. By removing them in the re-architecture, we’re now creating something leaner, faster, better.
As a result, we’re closer to a strategic goal of open-sourcing our client application. While not the purpose of this re-architecture, we’re glad it is helping us realize this.
Lastly, we’re increasingly confident that Flutter remains the right choice for our client-side development and that we’re going to realize substantial development benefits in the future.
Recovery and the launching of a new normal
Now that our train is back on track to achieve greatness in schools, what’s next?
We could just celebrate the re-architecture, return to our normal sprint structures, and keep going. To understand why this is bad choice, I have to share some insights from my experience as a high-performance athlete. Most athletes prepare for a significant competition through a series of consistent training programs and smaller competitions. The SAFe equivalent would be a major release being (partially) created through a series of minor updates or features.
What’s interesting is what happens after the significant competition. You might think that athletes simply stop — that we take a long break. As a former world-class athlete, I can assure you that’s pretty rare. Oh sure, we may take a week or off to recover mentally and physically. But we’re not going to be completely idle for weeks or months. Actually, it’s bad to ‘hard stop’ a body trained for high-performance.
Instead, elite athletes take a break and then return to training. Not as intensely, and typically with a bit more playfulness. But we return to the gym and prepare for the next significant competition.
As of TODAY, our agile team is well on its way to completed a significant re-architecture. We have at least another week of work, maybe two. When finished, we need to close the work with a ceremony, such as a lessons-learned retrospective, then move into a recovery phase. Yes, the team will come to work, and yes, they’ll enjoy the benefits of their new architecture. But, we won’t hold them to quite the same expected velocity. We’ll be giving them time to grow into their new normal. And then we’ll watch as velocity naturally improves because the system has been improved.
So, there you have it: Agile doesn’t mean you’re perfect. Instead, it means that you acquire the data needed to make better decisions. Sometimes a refactor becomes a re-architecture. And if you do pull the Andon Cord, you need to focus on ONE THING. When you’re finished, close, recover, and return. If you’ve done it well, the new system will create new a new velocity of greatness.