On the different nature of work
By failing to prepare, you are preparing to fail. — Benjamin Franklin
I was talking a few days ago with another CTO about how do they work on his organization. What difficulties do they find with Scrum and how could they solve them.
What he told me is that they were trying a mix of both Scrum and Kanban. They made this decision because they had urgent work that could come up during the middle of the Sprint. So they implemented a pipeline in order to track this unplanned work and prioritize it constantly.
However, there was a trade-off with this. Their developers were frustrated with this constant change of tasks. They couldn't have the feeling of fulfillment at the end of the Sprint. More tasks were added, while the tasks they originally planned to do were never started.
Scrum proposes to work in iterative small cycles, called Sprints. Usually the Sprint duration is two weeks, but that can change depending on the team and organization.
One of the benefits of working with these small cycles is to be able to reprioritize the work every time a new cycle starts. This is done in order to reduce the risk of investing time on things that aren't relevant anymore.
This view is in opposition to the Waterfall methodology, which proposes long cycles of work, usually quarters and years. And when the teams deliver what they originally planned after a long time, they realize it wasn't what customers expected.
But working in two-weeks Sprint, doesn't prevent that some unplanned and urgent work emerge during it. New reactive tasks can happen at any time:
- A bug might have appeared after a recent release with huge impact.
- A small improvement might be requested by an important customer.
- An operational issue on the infrastructure might be causing downtime on the platform.
- A small change might be requested by the marketing team for their new campaign launch.
- Etc
All these tasks can happen without ringing the bell. Some of them can be delayed to the next sprint, others sadly can't.
Scrum has a way to solve this problem. It proposes that every task should be estimated in story points. The planned tasks have to be estimated during the Sprint planning and the reactive ones as soon as they appear. So if you add a new reactive task to the Sprint plan, you should remove another one with a similar size in order to ideally keep the "velocity" constant. The velocity is the total number of story points a team can deliver on each Sprint. I have a lot of critics to the story points and this velocity metric, but that is a topic for another time.
Still, we have another problem. Because if during a cycle we add and remove tasks, then we are in a constant reprioritization.
The constant reprioritization leads to problems in computing known as scheduling problems. If the team only pays attention to the reactive work, then the originally planned and strategically important work for the business never gets done. This is called "resource starvation".
And another problem is that the planned work is usually very different in nature from the reactive work. The planned work takes more thinking time as it usually involves bigger and more complex features. Different (business, technical, design, etc.) questions need to be answered in order to start building it.
The reactive tasks are of a different species. They don't require too many decisions. A bug might need some deep research. But once you find the root cause, you just have to decide how to solve it and then implement it. Bonus points if you also implement an automatic regression test.
This difference on the essence of the tasks cause a constant context switch problem for the team. And the more complex the planned work, the more time is wasted with this context switch.
Sadly, these problems don't get solved with the Kanban methodology. If we, as a thought experiment, reduce the Scrum Sprint cycle to the minimum (hourly, daily, etc.), then we are in a Kanban style of work. But still mixing tasks of a different nature, with the same constant reprioritization and the context switching.
One valid objection can be that if a task is urgent, then it is more important than what has been originally planned. But something can be important for the business in the (not so) long term, but in the short term, urgent things need to be solved too.
If you can't allocate time to what's important, it will never happen. The strategic work gets drowned out by the urgent floods.
My opinion in order to solve this problem, and what I suggested to the CTO, was to divide the team into two tracks.
One track with a squad that will be assigned to work on the proactive and planned work only. For this track, they can use the methodology they want. I prefer Shape Up rather than Scrum, but the important thing is that they only work on the strategic projects.
And the other track with another squad that should be assigned with the reactive work only. This track should be working with a Kanban methodology and a constant reprioritization. The frequency of this reprioritization depends on the organization.
These two different squads can rotate its people after each cycle, if needed. So the people don't get bored working just with bugs and small improvements.
How you distribute the number of people on each squad, depends heavily on your business, how much reactive work you have during each cycle, and how much proactive work you need to attack to achieve the business expectations. Also, this distribution can change with each cycle, it shouldn't be rigid.
With this structure you ensure that each cycle the team will be working on what is strategically important for the business, and you will also have capacity to attack the reactive tasks that can emerge.
There are two valid objections that quickly come to my mind with this two-track arrangement.
The first one is that if you don't have any reactive work during the cycle, the time of the squad pre-allocated for the reactive work will be wasted.
But this is a trade-off.
The pre-allocation can be seen as a slack or as an insurance. The slack capacity is the cost that you pay, so your proactive squad can work smoothly. Similar to your car or home insurance. If an event happens, you are glad you have paid for it.
Besides, the reactive squad, can work on minor refactors and minor improvements when they don't have anything urgent to work on.
The other valid objection, that we discussed with the other CTO, was that sometimes the developer who has all the domain knowledge to solve a bug or any reactive task is assigned to the proactive squad, not the reactive one.
My point here, is that the reactive squad isn't working alone. They are part of the team, and they can have a small call or meeting with the domain expert to guide them on how to solve the problem.
This isn't a huge cost in context switching for the domain expert. There are more things that are involved in building software than just thinking about what the solution should be (altering the code, submitting the PR, testing, etc.). Defining the solution is a small part. Having the ownership and ensuring that the task gets done is where the effort resides. And that's the responsibility of the developer working in the reactive squad, not the external domain expert. Also, there is a plus, as this is an opportunity for the domain experts to transfer their knowledge.
And these two tracks in parallel is the way we work in my organization.
We implement this dual track with a squad, which we call the "pivot role", that rotates on each cycle. And we have the proactive squads assigned each to a project that they deliver with the Shape Up methodology.
I don't see this reactive squad as a waste. My personal view is that it is more than insurance. It's the defensive wall that protects the strategic work, which is important for customers and the business, from disruptions that prevent it from getting done.