I might drink this beer

Logo

I am a jack of some trades and definitely a master of none. That doesn't mean I haven't had some experience and a handful of opinions to go with it. All of the opinions expressed here are my own and do not reflect the views of my employer

@mjmengerGitHub

Encrypted chat via Keybase markjmenger

mstdn.socialmastodon

mastodon.f2a.iomastodon

pdx.socialmastodon

RSS

summer
sweet
devops
dark
year-round
big-ip
automation
hashicorp
terraform
winter
brewing
yeast
complexity
fragile2agile
technology
history
lean
modernization
evolutionary
revolutionary
innovation
strategy
security
agility
linguistics
ai
architecture

21 March 2022

Fragile to Agile: What to do

by Mark J Menger

“The future ain’t what it used to be.”

-Yogi Berra

Real progress in any technical field requires advancements that include people, processes, and technology. The golden triad rarely moves in unison, however. The Agile and DevOps movements that have transformed software development developed in an environment where technology outpaced people and processes. On the technology front, instant access to compute resources offered by public and private platform services opened up new possibilities for rapidly deploying digital services. Yet, for years, traditional software development processes and ways of working remained aligned to an earlier era in which deployment of new services was measured in months or years. This opened a window of competitive opportunity for small, nimble start-ups, unencumbered by outdated processes and siloed work structures, to out-innovate and disrupt their lumbering rivals.

Unfortunately, not all of the start-ups were engaged in legitimate commerce. Many criminal organizations were adopting the same Agile and DevOps practices, allowing them to rapidly adapt to slowly changing security postures.

Agile and DevOps aimed at culture and processes that held organizations back - bringing software development speed back in line with technical capabilities.

Theory of constraints

Processes that worked admirably when delivery cycles were measured in months were still slavishly adhered to when technology allowed delivery in minutes.

A myopic view of time to value, often driven by naive net present value calculations, drove control slackening. This resulted in short-term gains in solution delivery at the cost of a reduction in quality. Mean time between failures (MTBF) and mean time to resolution (MTTR) headed in the wrong direction.

Under the banner of Fail Fast, supposed agile advocates took a similar short-sighted view seeing failure as a virtue on its own. Once again

Now there is new friction slowing progress. In most organizations today, developers can move fast and deploy at least some new services in the cloud with or without IT support. But the loosening of controls that have made that speed possible has created a new problem: In most organizations, the technology ecosystem that supports those applications is buckling from overwhelming complexity. There are simply too many different tools in many environments, deployed and managed with different systems.

IT complexity has many impacts. It impedes business growth by bogging down resources that could be used for innovation in unproductive “keeping the lights on” activities. Information sharing and employee mobility decrease because skills and knowledge do not translate easily across teams. Complexity also degrades security on multiple fronts: by expanding the threat landscape, degrading the capacity of operators to identify and respond to new threats, and by making the mitigation process slow and risky.

My experience with product CVEs brought home the alarming impact of complexity on security response for me. CVEs are an unfortunate fact of life. Many vendors employ dedicated teams of security researchers to identify and share threat data. That doesn’t make us immune to vulnerabilities, but ideally, it puts us in a better position to respond rapidly and effectively. As soon as a new CVE is identified, we shift into high gear, working to identify and implement appropriate remediation to protect our environments. In an ideal situation, once any organization publishes the CVE and its remediation, we address the issue by the end of the business day. It’s never a happy situation, but it is our best defense.

Despite this process of identifying and publishing remediations, I have been repeatedly dismayed to discover unpatched systems months after published remediations. Organizations have been living with a known security vulnerability for months. As I dug in to understand why, I realized the slow response was not because customers failed to understand or did not care about the risks. I understood the root problem was rigid IT architectures and inelastic IT processes that simply didn’t allow a faster response.

This needs to change.

The existence of complex systems is not going to change. However, complexity’s nature, magnitude, and impact are within an organization’s control. Too many of the complex systems we build and operate today are fragile. In other words, they are systems operated from a position of fear – in which change represents a threat. Change breaks fragile systems. In a world where change is a constant and necessary presence, fragility presents an unacceptable existential risk.

To respond to that, we need to create anti-fragile systems built with systems and processes that withstand the adverse impacts of change or even strengthen and improve as a result of change.

The good news is that some organizations are figuring this out, and many are sharing their experiences to help others improve. If you haven’t read it already, a great place to start is with the book, Accelerate by Dr. Nicole Forsgren and Jez Humble and Gene Kim, which distills multiple years of research published in the State of DevOps report. They identify six key practices that differentiate the handful of high-performing organizations from low performers. These are:

They also propose metrics to identify whether the organization is succeeding. It’s important to note that these metrics need to improve together. Increasing the frequency of delivery doesn’t count if nine out of 10 deployments fail. Fundamental transformation occurs when performance improves across all of these.

Moving from a position of fragility to one of anti-fragility, adaptability, and agility is not an overnight proposition. I’ve worked with organizations who have managed this successfully – and they invariably and necessarily approach it as a multi-year journey. They see benefits from technology, people, and process improvements along the way, but essential transformation takes time. In the case of one large e-commerce customer, they achieved orders of magnitude improvements across these metrics over a two-year period – which resulted in significant business benefits as they increased their rate of innovation, improved customer satisfaction, and saw their share price rise.

I recommend that customers approach IT system decisions from a systems mindset that prioritizes overall business outcomes over isolated performance measurements. Success should be reflected in how well we enable:

rapid detection and neutralization of security threats

improved business and application performance and resilience

timely deployments of new apps

easy, unified policy across environments

In addition to the practices recommended by Dr. Forsgren and the team, I also have a perspective on a better framework for making architectural decisions, one that focuses on eliminating unnecessary complexity and better leveraging automation and emerging AI capabilities to create and adopt an adaptive system over time.

Diagram

Description automatically generated

It is crucial to consider architectural service tiers, looking for joint services at each tier and consolidating where possible.

Committing to automating as much of the delivery pipeline as possible is essential, with automated guardrails enforcing security and network policies.

The shift from thinking of IT services in discrete silos to improving and optimizing the service delivery architecture as a whole puts us all on a path towards a better, more adaptive future – in which change can represent a genuine opportunity for improvement and differentiation rather than a threat to a fragile status quo.

tags: fragile2agile - devops - lean - complexity