Eric Ries has just published a post entitled Continuous
deployment for mission-critical applications. In this post he takes a very clear
stand on the suitability of continuous deployment to mission-critical
applications, as follows:
"I want to directly challenge the belief that continuous deployment leads to lower quality software. I just don't believe it. Continuous deployment offers three significant advantages over large batch development systems. Some of these benefits are shared by agile systems which have continuous integration but large batch releases, but others are unique to continuous deployment.
- Faster (and better) feedback... Engineers
working in a continuous deployment environment are much more likely to get individually tailored feedback
about their work.
- More automation... Continuous deployment requires living the mantra: 'have every problem only once.'
- Monitoring of real-world metrics... There are huge classes of
bugs that "work as designed" but cause catastrophic changes in
customer behavior. My favorite: changing the checkout button in an
e-commerce flow to appear white on a white background. No automated test
is going to catch that, but it still will drive revenue to zero.
Continuous deployment teams will get burned by that class of bug only
once.
- Better
handling of intermittent bugs... For
example, consider a bug that happens only one-time-in-a-million uses.
Traditional QA teams are never going to find a reproduction path for that
bug. It will never show up in the lab. But for a product with millions of
customers, it's happening (and being reported to customer service)
multiple times a day! Continuous deployment teams are much better able to
find and fix these bugs.
- Smaller batches... Continuous deployment tends to drive the batch size of work down to an optimal level, whereas traditional deployment systems tend to drive it up."
I could not agree more - continuous deployment is very effective as a software quality improvement strategy. Whether you do BSM, ERP, transaction management or any other mission-critical application, thoughtful continuous deployment is an excellent way to go. The laws of software engineering apply to any kind of application you might be developing and deploying.
I believe, however,
we might have a metrics problem on our hands. What often happens is that
continuous deployment flies at the teeth of the 'Man in the Dock' theory. When
a major disruption happens, we look for a single
point of accountability instead of deciphering the complex pathways to the
disruption. Such a theory in use, of course, leads to less frequent deployments
which in the long run adversely affect software quality.
A major task for Agile Business Service Management is the development of metrics that take us away from 'The man in the Dock' mindset. These metrics need to satisfy two criteria:
- Map software quality to customer value.
- Help us realize that service disruptions are systemic. They are a matter of complicated pathways, not of the incompetence of one individual or another.
Israel, your conclusions on spot on re: metrics focused on customer value (…moving beyond ‘man in dock’ premise) and service disruptions being a “matter of complicated pathways.” Much, much too often the attitude towards IT Management is “who/what caused this and get it fixed now” which generates a huge degree of CYA or inefficient finger pointing. How often does the business side of BSM actually acknowledge the reality of complicated pathways in their side of the discussion? Conversely (…and as the primary focus of BSM) how often do IT organizations focus on metrics of business value instead of cause of disruption and complicated pathways? One more example of how BSM needs effective communication from both IT and business sides.
The pendulum actually swings back and forth between 'The Man in the Dock' theory and 'The Man on the Couch' theory. None is too useful IMHO.
I have some preliminary thoughts on an analytical framework for getting a handle on the pathways to disruption. To test its usefulness, I would need a very well documented case study in which development, IT and the business ended up with a major disruption on their hands. It needs to be sufficiently detailed to make deductions at both the conceptual and operational levels and to figure out how A led to B, that beget C, which caused D, etc. Any suggestions for such a case study?!
Israel
You are asking for a company to volunteer the details in how they got to the fiasco. Without a prior relationship with such a company, it is hard to find one willing to expose their mistakes to outsiders for purposes of analysis and documentation. I'd keep asking any one that might have such a lead ...and somewhere at sometime there will be someone willing to engage.