Can you afford not to do continuous deployment?
Continuous deployment is the practice of regularly (more than daily) deploying updated software to production.
Arguments in favour continuous deployment often focus on how it enables us to continually, regularly, and rapidly deliver value to the business, allowing us to move fast. It’s also often discussed how it reduces release-risk by making deployments an everyday event – with smaller, less risky changes, which are fully automated.
I want to consider another reason
Unforeseen Production Issues
It can be tempting to reduce the frequency of deployments in response to risk. If a deployment with a bug can result in losing a significant amount of money or catastrophic reputation damage, it’s tempting to shy away from the risk and do it less often. Why not plan to release once a month instead of every day? Then we’re only taking the risk monthly.
Let’s leave aside the many reasons why doing things less often is unlikely to reduce the risk.
There will always be things that can break your system in production that are outside your control, and are unlikely to be caught by your pre-deployment testing. If you hit one of these issues then you will need to perform a deployment to fix the issue.
How often do we run into bugs in software due to failing to anticipate some detail of time and dates. We hear stories of systems breaking on 29/02 nearly every leap year. Have you anticipated backwards time corrections from NTP? Are you sure you handle leap-seconds? What about every framework and third party service you are using?
Yes, most of these can be tested for in advance with sufficiently rigorous testing, but we always miss test cases.
Time means that we cannot be sure that software that worked when it was deployed will continue to work.
If you production system is overloaded due to unforeseen bottlenecks you may need to fix them and re-deploy. Even if you do capacity planning you may find the software needs to scale in ways that you did not anticipate when you wrote your capacity tests.
When you find an unexpected bottleneck in a production system, you’re going to need to make a change and deploy. How quickly depends on how much early warning your monitoring software gave you.
Third Party dependencies
That service that you depend on goes down for an extended period of time, suddenly you need to change your system to not rely on it.
Do you handle all the infrastructure related issues that can go wrong? What happens if DNS is intermittently returning incorrect results? What if there’s a routing problem between two services in your system?
Third Party Influence
Monitor Monitor Monitor