One of the more interesting questions that came up at Pipeline Conference was:
“How can we mitigate the risk of releasing a change that damages our data?”
When we have a database holding data that may be updated and deleted, as well as inserted/queried, then there’s a risk of releasing a change that causes the data to be mutated destructively. We could lose valuable data, or worse – have incorrect data upon which we make invalid business decisions.
Point in time backups are insufficient. For many organisations, simply being unable to access the missing or damaged data for an extended period of time while a backup is restored, would have an enormous cost. Invalid data used to make business decisions could also result in a large loss.
Worse, with most kinds of database it’s much harder to roll back the database to a point in time, than it is to roll-back our code. It’s also hard to or isolate and roll-back the bad data, while retaining the good data inserted since a change.
How can we avoid release-paralysis when there’s risk of catastrophic data damage if we release bad changes?
Practices like having good automated tests and pair programming may reduce the risk of releasing a broken change – but in the worst-case scenario where they don’t catch a destructive bug, how can we mitigate its impact?
Here’s some techniques I think can help.
Release more Frequently
This may sound counter-intuitive. If every release we make has a risk of damaging our data, surely releasing more frequently increases that risk?
There has been lots written about this. The reality seems to be that the more frequent our releases, the smaller they are, which means the chances of one causing a problem are reduced.
We are able to reason about the impact of a tiny change more easily than a huge change. This helps us to think through potential problems when reviewing before deployment.
We’re also more easily able to confirm that a small change is behaving as expected in production. Which means we should notice any undesirable behaviour more quickly. Especially if we are practising monitoring driven development,
Attempting to release more frequently will likely force you to think about the risks involved in releasing your system, and consider other ways to mitigate them. Such as…
Group Data by Importance
Not all data is equally important. You probably care a lot that financial transactions are not lost or damaged, but you may not care quite so much whether you know when a user last logged into your system.
If every change you release is theoretically able to both update the user’s last logged in date, and modify financial transactions, then there’s some level of risk that it does the latter when you intended it to do the former.
Simply using different credentials and permissions to control which parts of your system can modify what data, can increase your confidence in changing less …