Why your backup process won’t always save you

Julien
Get news

Julien

Production Engineer
Backpacker when production isn't down.
Curious, I enjoy "doing stuff" tech-related with my friends.
If you annoy me I automate you in Bash.
Julien
Get news


Do you know when we start caring about backups ?

When a project begins ? When we build an infrastructure ? Almost every week ?

Nop. Really, the click is usually one’s first data loss. Hopefully, now we can store our data in someone’s else computer the cloud, that reduces the odds of really losing data.

How about when the production server dies ?

Outch.

unix-1

I’ve been working in IT for few years. I’ve always heard about how important backups are. That’s a very hot topic. See how big the backup market is, you’ll get it. Also see the incoming growth of the “Disaster recovery as a service” businesses.

 

Though, considering backups important is a good thing. People now know that is a key factor in their business.

Then, what’s wrong with backups ?

Well, in IT we have data storage issues. Seriously, backups take a lot of space. So, some people started to ask themselves what needs or not to be saved. This is actually smart, if your server runs a fresh Debian 8 install, only running a webserver, why would you backup the entire system ? What is important for you is actually what is different from the basic install. Then you will need to save your webserver’s configuration, your website and that’s it. Instead of saving an entire system, you will save a few files. Good.

We can use that way of thinking for an entire infrastructure. Storage would not longer be an issue if you’d backup only important files.

Easy to tell, how do I know what’s an important file ?

speakingWell… It’s hard. If you have no idea, I strongly recommend to backup your entire system. It’s safer. You could ask the experts working in Operations. They would probably have ideas on what’s important or not. Overall, communicate, ask your colleagues and tell them about the project. If you doubt, don’t play the odds, you will lose against the IT God.

Well at my place we regularly backup everything, how can our process be wrong ?

I can’t tell knowing only that. But I’m pretty sure it is as wrong as everyone’s else.

Why is that ?

Because having backups is fine. Having restorable backups is good. Having a process including backup and restoration testing is perfect.

Resto-what ?

Restorable ! Knowing if your backup is able to get you back in the state before the problem / the data loss (and yes, this will happen).

 

cat

Do you know about Schrödinger’s cat ? If not, see this.

There’s a link between this experiment and backups. We don’t know the backup state, dead or alive, before trying to restore it.

It seems easy, but it is a fact : we don’t do restoration testing. But trust me, it is hard to realise that our backups aren’t restorable when you try to save your production servers after an incident.

It takes what it takes, you can’t only do the half of the work for something that is as important as backups. It is your safety net, you can’t allow yourself to not use it. That means more work, time to do proper testing, but it is worth it.

Do what I say, don’t say what I do !

Haha, right. I’m talking and giving advice but how do I deal with my servers ?

First, outside of work I only own one server. With a friend, it is our go-to server, we have a website, email clients, storage … When we started to install and configure it, we marked down every single package we installed, where every configuration file is and what are the key files of a service. It’s a bit of work but thanks to this document we know exactly what is installed.

We use Borg to backup our data. It is simple to use and pretty good.

Alright, documentation, a tool, but what more ?

SSH and Python. That’s it actually. Consider the server as a Lego tower, the bottom of the tower would be the Operating System etc, the top would be what we have installed over it. Now ask yourself, to retrieve or build the exact same tower, should I copy everything as it is ? Or would it be better to get the bottom of the tower (OS) and re-build automately the top ?

industrial-robot-5Second answer it is. That is the main idea of Insfrastructure as Code (IaC). Instead of copying everything, you deploy on a fresh install the packages / files / binaries you need. That is how we do with our server, this is powered by Ansible.

If a major problem occurs to us, instead of trying to fix a burning house, we would only let Ansible rebuild our server from a fresh install.

Why Ansible ?

Simply because it works. There are other tools : Puppet, Chef, SaltStack, but I let you find our own way. 🙂

Anyways, it is only one among many solutions to a simple problem : You have backups but you don’t know if they work. Try them ! You don’t want to do half of the work. 😉

References :

  1. Wikipedia : Schrödinger’s Cat
  2. Borg : BorgBackup
  3. Ansible : Ansible

Icons made by Freepik from Flaticon is licensed by CC 3.0 BY

_______________________
Sharing is caring😜
Share on FacebookTweet about this on TwitterShare on LinkedIn


Comments