Backup to S3 with Cronjobs

Feb 14, 2022

Backing up your stuff is one of those things people know how important it’s, but never actually make it happen… at least not until it’s too late. I’m trying to prevent the last part, so here I’m going to tell you how to set up a simple backup workflow using Cronjobs and AWS S3.

What I want to backup

I’m self-hosting an instance of GoatCounter, an alternative to Google Analytics, to gather metrics of this blog. GC is simple, fast, and every data collected is saved in a single SQLite3 file.

I don’t want to lose my blog’s metrics if something happens to the server hosting it, so let’s use this example to show you how to set up a backup workflow, using Cronjobs and AWS S3.

When thinking about data backup, three important questions must be thought about:

Which frequency you want to backup your data?
For how long you want to keep the backups stored?
Do you know how to use the stored data?

For my use case, the answers are straightforward. Keeping a month of daily backup should be enough. To restore GoatCounter in case of any issue with the server, I just need to move the backup file into the correct folder.

AWS Setup

Despite AWS Console being known for its complex interface, what we need to do is quite simple. First, we’ll create the S3 bucket where the backup files will be uploaded (for simplicity, you can read “bucket” as “folder”). Then, we need to create a user that will be used to authenticate to the AWS account when using its official CLI. Finally, an IAM policy is required to give the user access to the bucket created. Now, let’s see in details how to do each one of those steps.

Creating the S3 bucket

On Create bucket page, you just need to define the bucket name. All the other options can be left with their default values. I’m going to use rgth-backup as the name of the bucket in this tutorial.

Be aware of rules for bucket naming.

Creating the IAM policy

Our IAM policy will define permission to publish files to our new S3 bucket. On Create policy page, you can define the policy with a visual editor or by using JSON. Let’s use the visual editor:

Select S3 at the service section
The only action the user will do is to upload files to the bucket, so you can select just PutObject.
In the resources section, it’s a good and (safer practice) to be as specific as possible. At Add ARN link, give the bucket name and select Any as the object name.

After that, you can jump the Tag section and in the last step, give the policy a name. I’m using rgth-backup-policy as the policy name in the rest of the tutorial.

See more about S3:PutObject.

Creating the IAM user

Creating a new IAM user is also a quick process. On Add user in the IAM Users page, give the user a name and at the access type select Access key - Programmatic access, as this user will be used to upload files to S3 programmatically, not through the website.

In the Permissions section, select the policy we have just created at the tab Attach existing policies directly.

Tags section can be ignored, but after that, in the Review section, confirm you have selected Programmatic access as access type and the correct policy.

If everything goes right, the console will show you a success message and, the important part, it’ll give the user’s Access key ID and Secret access key. This will be used in the next step when configuring AWS CLI.

Server-side

The first step on the server-side is to install AWS CLI program, which we’ll use to send files to S3. On Linux, you can download and install using the following commands:

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

Take a look at the official guide for more information on other platforms.

Once that’s finished, we need to tell the AWS CLI to use the credentials of the IAM user we created. There are a couple of ways to configure it, a quick one is to simply create the file ~/.aws/credentials with the following content:

[default]
aws_access_key_id=<replace-with-access-key-id>
aws_secret_access_key=<replace-with-secret-access-key>

See more about configuring AWS CLI.

With the AWS CLI installed, we can send files to the backup folder using the command below:

$ aws s3 cp /path/to/file s3://bucket-name/folder/file

Scheduling backup

The last missing part is making sure that the command is executed automatically, according to a specific schedule. Most, if not all, Linux distributions comes with a specific tool for that: cron.

Cron is another CLI available on Linux that allows us to configure commands to be executed periodically at fixed times, dates or intervals. To set up a new command on cron, run crontab -e. It’ll open the configuration file in the default text editor on your server.

Each line of the configuration file represents a job, and it looks like this:

* * * * * /absolute/path/to/cli --options

Each * have a different meaning and can be written to specify when the following command should be executed.

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of the month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
* * * * * <command to execute>

From Wikipedia.

I want to send the backup file to S3 every day at midnight, so this is what I have configured:

0 0 * * * /usr/local/bin/aws s3 cp /sites/db.sqlite3 s3://rgth-backup/daily.db.sqlite3

Once you save the configuration file, cron will run the command the next time it matches the new rule.

Keep in mind that the absolute path of AWS CLI might be different on your machine. At my server, it was installed in /usr/local/bin/, to confirm at yours, use the command:

$ which aws
/usr/local/bin/aws

Also, if you noticed that the file name going to S3 is always the same, you might think we’re overwriting it in the bucket. That’s not the case, S3 actually creates a new version of the file if there’s one with the same name present in the bucket.

With all of that done, I was able to see new versions of the database being sent to S3 every day. Once you go through the process of creating a new account, everything else should be quick to do. I’ll leave a couple of resources that were useful to me:

crontab.guru - super helpful when creating non-trivial rules on cron.
Guide on how to setup lifecycle configuration on a bucket - automatically deleting old versions of files.

\o/