by
James Palmer
| Dec 09, 2019

In a world where your database is the core of your business, your backups are business critical. In a world where automation is becoming more and more important, creating a process that is easily maintainable, robust and reliable has
never been easier.
Automation tools like Ant and Jenkins are not just the playground of your deployment teams. Their power can be leveraged to greatly improve the robustness of all your business-critical needs.
In this article we will describe how to implement a backup strategy that you can easily maintain; one that can be scheduled with ease; and one that will provide reliable reporting of failures should these arise.
In the past, backup scripts have often been based on shell scripts, executed using Cron or Task Scheduler, with increasingly complex mechanisms for error reporting and logging. Additionally, with the trend of moving from on premise to the
cloud, a lot of database administrators are finding it necessary to move from Windows to Linux. As a result, a platform independent solution is really important as you can implement it anywhere.
Creating a strategy that will protect your business risk in case of system failure has been complex. Let’s look at what we can do to improve that situation.
There is a lot of differing opinion on what constitutes a reliable backup. In fact, the implementation of a good strategy will vary greatly from situation to situation. But to make things easier we will use the so-called 3-2-1 Strategy
for implementing our solution. It should be trivial to adapt this to your specific architecture though.
This is a bare minimum strategy. You can, of course, have more than this suggests.
The 3-2-1 strategy is very simple: 3 copies of the data, in 2 storage types, 1 off-site copy.
Your production system is the first copy of your data – constantly being updated by the system. Then you need to have 2 other copies of your data – backups.
It doesn’t matter which types of storage, but you’re going to want to have 2 types of storage in play, in case one fails entirely. We’re talking HDD/SDD, external storage, network storage, cloud storage, etc. here.
This is your final level of insurance, in case of a complete system failure or environmental disaster at your primary location. You need to keep one of your backups close to the server – getting a backup to the server from
a cloud storage location could be an expensive and time-consuming job. But if your entire data centre goes down you need to have something to fall back on.
Being able to recover your database is a good step, but if you lose your entire system, you need to be able to recover that too, and quickly. So, your backup needs to include steps to recover your operating system configuration, your parameter
files, your source code: everything to return your system to a fully working position.
Just taking a backup of your system is not enough. You need to be able to test that you can use the backups to provide a working environment. And you need to be able to test this regularly.
This step not only ensures confidence in your strategy, it also enforces a good backup of everything involved in your system. If something is missing from the backups you will know about it very quickly in the testing process.
This blog article is aimed at the OpenEdge Database side of things. So, we will assume that your source code, application specific items, and OS are backed up through other means. In most situations this will be the case nowadays
anyway, with virtualization being more and more prevalent.
There are all manner of tools for providing backups, but we have found the easiest in terms of configuration and reliability are Ant and Jenkins. They also make the platform independence of your backups so much easier.
Apache Ant is a Java library and command line build system. Because of the use of Java, it is natively platform independent. Ant takes a build file and executes it. It has built in hooks which allow it to terminate
gracefully when errors are encountered. This ensures the robustness of your strategy, particularly as you can ensure destructive steps such as clearing up old backups only happen when you know you have a good backup.
Ant build files are XML based which takes a little getting used to, but it does mean there isn’t a massive learning curve for getting started with it. There are also vast resources on the internet to help you solve all manner of problems with it. Out
of the box, Ant provides platform independent file manipulation tools and zip/unzip capabilities which are critical for a backup solution.
Jenkins is a Java based automation engine. Again, it’s platform independent. It’s an industry standard, is well maintained and robust. It has many advantages over Cron/Task scheduler, in particular that it can be monitored and
configured via a web interface from any remote machine that is able to access the IP address. It can be locked down easily with security to ensure nobody can access it maliciously. It’s also very simple to implement email functionality
to report on success or failure.
A quick aside here, as we see more and more businesses relying on 3rd party backup tools and virtual host backups for their entire system. These tools are great for ensuring your OS and config are backed up, but they are not a substitute
for a database backup. Taking a copy of a running database, any transactional database, not just Progress, is a huge risk. 9 times out of 10 it will probably work, but you wouldn’t want the one failure to be when you need
it most.
Yes, you can set a quiet point on the database, but there are still risks involved.
Probkup, the tool provided by Progress for database backups, is still the most robust and reliable tool for ensuring you can recover your databases.
We have provided a basic solution in terms of an Ant script for backing up databases. You can take a look at it and take a copy from our GitHub repository. This script is available under the MIT license. It is intended as a starting point for you to get started with the process.
It is not a complete out of the box solution.
Full documentation is available on GitHub, but here is a brief outline of the various steps of the script in the order of execution. If any of these steps fail, we will halt the continuation of the script and report an error.
Bill of Materials. Not at all necessary, but incredibly useful for problem determination in future. Particularly if you’re running this on a schedule and it fails. It provides a listing of any properties and settings at runtime.
The most important section. Uses Probkup to take a time stamped backup of the database to a local disk. Note that we have actually taken the logic of the probkup batch file and reproduced it rather than calling probkup directly.
This is because Ant is not very good at capturing return codes from Windows batch files. You have to wrap the execution in some logic to ensure an error causes a stop. So, it’s easier here just to reproduce the logic as it’s
not complicated. This issue is not present on other operating systems.
The -com switch means empty blocks are not backed up. It doesn’t compress the backup in the tar/zip sense, but it does make the backup smaller in that empty space is missed.
The -Bp switch on the online backup instructs the process to reserve and use X buffers in the primary buffer pool, rather than using the full primary buffer pool for the job. As the whole database is read in a backup, the whole buffer pool would
be sullied by the backup. The -Bp parameter means only that number of buffers are used by the backup. A value of 10 is sufficient in most cases, but we allow you to configure this in the properties file.
Runs a partial verify of the backup files. This checks the block level integrity of it. It is not a substitute for testing the backups periodically but is a good indicator that they are ok.
Take the backup we created and copy it to an offsite location. You could easily wrap a zip command around this step to compress the backups, and/or change this to use some sort of cloud storage solution.
Copy a comma delimited list of important config files to the onsite and offsite backup locations. This would usually include the structure of the database – the .st file. Without this any restore could be difficult to get going. Any
.pf files you use to start the database should also be backed up with your database.
Copies After Image Archive files offsite. In reality you would probably have this step in its own script, executing much more frequently. After Image files should be copied offsite as soon as possible. This section here is to demonstrate
how easy it is to do with Ant.
Removes old After Image files in the Archive Directory based on a retention policy you define. You want to keep old After Image files, but not for too long as they will end up taking up space.
As with After Image files, you want to remove old backups based on some retention policy.
Error Messages and Alerting
Jenkins is able to natively execute Ant scripts. So as a result, it handles errors from your script easily. As a result, building in some sort of error reporting mechanism is trivial. We would recommend only reporting
errors though. Do not report successes. After a while this would lead to alert fatigue meaning real alerts get missed.