In the margins of the commercial products aimed at backup for Kubernetes clusters – Kasten from Veeam, Trilio and Pure’s Portworx – the open source project from Velero aims at becoming a standard.
“Velero’s ambition is to take into account the most possible scenarios,” said Shubham Pampattiwar, chief engineer at Red Hat, who oversees contributions to Velero, and who met ComputerWeekly’s sister publication LeMagIt at a recent IT Press Tour event.
“We have developed, for example, hooks or modules that quiesce an activity for the time needed to backup its data so as not to save it with incoherencies,” said Pampattiwar.
“But also modules that will carry out asynchronous backups so that data is backed up without stopping production. And an engine that parallelises several backup processes and/or restores so that activity can be restored as quickly as possible in case of cyber attack.”
It’s quite probable that the likes of Dell, Veritas or IBM could integrate Velero as a Kubernetes extension to their backup products, while Red Hat and VMware could enhance their Kubernetes offers, namely OpenShift and Tanzu, with a native backup function.
An assembler of storage processes
Initially called Heptio Ark, Velero offers just three functions for now. These are backup scheduling, backup and recovery. These take the form of a CRD – a Kubernetes CustomResourceDefinition – or a functional extension to Kubernetes and its configuration that’s defined in the ectd registry to configure the entire cluster.
Velero doesn’t implement backup functionality as such – rather, it acts as an engine to manage functionality on the cluster that can carry out backup. It manages snapshots, for example, offered through the CSI drivers provided by vendors for block storage, Restic or Kopia for file backup, or the Kubernetes API for emergency volumes in object storage mode.
“The availability of a growing number of modules that can integrate with surrounding infrastructure is the advantage of Velero,” said Pampattiwar. “For example, it is due to this that we can connect via API to cloud hosts to protect their resources. For the user, everything is transparent. They schedule their backup or carry out restores without having to be concerned with the underlying infrastructure.”
Drilling down, the user runs or schedules the running of commands such as “velero backup create <name of backup>” and that launches the right scripts for the right APIs in the underlying infrastructure.
Concerning the infrastructure that hosts the backups, that can be an S3 cloud volume or file volume. All there needs to be is a module that defines the destination storage that the admin can point to in backup settings. The admin can enter, for example, “–provider aws” or “–provider Portworx” followed by necessary details such as the name of the volume or access credentials.
In the same way, the hooks are maintenance scripts adapted to run before or after backup functions.
For example, to backup files of a functioning pod running on a Linux system it’s enough to define a “pre-hook” during pod deployment that asks for a container to run the command “/sbin/fsfreeze” – where “freeze” stops access before the backup – then the command “/sbin/fsfreeze” to unfreeze it and reactivate I/O. In the same way, it can act on requests formulated by application APIs, sent in JSON format, at restore time, for example.
A plug-in engine for backup
Pampattiwar admits that to execute all backup functions from the command line could be complex for system administrators. But according to him, it’s a detail.
“The attraction of Velero is that it is open source,” said Pampattiwar. “That means it is useable by all storage vendors who have developed a CSI driver for their solution that is recognised by Kubernetes. Now vendors can integrate Velero functionality into a graphical console to administer systems.”
“The challenge for Velero is to focus on the mechanisms,” said Pampattiwar. “It’s a platform that’s sufficiently raw that anyone can come and propose improvements that will be useful for lots of people. It is up to everyone how they want to implement the final product so that it’s easy for their customers to use.”
Pampattiwar explained that storage vendors can only offer local backups with their CSI drivers. The first issue here is that these backups are potentially non-functional because they do not make sure that data is coherent before and after backup operations. The second is that these backups aren’t restorable to a different infrastructure. Integrating Velero into their products would offer reliable, multicloud protection.
“The commercial backup products are pre-packaged. Velero doesn’t have that ambition for now. Instead Velero wants to be a plug-in engine, capable of backup for everything, useable by new storage products as well as the historic incumbents,” said Pampattiwar, who promised regular arrivals of new functionality to smooth backup processes.