Backup and Restore

GBAR - Graph Backup and Restore

Version 2.1.3 to 2.3. Copyright © 2019 TigerGraph. All Rights Reserved.

GBAR (Graph Backup And Restore), is an integrated tool for backing up and restoring the data and data dictionary (schema, loading jobs, and queries) of a single TigerGraph node. In Backup mode, it packs TigerGraph data and configuration information in a single file onto disk or a remote AWS S3 bucket. Multiple backup files can be archived. Later, you can use the Restore mode to rollback the system to any backup point. This tool can also be integrated easily with Linux cron to perform periodic backup jobs.

Introduction and Syntax

The current version of GBAR is intended for restoring the same machine that was backed up. For help with cloning a database (i.e., backing up machine A and restoring the database to machine B), please contact support@tigergraph.com .

Synopsis
Usage: gbar backup [options] -t <backup_tag>
       gbar restore [options] <backup_tag>
       gbar config
       gbar list

Options:
  -h, --help     Show this help message and exit
  -v             Run with debug info dumped
  -vv            Run with verbose debug info dumped
  -y             Run without prompt
  -t BACKUP_TAG  Tag for backup file, required on backup

The -y option forces GBAR to skip interactive prompt questions by selecting the default answer. There is currently one interactive question:

  • At the start of restore, GBAR will always ask if it is okay to stop and reset the TigerGraph services: (y/N)? The default answer is yes.

Changes between v2.0 and v2.1

Config

  • For S3 configuration, the AWS access key and secret are not provided, then GBAR will use the attached IAM role.

  • You can specify the number of parallel processes for backup and restore.

  • If GSQL authentication is enabled, you must provide a username and password.

Backup

  • A backup archive is stored as several files in a folder, rather than as a single file.

  • Distributed backup performance is improved.

Restore

  • To select a backup archive to restore, the full backup name must be specified.

  • Restore asks fewer interactive questions than before:

    • The user must provide a full archive name; there is no option to select the latest from a set of archives.

    • GBAR restore does not estimate the the uncompressed data size and check whether there is sufficient disk space.

Config

gbar config

GBAR Config must be run before using GBAR backup/restore functionality. GBAR Config will open the following configuration template interactively in a text editor. Using the comments as a guide, edit the configuration file to set the configuration parameters according to your own needs.

Synopsis
 # Configure file for GBAR
 # you can specify storage method as either local or s3.

 # Assign True if you want to store backup files on local disk.
 # Assign False otherwise, in this case no need to set path.
 store_local: False
 path: PATH_TO_BACKUP_REPOSITORY

 # Assign True if you want to store backup files on AWS S3.
 # Assign False otherwise, in this case no need to set AWS key and bucket.
 # AWS access key and secret is optional. If not specified, it will use
 # attached IAM role of the instance.
 store_s3: False
 aws_access_key_id:
 aws_secret_access_key:
 bucket: YOUR_BUCKET_NAME

 # The maximum timeout value to wait for core modules(GPE/GSE) on backup.
 # As a roughly estimated number,
 # GPE & GSE backup throughoutput is about 2GB in one minute on HDD.
 # You can set this value according to your gstore size.
 # Interval string could be with format 1h2m3s, means 1 hour 2 minutes 3 seconds,
 # or 200m means 200 minutes.
 # You can set to 0 for endless waiting.
 backup_core_timeout: 5h

 # The number of processes to be created during compressing backup archive.
 # Compressing in parallel can gain improved performance.
 # The same number of processes will be spawned for decompression on restore.
 compress_process_number: 8

 # Need to put gsql user/passwd here if gsql authentication is on
 gsql_user:
 gsql_passwd:

Backup

gbar backup -t <backup_tag>

The backup_tag acts like a filename prefix for the archive filename. The full name of the backup archive will be <backup_tag>-<timestamp>, which is a subfolder of the backup repository. If store_local is true, the folder is a local folder on every node in a cluster, to avoid massive data moving across nodes in a cluster. If store_s3 is true, every node will upload data located on the node to the s3 repository. Therefore, every node in a cluster needs access to Amazon S3. If IAM policy is used for authentication, every node in the cluster needs to be attached with the IAM policy.

GBAR Backup performs a live backup, meaning that normal operations may continue while backup is in progress. When GBAR backup starts, it sends a request to gadmin , which then requests the GPE and GSE to create snapshots of their data. Per the request, the GPE and GSE store their data under GBAR’s own working directory. GBAR also directly contacts the Dictionary and obtains a dump of its system configuration information. In addition, GBAR records TigerGraph system version. Then, GBAR compresses each of these data and configuration information files in tgz format and stores them in the <backup_tag>-<timestamp> subfolder on each node. As the last step, GBAR copies that file to local storage or AWS S3, according to the Config settings, and removes all temporary files generated during backup.

The current version of GBAR Backup takes snapshots quickly to make it very likely that all the components (GPE, GSE, and Dictionary) are in a consistent state, but it does not fully guarantee consistency. It’s highly recommended when issuing the backup command, no active data update is in progress. A no-write time period of about 5 seconds is sufficient.

Backup does not save input message queues for REST++ or Kafka.

List Backup Files

gbar list

This command lists all generated backup files in the storage place configured by the user. For each file, it shows the file’s full tag, file’s size in human readable format, and its creation time.

Restore

gbar restore <archive_name>

Restore is an offline operation, requiring the data services to be temporarily shut down. The user must specific the full archive name ( <backup_tag>-<timestamp> ) to be restored. When GBAR restore begins, it first searches for a backup archive exactly matching the archive_name supplied in the command line. Then it decompresses the backup files to a working directory. Next, GBAR will compare the TigerGraph system version in the backup archive with the current system's version, to make sure that backup archive is compatible with that current system. It will then shut down the TigerGraph servers (GSE, RESTPP, etc.) temporarily. Then, GBAR makes a copy of the current graph data, as a precaution. Next, GBAR copies the backup graph data into the GPE and GSE and notifies the Dictionary to load the configuration data. When these actions are all done, GBAR will restart the TigerGraph servers.

The primary purpose of GBAR is to save snapshots of the data configuration of a TigerGraph system, so that in the future the same system can be rolled back (restored) to one of the saved states. A key assumption is that Backup and Restore are performed on the same machine, and that the file structure of the TigerGraph software has not changed. Specific requirements are listed below.

Restore Requirements and Limitations

Restore is supported if the TigerGraph system has had only minor version updates since the backup.

  • TigerGraph version numbers have the format X.Y[.Z], where X is the major version number and Y is the minor version number.

  • Restore is supported if the backup archive and the current system have the same major version number AND the current system has a minor version number that is greater than or equal to the backup archive minor version number.

  • Backup archives from a 0.8.x system cannot be Restored to a 1.x system.

  • Examples:

Restore needs enough free space to accommodate both the old gstore and the gstore to be restored.

GBAR Detailed Example

The following example describes a real example, to show the actual commands, the expected output, and the amount of time and disk space used, for a given set of graph data. For this example, and Amazon EC2 instance was used, with the following specifications:

Single instance with 32 vCPU + 244GB memory + 2TB HDD.

Naturally, backup and restore time will vary depending on the hardware used.

GBAR Backup Operational Details

To run a daily backup, we tell GBAR to backup with the tag name daily .

$ gbar backup -t daily
[23:21:46] Retrieve TigerGraph system configuration
[23:21:51] Start workgroup
[23:21:59] Snapshot GPE/GSE data
[23:33:50] Snapshot DICT data
[23:33:50] Calc checksum
[23:37:19] Compress backup data
[23:46:43] Pack backup data
[23:53:18] Put archive daily-20180607232159 to repo-local
[23:53:19] Terminate workgroup
Backup to daily-20180607232159 finished in 31m33s.

The total backup process took about 31 minutes, and the generated archive is about 49 GB. Dumping the GPE + GSE data to disk took 12 minutes. Compressing the files took another 20 minutes.

GBAR Restore Operational Details

To restore from a backup archive, a full archive name needs to be provided, such as daily-20180607232159 . By default, restore will ask the user to approve to continue. If you want to pre-approve these actions, use the "-y" option. GBAR will make the default choice for you.

$ gbar restore daily-20180607232159
[23:57:06] Retrieve TigerGraph system configuration
GBAR restore needs to reset TigerGraph system.
Do you want to continue?(y/N):y
[23:57:13] Start workgroup
[23:57:22] Pull archive daily-20180607232159, round #1
[23:57:57] Pull archive daily-20180607232159, round #2
[00:01:00] Pull archive daily-20180607232159, round #3
[00:01:00] Unpack cluster data
[00:06:39] Decompress backup data
[00:17:32] Verify checksum
[00:18:30] gadmin stop gpe gse
[00:18:36] Snapshot DICT data
[00:18:36] Restore cluster data
[00:18:36] Restore DICT data
[00:18:36] gadmin reset
[00:19:16] gadmin start
[00:19:41] reinstall GSQL queries
[00:19:42] recompiling loading jobs
[00:20:01] Terminate workgroup
Restore from daily-20180607232159 finished in 22m55s.
Old gstore data saved under /home/tigergraph/tigergraph/gstore with suffix -20180608001836, you need to remove them manually.

For our test, GBAR restore took about 23 minutes. Most of the time (20 minutes) was spent decompressing the backup archive.

Note that after the restore is done, GBAR informs you were the pre-restore graph data (gstore) has been saved. After you have verified that the restore was successful, you may want to delete the old gstore files to free up disk space.

Performance Summary of Example

Last updated