1 of 6

System Management

Admin Portal, gamin utility, GBAR backup and restore

File Output Policy

GSQL restricts where a query can produce output to files through a file output policy. The policy consists of a whitelist and a blacklist.

GSQL queries must only output to the directories and their descendants or the files indicated by paths in the whitelist.
GSQL queries cannot output to the directories and their descendants or the files indicated by paths in the blacklist. The blacklist takes precedence over the whitelist.

By default, the file output policy allows outputs to all files.

`GSQL.FileOutputPolicy`

The file output policy is implemented through the system configuration parameterGSQL.FileOutputPolicy, which is a JSON array of strings that represents a list of paths. If there is an exclamation mark (!) preceding a path, the path is on the blacklist. If there is no exclamation mark preceding a path, the path is on the whitelist.

Example

For example, if the value for GSQL.FileOutputPolicy is ["/home/tigergraph", "!/home/tigergraph/documents", "!/home/tigergraph/desktop"], then below are the paths on the white list and on the black list:

Whitelist: /home/tigergraph and all its descendants
Blacklist: /home/tigergraph/documents, /home/tigergraph/desktop and all their descendants.

Since the blacklist takes precedence, GSQL will allow queries to write to all files and directories under /home/tigergraph except the documents and destktop folders.

Edit the file output policy

To edit the file policy, ensure that you are logged in as the TigerGraph Linux user, and run the following command:
```
$ gadmin config entry GSQL.FileOutputPolicy
```

In the prompt, enter the new value for the parameter:

GSQL.FileOutputPolicy [ ["/"] ]: The policy to control file outputs in GSQL queries
New: ["/home/tigergraph", "!/home/tigergraph/app"]
# Whitelist: /home/tigergraph and all its descendants
# Blacklist: /home/tigergraph/app and all its decendants
# Effect: GSQL can output to /home/tigergraph and all its decendants except /home/tigergraph/app

Apply the new configurations and restart GSQL

$ gadmin config apply
$ gadmin restart gsql

After implementing the file output policy, queries that write to paths that are not on the whitelist are forbidden:

GSQL > BEGIN
GSQL > CREATE QUERY fileOutput() FOR GRAPH tpc_graph { 
GSQL >   FILE f ("/home/documents/data.txt");
GSQL > }
GSQL > END

Semantic Check Error in query fileOutput (SEM-2502): line 2, col 7
The path '/home/documents/data.txt' is not allowed by the file output policy.
For more info, please check log at node 'm2': /home/tigergraph/tigergraph/log/gsql/log.ERROR
Failed to create queries: [fileOutput].

If a FILE object is defined with an empty string, it is regarded as a null file. The file output policy will not block the definition of the FILE object, but writing to a null file would cause a runtime error.

Additionally, queries that write to paths on the whitelist, but also on the blacklist are also forbidden:

GSQL > BEGIN
GSQL > CREATE QUERY fileOutput() FOR GRAPH tpc_graph { 
GSQL >   FILE f ("/home/tigergraph/app/data.txt");
GSQL > }
GSQL > END

Semantic Check Error in query fileOutput (SEM-2502): line 3, col 7
The path '/home/tigergraph/app/data.txt' is not allowed by the file output
policy.
For more info, please check log at node 'm2': /home/tigergraph/tigergraph/log/gsql/log.ERROR
Failed to create queries: [fileOutput].

Backup and Restore

GBAR - Graph Backup and Restore

Graph Backup And Restore (GBAR), is an integrated tool for backing up and restoring the data and data dictionary (schema, loading jobs, and queries) of a TigerGraph instance or cluster.

The backup feature packs TigerGraph data and configuration information into a directory on the local disk or a remote AWS S3 bucket. Multiple backup files can be archived. Later, you can use the restore feature to roll back the system to any backup point. This tool can also be integrated easily with Linux cron to perform periodic backup jobs.

Syntax

The current version of GBAR is intended for restoring the same machine that was backed up. For help with cloning a database (i.e., backing up machine A and restoring the database to machine B), please contact support@tigergraph.com.

Synopsis

Usage: gbar backup [options] -t <backup_tag>
		      gbar restore [options] <backup_tag>
		      gbar list [backup_tag] [-j]
		      gbar remove|rm <backup_tag>
		      gbar cleanup
		      gbar expand [-a] <new_nodes>
		           New nodes must be written in <name>:<host> pairs separated by comma
		           Example:
		               m1:192.168.1.2,m2:192.168.1.3,m3:192.168.1.4

		Options:
		 -h, --help     	Show this help message and exit
		 -v             	Run with debug info dumped
		 -vv            	Run with verbose debug info dumped
		 -y             	Run without prompt
		 -j            		Print gbar list as JSON
		 -t BACKUP_TAG  	Tag for backup file, required on backup
		 -a, --advanced    Enable advanced mode for node expansion

The -y option forces GBAR to skip interactive prompt questions by selecting the default answer. There is currently one interactive question:

At the start of restore, GBAR will always ask if it is okay to stop and reset the TigerGraph services: (y/N)? The default answer is yes.

Configure GBAR

Before using the backup or the restore feature, GBAR must be configured.

Run gadmin config entry system.backup. At each prompt, enter the appropriate values for each config parameter.

$ gadmin config entry system.backup
 
System.Backup.TimeoutSec [ 18000 ]: The backup timeout in seconds
New: 18000
 
System.Backup.CompressProcessNumber [ 8 ]: The number of concurrent process for compression during backup
New: 8
 
System.Backup.Local.Enable [ true ]: Backup data to local path
New: true
 
System.Backup.Local.Path [ /tmp/backup ]: The path to store the backup files
New: /data/backup
 
System.Backup.S3.Enable [ false ]: Backup data to S3 path
New: false
 
System.Backup.S3.AWSAccessKeyID [ <masked> ]: The path to store the backup files
New:
 
System.Backup.S3.AWSSecretAccessKey [ <masked> ]: The path to store the backup files
New:
 
System.Backup.S3.BucketName [  ]: The path to store the backup files
New:

After entering the configuration values, run the following command to apply the new configurations
```
gadmin config apply -y
```

Note:

You can specify the number of parallel processes for backup and restore.
You must provide username and password using GSQL_USERNAME and GSQL_PASSWORD environment variables.
```
$ GSQL_USERNAME=tigergraph GSQL_PASSWORD=tigergraph gbar backup -t daily
```

Perform a backup

To perform a backup, run the following command as the TigerGraph Linux user:

gbar backup -t <backup_tag>

Depending on your configuration settings, your backup archive will be output to your local backup path and/or your AWS S3 bucket. If you are running a cluster, there will be a backup archive on every node in the same path.

A backup archive is stored as several files in a folder, rather than as a single file. The backup tag acts like a filename prefix for the archive filename. The full name of the backup archive will be <backup_tag>-<timestamp>, which is a subfolder of the backup repository.

If System.Backup.Local.Enable is set to true, the folder is a local folder on every node in a cluster, to avoid massive data moving across nodes in a cluster.
If System.Backup.S3.Enable is set to true, every node will upload data located on the node to the s3 repository. Therefore, every node in a cluster needs access to Amazon S3.

GBAR Backup performs a live backup, meaning that normal operations may continue while the backup is in progress. When GBAR backup starts, GBAR will check if there are running loading jobs. If there are, it will pause loading for 1 minute to generate a snapshot and then continue the backup process. You can specify the loading pausing interval by the environment variable PAUSE_LOADING.

GBAR then sends a request to the admin server, which then requests the GPE and GSE to create snapshots of their data. Per the request, the GPE and GSE store their data under GBAR’s own working directory. GBAR also directly contacts the Dictionary and obtains a dump of its system configuration information. In addition, GBAR gathers the TigerGraph system version and customized information including user-defined functions, token functions, schema layouts and user-uploaded icons. Then, GBAR compresses each of these data and configuration information files in tgz format and stores them in the <backup_tag>-<timestamp> subfolder on each node. As the last step, GBAR copies that file to local storage or AWS S3, according to the Config settings, and removes all temporary files generated during backup.

The current version of GBAR Backup takes snapshots quickly to make it very likely that all the components (GPE, GSE, and Dictionary) are in a consistent state, but it does not fully guarantee consistency.

Backup does not save input message queues for REST++ or Kafka.

List Backup Files

gbar list

This command lists all generated backup files in the storage place configured by the user. For each file, it shows the file’s full tag, its size in human-readable format, and its creation time.

Restore from a backup archive

Before restoring a backup, you should ensure that the backup you are restoring from is in the same exact version as your current version of TigerGraph.

To restore a backup, run the following command:

gbar restore <archive_name>

If GBAR can verify that the backup archive exists and that the backup's system version is compatible with the current system version, GBAR will shut down the TigerGraph servers temporarily as it restores the backup. After completing the restore, GBAR will restart the TigerGraph servers. If you are running a cluster, and you have copied the backup files to each individual node in the cluster, running gbar restore on any node will restore the entire cluster.

Restore is an offline operation, requiring the data services to be temporarily shut down. The user must specify the full archive name ( <backup_tag>-<timestamp> ) to be restored. When GBAR restore begins, it first searches for a backup archive exactly matching the archive name supplied in the command line. Then it decompresses the backup files to a working directory. Next, GBAR will compare the TigerGraph system version in the backup archive with the current system's version, to make sure that the backup archive is compatible with that current system. It will then shut down the TigerGraph servers (GSE, RESTPP, etc.) temporarily. Then, GBAR makes a copy of the current graph data, as a precaution. Next, GBAR copies the backup graph data into the GPE and GSE and notifies the Dictionary to load the configuration data. Also, GBAR will notify the GST to load backup user data and copy the backup user-defined token/functions to the right location. When these actions are all done, GBAR will restart the TigerGraph servers.

Note: GBAR restore does not estimate the uncompressed data size and check whether there is sufficient disk space.

The primary purpose of GBAR is to save snapshots of the data configuration of a TigerGraph system, so that in the future the same system can be rolled back (restored) to one of the saved states. A key assumption is that Backup and Restore are performed on the same machine, and that the file structure of the TigerGraph software has not changed.

Restore needs enough free space to accommodate both the old gstore and the gstore to be restored.

Remove a backup

To remove a backup, run the gbar remove command:

$ gbar remove <backup_tag>

The command removes a backup from the backup storage path. To retrieve the tag of a backup, you can use the gbar list command.

Clean up temporary files

Run gbar cleanup to delete the temporary files created during backup or restore operations:

$ gbar cleanup

GBAR Detailed Example

The following example describes a real example, to show the actual commands, the expected output, and the amount of time and disk space used, for a given set of graph data. For this example, an Amazon EC2 instance was used, with the following specifications:

Single instance with 32 vCPU + 244GB memory + 2TB HDD.

Naturally, backup and restore time will vary depending on the hardware used.

GBAR Backup Operational Details

To run a daily backup, we tell GBAR to backup with the tag name daily.

$ gbar backup -t daily
[23:21:46] Retrieve TigerGraph system configuration
[23:21:51] Start workgroup
[23:21:59] Snapshot GPE/GSE data
[23:33:50] Snapshot DICT data
[23:33:50] Calc checksum
[23:37:19] Compress backup data
[23:46:43] Pack backup data
[23:53:18] Put archive daily-20180607232159 to repo-local
[23:53:19] Terminate workgroup
Backup to daily-20180607232159 finished in 31m33s.

The total backup process took about 31 minutes, and the generated archive is about 49 GB. Dumping the GPE + GSE data to disk took 12 minutes. Compressing the files took another 20 minutes.

GBAR Restore Operational Details

To restore from a backup archive, a full archive name needs to be provided, such as daily-20180607232159. By default, restore will ask the user to approve to continue. If you want to pre-approve these actions, use the "-y" option. GBAR will make the default choice for you.

$ gbar restore daily-20180607232159
[23:57:06] Retrieve TigerGraph system configuration
GBAR restore needs to reset TigerGraph system.
Do you want to continue?(y/N):y
[23:57:13] Start workgroup
[23:57:22] Pull archive daily-20180607232159, round #1
[23:57:57] Pull archive daily-20180607232159, round #2
[00:01:00] Pull archive daily-20180607232159, round #3
[00:01:00] Unpack cluster data
[00:06:39] Decompress backup data
[00:17:32] Verify checksum
[00:18:30] gadmin stop gpe gse
[00:18:36] Snapshot DICT data
[00:18:36] Restore cluster data
[00:18:36] Restore DICT data
[00:18:36] gadmin reset
[00:19:16] gadmin start
[00:19:41] reinstall GSQL queries
[00:19:42] recompiling loading jobs
[00:20:01] Terminate workgroup
Restore from daily-20180607232159 finished in 22m55s.
Old gstore data saved under /home/tigergraph/tigergraph/gstore with suffix -20180608001836, you need to remove them manually.

For our test, GBAR restore took about 23 minutes. Most of the time (20 minutes) was spent decompressing the backup archive.

Note that after the restore is done, GBAR informs you were the pre-restore graph data (gstore) has been saved. After you have verified that the restore was successful, you may want to delete the old gstore files to free up disk space.

Performance Summary of Example

Database Import/Export

Export/Import is a complement to Backup/Restore, not a substitute.

The GSQL EXPORT and IMPORT commands perform a logical backup and restore. A database export contains the database's data, and optionally some types of metadata, which can be subsequently imported in order to recreate the same database, in the original or in a different TigerGraph platform instance.

To import an exported database, ensure that the export files are from a database that was running the exact same version of TigerGraph as the database that you are importing into.

Known Issues (Updated Feb 16th):

User-defined loading jobs containingDELETE statements are not exported correctly.
If a graph contains vertex or edge types with a composite key, the graph data is exported in a nonstandard format that cannot be reimported.

EXPORT GRAPH ALL

The EXPORT GRAPH ALL command reads the data and metadata for all graphs in the TigerGraph system and writes the information to a zip file in the designated folder. If no options are specified, then a full backup is performed, including schema, data, template information, and user profiles.

Required privilege

EXPORT_GRAPH

Synopsis

EXPORT GRAPH ALL [<export_options>] TO "<directory_name>"

exportOptions ::=
(-S | --SCHEMA | -T | --TEMPLATE | -D | --DATA | -U | --USERS | -P | --PASSWORD pwd)

    -S, --SCHEMA        Only Schema will be exported
    -T, --TEMPLATE      Only Schema, Queries, Loading Jobs, UDFs
    -D, --DATA          Only Data Sources will be exported
    -U, --USERS         Includes Permissions, Secrets, and Tokens
    -P, --PASSWORD      Encrypt with password. User will be prompted

The export directory should be empty before running the command because all contents are zipped and compressed.

Parameters

Options

Output

The EXPORT GRAPH command exports all graphs in the database.

The export contains four categories of files:

Data files in CSV format, one file for each type of vertex and each type of edge.
GSQL DDL command files created by the export command. The import command uses these files to recreate the graph schema(s) and reload the data.
Copies of the database's queries, loading jobs, and user-defined functions.
GSQL command files used to recreate the users and their privileges.

The following files are created in the specified directory when exporting and are then zipped into a single file called ExportedGraphs.zip.

If the file is password-protected, it can only be unzipped using GSQL IMPORT. The security feature prevents users from directly unzipping it.

A DBImportExport_<graphName>.gsql for each graph called <graphName> in a multigraph database that contains a series of GSQL DDL statements that do the following:
- Create the exported graph, along with its local vertex, edge, and tuple types,
- Create the loading jobs from the exported graphs
- Create data source file objects
- Create queries
A graph_<graphName>/ folder for each graph in a multigraph database containing data for local vertex/edge types in <graphName>. For each vertex or edge type called <type>, there is one of the following two data files:
- vertex_<type>.csv
- edge_<type>.csv
global.gsql - DDL job to create all global vertex and edge types, and data sources.
tuple.gsql - DDL job to create all User Defined Tuples.
Exported data and jobs used to restore the data:
- GlobalTypes/ - folder containing data for global vertex/edge types
  - vertex_name.csv
  - edge_name.csv
- run_loading_jobs.gsql - DDL created by the export command which will be used during import:
  - Temporary global schema change job to add user-defined indexes. This schema job is dropped after it is has run.
  - Loading jobs to load data for global and local vertex/edges.
Database's saved queries, loading jobs, and schema change jobs
- SchemaChangeJob/ - folder containing DDL for schema change jobs. See section "Schema Change Jobs" for more information
  - Global_Schema_Change_Jobs.gsql contains all global schema change jobs
  - graphName_Schema_Change_Jobs.gsql contains schema change jobs for each graph "graphName"
  Tokenbank.cpp - copy of <tigergraph.root.dir>/app/<VERSION_NUM>/dev/gdk/gsql/src/TokenBank/TokenBank.cpp
- ExprFunctions.hpp - copy of <tigergraph.root.dir>/app/<VERSION_NUM>dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp
- ExprUtil.hpp - copy of <tigergraph.root.dir>/app/<VERSION_NUM>/dev/gdk/gsql/src/QueryUdf/ExprUtil.hpp
Users:
- users.gsql - DDL to create all exported users and import Secrets and Tokens, and grant permissions.

Example

GSQL > EXPORT GRAPH ALL TO "/tmp/export_graphs/"

Insufficient Disk Space

If not enough disk space is available for the data to be exported, the system returns an error message indicating not all data has been exported. Some data may have already been written to disk. If an insufficient disk error occurs, the files will not be zipped, due to the possibility of corrupted data which would then corrupt the zip file. The user should clear enough disk space, including deleting the partially exported data, before reattempting the export.

It is possible for all the files to be written to disk and then to run out of disk space during the zip operation. If that is the case, the system will report this error. The unzipped files will be present in the specified export directory.

Default Timeout and Session Parameter export_timeout

If the timeout limit is reached during export, the system returns an error message indicating not all data has been exported. Some data may have already been written to disk. If a timeout error occurs, the files will not be zipped. The user should delete the export files, increase the timeout limit and then rerun the export.

The timeout limit is controlled by the session parameter export_timeout. The default timeout is ~138 hours. To change the timeout limit, use the command:

GSQL > SET EXPORT_TIMEOUT = <timeout_in_ms>

IMPORT GRAPH ALL

The IMPORT GRAPH ALL command unzips the .zip file ExportedGraph.zip located in the designated folder, unzips it, and then runs the GSQL command files within to import the graph.

Required privilege

WRITE_SCHEMA, WRITE_QUERY, WRITE_LOADINGJOB, EXECUTE_LOADINGJOB, DROP ALL, WRITE_USERS

Synopsis

IMPORT GRAPH ALL [import_options] FROM "<filename>"

importOptions ::= [-P | --PASSWORD ] [ (-KU | -- keep-users]
    -P,  --PASSWORD     Decrypt with password. User will be prompted.
    -KU, --KEEP-USERS   Do not delete user identities before importing

Parameters

Options

Example

GSQL > IMPORT GRAPH ALL FROM "/tmp/export_graphs/"

IMPORT GRAPH looks for specific filenames. If either the zip file or any of its contents are renamed by the user, IMPORT GRAPH may fail.

IMPORT GRAPH erases the current database (equivalent to running DROP ALL). The current version does not support incremental or supplemental changes to an existing database (except for the --keep-users option)

Loading Jobs

There are two sets of loading jobs:

Those that were in the catalog of the database which was exported. These are embedded in the file DBImportExport_graphName.gsql
Those that are created by EXPORT GRAPH and are used to assist with the import process. These are embedded in the file run_loading_jobs,gsql.

The catalog loading jobs are not needed to restore the data. They are included for archival purposes.

Some special rules apply to importing loading jobs. Some catalog loading jobs will not be imported.

If a catalog loading job contains DEFINE FILENAME F = "/path/to/file/", the path will be removed and the imported loading job will only contain DEFINE FILENAME F. This is to allow a loading job to still be imported even though the file may no longer exist or the path may be different due to moving to another TigerGraph instance.
If a specific file path is used directly in the LOAD statement, and the file cannot be found, the loading job cannot be created and will be skipped. For example, LOAD "/path/to/file" to vertex v1 cannot be created if /path/to/file does not exist.
Any file path using $sys.data_root will be skipped. This is because the value of $sys.data_root is not retained from export. During import, $sys.data_root is set to the root folder of the import location.

Schema Change Jobs

There are two sets of schema change jobs:

Those that were in the catalog of the database which was exported. These are stored in the folder /SchemaChangeJobs.
Those that were created by EXPORT GRAPH and are used to assist with the import process. These are in the run_loading_jobs.gsql command file. The jobs are dropped after the import command is finished with them.

The database's schema change jobs are not executed during the import process. This is because if a schema change job had been run before the export, then the exported schema already reflects the result of the schema change job. The directory /SchemaChangeJobs contains these files:

Global_Schema_Change_Jobs.gsql contains all global schema change jobs
<graphName>_Schema_Change_Jobs.gsql contains schema change jobs for each graph <graphName>.

Cluster import/export

In the current version, importing and exporting clusters is not fully automated. The database can be exported and imported by following some additional steps.

Export from a Cluster

Rather than creating a single export zip file, export will create a file for each machine. Before exporting, specific folders must be created on each server using the following commands:

$ grun all "mkdir -p /path/to/export_directory/GlobalTypes/"
$ grun all "mkdir -p /path/to/export_directory/graph_<graphName>/"

Then run the export command on one server. The EXPORT command does not bundle all the files to one server, and it does not compress each server's files to one zip. Some files, including the data files, will be exported to each server, to the folders created above. Some files will be only on the local server where EXPORT GRAPH was run.

Import to a Cluster

1. Place the files on the import servers

You may only import to a cluster that has the same number and configuration of servers as the data from which the export originated. Transfer the files from one export server to a corresponding import server. That is, copy the files from export_server_n:/path/to/export_directory to import_server_n:/path/to/import/directory

2. Manually modify the loading jobs

On the main server, edit the run_loading_jobs.gsql files as follows.

Find the line(s) of the form: LOAD "sys.data_root/.../<vertex_or_edge_type>.csv" Close to it should be similar line that is commented out which have the "all:" data source directive: #LOAD "all:sys.data_root/.../<vertex_or_edge_type>.csv"

See the example below:

LOAD "$sys.data_root/graph_graph1/localE.csv"
#If running on a cluster, check that the file exists on all nodes then uncomment the line below and comment the line above.
#LOAD "all:$sys.data_root/graph_graph1/localE.csv"
    TO EDGE localE VALUES ($"from", $"to") USING SEPARATOR = "^]", HEADER = "true";

Comment out the LOAD line and uncomment the LOAD all: line. Be sure that you do this for all data source files.

3. Run the IMPORT GRAPH command from the main server (e.g., the one that corresponds to the server where EXPORT GRAPH was run).

Advanced Platform-layer Commands

This page documents a list of advanced Linux commands that simplify platform operations that are performed often during debugging, especially on high availability (HA) clusters. Only the TigerGraph platform owner - the Linux user created during installation has access to the commands on this page.

Users are advised to use these commands only at the guidance and recommendation of TigerGraph support.

Connection between nodes

Connect to another node via SSH

$ gssh <node_name>

This command allows you to connect to another node in your cluster via SSH.

Example:

# Originally on m1
[tigergraph@ip-172-31-88-111 ~]$ gssh m3
Last login: Fri Apr 23 18:24:27 2021
# Now connected to m3 via ssh
[[tigergraph@ip-172-31-93-187 ~]$

Loading data

Offline loading

$ gautoloading.sh (-g <graph_name> -j <loading_job_name> | path_to_config_file)

With huge data volumes, data loading can be time-consuming. If you find yourself often loading huge volumes of data into an empty graph, and your data volume is so large that your loading jobs are taking hours to complete, you might consider using offline loading to speed up data loading.

In order to use offline loading, all the filename variables in the loading job must take an initial path value. After creating the loading job and ensuring that all the data files are referenced correctly in the loading job, use the options -g and -j to specify the graph and loading job to run. During offline loading, your database is focused on loading data and will not be able to handle requests and queries.

Offline loading deletes all existing graph data before it starts. Back up your data before using offline loading.

Options

-g <graph_name>: Name of the graph whose loading job to run
-j <loading_job_name>: Name of the loading job to run

Example

The following command runs the loading job load_ldbc_snb on the graph ldbc_snb:

$ gautoloading.sh -g ldbc_snb -j load_ldbc_snb

You can also provide the graph name and the loading job name with a config file written in Bash:

~/example_config

# the name of the graph for the initial loading
GRAPH_NAME="tpc_graph"

# the name of loading jobs separated by white space
LOADING_JOBS=("load_test")

Once you have the config file, you can run gautoloading.sh with the config file instead of the -g and -j options:

$ gautoloading.sh ~/example_config

File operations

Copy files on the specified nodes

$ gscp <all|component_name|node_list> <source_path> <target_dir>

This command allows you to copy files from the current node to target folders on multiple nodes at the same time. The file or directory on the current node specified by the source path will be copied into the target folder on every node. If the target folder does not exist at the path given, the target folder will be created automatically. You can also specify multiple source files or directories, in which case, the source paths need to be absolute paths, put in quotes, and separated by space.

You can specify the nodes where you want the copy operation to occur in the following ways:

gscp all <source_path> <target_dir> will execute the command on all nodes
gscp <component_name> <source_path> <target_dir> will execute the command on nodes where the component you specified is running
gscp <node_list> <source_path> <target_dir> will execute the command on the nodes you specify in the node list

Example

$ gscp all /tmp/gscp_test /tmp/gscp_test_folder

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

// A copy of gscp_test is on every node
$ grun all 'ls /tmp/gscp_text_folder'

### Connecting to local  server 172.31.91.54 ...
gscp_test

### Connecting to remote server 172.31.88.179 ...
gscp_test

### Connecting to remote server 172.31.91.208 ...
gscp_test

// Copy file to the target folder only on nodes where GPE is running
$ gscp gpe /tmp/gscp_test1 /tmp/gscp_test_folder

// Copy file to a specified list of nodes
$ gscp m1,m2 /tmp/gscp_test3 /tmp/gscp_test_folder

$ gscp all "/tmp/gscp_test1 /tmp/gscp_test2" /tmp/gscp_test_folder

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

// Copies of both files are on every node
$ grun all 'ls /tmp/gscp_text_folder'

### Connecting to local  server 172.31.91.54 ...
gscp_test1 gscp_test2

### Connecting to remote server 172.31.88.179 ...
gscp_test1 gscp_test2

### Connecting to remote server 172.31.91.208 ...
gscp_test1 gscp_test2

Download file from another node

$ gfetch <all|component_name|node_list> <source_path> <target_dir>

This command downloads a file or directory from every specified node to the target directory on the current node.

Example

$ gfetch all ~/test.txt ~/test_folder

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...
scp: /home/tigergraph/test.txt: No such file or directory

// Nothing is downloaded if the file does not exist on a node
$ ls ~/test_folder
test.txt_m1  test.txt_m2

Run commands on multiple nodes

Run commands sequentially

$ grun <all|component_name|node_list> '<command>'

This command allows you to run commands on a specified list of nodes in your cluster one by one, and the output from every node will be visible to the terminal. grun will wait for the command to finish running on one node before executing the command on the next node.

You can specify which nodes to run commands on in the following ways:

grun all '<command>' will execute the command on all nodes
grun <component_name> '<command>' will execute the command on nodes where the component you specified is running
grun <node_list> '<command>' will execute the commands on the nodes you specify in the node list

Example

grun all 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...
hello world

### Connecting to remote server 172.31.88.179 ...
hello world

### Connecting to remote server 172.31.91.208 ...
hello world

# Run 'echo "hello world"' on every node where GPE is running
grun gpe 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...
hello world

### Connecting to remote server 172.31.88.179 ...
hello world

### Connecting to remote server 172.31.91.208 ...
hello world

grun m1,m3 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...
hello world

### Connecting to remote server 172.31.91.208 ...
hello world

Run commands in parallel

$ grun_p <all|component_name|node_list> '<command>'

This command allows you to run commands on a specified list of nodes in your cluster in parallel, and the output will be visible to the terminal where the grun_p command was run. You can specify the nodes to run commands on in the following ways:

grun_p all '<command>' will execute the command on all nodes
grun_p <component_name> '<command>' will execute the command on nodes where the component you specified is running
grun_p <node_list> '<command>' will execute the commands on the nodes you specify in the node list. The list of nodes should be separated by a comma, e.g.: m1,m2

$ grun_p all 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

### ---- (m1)_172.31.91.54 ---0--
hello world

### ---- (m2)_172.31.88.179 ---0--
hello world

### ---- (m3)_172.31.91.208 ---0--
hello world

$ grun_p gpe 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

### ---- (m1)_172.31.91.54 ---0--
hello world

### ---- (m2)_172.31.88.179 ---0--
hello world

### ---- (m3)_172.31.91.208 ---0--
hello world

$ grun_p m1,m3 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.91.208 ...

### ---- (m1)_172.31.91.54 ---0--
hello world

### ---- (m3)_172.31.91.208 ---0--
hello world

Display cluster information

Show current node IP

$ gmyip

This command returns the private IP address of your current node.

Example:

$ gmyip
172.31.93.187 # Current node IP address

Show current node number and servers

$ ghostname

This command returns your current node number as well as all servers that are running on the current node.

Example

In this example, m1 is the current node number, and ADMIN#1, admin#1 etc. are all servers that are running on m1.

$ ghostname

m1 ADMIN#1 admin#1 CTRL#1 ctrl#1 DICT#1 dict#1 ETCD#1 etcd#1 EXE_1 exe_1 GPE_1#1 gpe_1#1 GSE_1#1 gse_1#1 GSQL#1 gsql#1 GUI#1 gui#1 IFM#1 ifm#1 KAFKA#1 kafka#1 KAFKACONN#1 kafkaconn#1 KAFKASTRM-LL_1 kafkastrm-ll_1 NGINX#1 nginx#1 RESTPP#1 restpp#1 TS3_1 ts3_1 TS3SERV#1 ts3serv#1 ZK#1 zk#1

Show deployment information

$ gssh

The gssh command, when used without arguments, outputs information about server deployments in your cluster. The output contains the names and IP addresses of every node. For each node, the output shows the full list of servers that are running on the node. For each server, the output shows the full list of the nodes that the server is running on.

Example

$ gssh

Usage: gssh m1|gpe_1#1|gse1_1#1|...
Usage: ----------------Available hosts--------------
Host *
    IdentityFile /home/tigergraph/.ssh/tigergraph_rsa
    Port 22

Host m1 ADMIN#1 admin#1 CTRL#1 ctrl#1 DICT#1 dict#1 ETCD#1 etcd#1 EXE_1 exe_1 GPE_1#1 gpe_1#1 GSE_1#1 gse_1#1 GSQL#1 gsql#1 GUI#1 gui#1 IFM#1 ifm#1 KAFKA#1 kafka#1 KAFKACONN#1 kafkaconn#1 KAFKASTRM-LL_1 kafkastrm-ll_1 NGINX#1 nginx#1 RESTPP#1 restpp#1 TS3_1 ts3_1 TS3SERV#1 ts3serv#1 ZK#1 zk#1
    HostName 172.31.91.54

Host m2 ADMIN#2 admin#2 CTRL#2 ctrl#2 DICT#2 dict#2 ETCD#2 etcd#2 EXE_2 exe_2 GPE_2#1 gpe_2#1 GSE_2#1 gse_2#1 GSQL#2 gsql#2 GUI#2 gui#2 IFM#2 ifm#2 KAFKA#2 kafka#2 KAFKACONN#2 kafkaconn#2 KAFKASTRM-LL_2 kafkastrm-ll_2 NGINX#2 nginx#2 RESTPP#2 restpp#2 TS3_2 ts3_2 ZK#2 zk#2
    HostName 172.31.88.179

Host m3 ADMIN#3 admin#3 CTRL#3 ctrl#3 DICT#3 dict#3 ETCD#3 etcd#3 EXE_3 exe_3 GPE_3#1 gpe_3#1 GSE_3#1 gse_3#1 GSQL#3 gsql#3 GUI#3 gui#3 IFM#3 ifm#3 KAFKA#3 kafka#3 KAFKACONN#3 kafkaconn#3 KAFKASTRM-LL_3 kafkastrm-ll_3 NGINX#3 nginx#3 RESTPP#3 restpp#3 TS3_3 ts3_3 ZK#3 zk#3
    HostName 172.31.91.208

#cluster.nodes: m1:172.31.91.54,m2:172.31.88.179,m3:172.31.91.208
#admin.servers: m1,m2,m3
#ctrl.servers: m1,m2,m3
#dict.servers: m1,m2,m3
#etcd.servers: m1,m2,m3
#exe.servers: m1,m2,m3
#gpe.servers: m1,m2,m3
#gse.servers: m1,m2,m3
#gsql.servers: m1,m2,m3
#gui.servers: m1,m2,m3
#ifm.servers: m1,m2,m3
#kafka.servers: m1,m2,m3
#kafkaconn.servers: m1,m2,m3
#kafkastrm-ll.servers: m1,m2,m3
#nginx.servers: m1,m2,m3
#restpp.servers: m1,m2,m3
#ts3.servers: m1,m2,m3
#ts3serv.servers: m1
#zk.servers: m1,m2,m3
#log.root: /home/tigergraph/tigergraph/log
#app.root: /home/tigergraph/tigergraph/app/3.1.1
#data.root: /home/tigergraph/tigergraph/data

Show graph status

$ gstatusgraph

This command returns the size of your data, the number of existing vertices and edges, as well as the number of deleted and skipped vertices on every node in your cluster. If you are running TigerGraph on a single node, it will return the same information that one node.

Single-node example

$ gstatusgraph
=== graph ===
[GRAPH  ] Graph was loaded (/home/tigergraph/tigergraph/data/gstore/0/part/):
[m1     ] Partition size: 437MiB, IDS size: 103MiB, Vertex count: 3181724, Edge count: 34512076, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[WARN   ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information

Cluster example

$ gstatusgraph
=== graph ===
[GRAPH  ] Graph was loaded (/home/tigergraph/tigergraph/data/gstore/0/part/):
[m1     ] Partition size: 246MiB, IDS size: 31MiB, Vertex count: 1152822, Edge count: 10908545, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[m2     ] Partition size: 248MiB, IDS size: 31MiB, Vertex count: 1157325, Edge count: 11004342, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[m3     ] Partition size: 225MiB, IDS size: 29MiB, Vertex count: 1049883, Edge count: 10009479, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[WARN   ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information

Managing with gadmin

Managing TigerGraph Servers with gadmin

Introduction

TigerGraph Graph Administrator (gadmin) is a tool for managing TigerGraph servers. It has a self-contained help function and a man page, whose output is shown below for reference. If you are unfamiliar with the TigerGraph servers, please see GET STARTED with TigerGraph.

To see a listing of all the options or commands available for gadmin, run any of the following commands:

$ gadmin -h
$ gadmin --help

After changing a configuration setting, it is generally necessary to run gadmin config apply. Some commands invoke config apply automatically. If you are not certain, just run gadmin config apply

List of commands

Below is the man page for gadmin. Most of the commands are self-explanatory. Common examples are provided with each command.

NOTE: Some commands have changed in v3.0. In particular, gadmin set <config | license> has changed to gadmin <config | license> set

GADMIN(1)                         User Commands                                GADMIN(1)

NAME
       gadmin - manual page for TigerGraph Administrator.

SYNOPSIS
       gadmin [flags]
       gadmin [command]

DESCRIPTION
       gadmin is a tool for managing TigerGraph servers

OPTIONS
       Available Commands:
         autocomplete Generate autocomplete script
         config       Manage the configuration for the TigerGraph system
         help         Help about any command
         init         Init the whole cluster or given service
         license      Manage TigerGraph license
         log          List log files of the given services or all services
         reset        Reset the whole init or given service with its data
         restart      Restart services by service id
         start        Start services by service id
         status       Show current status of service
         stop         Stop services by service id
         version      Show the version information

Flags:
      --debug   enable debug log output to stdout
  -h, --help    help for gadmin

Use "gadmin [command] --help" for more information about a command.
GADMIN(1)

gadmin autocomplete

Gadmin autocomplete is more of a feature than a command. It is an auto-complete feature that allows you to see all possible entries of a specific configuration. You can press tab when typing a command to either print out all possible entries, or auto-complete the entry you are currently typing.

$ gadmin autocomplete -h 

Generate autocomplete script

Usage:
  gadmin autocomplete <bash|zsh> [flags]

Description:
  If you want to make this automatic, add ". <(gadmin autocomplete bash)" to your
  .bashrc file.

Flags:
  -h, --help   help for autocomplete

Global Flags:
      --debug   enable debug log output to stdout

The example below shows an example of the autocomplete for the command gadmin status.

tigergraph@ubuntu:~$ gadmin status
admin         exe           ifm           nginx
all           gpe           infra         restpp
ctrl          gse           kafka         ts3
dict          gsql          kafkaconn     ts3serv
etcd          gui           kafkastrm-ll  zk

gadmin config

gadmin config commands are used to manage the configuration for the TigerGraph system. To get a complete list of configuration parameters that are available, see Configuration Parameters.

gadmin config has many sub-entries as well, they will be listed below.

$ gadmin config -h

Manage the configuration for the TigerGraph system

Usage:
  gadmin config [flags]
  gadmin config [command]

Description:
  You can specify local config file to turn on file mode, which no remote
  connection will be made and the configs are read/write from/to the file.Note: Not
  all config commands work on file mode.

Available Commands:
  apply       Apply the config changes in staging state
  diff        Show the differences between staging and applied configs
  discard     Discard the staging config changes
  dump        Dump the staging system config in json format
  entry       Configure the entries with given substring patterns interactively
  get         Get the config value of given entry name non-interactivly
  group       Configure the entries of given groups interactively
  init        Initialize configuration
  list        List the available config entries or groups
  set         Configure the entry of given config entry name in a non-interactive manner

Flags:
      --file string   specify config file path
  -h, --help          help for config

Global Flags:
      --debug   enable debug log output to stdout

Use "gadmin config [command] --help" for more information about a command.

Example: Change the retention size of the kafka queue to 10GB:

$ gadmin config set Kafka.RetentionSizeGB 10

gadmin config apply

$ gadmin config apply -h
Apply the config changes in staging state

Usage:
  gadmin config apply [flags]

Flags:
  -y, --confirm              confirm to apply
  -f, --force                force components to apply new config
  -h, --help                 help for apply
      --initial              config apply with the initial configuration when the remote config (ETCD) is empty
      --with-config string   the input config file used to config apply, will overwrite both local and remote(ETCD)

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config diff

Show what configuration changes were made.

$ gadmin config diff -h
Show the differences between staging and applied configs

Usage:
  gadmin config diff [flags]

Flags:
  -h, --help   help for diff

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config discard

Discard the configuration changes without applying them.

$ gadmin config discard -h
Discard the staging config changes

Usage:
  gadmin config discard [flags]

Flags:
  -h, --help   help for discard

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config dump

Display all configuration entries.

$ gadmin config dump -h
Dump the staging system config in json format

Usage:
  gadmin config dump [flags]

Flags:
  -h, --help   help for dump

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config entry

Change a configuration entry.

$ gadmin config entry -h
Configure the entries with given substring patterns interactively

Usage:
  gadmin config entry [EntryName] [flags]

Description:
  You may use `config entry system` to go through all the system related entries.

Flags:
  -a, --all     configure all entries
      --basic   configure basic entries only
  -h, --help    help for entry

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config get

Get the value of a specific configuration entry.

$ gadmin config get -h
Get the config value of given entry name non-interactivly

Usage:
  gadmin config get [EntryName] [flags]

Flags:
  -h, --help   help for get

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config group

Configure entries for a specific service group. e.g. KAFKA, GPE, ZK

$ gadmin config group -h
Configure the entries of given groups interactively

Usage:
  gadmin config group [GroupName] [flags]

Description:
  You may use `gadmin config list group` to see all the groups.

Flags:
  -h, --help   help for group

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config init

Initialize your configuration.

$ gadmin config init -h
Initialize configuration

Usage:
  gadmin config init [flags]

Flags:
      --all            display every configurable entry
      --expert         display node assignment entries
      --ha             enable HA for init
  -h, --help           help for init
  -i, --input string   provide an input file name and init the configuration silently with the provided input file
      --template       show the template for init initialization

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config list

List all configurable entries or entry groups.

$ gadmin config list -h
List the available config entries or groups

Usage:
  gadmin config list <group|entry> [flags]

Description:
  List prints out the available config groups or config entries, which can be used
  in entry/group commands.

Flags:
      --basic   list basic entries only
  -h, --help    help for list

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin config set

$ gadmin config set -h
Configure the entry of given config entry name in a non-interactive manner

Usage:
  gadmin config set [EntryName] [EntryValue] [flags]

Description:
  [EntryName] [EntryValue] must be provided in pairs, and use space to separate
  each pair.

Flags:
  -h, --help   help for set

Global Flags:
      --debug         enable debug log output to stdout
      --file string   specify config file path

gadmin init

$ gadmin init -h

Init the whole cluster or given service

Usage:
  gadmin init [flags]
  gadmin init [command]

Description:
  Init command initializes the cluster/kafka. When "cluster" is specified,
  a config path is required.

Available Commands:
  cluster     Init the whole cluster
  kafka       Init the KAFKA

Flags:
  -h, --help   help for init

Global Flags:
      --debug   enable debug log output to stdout

Use "gadmin init [command] --help" for more information about a command.

gadmin license

Options for configuring your license.

$ gadmin license -h

Manage TigerGraph license

Usage:
  gadmin license [flags]
  gadmin license [command]

Available Commands:
  seed        Collects host signature and generates seed file for issuing license
  set         Set new license
  status      Display license status and info

Flags:
  -h, --help   help for license

Global Flags:
      --debug   enable debug log output to stdout

Use "gadmin license [command] --help" for more information about a command.

Generating a license seed

To generate a license seed, use the following command:

$ gadmin license seed <host_signature_type>
# host_signature_type: [aws|azure|gcp|hardware|node-id]

Depending on your host machine, you need to choose the appropriate host signature type. If you are generating the seed from a cloud instance, choose the corresponding cloud provider as your signature type.

If you are generating the seed from your own machine, choose either node-id or hardware.

The hardware option tells gadmin to collect information from your machine's hardware as the host signature to generate the license seed. A signature produced by using this parameter will not be altered by software changes on the machine, including OS reinstalls. This is the usual choice.
node-id refers to the machine ID in the machine-id file located at /etc/machine-id and is a unique signature for the OS that identifies your machine. A reinstall of the OS may change the machine ID.

Applying a new license key

Example flow for applying a new license (which may be replacing an existing license key):

$ gadmin license set <new_license_key>
[   Info] Configuration has been changed. 
Please use 'gadmin config apply' to persist the changes.

$ gadmin config apply
[Warning] No difference from staging config, config apply is skipped.
[   Info] Successfully applied configuration change. Please restart services to make it effective immediately.

$ gadmin restart
[   Note] Restart the service(s)? (y/N)y
[   Info] Stopping DICT ADMIN GSE NGINX GPE RESTPP KAFKASTRM-LL KAFKACONN TS3SERV GSQL TS3 GUI
[   Info] Starting ZK ETCD DICT KAFKA ADMIN GSE NGINX GPE RESTPP KAFKASTRM-LL KAFKACONN TS3SERV GSQL TS3 GUI

Once the license has been set and config has been applied, you can run gadmin license status to view the details of your license, including the expiration date and time.

$ gadmin license status
[Warning] License will expire in 6 days

   Issuer: TigerGraph Inc.
 Audience: tigergraph user
IssueTime: 2020-06-12 17:45:10 +0000 UTC
  EndTime: 2020-06-30 17:45:10 +0000 UTC
  Edition: Enterprise

Host:
	MaxCPUCore: 1024
	MaxPhysicalMemorySize: 1073741824.00MB
	MaxClusterNodeNumber: 1024

Topology:
	MaxVertexNumber: 9.007199254740991e+15
	MaxEdgeNumber: 9.007199254740991e+15
	MaxGraphNumber: 1000
	MaxTopologySize: 8.00MB

RuntimeMemory:
	MaxUserResidentSetSize: 1073741824.00MB

gadmin log

The gadmin log command will reveal the location of all commonly checked log files for the TigerGraph system.

$ gadmin log -h

List log files of the given services or all services

Usage:
  gadmin log [service name...] [flags]

Description:
  Service name should be a valid TigerGraph service name, for example, GSE, RESTPP
  or GPE.

Flags:
  -h, --help   help for log

Global Flags:
      --debug   enable debug log output to stdout

$ gadmin log
ADMIN  : /home/tigergraph/tigergraph/log/admin/ADMIN#1.out
ADMIN  : /home/tigergraph/tigergraph/log/admin/ADMIN.INFO
CTRL   : /home/tigergraph/tigergraph/log/controller/CTRL#1.log
CTRL   : /home/tigergraph/tigergraph/log/controller/CTRL#1.out
DICT   : /home/tigergraph/tigergraph/log/dict/DICT#1.out
DICT   : /home/tigergraph/tigergraph/log/dict/DICT.INFO
ETCD   : /home/tigergraph/tigergraph/log/etcd/ETCD#1.out
EXE    : /home/tigergraph/tigergraph/log/executor/EXE_1.log
EXE    : /home/tigergraph/tigergraph/log/executor/EXE_1.out
GPE    : /home/tigergraph/tigergraph/log/gpe/GPE_1#1.out
GSE    : /home/tigergraph/tigergraph/log/gse/GSE_1#1.out
GSE    : /home/tigergraph/tigergraph/log/gse/log.INFO
GSQL   : /home/tigergraph/tigergraph/log/gsql/GSQL#1.out
GSQL   : /home/tigergraph/tigergraph/log/gsql/log.INFO
GUI    : /home/tigergraph/tigergraph/log/gui/GUI#1.out
IFM    : /home/tigergraph/tigergraph/log/informant/IFM#1.log
IFM    : /home/tigergraph/tigergraph/log/informant/IFM#1.out
KAFKA  : /home/tigergraph/tigergraph/log/kafka/controller.log
KAFKA  : /home/tigergraph/tigergraph/log/kafka/kafka-request.log
KAFKA  : /home/tigergraph/tigergraph/log/kafka/kafka.log
KAFKA  : /home/tigergraph/tigergraph/log/kafka/server.log
KAFKA  : /home/tigergraph/tigergraph/log/kafka/state-change.log
KAFKACONN: /home/tigergraph/tigergraph/log/kafkaconn/KAFKACONN#1.out
KAFKACONN: /home/tigergraph/tigergraph/log/kafkaconn/kafkaconn.log
KAFKASTRM-LL: /home/tigergraph/tigergraph/log/kafkastrm-ll/KAFKASTRM-LL_1.out
KAFKASTRM-LL: /home/tigergraph/tigergraph/log/kafkastrm-ll/kafkastrm-ll.log
NGINX  : /home/tigergraph/tigergraph/log/nginx/logs/NGINX#1.out
NGINX  : /home/tigergraph/tigergraph/log/nginx/logs/error.log
NGINX  : /home/tigergraph/tigergraph/log/nginx/logs/nginx.access.log
NGINX  : /home/tigergraph/tigergraph/log/nginx/logs/nginx.error.log
RESTPP : /home/tigergraph/tigergraph/log/restpp/RESTPP#1.out
RESTPP : /home/tigergraph/tigergraph/log/restpp/log.INFO
TS3    : /home/tigergraph/tigergraph/log/ts3/TS3_1.log
TS3    : /home/tigergraph/tigergraph/log/ts3/TS3_1.out
TS3SERV: /home/tigergraph/tigergraph/log/ts3serv/TS3SERV#1.out
ZK     : /home/tigergraph/tigergraph/log/zk/ZK#1.out
ZK     : /home/tigergraph/tigergraph/log/zk/zookeeper.log

gadmin reset

$ gadmin reset -h

Reset the whole init or given service with its data

Usage:
  gadmin reset [service name...] [flags]

Description:
  Service name should be a valid TigerGraph service name, for example, GSE, RESTPP
  or GPE.

Flags:
  -y, --confirm   confirm to reset service
  -h, --help      help for reset

Global Flags:
      --debug   enable debug log output to stdout

gadmin restart

The gadmin restart command is used to restart one, many, or all TigerGraph services. You will need to confirm the restarting of services by either entering y (yes) or n (no). To bypass this prompt, you can use the -y flag to force confirmation.

$ gadmin restart -h

Restart services by service id

Usage:
  gadmin restart [serviceID...] [flags]

Description:
  ServiceID should be [serviceName][_partition][#replica], e.g., GSE_1#3. Leave
  replica field empty(e.g. GSE_1) to either refer to all replicas of given
  partition, or if the service has no replicas(e.g. EXE_1). Same for parititons.

Flags:
  -y, --confirm   confirm to restart service
  -h, --help      help for restart
      --no-dep    restart service without dependency

Global Flags:
      --debug   enable debug log output to stdout

$ gadmin restart all -y
[   Info] Stopping ZK ETCD DICT KAFKA ADMIN GSE NGINX GPE RESTPP KAFKASTRM-LL KAFKACONN TS3SERV GSQL TS3 IFM GUI
[   Info] Stopping CTRL
[   Info] Stopping EXE
[   Info] Starting EXE
[   Info] Starting CTRL
[   Info] Starting ZK ETCD DICT KAFKA ADMIN GSE NGINX GPE RESTPP KAFKASTRM-LL KAFKACONN TS3SERV GSQL TS3 IFM GUI

gadmin start

The gadmin start command can be used to start one, many, or all services.

$ gadmin start -h

Start services by service id

Usage:
  gadmin start [serviceID...] [flags]

Description:
  ServiceID should be [serviceName][_partition][#replica], e.g., GSE_1#3. Leave
  replica field empty(e.g. GSE_1) to either refer to all replicas of given
  partition, or if the service has no replicas(e.g. EXE_1). Same for parititons.
  If no serviceID is specified, it only starts services excluding the
  infrastructure. Use 'gadmin start all' to start all services.

Flags:
      --dry-run              dry run and output command to start the service
  -h, --help                 help for start
      --no-dep               start service without dependency
      --with-config string   start with given config file and dump it to each node (only for executor)
      --ignore-error         starting services with ignore-error will start all possible services, and ensure that the platform can still start and run normally when some services or some nodes are down

Global Flags:
      --debug   enable debug log output to stdout

$ gadmin start all
[   Info] Starting EXE
[   Info] Starting CTRL
[   Info] Starting ZK ETCD DICT KAFKA ADMIN GSE NGINX GPE RESTPP KAFKASTRM-LL KAFKACONN TS3SERV GSQL TS3 IFM GUI

gadmin status

Check the status of TigerGraph component servers:

$ gadmin status -h

Show current status of service

Usage:
  gadmin status [serviceID...] [flags]

Description:
  ServiceID should be [serviceName][_partition][#replica], e.g., GSE_1#3. Leave
  replica field empty(e.g. GSE_1) to either refer to all replicas of given
  partition, or if the service has no replicas(e.g. EXE_1). Same for parititons.
  If no serviceID is specified, it will show all service status

Flags:
  -h, --help      help for status
  -v, --verbose   report service status in detail

Global Flags:
      --debug   enable debug log output to stdout

Use gadmin status to report whether each of the main component servers is running (up) or stopped (off). The example below shows the normal status when the graph store is empty and a graph schema has not been defined:

$ gadmin status

+--------------------+-------------------------+-------------------------+
|    Service Name    |     Service Status      |      Process State      |
+--------------------+-------------------------+-------------------------+
|       ADMIN        |         Online          |         Running         |
|        CTRL        |         Online          |         Running         |
|        DICT        |         Online          |         Running         |
|        ETCD        |         Online          |         Running         |
|        GPE         |         Online          |         Running         |
|        GSE         |         Online          |         Running         |
|        GSQL        |         Online          |         Running         |
|        GUI         |         Online          |         Running         |
|        IFM         |         Online          |         Running         |
|       KAFKA        |         Online          |         Running         |
|     KAFKACONN      |         Online          |         Running         |
|    KAFKASTRM-LL    |         Online          |         Running         |
|       NGINX        |         Online          |         Running         |
|       RESTPP       |         Online          |         Running         |
|        TS3         |         Online          |         Running         |
|      TS3SERV       |         Online          |         Running         |
|         ZK         |         Online          |         Running         |
+--------------------+-------------------------+-------------------------+

You can also check the status of each instance using the verbose flag : gadmin status -v or gadmin status --verbose. This will show each machine's status. See example below

$ gadmin status -v GPE

+--------------------+-------------------------+-------------------------+
|    Service Name    |     Service Status      |      Process State      |
+--------------------+-------------------------+-------------------------+
|      GPE_1#1       |         Warmup          |         Running         |
|      GPE_1#2       |         Warmup          |         Running         |
+--------------------+-------------------------+-------------------------+

Here are the most common service and process status states you might see from running the gadmin status command :

Service Status Definitions

Online - The service is online and ready.
Warmup - The service is processing the graph information and will be online soon.
Stopping - The service has received a stop command and will be down soon.
Offline - The service is not available.
Down - The service has been stopped or crashed.
StatusUnknown - The valid status of the service is not tracked.

Process State Status Definitions

Init - Process is initializing and will be in the running state soon.
Running - The process is running and available.
Zombie - There is a leftover process from a previous instance.
Stopped - The process has been stopped or crashed.
StatusUnknown - The valid status of the process is not tracked.

gadmin stop

The gadmin stop command can be used to stop one, many, or all TigerGraph services. You will need to confirm the restarting of services by either entering y (yes) or n (no). To bypass this prompt, you can use the -y flag to force confirmation.

$ gadmin stop -h

Stop services by service id

Usage:
  gadmin stop [serviceID...] [flags]

Description:
  ServiceID should be [serviceName][_partition][#replica], e.g., GSE_1#3. Leave
  replica field empty(e.g. GSE_1) to either refer to all replicas of given
  partition, or if the service has no replicas(e.g. EXE_1). Same for parititons.
  If no serviceID is specified, it only stops services excluding the
  infrastructure. Use 'gadmin stop all' to stop all services.

Flags:
  -y, --confirm   confirm to stop service
  -h, --help      help for stop
  --ignore-error  stopping services with ignore-error will stop all possible services, and ensure that the platform can still stop and run normally when some services or some nodes are down

Global Flags:
      --debug   enable debug log output to stdout

$ gadmin stop gsql
[   Note] Stop the service(s)? (y/N)y
[   Info] Stopping GSQL

gadmin version

$ gadmin version -h

Show the version information

Usage:
  gadmin version [flags]

Description:
  Show version information of all TigerGraph components, including repo name,
  version, git commit number, git commit datetime.

Flags:
  -h, --help   help for version

Global Flags:
      --debug   enable debug log output to stdout

More examples

Configuring memory use thresholds

TigerGraph offers two levels of memory thresholds using the following configuration settings:

SysAlertFreePct and SysMinFreePct

SysAlertFreePct setting indicates that the memory usage has crossed a threshold where the system will start throttling Queries to allow long-running queries to finish and release the memory.

SysMinFreePct setting indicates that the memory usage has crossed a critical threshold and the Queries will start aborting automatically to prevent GPE crash and system stability.

By default, SysMinFreePct is set at 10%, at which point Queries will be aborted.

$ gadmin config entry GPE.BasicConfig.Env

GPE.BasicConfig.Env [ LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; ]: The runtime environment variables, separated by ';'
✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH;

Add this line to the existing config :
SysMinFreePct=20;SysAlertFreePct=70;

Your config line should now look like this :

GPE.BasicConfig.Env [ LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; ]: The runtime environment variables, separated by ';'
✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH;SysMinFreePct=20;SysAlertFreePct=30;

$ gadmin restart gpe -y

Example:

SysAlertFreePct=30 means when the system memory consumption is over 70% of the memory, the system will enter an alert state, and Graph updates will start to slow down.

SysMinFreePct=20 means 20% of the memory is required to be free. When memory consumption enters a critical state (over 80% memory consumption) queries will be aborted. automatically.

Configuring Nginx configuration template

Follow the steps documented in this support article to update the Nginx configurations of your TigerGraph instance.

Advanced Platform-layer Commands

Users are advised to use these commands only at the guidance and recommendation of TigerGraph support.

Connection between nodes

Connect to another node via SSH

$ gssh <node_name>

This command allows you to connect to another node in your cluster via SSH.

Example:

# Originally on m1
[tigergraph@ip-172-31-88-111 ~]$ gssh m3
Last login: Fri Apr 23 18:24:27 2021
# Now connected to m3 via ssh
[[tigergraph@ip-172-31-93-187 ~]$

Loading data

Offline loading

$ gautoloading.sh (-g <graph_name> -j <loading_job_name> | path_to_config_file)

Offline loading deletes all existing graph data before it starts. Back up your data before using offline loading.

Options

-g <graph_name>: Name of the graph whose loading job to run
-j <loading_job_name>: Name of the loading job to run

Example

The following command runs the loading job load_ldbc_snb on the graph ldbc_snb:

$ gautoloading.sh -g ldbc_snb -j load_ldbc_snb

You can also provide the graph name and the loading job name with a config file written in Bash:

~/example_config

# the name of the graph for the initial loading
GRAPH_NAME="tpc_graph"

# the name of loading jobs separated by white space
LOADING_JOBS=("load_test")

Once you have the config file, you can run gautoloading.sh with the config file instead of the -g and -j options:

$ gautoloading.sh ~/example_config

File operations

Copy files on the specified nodes

$ gscp <all|component_name|node_list> <source_path> <target_dir>

You can specify the nodes where you want the copy operation to occur in the following ways:

gscp all <source_path> <target_dir> will execute the command on all nodes
gscp <component_name> <source_path> <target_dir> will execute the command on nodes where the component you specified is running
gscp <node_list> <source_path> <target_dir> will execute the command on the nodes you specify in the node list

Example

$ gscp all /tmp/gscp_test /tmp/gscp_test_folder

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

// A copy of gscp_test is on every node
$ grun all 'ls /tmp/gscp_text_folder'

### Connecting to local  server 172.31.91.54 ...
gscp_test

### Connecting to remote server 172.31.88.179 ...
gscp_test

### Connecting to remote server 172.31.91.208 ...
gscp_test

// Copy file to the target folder only on nodes where GPE is running
$ gscp gpe /tmp/gscp_test1 /tmp/gscp_test_folder

// Copy file to a specified list of nodes
$ gscp m1,m2 /tmp/gscp_test3 /tmp/gscp_test_folder

$ gscp all "/tmp/gscp_test1 /tmp/gscp_test2" /tmp/gscp_test_folder

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

// Copies of both files are on every node
$ grun all 'ls /tmp/gscp_text_folder'

### Connecting to local  server 172.31.91.54 ...
gscp_test1 gscp_test2

### Connecting to remote server 172.31.88.179 ...
gscp_test1 gscp_test2

### Connecting to remote server 172.31.91.208 ...
gscp_test1 gscp_test2

Download file from another node

$ gfetch <all|component_name|node_list> <source_path> <target_dir>

This command downloads a file or directory from every specified node to the target directory on the current node.

Example

$ gfetch all ~/test.txt ~/test_folder

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...
scp: /home/tigergraph/test.txt: No such file or directory

// Nothing is downloaded if the file does not exist on a node
$ ls ~/test_folder
test.txt_m1  test.txt_m2

Run commands on multiple nodes

Run commands sequentially

$ grun <all|component_name|node_list> '<command>'

You can specify which nodes to run commands on in the following ways:

grun all '<command>' will execute the command on all nodes
grun <component_name> '<command>' will execute the command on nodes where the component you specified is running
grun <node_list> '<command>' will execute the commands on the nodes you specify in the node list

Example

grun all 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...
hello world

### Connecting to remote server 172.31.88.179 ...
hello world

### Connecting to remote server 172.31.91.208 ...
hello world

# Run 'echo "hello world"' on every node where GPE is running
grun gpe 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...
hello world

### Connecting to remote server 172.31.88.179 ...
hello world

### Connecting to remote server 172.31.91.208 ...
hello world

grun m1,m3 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...
hello world

### Connecting to remote server 172.31.91.208 ...
hello world

Run commands in parallel

$ grun_p <all|component_name|node_list> '<command>'

grun_p all '<command>' will execute the command on all nodes
grun_p <component_name> '<command>' will execute the command on nodes where the component you specified is running
grun_p <node_list> '<command>' will execute the commands on the nodes you specify in the node list. The list of nodes should be separated by a comma, e.g.: m1,m2

$ grun_p all 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

### ---- (m1)_172.31.91.54 ---0--
hello world

### ---- (m2)_172.31.88.179 ---0--
hello world

### ---- (m3)_172.31.91.208 ---0--
hello world

$ grun_p gpe 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.88.179 ...

### Connecting to remote server 172.31.91.208 ...

### ---- (m1)_172.31.91.54 ---0--
hello world

### ---- (m2)_172.31.88.179 ---0--
hello world

### ---- (m3)_172.31.91.208 ---0--
hello world

$ grun_p m1,m3 'echo "hello world"'

### Connecting to local  server 172.31.91.54 ...

### Connecting to remote server 172.31.91.208 ...

### ---- (m1)_172.31.91.54 ---0--
hello world

### ---- (m3)_172.31.91.208 ---0--
hello world

Display cluster information

Show current node IP

$ gmyip

This command returns the private IP address of your current node.

Example:

$ gmyip
172.31.93.187 # Current node IP address

Show current node number and servers

$ ghostname

This command returns your current node number as well as all servers that are running on the current node.

Example

In this example, m1 is the current node number, and ADMIN#1, admin#1 etc. are all servers that are running on m1.

$ ghostname

m1 ADMIN#1 admin#1 CTRL#1 ctrl#1 DICT#1 dict#1 ETCD#1 etcd#1 EXE_1 exe_1 GPE_1#1 gpe_1#1 GSE_1#1 gse_1#1 GSQL#1 gsql#1 GUI#1 gui#1 IFM#1 ifm#1 KAFKA#1 kafka#1 KAFKACONN#1 kafkaconn#1 KAFKASTRM-LL_1 kafkastrm-ll_1 NGINX#1 nginx#1 RESTPP#1 restpp#1 TS3_1 ts3_1 TS3SERV#1 ts3serv#1 ZK#1 zk#1

Show deployment information

$ gssh

Example

$ gssh

Usage: gssh m1|gpe_1#1|gse1_1#1|...
Usage: ----------------Available hosts--------------
Host *
    IdentityFile /home/tigergraph/.ssh/tigergraph_rsa
    Port 22

Host m1 ADMIN#1 admin#1 CTRL#1 ctrl#1 DICT#1 dict#1 ETCD#1 etcd#1 EXE_1 exe_1 GPE_1#1 gpe_1#1 GSE_1#1 gse_1#1 GSQL#1 gsql#1 GUI#1 gui#1 IFM#1 ifm#1 KAFKA#1 kafka#1 KAFKACONN#1 kafkaconn#1 KAFKASTRM-LL_1 kafkastrm-ll_1 NGINX#1 nginx#1 RESTPP#1 restpp#1 TS3_1 ts3_1 TS3SERV#1 ts3serv#1 ZK#1 zk#1
    HostName 172.31.91.54

Host m2 ADMIN#2 admin#2 CTRL#2 ctrl#2 DICT#2 dict#2 ETCD#2 etcd#2 EXE_2 exe_2 GPE_2#1 gpe_2#1 GSE_2#1 gse_2#1 GSQL#2 gsql#2 GUI#2 gui#2 IFM#2 ifm#2 KAFKA#2 kafka#2 KAFKACONN#2 kafkaconn#2 KAFKASTRM-LL_2 kafkastrm-ll_2 NGINX#2 nginx#2 RESTPP#2 restpp#2 TS3_2 ts3_2 ZK#2 zk#2
    HostName 172.31.88.179

Host m3 ADMIN#3 admin#3 CTRL#3 ctrl#3 DICT#3 dict#3 ETCD#3 etcd#3 EXE_3 exe_3 GPE_3#1 gpe_3#1 GSE_3#1 gse_3#1 GSQL#3 gsql#3 GUI#3 gui#3 IFM#3 ifm#3 KAFKA#3 kafka#3 KAFKACONN#3 kafkaconn#3 KAFKASTRM-LL_3 kafkastrm-ll_3 NGINX#3 nginx#3 RESTPP#3 restpp#3 TS3_3 ts3_3 ZK#3 zk#3
    HostName 172.31.91.208

#cluster.nodes: m1:172.31.91.54,m2:172.31.88.179,m3:172.31.91.208
#admin.servers: m1,m2,m3
#ctrl.servers: m1,m2,m3
#dict.servers: m1,m2,m3
#etcd.servers: m1,m2,m3
#exe.servers: m1,m2,m3
#gpe.servers: m1,m2,m3
#gse.servers: m1,m2,m3
#gsql.servers: m1,m2,m3
#gui.servers: m1,m2,m3
#ifm.servers: m1,m2,m3
#kafka.servers: m1,m2,m3
#kafkaconn.servers: m1,m2,m3
#kafkastrm-ll.servers: m1,m2,m3
#nginx.servers: m1,m2,m3
#restpp.servers: m1,m2,m3
#ts3.servers: m1,m2,m3
#ts3serv.servers: m1
#zk.servers: m1,m2,m3
#log.root: /home/tigergraph/tigergraph/log
#app.root: /home/tigergraph/tigergraph/app/3.1.1
#data.root: /home/tigergraph/tigergraph/data

Show graph status

$ gstatusgraph

Single-node example

$ gstatusgraph
=== graph ===
[GRAPH  ] Graph was loaded (/home/tigergraph/tigergraph/data/gstore/0/part/):
[m1     ] Partition size: 437MiB, IDS size: 103MiB, Vertex count: 3181724, Edge count: 34512076, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[WARN   ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information

Cluster example

$ gstatusgraph
=== graph ===
[GRAPH  ] Graph was loaded (/home/tigergraph/tigergraph/data/gstore/0/part/):
[m1     ] Partition size: 246MiB, IDS size: 31MiB, Vertex count: 1152822, Edge count: 10908545, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[m2     ] Partition size: 248MiB, IDS size: 31MiB, Vertex count: 1157325, Edge count: 11004342, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[m3     ] Partition size: 225MiB, IDS size: 29MiB, Vertex count: 1049883, Edge count: 10009479, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[WARN   ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information

Database Import/Export

Export/Import is a complement to Backup/Restore, not a substitute.

To import an exported database, ensure that the export files are from a database that was running the exact same version of TigerGraph as the database that you are importing into.

Known Issues (Updated Feb 16th):

User-defined loading jobs containingDELETE statements are not exported correctly.
If a graph contains vertex or edge types with a composite key, the graph data is exported in a nonstandard format that cannot be reimported.

EXPORT GRAPH ALL

Required privilege

EXPORT_GRAPH

Synopsis

EXPORT GRAPH ALL [<export_options>] TO "<directory_name>"

exportOptions ::=
(-S | --SCHEMA | -T | --TEMPLATE | -D | --DATA | -U | --USERS | -P | --PASSWORD pwd)

    -S, --SCHEMA        Only Schema will be exported
    -T, --TEMPLATE      Only Schema, Queries, Loading Jobs, UDFs
    -D, --DATA          Only Data Sources will be exported
    -U, --USERS         Includes Permissions, Secrets, and Tokens
    -P, --PASSWORD      Encrypt with password. User will be prompted

The export directory should be empty before running the command because all contents are zipped and compressed.

Parameters

Options

Output

The EXPORT GRAPH command exports all graphs in the database.

The export contains four categories of files:

Data files in CSV format, one file for each type of vertex and each type of edge.
GSQL DDL command files created by the export command. The import command uses these files to recreate the graph schema(s) and reload the data.
Copies of the database's queries, loading jobs, and user-defined functions.
GSQL command files used to recreate the users and their privileges.

The following files are created in the specified directory when exporting and are then zipped into a single file called ExportedGraphs.zip.

If the file is password-protected, it can only be unzipped using GSQL IMPORT. The security feature prevents users from directly unzipping it.

A DBImportExport_<graphName>.gsql for each graph called <graphName> in a multigraph database that contains a series of GSQL DDL statements that do the following:
- Create the exported graph, along with its local vertex, edge, and tuple types,
- Create the loading jobs from the exported graphs
- Create data source file objects
- Create queries
A graph_<graphName>/ folder for each graph in a multigraph database containing data for local vertex/edge types in <graphName>. For each vertex or edge type called <type>, there is one of the following two data files:
- vertex_<type>.csv
- edge_<type>.csv
global.gsql - DDL job to create all global vertex and edge types, and data sources.
tuple.gsql - DDL job to create all User Defined Tuples.
Exported data and jobs used to restore the data:
- GlobalTypes/ - folder containing data for global vertex/edge types
  - vertex_name.csv
  - edge_name.csv
- run_loading_jobs.gsql - DDL created by the export command which will be used during import:
  - Temporary global schema change job to add user-defined indexes. This schema job is dropped after it is has run.
  - Loading jobs to load data for global and local vertex/edges.
Database's saved queries, loading jobs, and schema change jobs
- SchemaChangeJob/ - folder containing DDL for schema change jobs. See section "Schema Change Jobs" for more information
  - Global_Schema_Change_Jobs.gsql contains all global schema change jobs
  - graphName_Schema_Change_Jobs.gsql contains schema change jobs for each graph "graphName"
  Tokenbank.cpp - copy of <tigergraph.root.dir>/app/<VERSION_NUM>/dev/gdk/gsql/src/TokenBank/TokenBank.cpp
- ExprFunctions.hpp - copy of <tigergraph.root.dir>/app/<VERSION_NUM>dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp
- ExprUtil.hpp - copy of <tigergraph.root.dir>/app/<VERSION_NUM>/dev/gdk/gsql/src/QueryUdf/ExprUtil.hpp
Users:
- users.gsql - DDL to create all exported users and import Secrets and Tokens, and grant permissions.

Example

GSQL > EXPORT GRAPH ALL TO "/tmp/export_graphs/"

Insufficient Disk Space

Default Timeout and Session Parameter export_timeout

The timeout limit is controlled by the session parameter export_timeout. The default timeout is ~138 hours. To change the timeout limit, use the command:

GSQL > SET EXPORT_TIMEOUT = <timeout_in_ms>

IMPORT GRAPH ALL

The IMPORT GRAPH ALL command unzips the .zip file ExportedGraph.zip located in the designated folder, unzips it, and then runs the GSQL command files within to import the graph.

Required privilege

WRITE_SCHEMA, WRITE_QUERY, WRITE_LOADINGJOB, EXECUTE_LOADINGJOB, DROP ALL, WRITE_USERS

Synopsis

IMPORT GRAPH ALL [import_options] FROM "<filename>"

importOptions ::= [-P | --PASSWORD ] [ (-KU | -- keep-users]
    -P,  --PASSWORD     Decrypt with password. User will be prompted.
    -KU, --KEEP-USERS   Do not delete user identities before importing

Parameters

Options

Example

GSQL > IMPORT GRAPH ALL FROM "/tmp/export_graphs/"

IMPORT GRAPH looks for specific filenames. If either the zip file or any of its contents are renamed by the user, IMPORT GRAPH may fail.

Loading Jobs

There are two sets of loading jobs:

Those that were in the catalog of the database which was exported. These are embedded in the file DBImportExport_graphName.gsql
Those that are created by EXPORT GRAPH and are used to assist with the import process. These are embedded in the file run_loading_jobs,gsql.

The catalog loading jobs are not needed to restore the data. They are included for archival purposes.

Some special rules apply to importing loading jobs. Some catalog loading jobs will not be imported.

If a catalog loading job contains DEFINE FILENAME F = "/path/to/file/", the path will be removed and the imported loading job will only contain DEFINE FILENAME F. This is to allow a loading job to still be imported even though the file may no longer exist or the path may be different due to moving to another TigerGraph instance.
If a specific file path is used directly in the LOAD statement, and the file cannot be found, the loading job cannot be created and will be skipped. For example, LOAD "/path/to/file" to vertex v1 cannot be created if /path/to/file does not exist.
Any file path using $sys.data_root will be skipped. This is because the value of $sys.data_root is not retained from export. During import, $sys.data_root is set to the root folder of the import location.

Schema Change Jobs

There are two sets of schema change jobs:

Those that were in the catalog of the database which was exported. These are stored in the folder /SchemaChangeJobs.
Those that were created by EXPORT GRAPH and are used to assist with the import process. These are in the run_loading_jobs.gsql command file. The jobs are dropped after the import command is finished with them.

Global_Schema_Change_Jobs.gsql contains all global schema change jobs
<graphName>_Schema_Change_Jobs.gsql contains schema change jobs for each graph <graphName>.

Cluster import/export

In the current version, importing and exporting clusters is not fully automated. The database can be exported and imported by following some additional steps.

Export from a Cluster

Rather than creating a single export zip file, export will create a file for each machine. Before exporting, specific folders must be created on each server using the following commands:

$ grun all "mkdir -p /path/to/export_directory/GlobalTypes/"
$ grun all "mkdir -p /path/to/export_directory/graph_<graphName>/"

Import to a Cluster

1. Place the files on the import servers

2. Manually modify the loading jobs

On the main server, edit the run_loading_jobs.gsql files as follows.

See the example below:

LOAD "$sys.data_root/graph_graph1/localE.csv"
#If running on a cluster, check that the file exists on all nodes then uncomment the line below and comment the line above.
#LOAD "all:$sys.data_root/graph_graph1/localE.csv"
    TO EDGE localE VALUES ($"from", $"to") USING SEPARATOR = "^]", HEADER = "true";

Comment out the LOAD line and uncomment the LOAD all: line. Be sure that you do this for all data source files.

3. Run the IMPORT GRAPH command from the main server (e.g., the one that corresponds to the server where EXPORT GRAPH was run).

Backup and Restore

GBAR - Graph Backup and Restore

Graph Backup And Restore (GBAR), is an integrated tool for backing up and restoring the data and data dictionary (schema, loading jobs, and queries) of a TigerGraph instance or cluster.

Syntax

Synopsis

Usage: gbar backup [options] -t <backup_tag>
		      gbar restore [options] <backup_tag>
		      gbar list [backup_tag] [-j]
		      gbar remove|rm <backup_tag>
		      gbar cleanup
		      gbar expand [-a] <new_nodes>
		           New nodes must be written in <name>:<host> pairs separated by comma
		           Example:
		               m1:192.168.1.2,m2:192.168.1.3,m3:192.168.1.4

		Options:
		 -h, --help     	Show this help message and exit
		 -v             	Run with debug info dumped
		 -vv            	Run with verbose debug info dumped
		 -y             	Run without prompt
		 -j            		Print gbar list as JSON
		 -t BACKUP_TAG  	Tag for backup file, required on backup
		 -a, --advanced    Enable advanced mode for node expansion

The -y option forces GBAR to skip interactive prompt questions by selecting the default answer. There is currently one interactive question:

At the start of restore, GBAR will always ask if it is okay to stop and reset the TigerGraph services: (y/N)? The default answer is yes.

Configure GBAR

Before using the backup or the restore feature, GBAR must be configured.

Run gadmin config entry system.backup. At each prompt, enter the appropriate values for each config parameter.

$ gadmin config entry system.backup
 
System.Backup.TimeoutSec [ 18000 ]: The backup timeout in seconds
New: 18000
 
System.Backup.CompressProcessNumber [ 8 ]: The number of concurrent process for compression during backup
New: 8
 
System.Backup.Local.Enable [ true ]: Backup data to local path
New: true
 
System.Backup.Local.Path [ /tmp/backup ]: The path to store the backup files
New: /data/backup
 
System.Backup.S3.Enable [ false ]: Backup data to S3 path
New: false
 
System.Backup.S3.AWSAccessKeyID [ <masked> ]: The path to store the backup files
New:
 
System.Backup.S3.AWSSecretAccessKey [ <masked> ]: The path to store the backup files
New:
 
System.Backup.S3.BucketName [  ]: The path to store the backup files
New:

After entering the configuration values, run the following command to apply the new configurations
```
gadmin config apply -y
```

Note:

You can specify the number of parallel processes for backup and restore.
You must provide username and password using GSQL_USERNAME and GSQL_PASSWORD environment variables.
```
$ GSQL_USERNAME=tigergraph GSQL_PASSWORD=tigergraph gbar backup -t daily
```

Perform a backup

To perform a backup, run the following command as the TigerGraph Linux user:

gbar backup -t <backup_tag>

If System.Backup.Local.Enable is set to true, the folder is a local folder on every node in a cluster, to avoid massive data moving across nodes in a cluster.
If System.Backup.S3.Enable is set to true, every node will upload data located on the node to the s3 repository. Therefore, every node in a cluster needs access to Amazon S3.

Backup does not save input message queues for REST++ or Kafka.

List Backup Files

gbar list

This command lists all generated backup files in the storage place configured by the user. For each file, it shows the file’s full tag, its size in human-readable format, and its creation time.

Restore from a backup archive

Before restoring a backup, you should ensure that the backup you are restoring from is in the same exact version as your current version of TigerGraph.

To restore a backup, run the following command:

gbar restore <archive_name>

Note: GBAR restore does not estimate the uncompressed data size and check whether there is sufficient disk space.

Restore needs enough free space to accommodate both the old gstore and the gstore to be restored.

Remove a backup

To remove a backup, run the gbar remove command:

$ gbar remove <backup_tag>

The command removes a backup from the backup storage path. To retrieve the tag of a backup, you can use the gbar list command.

Clean up temporary files

Run gbar cleanup to delete the temporary files created during backup or restore operations:

$ gbar cleanup

GBAR Detailed Example

Single instance with 32 vCPU + 244GB memory + 2TB HDD.

Naturally, backup and restore time will vary depending on the hardware used.

GBAR Backup Operational Details

To run a daily backup, we tell GBAR to backup with the tag name daily.

$ gbar backup -t daily
[23:21:46] Retrieve TigerGraph system configuration
[23:21:51] Start workgroup
[23:21:59] Snapshot GPE/GSE data
[23:33:50] Snapshot DICT data
[23:33:50] Calc checksum
[23:37:19] Compress backup data
[23:46:43] Pack backup data
[23:53:18] Put archive daily-20180607232159 to repo-local
[23:53:19] Terminate workgroup
Backup to daily-20180607232159 finished in 31m33s.

The total backup process took about 31 minutes, and the generated archive is about 49 GB. Dumping the GPE + GSE data to disk took 12 minutes. Compressing the files took another 20 minutes.

GBAR Restore Operational Details

$ gbar restore daily-20180607232159
[23:57:06] Retrieve TigerGraph system configuration
GBAR restore needs to reset TigerGraph system.
Do you want to continue?(y/N):y
[23:57:13] Start workgroup
[23:57:22] Pull archive daily-20180607232159, round #1
[23:57:57] Pull archive daily-20180607232159, round #2
[00:01:00] Pull archive daily-20180607232159, round #3
[00:01:00] Unpack cluster data
[00:06:39] Decompress backup data
[00:17:32] Verify checksum
[00:18:30] gadmin stop gpe gse
[00:18:36] Snapshot DICT data
[00:18:36] Restore cluster data
[00:18:36] Restore DICT data
[00:18:36] gadmin reset
[00:19:16] gadmin start
[00:19:41] reinstall GSQL queries
[00:19:42] recompiling loading jobs
[00:20:01] Terminate workgroup
Restore from daily-20180607232159 finished in 22m55s.
Old gstore data saved under /home/tigergraph/tigergraph/gstore with suffix -20180608001836, you need to remove them manually.

For our test, GBAR restore took about 23 minutes. Most of the time (20 minutes) was spent decompressing the backup archive.