Cluster Scale-Out
Adding machines to a TigerGraph cluster, for distributed data and/or HA
Version 2.2 - 2.3 Copyright © 2019 TigerGraph. All Rights Reserved.
Introduction
Cluster expansion allows the user to add new machine nodes to an existing cluster and to redistribute data, while the entire system is offline.
Prerequisites
The current TigerGraph system must be installed in cluster mode, not single-node mode.
The total graph data storage space for the expanded cluster should be at least 3 times as large as the current Gstore disk usage.
During the expansion process, a backup copy of all the graph data files is created, plus additional working space is needed.
To check your existing gstore disk space:
The new nodes are available.
Configure GBAR
Cluster Expansion Workflow
The GBAR utility is used for cluster expansion. If this is your first time using GBAR, you must first run gbar config
. See the Backup and Restore guide. For a large system one of the key parameters is backup_core_timeout
. The default value is 5 hours. The config script gives guidance on estimating an appropriate value.
Set Up Environment in New Nodes
From the command line, switch to the <tigergraph_root_dir>/pkg_pool/syspre_pkg
directory under the TigerGraph root directory (~/tigergraph/pkg_pool/syspre_pkg
by default). In this directory, a utility script set_syspre.sh
is used to setup environment:
Run ./set_syspre.sh -h
to see the usage:
For example, to set up the environment on a new node 192.168.1.6
with sudo user called ubuntu
and login key ubuntu_rsa
, run the following command:
Firewall check
The firewall configuration on new node must be the same as that on existing nodes. Otherwise, the TigerGraph instances on new nodes may not work properly.
For users using TigerGraph 2.2 with Ubuntu, you must comment out the following block at the beginning of .bashrc
in the tigergraph user's home directory, on every node.
When done, the environment including system-prerequisites and ssh keys for the TigerGraph system will be set up on the new nodes.
Add New Nodes to Cluster
To expand the cluster, run gbar expand
with a list of new nodes in the following format:
For example, the following command adds two nodes to the cluster:
The command above will redistribute the data on all nodes including m6 and m7, so that each node has about the same amount of data.
GBAR will run the following checks for each new node:
The number of new nodes must be an integer multiple of max(gpe.replicas, gse.replicas).
Each new node alias must be a valid identifier.
Each new node's IP address must be accessible via ssh from the node where
gbar expand
is being run.
Error Handling
If the system does not have a schema or data, it will report a data integrity check error. You may ignore this warning.
Advanced Expansion Mode
Advanced expansion configuration options are possible. Contact TigerGraph Support for guidance.
Should any errors occur, GBAR will roll back to the state before node expansion started. As a failsafe, a backup copy of the data is kept, until expansion either succeeds or finishes rollback.
Last updated