Cluster Expansion

Last updated 2 months ago

Adding machines to a cluster

Version 2.2 Copyright © 2018 TigerGraph. All Rights Reserved.

Cluster expansion allows the user to add new machine nodes to an existing cluster and to redistribute data, while the entire system is offline.

Prerequisites

  1. The current TigerGraph system must be installed in cluster mode, not single-node mode.

  2. The total graph data storage space for the expanded cluster should be at least 3 times as large as the current Gstore disk usage.

    1. During the expansion process, a backup copy of all the graph data files is created, plus additional working space is needed.

    2. To check your existing gstore disk space:

  3. The new nodes are available.

Cluster Expansion Workflow

Configure GBAR

The GBAR utility is used for cluster expansion. If this is your first time using GBAR, you must first run gbar config. See the Backup and Restore guide. For a large system one of the key parameters is backup_core_timeout. The default value is 5 hours. The config script gives guidance on estimating an appropriate value.

Set Up Environment in New Nodes

From the command line, switch to the <tigergraph_root_dir>/pkg_pool/syspre_pkg directory under the TigerGraph root directory (~/tigergraph/pkg_pool/syspre_pkg by default). In this directory, a utility script set_syspre.sh is used to setup environment:

Run ./set_syspre.sh -h to see the usage:

./set_syspre.sh -h
Usage:
./set_syspre.sh -i <IP address/host name> -u <sudo user> (-P <password> | -K <ssh key>) [-p <tigergraph user password>]
./set_syspre.sh -h
Options:
-h -- show the help
-i -- the IP address of the new machine
-u -- sudo user [default: $USER]
-P -- sudo user password [default: empty]
-K -- sudo user ssh key [default: empty]
-p -- tigergraph user password [default: tigergraph]
[NOTE ]: This script must be run under tigergraph user.

For example, to set up the environment on a new node 192.168.1.6 with sudo user called ubuntu and login key ubuntu_rsa, run the following command:

Set Environment in New Nodes
./set_syspre.sh -i 192.168.1.6 -u ubuntu -K ~/.ssh/ubuntu_rsa

When done, the environment including system-prerequisites and ssh keys for the TigerGraph system will be set up on the new nodes.

For users using TigerGraph 2.2 with Ubuntu, you must comment out the following block at the beginning of .bashrc in the tigergraph user's home directory, on every node.

# If not running interactively, don't do anything
case $- in
*i*) ;;
*) return;;
esac

Add New Nodes to Cluster

To expand the cluster, run gbar expand with a list of new nodes in the following format:

gbar expand <node_alias_1>:<ip_1>,<node_alias_1>:<ip_2>,...,<node_alias_n>:<ip_n>

For example, the following command adds two nodes to the cluster:

gbar expand m6:192.168.1.6,m7:192.168.1.7

The command above will redistribute the data on all nodes including m6 and m7, so that each node has about the same amount of data.

GBAR will run the following checks for each new node:

  1. The number of new nodes must be an integer multiple of max(gpe.replicas, gse.replicas).

  2. Each new node alias must be a valid identifier.

  3. Each new node's IP address must be accessible via ssh from the node where gbar expand is being run.

If the system does not have a schema or data, it will report a data integrity check error. You may ignore this warning.

Advanced Expansion Mode

Advanced expansion configuration options are possible. Contact TigerGraph Support for guidance.

Error Handling

Should any errors occur, GBAR will roll back to the state before node expansion started. As a failsafe, a backup copy of the data is kept, until expansion either succeeds or finishes rollback.