Cluster Expansion

Expanding a cluster adds more nodes to the cluster. During the expansion, the cluster’s data is redistributed across the expanded cluster. You can also change replication factor during an expansion.

Expanding a cluster may make sense for several reasons:

to improve query performance
to improve system availability
to improve fault tolerance

Cluster expansion requires several minutes of cluster downtime. The exact amount of downtime varies depending on the size of your cluster.

If your TigerGraph cluster uses a seeded license, please contact TigerGraph support to obtain an unseeded temporary license before expanding your cluster.

1. Before you begin

You must adhere to all of the following requirements.

The new nodes must already have exactly the same version of TigerGraph software installed as the nodes you are expanding from.
The new nodes must not have any data you wish to keep.
Ensure that the cluster is not using shared storage. Cluster expansion does not support shared storage.
Ensure that no loading jobs, queries, or REST requests are running on the original or the expansion target.
If the original cluster is a single node installation, make sure the IP used is not a local loopback address such as 127.0.0.1.

You can check the IP address of your original cluster using gadmin config get System.HostList. If it is a loopback IP, you can update it to be the internal IP using the following script:

$ gadmin config set System.HostList '[{"Hostname":"'$(ip a | grep "inet " | awk 'FNR == 2 {print $2}' | awk -F "/" '{print $1}')'","ID":"m1","Region":""}]'
$ gadmin config apply -y
$ gadmin restart all -y

Advice and Precautions

Obtain a few key measures for the state of your data before the expansion, such as vertex counts/edge counts or certain query results. This will be useful in verifying data integrity after the expansion completes.
Perform a full backup of your existing system before performing the expansion.

2. Procedure

2.1. Identify new cluster replication and partition

Before running any commands to expand a cluster, make sure you have a clear idea of how the new cluster should be distributed. You should have the following information:

The new replication factor of the cluster
The new partitioning factor of the cluster
The IP addresses of the new nodes to be added to the cluster

2.2. Expand the cluster

To expand the cluster, run the gadmin cluster expand command as shown. If the expansion involves changing the replication factor, use the --ha option to indicate the new replication factor:

$ gadmin cluster expand node_ip_list [--ha <replication_factor>]

node_ip_list is the machine aliases of the nodes you are adding to the cluster mapped to their IP addresses with a colon(:), and separated by a comma. Below is an example:

$ gadmin cluster expand m3:10.128.0.81,m4:10.128.0.82 --ha 1

We suggest naming the new nodes following the convention of m<count>, such as m1, m2, and m3 for a 3-node cluster. If you are adding a fourth node, then the fourth node would be named m4. If you decide to name them differently, make sure that all names are unique within the cluster.

2.2.1. Supply a staging location

Extra disk space is required during cluster expansion. If more space is not available on the existing disk, you can supply a staging location on a different disk to hold temporary data:

$ gadmin cluster expand m3:192.168.1.3,m4:192.168.1.4 --staging-path /tmp/

When you specify --staging-path, the path must:

Exist on every node in the cluster (both existing nodes and newly added nodes).
Be accessible and writable by the TigerGraph Linux user on all nodes.

TigerGraph uses the staging path on each node during expansion, so the operation will fail if the path is missing or not writable on any node.

The overall amount of space required on each node is:

(1 + ceiling(oldPartition / newPartition)) * dataRootSize

where:

oldPartition = partitioning factor before expansion
newPartition = partitioning factor after expansion
dataRootSize = size of the data root folder on that node

For example, expanding from a 6-node cluster with replication factor 2 and partitioning factor 3 to a 10-node cluster with replication factor 2 and partitioning factor 5, with a 50 GB data root folder, requires more than:

(1 + ceiling(3 / 5)) * 50 = 100 GB

of free space on the staging path of each node.

2.3. Verify success and delete temporary files

When the expansion completes, you should see a message confirming the completion of the cluster change. The message will also include the location of the temporary files created during the expansion.

Verify data integrity by comparing vertex/edge counts or query results. After confirming a successful expansion, delete the temporary files to free up disk space.