Cluster Shrinking

Shrinking a cluster removes nodes from the cluster. The data stored on those nodes will be redistributed to the remaining nodes.

Shrinking a cluster can make sense for a few reasons:

Your workload has changed and you can operate a cluster with fewer resources.
Your data volume is lower than projected.

Shrinking an operational cluster is different from removing a failed node. If a node has failed in your cluster and needs to be removed, see Removal of Failed Nodes.

Cluster shrinking requires several minutes of cluster downtime. The exact amount of downtime varies depending on the size of your cluster.

1. Before you begin

Requirements

Ensure that no loading jobs, queries, or REST requests are running on the new node or the cluster.

Advice and Precautions

Obtain a few key measures for the state of your data before the expansion, such as vertex counts/edge counts or certain query results. This will be useful in verifying data integrity after the expansion completes.
Perform a full backup of your existing system before performing the expansion.

2. Procedure

2.1. Identify new cluster replication and partition

Before running any commands to shrink a cluster, make sure you have a clear idea of how the new cluster should be distributed. You should have the following information:

The new replication factor of the cluster
The new partitioning factor of the cluster
The names and IP addresses of the nodes to be removed from the cluster

2.2. Shrink the cluster

To shrink the cluster, run the gadmin cluster shrink command like below. If the shrinking involves changing the replication factor, use the --ha <replicattion_factor> option to indicate the new replication factor:

$ gadmin cluster shrink node_ip_list [--ha <replication_factor>]

node_ip_list is the list of nodes you are removing from the cluster mapped to their IP addresses with a colon(:), and separated by a comma. Below is an example:

$ gadmin cluster shrink m3:10.128.0.81,m4:10.128.0.82 --ha 1

2.2.1. Supply a staging location

Extra disk space is required during cluster shrinking. If more space is not available on the same disk, you can supply a staging location on a different disk to hold temporary data:

$ gadmin cluster shrink m3:192.168.1.3,m4:192.168.1.4 --stagingPath /tmp/

If you choose to supply a staging location, make sure that the TigerGraph Linux user has write permission to the path you provide. The overall amount of space required for cluster shrinking on each node is (1 + ceiling(oldPartition/newPartition) ) * dataRootSize. oldPartition and newPartition stand for the partitioning factors of the cluster before and after shrinking, respectively; dataRootSize stands for the size of the data root folder on the node.

For example, assume you are shrinking from a 10-node cluster with a replication factor of 2 and a partitioning factor of 5, to a 6-node cluster with a replication factor of 2 and a partitioning factor of 3, and the size of the data root folder on a node is 50GB. You would need more than (1 + ceiling(5/3)) * 50) = 150 GB of free space on the staging path.

2.3. Verify success and delete temporary files

When shrinking completes, you should see a message confirming the completion of the cluster change. The message will also include the location of the temporary files created during the operation.

Verify data integrity by comparing vertex/edge counts or query results to what they were before the shrinking. After confirming a successful cluster contraction, delete the temporary files to free up disk space.

2.4. Uninstall TigerGraph on the removed nodes

Uninstall TigerGraph from the removed nodes by removing all root directories (app, data and tmp) except the log directory of TigerGraph.

For security reasons, guninstall is disabled on the removed nodes. Therefore, you need to delete TigerGraph by manually deleting the root directories.