HA Cluster Configuration
Version 2.2 - 2.3 Copyright © 2019 TigerGraph. All Rights Reserved.
A TigerGraph system with High Availability (HA) is a cluster of server machines which uses replication to provide continuous service when one or more servers are not available or when some service components fail. TigerGraph HA service provides loading balancing when all components are operational, as well as automatic failover in the event of a service disruption. One TigerGraph server consists of several components (e.g., GSE, GPE, RESTPP). The default HA configuration has a replication factor of 2, meaning that a fully-functioning system maintains two copies of the data, stored on separate machines. In advanced HA setup, users can set a higher replication factor.

System Requirements

  • An HA cluster needs at least 3 server machines . Machines can be physical or virtual. This is true even the system only has one graph partition.
  • For a distributed system with N partitions (where N > 1), the system must have at least 2N machines.
  • The same version of the TigerGraph software package is installed on each machine.

Limitations

  1. 1.
    HA configuration should be done immediately after system installation and before deploying the system for database use.
  2. 2.
    To convert a non-HA system to an HA system, the current version of TigerGraph requires that all the data and metadata be cleared, and all TigerGraph services be stopped. This limitation will be removed in a future release.

Workflow

Starting from version 2.1, configuring a HA cluster is integrated into platform installation, please check the document TigerGraph Platform Installation Guide for detail.

(A) Install TigerGraph

Follow the instructions in the document TigerGraph Platform Installation Guide to install the TigerGraph system in your cluster.
In the instructions below, all the commands need to be run as the tigergraph OS user, on the machine designated "m1" during the cluster installation.

(B) Stop the TigerGraph Service

Be sure you are logged in as the tigergraph OS user on machine "m1". Before setting up HA or changing HA configuration, the current TigerGraph system must be fully stopped. If the system has any graph data, clear out the data (e.g., with "gsql DROP ALL").
Stopping all TigerGraph services
1
gadmin stop ts3 -fy
2
gadmin stop all -fy
3
gadmin stop admin -fy
Copied!

(C) Enable HA

After the cluster installation, create an HA configuration using the following command:
1
gadmin --enable ha
Copied!
This command will automatically generate a configuration for a distributed (partitioned) database with an HA system replication factor of 2. Some individual components may have a higher replication factor .
Sample output:
Successful HA configuration
1
[email protected]$ gadmin --enable ha
2
[FAB ][m3,m2] mkdir -p ~/.gium
3
[FAB ][m3,m2] scp -r -P 22 ~/.gium ~/
4
[FAB ][m3,m2] mkdir -p ~/.gsql
5
[FAB ][m3,m2] scp -r -P 22 ~/.gsql ~/
6
[FAB ][m3,m2] mkdir -p ~/.venv
7
[FAB ][m3,m2] scp -r -P 22 ~/.venv ~/
8
[FAB ][m3,m2] cd ~/.gium; ./add_to_path.sh
9
[RUN ] /home/tigergraph/.gsql/gpe_auto_start_add2cron.sh
10
[FAB ][m3,m2] mkdir -p /home/tigergraph/.gsql/
11
[FAB ][m3,m2] scp -r -P 22 /home/tigergraph/.gsql/all_log_cleanup /home/tigergraph/.gsql/
12
[FAB ][m3,m2] mkdir -p /home/tigergraph/.gsql/
13
[FAB ][m3,m2] scp -r -P 22 /home/tigergraph/.gsql/all_log_cleanup_add2cron.sh /home/tigergraph/.gsql/
14
[FAB ][m1,m3,m2] /home/tigergraph/.gsql/all_log_cleanup_add2cron.sh
15
[FAB ][m1,m3,m2] rm -rf /home/tigergraph/tigergraph_coredump
16
[FAB ][m1,m3,m2] mkdir -p /home/tigergraph/tigergraph/logs/coredump
17
[FAB ][m1,m3,m2] ln -s /home/tigergraph/tigergraph/logs/coredump /home/tigergraph/tigergraph_coredump
Copied!
If the HA configuration fails, e.g, if the cluster doesn’t satisfy the HA requirements, then the command will stop running with a warning.
HA configuration failure
1
[email protected]$ gadmin --enable ha
2
Detect config change. Please run 'gadmin config-apply' to apply.
3
ERROR:root: To enable HA configuration, you need at least 3 machines.
4
Enable HA configuration failed.
Copied!

(D) [Optional] Configure Advanced HA

In this optional additional step, advanced users can run several "gadmin --set" commands to control the replication factor and manually specify the host machine for each TigerGraph component. The table below shows the recommended settings for each component. See the later example section for different configuration cases.
Component
Configuration Key
Suggested Number of Hosts
Suggested Number of Replicas
ZooKeeper
zk.servers
3 or 5
-
Dictionary Server
dictserver.servers
3 or 5
-
Kafka
kafka.servers
same as GPE
same as GPE
kafka.num.replicas
2 or 3
2 or 3
GSE
gse.servers
every host
every host
gse.replicas
2
2
GPE
gpe.servers
every host
every host
gpe.replicas
2
2
REST
restpp.servers
every host
every host
Example: There is a 3-machine cluster m1, m2 and m3. Kafka, GPE, GSE and RESTPP are all on m1 and m2, with replication factor 2. This is a non-distributed graph HA setup.
Example: 3-machine non-distributed HA cluster
1
gadmin --set zk.servers m1,m2,m3
2
gadmin --set dictserver.servers m1,m2,m3
3
gadmin --set dictserver.base_ports 17797,17797,17797
4
gadmin --set kafka.servers m1,m2
5
gadmin --set kafka.num.replicas 2
6
gadmin --set gse.replicas 2
7
gadmin --set gpe.replicas 2
8
gadmin --set gse.servers m1,m2
9
gadmin --set gpe.servers m1,m2
10
gadmin --set restpp.servers m1,m2
Copied!

(E) Install Package

Once the HA configuration is done, proceed to install the package from the first machine (named “m1” in the cluster installation configuration).
1
gadmin pkg-install reset -fy
Copied!

Examples

The table below shows how to setup for the common setups. Note if convert the system from another configuration, must stop the old TigerGraph system first.
System Goal
Cluster Configuration (number of servers in cluster is X)
How to A,B,C, etc. refer to the Steps in the section above.
Non-distributed graph with HA
Each server machine holds the complete graph.
  • For both initial installation and reconfiguration, (A) → B → C → D → E. While in D, set all replicas to X , e.g, gpe.replicas = X gse.replicas = X restpp.replicas = X ...
  • Note: (A) means A is needed only in initial installation
Distributed graph without HA
Graph is partitioned among all the cluster servers.
  • Note: no HA is equivalent to replication factor 1
  • For initial installation, skip B, C, D and E.
  • For reconfiguration, B → C → D → E. While in D, set all replicas to 1, e.g., gpe.replicas = 1 gse.replicas = 1 restpp.replicas = 1
    ...
Distributed graph with HA
Graph is partitioned with replica factor N. Number of partitions Y equals X/N.
  • For both initial installation and reconfiguration, (A) → B → C → D → E. While in D, set all replicas to N , e.g., gpe.replicas = N gse.replicas = N ...
  • Note: (A) means A is needed only in initial installation
Last modified 2yr ago