Installation, Cluster Configuration and Scale-out, License Activation
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page describes the steps to upgrade an existing installation of TigerGraph to TigerGraph 3.1.x.
This page lists all the versions you can upgrade from through self-service. If you are trying to upgrade a production system to 3.1.x from a different version than those listed on this page, please first follow the corresponding guide to upgrade to one of the versions listed on this page, and then upgrade to 3.1.x from that version, or contact TigerGraph Support.
Upgrading to v3.1.0 - 3.1.4 is not recommended. Please upgrade to 3.1.5 or 3.1.6 instead.
Follow the steps described in Advisory: Update to TigerGraph 3.1.5+ to ensure your existing installation passes the schema validation
Any 3.x version of TigerGraph can up upgraded to another v3.x version by running the installation script with the upgrade(-U
) flag.
Download the latest version of TigerGraph to your system.
Extract the tarball.
Run the install script under the Linux user created during installation with the upgrade flag (-U
) that was extracted from the tarball:
If you are upgrading from 3.0.x, follow the procedures in Enable GUI and GSQL HA after upgrading to 3.1 to enable High Availability for the GSQL server as well as the GUI server.
If you are upgrading from 3.1.x, no further actions are necessary.
If you are running a production system, please contact TigerGraph support for upgrading from 2.x.
Follow the steps described in 2.6.x to 3.x upgrade flow to upgrade from 2.6 to 3.1.5/3.1.6.
To uninstall TigerGraph, open the command line of the Linux server and switch to the TigerGraph user, which is created during installation:
Then as the TigerGraph user, run the following Linux command:
If you have the TigerGraph platform installed on a multi-node cluster, when running the guninstall
command on a single node in the cluster, you will have the option to uninstall TigerGraph from all of the nodes in the cluster or just a single node.
This guide covers two advanced license issues:
Activating a System-Specific License
Usage limits enforced by certain license keys
This section provides step-to-step instructions for activating or renewing a TigerGraph license, by generating and installing a license key unique to that TigerGraph system. This document applies to both non-distributed and distributed systems. In this document, a cluster acting cooperatively as one TigerGraph database is considered one system.
A valid license key activates the TigerGraph system for normal operation. A license key has a built-in expiration date and is valid on only one system. Some license keys may apply other restrictions, depending on your contract. Without a valid license key, a TigerGraph system can perform certain administrative functions, but database operations will not work. To activate a new license, a user first configures their TigerGraph system. The user then collects the fingerprint of the TigerGraph system (license seed) using a TigerGraph-provided utility program. Then the collected materials are sent to TigerGraph or an authorized agent via email or web form. TigerGraph certifies the license based on the collected materials and sends a license key back to the user. The user then installs the license key on their system using another TigerGraph command. A new license key (e.g., one with a later expiration) can be installed on a live system that already has a valid license; the installation process does not disrupt database operations.
If your system is currently using an older string-based license key that does not use a license seed, please contact support@tigergraph.com for the procedure to upgrade to the new system-specific license type.
Before beginning the license activation process, the TigerGraph package must be installed on each server, and the TigerGraph system must be configured with gadmin.
Collect the fingerprint of the whole TigerGraph system using the command gadmin license seed <host_signature_type>
, which can be executed on any machine in the system. The command packs all the collected data to generate the license seed and writes it into a file. When the command has completed successfully, it outputs the path of the file to the console.
Depending on the host machine, the user needs to choose the appropriate type of host signature for gadmin
to collect. The options are: aws
. azure
, gcp
, hardware,
and node-id
. If you are generating the seed on a cloud instance, choose the corresponding cloud provider for the host signature type. If you are generating the seed on your own machines, choose either hardware
or node-id
. Signatures generated with the hardware
parameter will use unique hardware information that persists through software changes, while signatures generated with node-id
will use a unique machine ID that may change during an OS reinstall. Most users installing their own instances should use the hardware
option.
Send the license seed file to TigerGraph, either through our license activation web portal (preferred) or by email to license@tigergraph.com. If using email, please include the following information:
Company/Organization name
Contract number. If you do not know your contract number, please contact your sales representative or sales@tigergraph.com.
A new license key file will be certificated and sent back to you.
Copy the license key file to a directory on the TigerGraph system where the TigerGraph Linux user has read permission.
Run the following three commands to install the license key:
If the installation completes successfully, the message "install license successfully" will be displayed in the console.
After a license key has been installed successfully on a TigerGraph system, the information of the installed license is available via either the CLI command gadmin license status
or via the following REST API:
Some license keys include a limit on the graph size, or on the number and size of machines which may be used, or restrict the use of certain optional features. In the case of a memory usage or graph size limit, when a TigerGraph system reaches its license's limit, additional data will not be loaded into the graph. You may still query the graph and delete data. To check whether or not you have exceeded your license limits, use the command gstatusgraph and collect the VertexCount, EdgeCount, and Partition Size. Compare this information to the limits established for your license.
The output may include a warning message such such as the following:
A TigerGraph system with High Availability (HA) is a cluster of server machines which uses replication to provide continuous service when one or more servers are not available or when some service components fail. TigerGraph HA service provides load balancing when all components are operational, as well as automatic failover in the event of a service disruption. The replication factor is the number of copies of data. In contrast, the partitioning factor is the number of machines across which one copy of the database is distribution.
If the replication factor is 2, a fully-functioning system maintains two copies of the data, stored on separate machines. Users can choose a higher replication factor for greater query throughput and greater system resiliency.
The total cluster size should be (partitioning factor) X (replication factor).
The smallest possible distributed database with HA is 2 x 2 = 4 machines.
The smallest possible non-distributed database with HA is 1 x 3 = 3 machines.
There is no upper limit for either partitioning factor or replication factor.
The same version of the TigerGraph software package is installed on each machine.
Starting from version 3.0, configuring a HA cluster is part of platform installation. See the Installation Guide page for details.
HA configuration can only be done at the time of system installation and before deploying the system for database use. HA configuration change after installation is not supported. Converting a non-HA system to an HA cluster would require reinstalling all the TigerGraph components and rebuilding the database from the start.
During TigerGraph platform installation, specify the replication factor. The default value for replication factor is 1, which means there is no HA setup for the cluster. The user does not explicitly set the partitioning factor. Instead, the TigerGraph system will set
partitioning factor = (number of machines / replication factor)
If the division does not produce an integer, some machines will be left unused.
Example: If you install a 7-node cluster with replication factor = 2, the resulting configuration will be 2-way HA for a database with with a partitioning factor of 3. One machine will be unused.
TigerGraph supports native HA functionality for its application server, which serves the APIs for TigerGraph's GUI - GraphStudio and Admin Portal. The application server follows the active-active architecture, in which the server is always on m1 and all replicas of m1. If one server falls offline, you can use the other servers without any loss of functionality.
When you deploy TigerGraph in a cluster with multiple replicas, it is ideal to set up load balancing to distribute network traffic evenly across the different servers. This page discusses what to do when a server dies when you haven't set up load balancing, and the steps needed to set up load balancing for the application server.
When a server dies, users can proceed to the next available server within the cluster to resume the operations. For example, assuming the TigerGraph cluster has Application Server on m1 and m2. If the server on m1 dies, users can access m2 to use GraphStudio and Admin Portal.
To find out which node hosts the application server, run the gssh
command in the bash terminal of any active node in the cluster. The output will show you which nodes are hosting a GUI server.
Keep in mind that any long-running operation that is currently in process when the server dies will be lost.
When you deploy TigerGraph in a cluster with multiple replicas, it is ideal to set up load balancing to distribute network traffic evenly across the different servers.
One possible choice for setting up load balancing is through the use of Nginx.
Here is an example Nginx configuration for the upstream and server directives:
The server directives should specify your nodes' addresses which you want to load balance. Since TigerGraph requires session persistence, the load balancing methods will be limited to ip_hash or hash, unless you have access to Nginx Plus, which then means any load balancing method may be used with session persistence setup: https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/#sticky
An active health check can be set on the following endpoint if using Nginx Plus:
/api/ping
Otherwise, only a passive health check is available. See Nginx documentation for more information: https://docs.nginx.com/nginx/admin-guide/load-balancer/http-health-check/
If your applications are provisioned on AWS, another choice for load balancing is through the use of an Application Load Balancer.
To create an application load balancer, follow AWS's guide to create an application load balancer. The following configurations apply as you follow the guide:
When creating or using an existing security group in Step 3, make sure it allows requests from the load balancer to port 14240 of the instances in the target group.
In Step 4, set the health check URL to /api/ping
In Step 5, enter 14240 for the port of your instances.
After following the steps and creating your load balancer, enable sticky sessions in your target group.
After successfully creating your load balancer, you should now be able to access GraphStudio through the load balancer's DNS name. The DNS name can be found under the "Description" tab of your load balancer in the Amazon EC2 console.
If your instances are provisioned on Azure, you can set up an Application Gateway. Follow the steps for setting up an Application Gateway outlined here: Quickstart: Direct web traffic using the portal - Azure Application Gateway
Some different TigerGraph specific settings are required during Application Gateway setup:
Under the section “Configuration Tab”
For step 5, where it states to use port 80 for the backend port, use port 14240 instead.
In the same window, enable “Cookie-based affinity”.
After the Application Gateway is complete, we need to create a custom health probe in order to check the health/status of our Application Servers. You can follow the following steps outlined here: Create a custom probe using the portal - Azure Application Gateway When filling out the health probe information, the fields below should have the following values:
Pick port from backend HTTP settings: yes
Path: /api/ping
HTTP Settings: The HTTP settings associated with the backend pool create during the Application Gateway setup
After successfully creating the Application Gateway, you should now be able to access GraphStudio from the frontend IP associated with the Application Gateway.
If your instances are provisioned on Google Cloud, you can set up an External HTTP(s) Load Balancer: You can follow Google’s provided steps in their documentation for setup here: Setting up an external HTTPS load balancer | Identity-Aware Proxy
When creating the instance group:
Click “Specify port name mapping”, and use 14240 for the port
When setting up the health check:
For the port, use 14240.
For the path, use /api/ping
.
Lastly, we need to set up session affinity for our load balancer. This is outlined in GCP documentation here: External HTTP(S) Load Balancing overview | Google Cloud
After successfully creating the load balancer, you should now be able to access GraphStudio from the frontend IP associated with the load balancer.
By Design, TigerGraph has built-in HA for all the internal critical components from the beginning. This includes GPE, GSE, REST API Servers, etc. However, the user-facing applications (GSQL and GraphStudio) were designed to be set up by customers based on their High Availability (HA) needs. This included building solutions using non-TigerGraph components. With 3.1 release, TigerGraph will support native HA functionality for user-facing applications as well. This simplifies and streamlines HA deployment for users completely. For Operations personnel, this will reduce the operational overhead while enhancing the availability for end users.
Before we elaborate the design, we need to understand the topology of how TigerGraph services are deployed in a cluster. TigerGraph nodes in a cluster are organized as ‘m1’, ‘m2’, and so on. Although all nodes in the cluster serve the same function - store data and participate in query execution, m1 is a special node. GSQL server runs on this node to address critical services such as storing client metadata as well as managing connections between client and server. With this feature, m1 will no longer serve as the only node for GSQL server connections. In the new design, other nodes will run standby GSQL servers to provide high availability for client connections.
In the 3.1 release, primary GSQL server will continue to perform all the tasks handled by GSQL server prior to 3.1 release. This includes:
Process client connections
Querying requests from GSQL clients
User management requests including token management
In addition to these, when Primary fails, a standby server will switch to become the Primary server, and when the old Primary server is back to normal function, it will become a GSQL Standby server.
Redirect requests to Primary Server
Help Primary server to check for source data file existence and parse file header (if ANY is chosen)
There is no change in how GSQL Client works.
Users store the following data on m1 node that is needed for query execution:
GSQL loader's Token functions
ExprFunctions
ExprUtil
This is part of the user source code that TigerGraph system uses to compile. Prior to 3.1 release, this information is available to GSQL server only on m1 node. Typically, users can modify these files directly on the machine. But with HA, the Primary GSQL may not be in m1, and can be switched to any other machine at any time. Users have to make sure all the machines have the same content whenever there are updates to the files. This is a new requirement for users.
GSQL server will retrieve the User source code files in the following priority order when it needs them:
Via github/github enterprise (if configuration is set),
Files uploaded via PUT,
Default files that are shipped with the product
This requires public network access, or github enterprise server access. User need to provide the following gadmin configuration:
Example:
When GSQL server needs to compile the files, it will retrieve them from github if the GitHub access is configured as above. It will retry 3 times, with timeout=5s for each time. If the connection fails, it will go to the next priority level method, i.e. file uploaded via PUT.
We are introducing new GSQL commands to address this need. These commands will allow users to upload and download the user source files.
Upload source code
Example:
Download source code
Example:
The uploaded files will be saved to all nodes. Users will need to have either ‘superuser’ and ‘global_designer’ roles to have the sufficient privileges to run the PUT/GET commands.
When calling GET command, the user can download the corresponding file from the Primary node, to a local directory at the current cluster node.
When calling PUT command, the local file will be copied to all of the cluster nodes, including itself.
Example usage scenario to update of the files is as follows:
For each cluster node, TokenBank.cpp is stored at:
ExprFunctions.hpp and ExprUtil.hpp files are stored at:
Full path should be provided including the file name for PUT/GET, eg:
Notice that in the first command, we use absolute path, while in the second command, we use relative path. Both are supported. But “~” is not supported (eg: “~/tmp/x.hpp”).
Additionally, users can also use the commands in the following manner as well:
Use a folder name, and automatically default name will be added. For example:
It will use ExprFunctions.hpp under the directory "/home/path/tmp" for PUT.
It will create/overwrite the file "home/path/tmp/TokenBank.cpp".
If the file name is given in the path, its file extension must be consistent with the corresponding file. For example:
is not allowed, since PUT/GET ExprFunctions must use “.hpp” as file extension.
If the corresponding file is not found, the GSQL Primary server will use the default file in the package. These default files are at:
In Pre-3.1 release design, the file path used in loading jobs refers to the file in m1, unless the user specifies machine name before the path (ALL, ANY, m1, m2,…). In the new HA design, the Primary server can be running on any machine, and can be switched. This means GSQL server may or may not find the file. To be back-compatible we prefix a machine name if the client is in TigerGraph cluster.
Users can specify the node ID before the path using: ALL, ANY, m1, m2 and so forth. Declaring ALL or ANY as host ID will load files from every cluster node.
User can use form like “m1|m3|m4” to declare the combination of several nodes.
If the hosts are not specified, it will look for the host ID of the current node that is running the loading job, (through searching the nodes in $(gadmin config get GSQL.BasicConfig.Nodes)). If not found, it will use node “m1” by default.
Data source can be created and used with a file path or a JSON string, same as above.
GSQL client can connect to GSQL server in the different ways with the following priority order:
Users can specify the ip and port when calling GSQL client using “gsql -i” or “gsql -ip”. For example:
GSQL clients will try these ips and ports one by one. Notice the port is optional, it will use 14240 by default, which is the default port for GSQL server.
If “gsql -i” or “gsql -ip” are not used, GSQL client will search the file gsql_server_ip_config where the user runs the GSQL client. The file gsql_server_ip_config should be a one-line file such as shown below. GSQL client will traverse the ips and ports in the file in its order.
Similarly, the port number is also optional, using 14240 by default.
If “gsql -i” or “gsql -ip” are not used, and the file gsql_server_ip_config does not exist where “gsql” is called, GSQL client will try to connect to the local server (127.0.0.1:8123).
Use gadmin config to get/set the following configurations related to GSQL High Availability.
The first is the heartbeat interval in milliseconds. The second (“max misses”) is the total timeout for switching to the Primary server which will measure the number of heartbeat intervals. It must be at least 2 to allow 1 heartbeat miss.
For example, if we use “IntervalMS = 2000” and “max misses = 4” as shown above, then the total timeout is 2s×4 = 8 seconds. So the current Primary server will be switched if its heartbeat has stopped for more than 8 seconds.
Installing Single-machine and Multi-machine systems
This guide describes how to install the TigerGraph platform either as a single node or as a multi-node cluster, interactively or non-interactively.
If you signed up for the Enterprise Free license, you also have access to the TigerGraph platform as a Docker image or a virtual machine (VirtualBox) image. Follow the instructions in Getting started to start up TigerGraph in a Docker container or with VirtualBox.
Before you can install the TigerGraph system, you need the following:
One or more machines that meet the minimum Hardware and Software Requirements.
A sudo user with the same username and login credential on every machine.
If sudo privilege is not available, please contact TigerGraph support for workarounds.
A license key provided by TigerGraph (not applicable to Enterprise Free)
A TigerGraph system package
If you do not yet have a TigerGraph system package, you can request one at the following address: https://www.tigergraph.com/get-tigergraph
If you are installing a cluster, ensure that every machine has the same SSH port and the port stays open during installation
TigerGraph's installation script support both single-node and cluster installation, and the user can choose to install either interactively and non-interactively.
The following describes the procedure to install TigerGraph on Linux interactively. The filename of your package may vary, depending on the product edition and version. For the examples here, we use the filename tigergraph-<version>.tar.gz
, which should be replaced by the actual filename of your package.
Extract the package by running the following command. A folder named tigergraph-<version>-offline
will be created.
Navigate to the tigergraph-<version>-offline
folder and run script install.sh
with the following commands:
The installer will ask for the following information, for which you may choose to hit Enter to skip and use the system default or enter a new value:
Your agreement to the License Terms and Conditions
Your license key (not applicable for Enterprise Free)
Username for the Linux user who will own and manage the TigerGraph platform
The installer creates a Linux user who is the only authorized user that can run gadmin
commands to manage the TigerGraph Platform, and for whom this username is for
Password for the Linux user who will own and manage the TigerGraph platform
Path to where the installation folder will be
Path to where the data folder will be
Path to where the log folder will be
Path to where the temp folder will be
The SSH port for your machine
To see what the default settings are, read the Installation options section below.
Since license keys are long – over 100 characters long. If you copy-and-paste the license key, be careful not to accidentally include an end-of-line character.
TigerGraph cluster configuration enables the graph database to be partitioned and distributed across multiple server nodes in a local network. After you have answered the questions described in the previous step, the installation script will ask for the following to complete cluster configuration:
The number of nodes in your cluster. Each node will be given an alias following the input (m1
, m2
, m3
, etc.)
If this is a single-node installation, enter 1
The IP address of each node
Username and credentials information of the sudo user
Every machine in the cluster must have a sudo user with the same username and password or SSH key.
Permission to set NPT time synchronization
Permission to set firewall rules among the cluster nodes
In TigerGraph 3.x, the installation machine can be within or outside the cluster. If outside the cluster, the installation machine still needs to be a Linux machine.
After all the questions are answered, the script will proceed to installation. A screenshot of the interactive installation is shown below:
After installation is complete, you can switch to the Linux user who owns the platform (created in Step 2) with the following command :
At the prompt, enter the password that was set in Step 2.
After switching to the correct user, you now have access to gadmin
commands. Confirm successful installation by running gadmin status
. If the system is installed correctly and the license is activated, the command should report that all services are up and ready. Since there is no graph data loaded yet, GSE and GPE will show "Warmup"
.
The following describes the procedure to install TigerGraph on Linux non-interactively.
Extract the package by running the following command. A folder named tigergraph-<version>-offline
will be created.
Navigate to the tigergraph-<version>-offline
folder. Inside the folder, there is a file named install_conf.json
. For non-interactive mode installation, the user must review and modify all the settings in the file install_conf.json
before running the installer.
Below is an example of the install_conf.json
file:
Here is a description of all the fields in the config file:
"BasicConfig"
"TigerGraph"
: Information about the Linux user that will be created by the installer who owns and manages the TigerGraph platform.
"Username"
: Username of the Linux user.
"Password"
: Password of the Linux user.
"SSHPort"
: Port number used to establish SSH connections.
"PrivateKeyFile"
(optional): Absolute path to a valid private key file. If left empty, TigerGraph will generate one named tigergraph.rsa
automatically.
"PublicKeyFile"
(optional): Absolute path to a valid public key file. If left empty, TigerGraph will generate one named tigergraph.pub
automatically.
"RootDir"
"AppRoot"
: Absolute path to where application folder will be.
"DataRoot
": Absolute path to where the data folder will be.
"LogRoot"
: Absolute path to where the log folder will be.
"TempRoot"
: Absolute path to where the temp folder will be.
"License"
: Your TigerGraph license string.
"Node List"
: A JSON array of the nodes in the cluster. Each machine in the cluster is defined as a key-value pair, where the key is a machine alias (m1, m2, m3, etc) and the value is the IP address of the node.
"AdvancedConfig"
"ClusterConfig"
: Cluster configurations
"LoginConfig"
: Login configurations
"SudoUser"
: Username of the sudo user who will be used to execute the installation on all nodes.
"Method"
: Authentication method for SSH. Enter "P"
to use password authentication and "K"
to use key-based authentication.
"P"
: Password of the sudo user.
"K"
: Absolute path to the SSH key to be used to authenticate the sudo user.
"ReplicationFactor"
: Replication factor of the cluster.
If you would like to enable the High Availability (HA) feature, please make sure you have at least 3 nodes in the cluster and set the replication factor to be greater than 1. For example, if your cluster has 6 nodes, you could set the replication factor to be 2 or 3. If you set the replication factor to be 2, then the partitioning factor will be 6 / 2 = 3. Therefore, 3 nodes will be used for one copy of the data, and the other 3 nodes will be used as a replica copy of the data.
Ensure that the total number of nodes can be fully divided by the replication factor. Otherwise, some nodes may not be utilized as parts of the HA cluster.
-n
optionStart the non-interactive installation process by running the install.sh
script with the -n
option:
The following default settings will be applied if no parameters are specified:
The installer will create a Linux user with username tigergraph
and with password tigergraph
. This user will be the only user authorized to run gadmin
commands to manage the TigerGraph platform and services.
If there is already a user named tigergraph
, this user will be designated as the platform owner and no other user will be created.
The default root directory for the installation would be /home/tigergraph/tigergraph
with the App/Data/Log/Temp files within it :
App Path : /home/tigergraph/tigergraph/app
Data Path :/home/tigergraph/tigergraph/data
Log Path :/home/tigergraph/tigergraph/log
Temp Path :/home/tigergraph/tigergraph/tmp
The root directory for the installation (referred to as <tigerGraph_root_dir>
) is a folder called tigergraph
located in the tigergraph user's home directory, i.e., /home/tigergraph/tigergraph
.
The installation can be customized by running command-line options with the install.sh
script: