Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
HA (High Availability) is a generic term used to describe a computer system which has been architected to deliver higher levels of operational performance through enhanced uptime and throughput than would be expected from a traditional single server node.
With Continuous Availability, TigerGraph goes beyond the standard scope and definition of High Availability by providing the following functionality:
Fault tolerance against loss of database server(s)
Automated recovery of services for Intra-cluster failure
Full native HA support for user-facing applications - Seamless automatic client reconnection to standby GSQL server and GraphStudio servers
Failover to remote cluster during Disaster Recovery
Improve RoI with additional Replicas
Enhanced Query Throughput performance
Increased concurrency for operational workloads
In short, TigerGraph Continuous Availability provides not only the ability to keep the business application running without any noticeable downtime, but also delivers higher return on investment.
TigerGraph’s architecture design relies on active-active replication to keep multiple data copies in sync. This is transparent to the user. The underlying principles of uniform distribution of data are automatically applied no matter how many replicas are stored. Additionally, the placement of replicas is infrastructure-aware to tolerate hardware failures. Continuous availability is a production configuration that customers can pick at cluster installation time. Customers have flexibility to place replicas in specific availability zones or data centers based on their infrastructure requirements.
TigerGraph Continuous Availability design provides the following
Throughput: Each replica is always up to date and handles its share of read requests. This provides higher query concurrency and throughput.
If a server goes offline for planned or unplanned reasons, TigerGraph's HA design with Automatic Failover will reroute work to the replica nodes of that server, maintaining continuous operation.
Higher levels of replication provide more throughput and resilience.
TigerGraph is based on an MPP architecture. All services are distributed uniformly across the cluster. This requires data to be distributed across the cluster. There are two key concepts in the cluster design:
Replication Factor: Replication factor is a characteristic of the cluster design that will determine the number of copies of data that will be stored in the cluster. This is configurable and customers can choose at the time of installation.
Partitioning Factor: Partitioning factor is an internal TigerGraph characteristic of a TigerGraph cluster that determines how the data in the database will be distributed. Based on the nodes of the cluster size, TigerGraph will automatically pick a partitioning factor taking into account the replication factor.
In short, a TigerGraph cluster can be seen as a combination of data spread across P partitions with multiple copies equal to the number of R Replicas.
Some Key Cluster Design Considerations:
Any cluster size is allowed, except 1x2
Minimum number of servers needed for Continuous availability is 3 servers - This is due to Zookeeper quorum dependency.
TigerGraph services are based on distributed masterless architecture - all replicas are equal and can service both read and write requests. This is a key differentiation that ensures that no single node can be the Single Point of Failure.
Write Operations: In order to keep all replicas in sync for full consistency of all data sets, all write operations are sent to all replicas synchronously by default. A write operation is considered complete only if all the replicas acknowledge that the writes are successful.
Read Operations: As all replicas are guaranteed to be in sync for all write operations, Read requests can be sent to any replica with no need to verify the data consistency with other replica copies. This optimizes the read performance for read-heavy analytical queries.
Example:
In the following example, the data sets in the cluster are spread across 5 partitions with 2 replicas of data sets i.e. Partitioning Factor of 5 and Replication Factor of 2.
All writes go to all replicas (e.g. both 1A and 1B).
Reads can be from any one replica (e.g. either 1A or 1B).
Distributed queries can read from a mix replicas (e.g. {1A, 2B, 3B, 4A, 5B}.
TigerGraph design ensures Automatic Failover for Continuous Availability. If a server goes down (hardware or software, planned or unplanned), incoming DB operations can continue. The requests are automatically routed around the unavailable server. The TigerGraph Database scheduler tracks in real time the availability of servers and routes the request to the right servers.
Example:
In the event of server failure:
If any single server is unavailable (expected or unexpected):
When it fails to respond after a certain number of tries, requests will automatically divert to another replica (e.g. 3B is unavailable, so use 3A)
If it fails in the middle of a transaction, that transaction might be aborted.
System continues to operate, with reduced throughput, until the server is restored.
Please bookmark the new location:
Beginning with Version 3.1,
Vertex-Level Access Control greatly improves the granularity of data access control from partitioning data at the type-level (MultiGraph) to the individual vertex-level (VLAC).
One TigerGraph instance can manage multiple graphs, each with its own set of user privileges. This first-of-its-kind capability, dubbed MultiGraph, is available as an optional service in the TigerGraph platform.
MultiGraph enables several powerful use cases:
Multiple Tenancy: Use one TigerGraph instance to support several completely separate data sets, each with its own set of users. Each user community cannot see the other user communities or other data sets.
Fine-grained privileges on the same set of data: Role-based access control, available on single graphs, grants permission for the privilege to run queries (include data modification queries). In a single graph scheme, there is no way to say "Query X can be run by some users but not by others." Using multiple graphs defined over the same set of data, each graph can have its own set of queries and own set of users, in effect customizing who can run which queries.
Overlapping graphs: Graphs can partially overlap, to enable a combination of shared and private data.
Hierarchical subgraphs: A Graph X can be defined to cover the domains of Graphs Y and Z, that is, Graph X = (Graph Y) U (Graph Z). This provides an interesting way to describe a data partitioning or parent-child structure. (This is not the same as defining sub-classes of data types; data types are still independent.)
If you implement only one graph now, you can upgrade to MultiGraph and add additional graphs at any time, without having to redo your existing design.
A graph is defined as a set of vertex types and edge types. More precisely, it is all the vertices and edges of that collection of types. The domain of a graph is its set of vertex types and edge types. Each graph contains its own data loading jobs and queries, which do not affect and are not visible to other graphs.
MultiGraph Principles
A TigerGraph instance with a basic license key can have one graph. A TigerGraph instance with a MultiGraph license key can create multiple graphs.
superuser
and globaldesigner
can define one or more graphs. The domains of the two graphs may be completely separate, may overlap, or may coincide exactly.
A vertex type or edge type created by a superuser is a global type.
A superuser
or globaldesigner
can include a global vertex or edge type in one or more graphs. Global types can be shared among multiple graphs.
Users with the admin
or designer
role for a particular graph can add local vertex types and edge types to their own graph. Local types cannot be shared among multiple graphs.
The TigerGraph system includes several predefined roles. Each role is a fixed and logical set of privileges to perform operations. In order to access a graph, a user must be granted a role on that graph. Without a role, a user has no meaningful access.
Role-Based MultiGraph Access Control
User roles are granted or revoked on a per-graph basis. Each GRANT or REVOKE statement specifies not only a role but also a graph.
The GRANT
/REVOKE
privilege is reserved for superuser and admin users.
The superuser
can grant a role to any user on any graph.
A superuser
can pick a user to be an admin
on a particular graph. The admin can then manage user privileges on their graph.
A user may be granted different roles on different graphs.
For details about managing users, privileges, and roles, see User Privileges and Authentication. There you will find a chart describing each of the roles in detail.
A user must set their working graph in order to access that graph, either using the -g flag with the GSQL command, or by using the USE GRAPH
command.
Users who have privileges on more than one graph (including superusers) may only work with one graph at a time. The GLOBAL SCHEMA_CHANGE JOB
stretches this rule.
Note that the CREATE
commands for queries, loading jobs, and schema_change jobs require that the graph name be specified, even for systems with only one graph.
RESTPP Endpoints: Endpoints that pertain to the graph data must include the name of the graph in the request URL. See RESTPP API User Guide .
User Authentication secrets and tokens: Our commands and procedures follow OAuth standards. See Managing User Privileges and Authentication.
There are many other details about using the MultiGraph feature, especially if your application has multiple users with different roles. In the documentation, the Multiple Graph logo is placed next to relevant topics:
This document describes the transactional support provided by the TigerGraph platform. A TigerGraph transaction is a sequence of operations which acts as a single logical unit of work. A read-only operation in TigerGraph does not change any vertex/edge attribute value and doesn't insert any new or delete any existing vertex/edge. An update operation in TigerGraph is an operation which either changes some vertex/edge attribute value or insert some new or delete some existing vertex/edge.
The TigerGraph system provides full ACID transactions with sequential consistency. Transactions are defined as follows:
Each GSQL query is a transaction. Each query may have multiple read or write operations.
Each REST++ GET, POST, or DELETE operation (which may have multiple update operations within it) is a transaction.
A transaction with update operations may insert/delete multiple vertices/edges or update the attribute values of multiple edges/vertices. Such update requests are “all or nothing”: either all changes are successful, or none is successful.
The TigerGraph system provides traditional ACID consistency: A transaction can include data validation rules. The data validation rules can ensure any transaction will bring the system from one valid state to another.
The TigerGraph system also provides distributed system Sequential Consistency: every replica of the data performs the same operations in the same order. This is one of the strongest forms of consistency available, stronger than causal consistency, for example.
TigerGraph supports the Serializable isolation level, the strongest form of isolation. Internally, TigerGraph uses MVCC to implement the isolation. MVCC, or Multi-Version Concurrency Control, makes use of multiple snapshots of portions of the database state in order to support isolated concurrent operations. In principle, there can be one snapshot per read or write operation.
A read-only transaction R1 will not see any changes made by an uncommitted update transaction, whether that update transaction was submitted before or after R1 was submitted to the system.
Multiple same reads in a single transaction T1 will get the same results, even if there are update transactions which change vertex or edge attribute values read by T1 during T1’s duration.
Multiple reads in a single read-only transaction T1 will get the same results, even if there are update transactions which deleted/inserted vertices or edges read by T1 during T1’s duration.
Committed transactions are written to disk (SSD or HDD). The TigerGraph platform implements write-ahead logging (WAL) to provide durability.
The TigerGraph platform uses Snapshot/MVCC (Multi-version Concurrency Control) to implement isolation of concurrent operations. At the high level, the platform can temporarily maintain multiple versions or snapshots of the graph data. When a transaction T1 is submitted to the system, it will work on the last consistent snapshot of the graph which has all the changes made by transactions committed before T1 was submitted but has no changes made by any transaction not yet committed when T1 was submitted. The version of the graph T1 is working on will not be changed by any transactions other than T1 , even if they commit before T1 is finished.
Let us examine a few transaction processing scenarios.
A read-only transaction R1 is running. Before R1 finishes, an update transaction W2 comes in. W2 might finish before R1 is finished. But R1 will not see the changes made by W2 before W2 is committed (no dirty reads). Even if W2 is committed before R1 is finished, if R1 reads the same part of the graph multiple times, it will not see the changes made by W2 (repeatable reads). There are no phantom reads either. This is because the graph version R1 is working on cannot be changed by any of the W2 transaction aforementioned. Bottom line: If W2 starts when R1 is not yet committed, R1 will see results as though W2 did not exist.
An update transaction W1 is running. Before W1 is committed, a read-only transaction R2 comes in. R2 will not wait for W1 to finish and will be executed as if there is no W1. Later. even if W1 finishes and commits before R2 is finished, R2 will not see any changes made by W1. This is because the graph version R2 works on is 'fixed' at the time when R2 is submitted and will not include the changes to be made by W1. Bottom line: If R2 starts when W1 is not yet committed, R2 will see results as though W1 did not exist.
An update transaction W1 is running. Before W1 finishes, a new update request W2 comes in. W2 will wait for W1 to finish before it is executed. When multiple update transactions come in, they will be executed sequentially by the system according to the time they are received by the system.
As the world’s first and only Native Parallel Graph (NPG) system, TigerGraph is a complete, distributed, graph analytics platform supporting web-scale data analytics in real time. The TigerGraph NPG is built around both local storage and computation, supports real-time graph updates, and works like a parallel computation engine. These capabilities provide the following unique advantages:
Fast data loading speed to build graphs - able to load 50 to 150 GB of data per hour, per machine
Fast execution of parallel graph algorithms - able to traverse hundreds of million of vertices/edges per second per machine
Real-time updates and inserts using REST - able to stream 2B+ daily events in real-time to a graph with 100B+ vertices and 600B+ edges on a cluster of only 20 commodity machines
Ability to unify real-time analytics with large scale offline data processing - the first and only such system
See the Resources section of our main website www.tigergraph.com to find white papers and other technical reports about the TigerGraph system.
The TigerGraph Platform runs on standard, commodity-grade Linux servers. The core components (GSE and GPE) are implemented in C++ for optimal performance. TigerGraph system is designed to fit into your existing environment with a minimum of fuss.
Data Sources : The platform includes a flexible, high-performance data loader which can stream in tabular or semi-structured data, while the system is online.
Infrastructure : The platform is available for on-premises, cloud, or hybrid use.
Integration : REST APIs are provided to integrate your TigerGraph with your existing enterprise data infrastructure and workflow.
The figure below takes a closer look at the TigerGraph platform itself:
Within the TigerGraph system, a message-passing design is used to coordinate the activities of the components. RESTPP, an enhanced RESTful server, is central to the task management. Users can choose how they wish to interact with the system:
GSQL client. One TigerGraph instance can support multiple GSQL clients, on remote nodes.
GraphStudio - our graphical user interface, which provides most of the basic GSQL functionality, with a graphical and intuitive interface.
REST API. Enterprise applications which need to run the same queries many times can maximize their efficiency by communicating directly with RESTPP.
gAdmin is used for system adminstration.
This document compares what is included in the Enterprise Edition vs. TigerGraph Cloud of the TigerGraph platform.
To see what has been added or changed in different releases (versions) of TigerGraph, see Change Log.
Name
Refers to
DDL
Data Definition Language - a generic term for a set of commands used to define a database schema. The GSQL Language includes DDL commands. In GraphStudio, the Design Schema function.
Dictionary (DICT)
The shared storage space for storing metadata about the graph store's configuration and state, including the catalog (graph schema, loading jobs, and queries).
DML
Data Manipulation Language - a generic term for a set of commands used to add, modify, and delete data from a database. Query commands are often considered a part of DML, even a pure query statement does not manipulate (change) the data. The GSQL Language includes full DML capability for query, add (insert), delete, and modify (update) commands.
gadmin
The system utility for configuring and managing the TigerGraph System. Analogous to mysqladmin.
gbar
Graph Backup and Restore. TigerGraph's utility program for backing up and restoring system data.
GPE
Graph Processing Engine. The server component which accepts requests from the REST++ server for querying and updating the graph store and which returns data.
Graph Store
The component which logically and physically stores the graph data and provides access to the data in a fast and memory-efficient way. We use the term graph store to distinguish it from conventional graph databases.
GraphStudio UI
The browser-based User Interface that enables the user to interact with the TigerGraph system in a visual and intuitive way, as an alternative to the GSQL Shell. The GraphStudio UI includes the following components: Schema Designer, Data Mapper, Data Loader, Graph Explorer, and Query Editor.
GSE
Graph Storage Engine. The processing component which manages the Graph Store.
GSQL
The user program which interprets and executes graph processing operations, including (a) schema definition, (b) data loading, and (c) data updates, and (d) data queries.
GSQL Language
The language used to instruct and communicate with the GSQL program.
GSQL Shell
The interactive command shell which may be used when running the GSQL program.
HA
High Availability - a generic term describing a computer system which has been architected to a higher level of operational performance (e.g., throughput and uptime) than would be expected from a traditional single server node.
IDS
ID Service. A subcomponent of the GSE which translates between user (external) IDs for data objects and graph store (internal) IDs.
Kafka
A free open-source "high-throughput distributed messaging system" from the Apache Software Foundation. Our distributed system architecture is based on message passing/queuing. Kafka is automatically included during TigerGraph system installation as one implementation of messaging passing. https://kafka.apache.org/
MultiGraph
A graph architecture and feature set which enables one global graph to be viewed as multiple logical subgraphs, each with its own set of user privileges. The subgraphs can overlap, meaning each subgraph can support both shared and private data.
Native Parallel Graph
An architecture and technology which provides for inherently highly-parallel and highly-scalable graph data storage and analytics. The use of vertex-level data+compute functionality is a key component of Native Parallel Graph design.
Nginx
A free, open-source, high-performance HTTP server and reverse proxy. Nginx is automatically included during TigerGraph system installation. https://nginx.org/en/
REST++ or RESTPP
A server component which accepts RESTful requests from clients, validates the requests, invokes the GPE, and sends responses back to the client. Additionally, REST++ provides a zero-coding interface for users to define RESTful endpoints.REST++ offers easy-to-use APIs for customizing the logic of handling requests and processing responses.
Single Sign-On (SSO)
A user authentication service that permits a user to use one set of login credentials to access multiple applications.
TigerGraph Platform
The TigerGraph real-time graph data analytics software system. The TigerGraph Platform offers complete functionality for creating and managing a graph database and for performing data queries and analyses. The platform includes the Graph Store and GSE , GPE, REST++, GSQL, GraphStudio, plus some third-party components, such as Apache Kafka and Zookeeper.
TigerGraph System
The TigerGraph platform and its languages. Based on context, the term may also include additional optional TigerGraph components which have been installed.
TS3
TigerGraph System Service State (TS3) is a TigerGraph sub-system which helps monitor the TigerGraph system. It serves as backend of TigerGraph Admin Portal.
Zookeeper
A free open-source program from the Apache Software Foundation, providing "a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services." Used for running the TigerGraph system on a cluster or other distributed system. Zookeeper is automatically included during TigerGraph system installation. https://zookeeper.apache.org/
TigerGraph Cloud
Enterprise Edition
Licensing
Pay as you go by the minute; Annual contracts available
A free license is available with up to 50 GB storage after compression - contact us for a paid version without storage limitations
Includes
The power of TigerGraph, as a Service
+ Instant Deployment
+ Automatic backups
+ Scaling out & Replication
+ Security
+ Pay for what you use
All TigerGraph Database and Enterprise features, including + Distributed Graph + MultiGraph + Security + User Management
Support
Free version includes support from the Community Forum. Contact us to get professional support.
How to Get It
Feature
TigerGraph Cloud
Enterprise Edition
Native MPP Graph
Real-Time Deep Link Analytics
Ultra-Fast Loading and Updates
GSQL Query and Loading Language
SQL-like syntax and built-in parallelism
Graph Size
Unlimited
Unlimited
Compressed Data Store
In-Memory Processing; ACID Transactions
Distributed, Auto-Partitioned Graph
MultiGraph
Dynamic Schema Change
GraphStudio Visual SDK and UI Design, Load, Explore, Query, Visualize
Admin Portal
Feature
TigerGraph Cloud
Enterprise Edition
Automated Deployment
No
Automated Backups
No
Backup and Restore
Multiple Users
Continuous Availability
HA Replication
Cross-region replication
No
Cluster resizing
No
Feature
TigerGraph Cloud
Enterprise Edition
Data Encryption At Rest and In Motion
Enterprise User Management LDAP and SSO
Role-Based Access Control
User-defined roles
Audit compliance
Coming soon
Cloud Security:
VPC for each account
N/A
built-in cloud feature
through GSQL web shell
SOC 2 type 1 and type 2