If an error occurs in one of the TigerGraph system's components, it may issue an error code and/or an error message. If the system was handling a user request, the error code and message may be in the JSON response (see GSQL Query Output Format). The error information may be in a log file.
When the GSQL Server completes a request, it returns an Exit Code
When the GSQL Client completes a request, it returns an Exit Code
Exit Code
Description
0
No Error
211
Syntax Error
212
Runtime Error
213
No Graph
255
Unknown Error
Exit Code
Description
0
No Error
41
Login or Authentication Error
201
Wrong Argument Error
202
Connection Error
203
Compatibility Error
204
Session Timeout
212
Runtime Error
255
Unknown Error
If you have a problem with the procedure described in the TigerGraph Platform Installation Guide, please contact support@tigergraph.com and summarize your issue in the email subject.
Use the following command:
$ gsql --version
To see the version numbers of individual components of the platform:
$ gadmin version
Each release comes with documentation addressing how to perform an upgrade. Contact support@tigergraph.com for help in your specific situation. As of this writing (May 2018), the TigerGraph Platform is releasing version 2.0, which is a major revision.
If you correctly installed the system and are now logged in as the TigerGraph system user, you should be able to enter the GSQL shell by typing the gsql
command from an operating system prompt. If this command has never worked, then probably the installation was not successful. If it works but you are not sure what to do next, please see the GSQL Demo Examples guide.
If you believe you have installed the system correctly (e.g., you followed the TigerGraph Platform Installation Guide and received no errors, and the gsql
and gadmin
commands are now recognized), then please contact support@tigergraph.com and summarize your issue in the email subject.
Different servers are needed for different purposes, but the TigerGraph should automatically turn services on and off as needed. Please be sure that the Dictionary (dict) server is on when using the TigerGraph system:
To check the status of servers:
$ gadmin status
Yes. For the GSQL shell and language, first enter the shell (type gsql
from an operating system prompt). Then type the help
command, e.g.,
HELP
This gives you a short list of commands. Note that "help" itself is one of the listed commands; there are help options to get more details about BASIC
, QUERY
commands. For example,
HELP QUERY
lists the command syntax for queries. See the "System Basics" section of the GSQL Language Reference, Part 1: Defining Graphs and Loading Data. The gadmin
administration tool also has a help menu and a manual page:
$ gadmin -h
$ man gadmin
User-defined identifiers are case-sensitive. For example, the names User
and user
are different. The GSQL language keywords (e.g., CREATE, LOAD, VERTEX) are not case-sensitive, but in our documentation examples, we generally show keywords in ALL CAPS to make them easy to distinguish.
An identifier consists of letters, digits, and the underscore. Identifiers may not begin with a digit. Identifiers are case sensitive. Special naming rules apply to accumulators (see the Query section).
The general rule is that string literals within the GSQL language are enclosed in double quotation marks. For data that is to be imported (not yet in the GSQL data store), the GSQL loading language lets the user specify how data fields are delimited within your input files. The loading language has an option to specify whether single quotes or double quotes are used to mark strings. For more help on loading, see the "Loading Data" section of this document or of the GSQL Language Reference, Part 1: Defining Graphs and Loading Data .
Yes. You can create a text file containing a sequence of GSQL commands and then execute that file. To execute from outside the shell:
To execute the command file from within the shell:
See also the "Language Basics" and "System Basics" sections of the GSQL Language Reference, Part 1: Defining Graphs and Loading Data document.
Yes. Normally, an end-of-line character triggers execution of a line. You can use the BEGIN
and END
keywords to mark off a multi-line block of text that should not be executed until END
is encountered.
This is an example of a loading statement split into multiple lines using BEGIN and END:
When a license limit has been reached, your system will be placed in a read-only mode - incapable of loading anymore data. You will still be able to delete data and view the graph.
A TigerGraph graph schema consists of (A) one or more vertex types, (B) one or more edge types, and (C) a graph type. Each edge type is defined to be either DIRECTED or UNDIRECTED. The graph type is simply the list of vertex types and edges types which may exist in the graph. For more: See the section "Defining a Graph Schema" in the GSQL Language Reference, Part 1: Defining Graphs and Loading Data . Below is an example of a graph schema containing two vertex types, one edge type, and one graph type:
Alternately, a generic CREATE GRAPH statement can be used:
Property graphs can model data fields ("properties") as either a property of a vertex or edge or as a vertex linked to other vertices. If your property relates to an edge, it should be an attribute of that edge (for example, a Date attribute of a CustomerBoughtProduct edge). If your property relates to a vertex, you have a choice. The optimal choice depends on how you will typically use this attribute in your application. If you will frequently search or filter based on that data, we suggest your treat it as a separate vertex type. Otherwise, we recommend modeling this data as an attribute of the principal vertex.
Each attribute of a vertex or edge has an assigned data type. v0.8 of the TigerGraph adds support for many more attribute types.: DATETIME, UDT, and container types LIST, SET, and MAP. The following is an abbreviated list. For a complete list and description, see the section "Attribute Data Types" of the GSQL Language Reference, Part 1: Defining Graphs and Loading Data .
Discontinued Feature
The UINT_SET and STRING_SET COMPRESS types have been discontinued since there is now equivalent functionality from the more general SET and SET types.
The TigerGraph MultiGraph service, an add-on option, supports logical partitions of one unified global graph. Each partition is treated as an independent local graph, with its own set of user privileges. Local graphs can overlap, to create a shared data space.
For performance reasons, we recommend to keep the number of different vertex and edge types under 5,000. The upper limit for the number of different vertex and edge types is approximately 10,000, depending on the complexity of the types.
From within the GSQL Shell, the ls
command lists the catalog : the vertex type, edge type, and graph type definitions, job definitions, query definitions, and some system configuration settings. If you have not set your active graph, then ls
will show only item which have global scope. To see graph-specific items (including loading jobs and queries), you must define an active graph.
The GSQL language includes ADD, ALTER, and DROP commands. See the section "Update Your Data" in the GSQL Demo Examples or the section "Modifying a Graph Schema" in the GSQL Language Reference, Part 1: Defining Graphs and Loading Data for details. Note that altering the graph schema will invalidate your old data loading and query jobs. You should create and install new loading and query jobs.
To delete your entire catalog, containing not just your vertex, edge, and graph type definitions, but also your loading job and query definitions, use the following command:
GSQL>
DROP ALL
To delete just your graph schema, use the DROP GRAPH command:
GSQL>
DROP GRAPH g1
UPDATE Deleting the graph schema also erases the contents of the graph store. To erase the graph store without deleting the graph schema, use the following command:
GSQL>
CLEAR GRAPH STORE
See also " How do I erase all data? "
See the GSQL Demo Examples for introductory examples. See GSQL Language Reference, Part 1: Defining Graphs and Loading Data for the complete specifications. We also have a cheatsheet; go to doc.tigergraph.com.
Which loading method should I use?
Created 01 May 2018
Beginning with v2.0, the TigerGraph platform introduces an extended syntax for defining and running loading jobs which offers several advantages:
The TigerGraph platform can handle concurrent loading jobs, which can greatly increase throughput.
The data file locations can be specified at compile time or at run time. Run-time settings override compile-time settings.
A loading job definition can include several input files. When running the job, the user can choose to run only part of the job by specifying only some of the input files.
Loading jobs can be monitored, aborted, and restarted.
The syntax for pre-v2.0 online loading and offline loading will be supported through v2.x, but they are deprecated.
Why has Offline loading been deprecated?
Online loading is preferred. Online loading can do everything that offline loading can do, plus it has the following advantages:
Can run while other graph operations are in progress.
Uses multithreaded execution for faster performance.
Does not need to turn the GPE off, which saves time.
Its data source is specified at run time rather than at compile time.
Can add data to an existing graph.
Offline loading is deprecated and is now being emulated by online loading. Therefore, there is no performance difference.
Here are the main differences between the styles. Note that v2.0 is superficially similar to old offline loading, but it also supports run-time filenames. The actual behavior and performance of v2.0 loading is online loading with concurrency.
See the offline2online command, described in GSQL Language Reference, Part 1: Defining Graphs and Loading Data .
The GSQL data loader reads text files organized in tabular or JSON format . Each field may represent numeric, boolean, string, or binary data. Each data field may contain a single value or a list of values (see How do I split a data field containing a list of values into separate vertices and edges? ).
Each tabular input data file should be structured as a table, in which each line represents a row, and each row is a sequence of data fields, or columns. A data field can contain string or numeric data. To represent boolean values, 0 or 1 is expected. A header line may be included, to associate a name with each column. A designated character separates columns. For example, if the designated separator character is the comma, this format is commonly called CSV, for Comma-Separated Values. Below is an example of a CSV file with a header. The uid
column is int type, name
is string type, avg_score
is float type, and is_member
is boolean type. See simple examples in Real-Life Data Loading and Querying Examples and a complete specification in the section "Creating a Loading Job" in GSQL Language Reference, Part 1: Defining Graphs and Loading Data .
The loader does not filter out extra white space (spaces or tabs). The user should filter out extra white space from the files before loading into the TigerGraph system.
The data field (or token ) separator can be any single ASCII character, including one of the non-printing characters. The separator is specified with the SEPARATOR phrase in the USING clause. For example, to specify the semicolon as the separator:
USING SEPARATOR=";"
To specify the tab character, use \t
. To specify any ASCII character, use \nn
where nn
is the character's ASCII code, in decimal. For example, to specify ASCII 30, the Record Separator (RS):
USING SEPARATOR="\30"
TigerGraph does not require fields to be enclosed in quotation marks, but is it recommended for string fields. If the QUOTE option is enabled, and if the loader finds a pair of quotation marks, then the loader treats the text within the quotation marks as one value, regardless of any separation characters that may occur in the value. The user must specify whether strings are marked by single quotation marks or double quotation marks.
USING QUOTE="single"
or
USING QUOTE="double"
For example, if SEPARATOR=","
and QUOTE="double"
are set, then when the following data are read,
"Lee, Tom" will be read as a single field. The comma between Lee and Tom will not separate the field.
No. You must specify either QUOTE="single"
or QUOTE="double"
.
The following three parameters should be considered for every loading job from a tabular input file:
The next two parameters, FILENAME and EOL are required if the job is an ONLINE_POST job:
All of the these five parameters are combined into one USING clause with a list of parameter/value pairs. The parameters may appear in any order.
The location of the USING clause depends on whether the job is an offline loading job or an online loading job. For offline loading, the USING clause appears at the end of the LOAD statement. For example:
For online loading, the USING clause appears at the end of the RUN statement
You can define a header line (a sequence of column names) within a loading job using a DEFINE HEADER statement, such as the following:
This statement must appear before the LOAD statement that wishes to use the header definition. Then, the LOAD statement must set the USER_DEFINED_HEADER parameter in the USING clause. A brief example is shown below:
Input data fields can always be referenced by position. They can also be referenced by name, if a header has been defined.
Position-based reference: The leftmost field is $0
, the next one is $1
, and so on.
Name-based reference: $"name"
, where name
is one of the header column names.
For example, if the header is
abc,def,ghi
then the third field can be referred to as either $2
or $"ghi"
.
First, to clarify the task, consider a graph schema with two vertex types, Book and Genre, and one edge type, book_genre:
Further, each row of the input data file contains three fields: bookcode , title , and genres , where genres is a list of strings associated with the book. For example, the first few lines of the data file could be the following:
The data line for bookcode 101 should generate one Book instance ("Harry Potter and the Philosopher's Stone"), four Genre instances ("fiction", "adventure", "fantasy", "young adult"), and four Book_Genre instances, connecting the Book instance to each of the Genre instances. This process of creating multiple instances from a list field (e.g., the genres field) is called flattening .
To flatten the data, we use a two-step load. The first LOAD statement uses the flatten() function to split the multi-value field and stores the results in a TEMP_TABLE. The second LOAD statement takes the TEMP_TABLE contents and writes them to the final edge type.
The flatten function has three arguments: (field_to_split, separator, number_of_parts_in_one_field). In this example, we want to split $2 (genres), the separator is the comma, and each field has only 1 part. So, the flatten function is called with the following arguments:flatten($2, ",",1)
. Using the example of data file , TEMP_TABLE t1 will then contain the following:
The second LOAD statement uses the TEMP_TABLE t1 to generates Genre vertex instances and book_genre_instances. While there are 7 rows shown in the sample TEMP_TABLE, only 6 Genre vertices will be generated, because there are only 6 unique values; "Fiction"
appears twice. Seven book_genre edges will be generated, one for each row in the TEMP_TABLE.
There is another version of the flatten function which has four arguments and which supports a two-level grouping. That is, the field contains a list of groups, each group composed of N subfields. The arguments are (field_to_split, group_separator, sub_field_separator, number_of_parts_in_one_group). For example, suppose the data line were organized this way instead:
Then the following loading statements would be appropriate:
Yes. Use online loading. Specifically, online loading lets you define a general loading process without naming the data source. Every time you call an online loading job, you name the source file. It can be a different file each time, or it can be the same file, if the contents of the file are changing over time. Also, if it happens that the loader re-reads a data line that it has encountered before, it will just reload the data (except for container attributes, e.g., a LIST attribute, using a reduce() loading function. In that case, there is an accumulative effect for re-reading a data line).
The GSQL Loading includes some built-in token functions (a token is one column or field of a data input line.) A user can also define custom token functions. Please see the section "Built-In Loader Token Functions" in the GSQL Language Reference, Part 1: Defining Graphs and Loading Data .
No. One of the advantages of the TigerGraph loading system is the flexible relationship between input files and resulting vertex and edge instances. In general, there is a many-to-many relationship: one input file can generate many vertex and edge types.
From the LOAD statement perspective for a online loading job:
Each LOAD statement refers to one input file.
Each LOAD statement can have one or more resulting vertex types and one or more resulting edge types.
Hence, one LOAD statement can potentially describe the one-to-many mapping from one input file to many resulting vertex and edge types.
It is not necessary for every input line to always generate the same set of vertex types and edge types. The WHERE clause in each TO VERTEX | TO EDGE clause can be used to selectively choose and filter which input lines generate which resulting types.
This not an error. There can only be one instance of a certain edge type between any given pair of vertices, so the most recently loaded edge data will be the edge that you will see in the graph.
If there is already data in the graph store and you wish to insert more data, you have a few options. First, if you have bulk data stored in a file (local disk, remote or distributed storage), you can us e Online Loading .
Second, if you have a few specific insertions, you can use the Upsert da ta command in the RESTPP API User Guide . For Upsert, the data must be formatted in JSON format.
Third, you can write a query containing INSERT statements. The syntax is similar to SQL INSERT. (See GSQL Language Reference Part 2 - Querying . ) The advantage of query-based INSERT is that the details (id values and attribute values) can be determined at run time and even can be based on an exploration and analysis of the existing graph. The disadvantage is that the query-insert job must be compiled first and data values must either be hardcoded or supposed as input parameters.
You can modify the schema in several ways:
Add new vertex or edge types
Drop existing vertex or edge types
Add or drop attributes from an existing vertex or edge type
Any schema change can invalidate existing loading jobs and queries.
See the section "Modifying a Graph Schema" in GSQL Language Reference Part 1 - Defining Graphs and Loading Data .
To make a known modification of a known vertex or edge: Option 1) Make a RESTPP endpoint request, to the POST /graph or DELETE /graph endpoint. See the RESTPP API User Guide .
Option 2) The Loading language includes an upsert command. The UPSERT statement performs a combined modify-or-add operation, depending on whether the indicated vertex or edge already exists. Examples of UPSERT are described in the GSQL Demo Examples document. The GSQL Language Reference Part 1 - Defining Graphs and Loading Data provides a full specification .
Option 3) The query language now includes an UPDATE statement which enables sophisticated selection of which vertices and edges to update and how to update them. Likewise, there is an INSERT statement in the query language. See the GSQL Language Reference Part 2 - Querying .
You can write a query which selects vertices or edges to be deleted. See the DELETE subsections of the "Data Modification Statements" section in GSQL Language Reference Part 2 - Querying .
If you wish to completely clear all the data in the graph store, use the CLEAR GRAPH STORE -HARD
command. Be very careful using this command; deleted data cannot be restored (except from a Backup). Note that clearing the data does not erase the catalog definitions of vertex, edge, and graph types. See also " How do I delete my entire graph schema? "
-HARD must be in all capital letters.
Yes. The GSQL Query Language is a full-featured graph query-and-data-computation language. In addition, there is a small lightweight set of built-in query commands that can inspect the set of stored vertices and edges, but these built-in commands do not support graph traversal (moving from one vertex to another via edges). We refer to this as the Standard Data Manipulation API or the Built-in Query Language (described in RESTPP API User Guide and the GSQL Demo Examples )
For a first-time user: See the documents GSQL Demo Examples and then GSQL Language Reference Part 2 - Querying . For users with some experience, a reference card is now available: GSQL Query Language Reference Card.
The GSQL Query Language supports powerful graph querying, but it is also designed to perform powerful computations. GSQL is Turing-complete, so it can be considered a programming language. It can be used for simple SQL-like queries, but it also features control flow (IF, WHILE, FOREACH), procedural calls, local and global variables, complex data types, and accumulators to enable much more sophisticated use.
Three new types were introduced in v0.8: GroupByAccum, BitwiseAndAccum, and BitwiseOrAccum. Version 0.8.1. added ArrayAccum. This is a quick summary. For a more detailed explanation, see the "Accumulator Types" section of GSQL Language Reference Part 2 - Querying .
In the following table, baseType means any of the following: INT, UINT, FLOAT, DOUBLE, STRING, BOOL, VERTEX, EDGE, JSONARRAY, JSONOBJECT, DATETIME
See the section "Accumulators" in the GSQL Language Reference Part 2 - Querying document.
Vertex and edge IDs (i.e., the unique identifier for each vertex or edge) are treated differently than user-defined attributes. Special keywords must be used to refer to the PRIMARY_ID, FROM, or TO id fields.
Vertices :
In a CREATE VERTEX statement, the PRIMARY_ID is required and is always listed first. User-defined attributes are optional and come after the required ID fields.
In a built-in query, if you wish to select vertices by specifying an attribute value, you use the attribute name (e.g., title):
In contrast, if you wish to reference vertices by the id value, the lowercase keyword primary_id
must be used. Note that that query does not use the id name pid
.
Edges :
In a CREATE EDGE statement, the FROM and TO vertex identifiers are required and are always listed first. The FROM and TO values should match the PRIMARY_ID values of a source vertex and a target vertex. In the example below, rating
and date_time
are user-defined optional attributes.
In a query, if you wish to select edges by specifying their FROM or TO vertex values, you must use the lowercase keywords from_id or to_id .
The data are in JSON format. See the section "Output Statements" in the GSQL Language Reference Part 2 - Querying .
Yes. The maximum output size for a query is 2GB. If the result of a query would be larger than 2GB, the system may return no data. No error message is returned.
Also, for built-in queries (using the Standard Data Manipulation REST API), queries return at most 10240 vertices or edges.
INSTALL QUERY query_name is required for each GSQL query, after its initial CREATE QUERY query_name statement and before using RUN QUERY query_name . After INSTALL query has been executed, RUN QUERY can now be used.
Anytime after INSTALL QUERY, another statement, INSTALL QUERY -OPTIMIZE can be executed once. This operation optimizes all previously installed queries, reducing their run times by about 20%.
Optimize a query if query run time is more important to you than query installation time. The initial INSTALL QUERY operation runs quickly. This is good for the development phase.
The optional additional operation INSTALL QUERY -OPTIMIZE will take more time, but it will speed up query run time. This makes sense for production systems.
Legal:
Illegal:
In short, yes. They will not be executed at the same time, but the installations will be queued by the order in which they were received.
Yes. A ListAccum is like an array, a 1-dimensional array. If you nest ListAccums as the elements within an outer ListAccum, you have effectively made a 2-dimensional array. Please read Section "Nested Accumulators" in the GSQL Language Reference Part 2 - Querying for more details. Here is an example:
Yes, please read Section "Nested Accumulators" in the GSQL Language Reference Part 2 - Querying for more details. There are seven types of container accumulators: ListAccum, SetAccum, BagAccum, MapAccum, ArrayAccum HeapAccum, and GroupByAccum. Here the allowed combinations:
ListAccum can contain ListAccum.
MapAccum and GroupByAccum can contain any container accumulator except HeapAccum.
ArrayAccum is always nested.
Here is an example:
Please check out the Troubleshooting Guide for more information regarding TigerGraph system log files, and what to look for.
To write a loading job, you must know the format of the input data files, so that you can describe to GSQL how to parse each data line and convert it into vertex and edge attributes. To validate a loading job, that is, to check that the actual input data meet your expectations, and that they produce the expected vertices and edges, you can use two features of the RUN JOB command: the -DRYRUN option and loading a specified range of data lines.
The full syntax for an (offline) loading job is the following:
RUN JOB [-DRYRUN] [-n [
first_line_num
,]
last_line_num
]
job_name
The -DRYRUN
option will read input files and process data as instructed by the job, but it does not store data in the graph store.
The -n
option limits the loading job to processing only a range of lines of each input data file. The selected data will be stored in the graph store, so the user can check the results. The -n flag accepts one or two arguments. For example,
-n 50
means read lines 1 to 50.
-n 10,50
means read lines 10 to 50.
The special symbol $ is interpreted as "last line", so -n 10,$
means reads from line 10 to the end.
The following command lists the log locations of the log files:
If the platform has been installed with default file locations, so that <TigerGraph_root_dir> = /home/tigergraph/tigergraph, then the output would be the following:
As of v2.4, the GSQL log files have been moved in order to keep all logs in a standard directory.
GPE : general system performance logs. GSE : Graph services logs. RESTPP : REST API call logs. GSQL : General GSQL logs.
Each loading run creates a log file, stored in the folder <TigerGraph_root_dir>/dev/gdk/gsql/output. The filename load_output.log is a link to the most recent log file. This file contains summary statistics on the number of lines read, the vertices created, and various types of errors encountered. Or, you can type a shell command to find log paths "gadmin log".
The log files record detailed internal operations and state information in response to user actions. They provide vital information for diagnosing and debugging your system. All log files can be found in the /home/tigergraph/tigergraph/logs directory. Through typing the command gadmin log, you will be given all the file paths of the most commonly used log files.
GPE Logs - Graph Processing Engine Logs GSE Logs - Graph Storage Engine Logs GSQL Logs - System & Query Logs RESTPP Logs - API call Logs NGINX Logs - HTTP Request Logs VIS Logs - GraphStudio Logs
One possible explanation is that you have reached a capacity limit controlled by your product license. To check if this is the case, run the command gadmin status. If the limit has been reached, there will be a warning message, such as the following:
In Limited Capacity mode, additional data may not be inserted. Data may be queried and deleted.
The Troubleshooting Guide teaches you how check on the status of your TigerGraph system, and when needed, how to find the log files in order to get a better understanding of why certain errors are occurring. This section covers log file debugging for data loading and querying.
Before any deeper investigation, always run these general system checks :
The following command reveals the location of the log files :
You will be presented with a list of log files. The left side of the resulting file paths is the component for which the respective log file is logging information. The majority of the time, these files will contain what you are looking for. You may notice that there are multiple files for each TigerGraph component.
The .out file extension is for errors. The .INFO file extension is for normal behaviors.
In order to diagnose an issue for a given component, you'll want to check the .out log file extension for that component.
Other log files that are not listed by the gadmin log
command are those for Zookeeper and Kafka, which can be found here:
To aid in the effort of system debugging, there is a tool you can use to collect all relevant log files from around the time of a system malfunction or error. Collection of these files greatly improves the efficiency of the support process, as this minimizes the need to access a customer environment to diagnose issues remotely. This will also avoid delayed restart of services.
Here is the relevant information from the TigerGraph servers that will be collected when running the gcollect
command.
OS version (cat /etc/*release)
TigerGraph version (gadmin version)
Services status (gadmin status)
Services status history (From ts3)
Free disk space (df -lh)
Free memory (free -g)
OOM information (dmesg | grep -i oom)
gstore yaml files (gstore/0/part/config.yaml, gstore/0/1/ids/graph_config.yaml)
kafka log files (kafka/kafka.out)
zk log files (zk/zookeeper.out.*)
nginx log files (logs/nginx/nginx_*.access.log)
restpp log files (logs/RESTPP_*_1/log.INFO, logs/restpp/restpp_*_1.out)
restpp loader log files (logs/RESTPP-LOADER_*_1/log.INFO)
gpe log files (logs/GPE_*_1/log.INFO, logs/gpe/gpe_*_1.out)
gse log files (logs/GSE_*_1/log.INFO, logs/gse/gse_*_1.out)
gsql log (~/tigergraph/logs/gsql_server_log/GSQL_LOG)
fab logs (~/tigergraph/.gsql/fab_dir/cmd_logs)
ium config (gadmin --dump-config)
GSQL *catalog*: (mkdir catalog && gadmin __pullfromdict catalog)
admin server (logs/admin_server/gadmin_server.out, logs/admin_server/gadmin_server.INFO)
gdict logs(logs/dict/gdict_client_.INFO, logs/dict/gdict_server.INFO)
tsar log (/var/log/tsar.data*)
ts3 DB (~/tigergraph/data/ts3.db)
restpp loader logs (logs/restpp/restpp_loader_logs)
GST logs (logs/gui/gui_INFO.log)
kafka connector (kafka/kafka-connect.out)
kafka stream (kafka/kafka-stream.out)
All the log files will be printed in the output directory, specified when running the gcollect
command, and each node has a subdirectory. Each component will have one or two log files.
The installation will quit if there are any missing dependency packages, and output a message. Please run bash install_tools.sh
to install all missing packages. You will need internet connection to install the missing dependencies.
Using the -x flag during installation will show you the detailed shell commands being run during installation.
bash -x install.sh
The /home directory requires at least 200MB of space, or the installation will fail with an out of disk message. This is temporary only during installation and will be moved to the root directory once installation is complete.
The /tmp directory requires at least 1GB of space, or the installation will fail with an out of disk message
The directory in which you choose to install TigerGraph requires at least 20GB of space, otherwise the installation will report the error and exit.
If your firewall blocks all ports not defined for use, we recommend opening up internal ports 1000-50000.
If you are using a cloud instance, you will need to configure the firewall rules through the respective consoles.
e.g. Amazon AWS or Microsoft Azure
If you are managing a local machine, you can manage your open ports using the iptables
command. Please refer to the example below to help with your firewall configuration.
To better help you understand the flow of a query within the TigerGraph system, we've provided the diagram below with arrows showing the direction of information flow. We'll walk through the execution of a typical query to show you how to observe the information flow as recorded in the log files.
From calling a query to returning the result, here is how the information flows: 1. Nginx receives the request.
You can click on the image below to expand it.
2. Nginx sends the request to Restpp.
3. Restpp sends an ID translation task to GSE and a query request to GPE. 4. GSE sends the translated ID to GPE, and the GPE starts to process the query. 5. GPE sends the query result to Restpp, and sends a translation task to GSE, which then sends the translation result to Restpp.
6. Restpp sends the result back to Nginx.
7. Nginx sends the response.
Multiple situations can lead to slower than expected query performance:
Insufficient Memory
When a query begins to use too much memory, the engine will start to put data onto the disk, and memory swapping will also kick in. Use the Li command: free -g
to check available memory and swap status. To combat this, you can either optimize the data structure used within the query or increase the physical memory size on the machine.
GSQL Logic Usually, a single server machine can process up to 20 million edges per second. If the actual number of vertices or edges is much much lower, most of the time it can be due to inefficient query logic. That is, the query logic is now following the natural execution of GSQL. You will need to optimize your query to tune the performance.
Disk IO
When the query writes the result to the local disk, the disk IO may be the bottleneck for the query's performance. Disk performance can be checked with this Linux command : sar 1 10
.
If you are writing (PRINT) one line at a time and there are many lines, storing the data in one data structure before printing may improve the query performance.
Huge JSON Response If the JSON response size of a query is too massive, it may take longer to compose and transfer the JSON result than to actually traverse the graph. To see if this is the cause, check the GPE log.INFO file. If the query execution is already completed in GPE but has not been returned, and CPU usage is at about 200%, this is the most probable cause. If possible, please reduce the size of the JSON being printed.
Memory Leak This is a very rare issue. The query will progressively become slower and slower, while GPE's memory usage increases over time. If you experience these symptoms on your system, please report this to the TigerGraph team.
Network Issues When there are network issues during communication between servers, the query can be slowed down drastically. To identify that this is the issue, you can check the CPU usage of your system along with the GPE log.INFO file. If the CPU usage stays at a very low level and GPE keeps printing ??? , this means network IO is very high.
Frequent Data Ingestion in Small Batches Small batches of data can increase the data loading overhead and query processing workload. Please increase the batch size to prevent this issue.
When a query hangs, or seems to run forever, it can be attributed to these possibilities :
Services are down
Please check that TigerGraph services are online and running. Run gadmin status
and possibly check the logs for any issues that you find from the status check.
Query infinite loop
To verify this is the issue, check the GPE log.INFO file to see if graph iteration log lines are continuing to be produced. If they are, and the edgeMaps log the same number of edges every few iterations, you have an infinite loop in your query.
If this is the case, please restart GPE to stop the query : gadmin restart gpe -y
.
Proceed to refine your query and make sure your loops within the query are able to break out of the loop.
GraphStudio Error If you are running the query from GraphStudio, the loading bar may continue spinning as if the query has not finished running. You can right-click the page and select inspect->console (in the Google Chrome browser) and try to find any suspicious errors there.
If a query runs and does not return a result, it could be due to two reasons: 1. Data is not loaded. From the Load Data page on GraphStudio, you are able to check the number of loaded vertices and edges, as well as a number of each vertex or edge type. Please ensure that all the vertices and edges needed for the query are loaded.
2. Properties are not loaded. The number of vertices and edges traversed can be observed in the GPE log.INFO file. If for one of the iterations you see activated 0 vertices, this means no target vertex satisfied your searching condition. For example, the query can fail to pass a WHERE clause or a HAVING clause. If you see 0 vertex reduces while the edge map number is not 0, that means that all edges have been filtered out by the WHERE clause, and that no vertices have entered into the POST-ACCUM phase. If you see more than 0 vertex reduces, but activated 0 vertices, this means all the vertices were filtered out by the HAVING clause.
To confirm the reasoning within the log file, use GraphStudio to pick a few vertices or edges that should have satisfied the conditions and check their attributes for any unexpected errors.
Query Installation may fail for a handful of reasons. If a query fails to install, please check the GSQL log file. The default location for the GSQL log is here :
If you have a c++ user-defined function error, your query will fail to install, even if it does not utilize the UDF.
The following example shows the system free memory is 69%
Using GraphStudio, you are able to see, from a high-level, a number of errors that may have occurred during the loading. This is accessible from the Load Data page. Click on one of your data sources, then click on the second tab of the graph statistics chart. There, you will be able to see the status of the data source loading, number of loaded lines, number of lines missing, and lines that may have an incorrect number of columns. (Refer to picture below.)
If you see there are a number of issues from the GraphStudio Load Data page, you can dive deeper to find the cause of the issue by examining the log files. Check the loading log located here:
Open up the latest .log file and you will be able to see details about each data source. The picture below is an example of a correctly loaded data file.
Here is an example of a loading job with errors :
From this log entry, you are able to see the errors being marked as lines with invalid attributes. The log will provide you the line number from the data source which contains the loading error, along with the attribute it was attempting to load to.
Normally, a single server running TigerGraph will be able to load from 100k to 1000k lines per second, or 100GB to 200GB of data per hour. This can be impacted by any of the following factors:
Loading Logic How many vertices/edges are generated from each line loaded?
Data Format Is the data formatted as JSON or CSV? Are multi-level delimiters in use? Does the loading job intensively use temp_tables?
Hardware Configuration Is the machine set up with HDD or SSD? How many CPU cores are available on this machine?
Network Issue Is this machine doing local loading or remote POST loading? Any network connectivity issues?
Size of Files How large are the files being loaded? Many small files may decrease the performance of the loading job.
High Cardinality Values Being Loaded to String Compress Attribute Type How diverse is the set of data being loaded to the String Compress attribute?
To combat the issue of slow loading, there are also multiple methods:
If the computer has many cores, consider increasing the number of Restpp load handlers.
Separate ~/tigergraph/kafka
from ~/tigergraph/gstore
and store them on separate disks.
Do distributed loading.
Do offline batch loading.
Combine many small files into one larger file.
When a loading job seems to be stuck, here are things to check for :
GPE is DOWN
You can check the status of GPE with this command : gadmin status gpe
If GPE is down, you can find the logs necessary with this command : gadmin log -v gpe
Memory is full
Run this command to check memory usage on the system : free -g
Disk is full
Check disk usage on the system : df -lh
Kafka is DOWN
You can check the status of Kafka with this command : gadmin status kafka
If it is down, take a look at the log with this command : vim ~/tigergraph/kafka/kafka.out
Multiple Loading Jobs By default, the Kafka loader is configured to allow a single loading job. If you execute multiple loading jobs at once, they will run sequentially.
If the loading job completes, but data is not loaded, there may be issues with the data source or your loading job. Here are things to check for:
Any invalid lines in the data source file. Check the log file for any errors. If an input value does not match the vertex or edge type, the corresponding vertex or edge will not be created.
Your loading job loads edges in the incorrect order. When you defined the graph schema, the from and to vertex order will affect the way you write the loading job. If you wrote the loading job in reversed order, the edges will not be created, possibly also affecting the population of vertices.
First, check the logs for important clues.
Are you reaching and reading all the data sources (paths and permissions)?
Is the data mapping correct?
Are your data fields correct? In particular, check data types. For strings, check for unwanted extra strings. Leading spaces are not removed unless you apply an optional token function to trim the extra spaces.
If you know what data you expect to see (number of vertices and edges, and attribute values), but the loaded data does not mean your expectations, there are a number of possible causes to investigate:
Possible causes of a loading job failure are:
Loading job timed out If a loading job hangs for 600 seconds, it will automatically time out.
Port Occupied Loading jobs require port 8500. Please ensure that this port is open.
Understanding what happens behind the scenes during a schema change.
DSC (Dynamic Schema Change) Drain - Stops the flow of traffic to RESTPP and GPE If GPE receives a DRAIN command, it will wait 1 minute for existing running queries to finish up. If the queries do not finish within this time, the DRAIN step will fail, causing the schema change to fail.
DSC Validation - Verification that no queries are still running.
DSC Apply - Actual step where the schema is being changed.
DSC Resume - Traffic resumes after schema change is completed. Resume will automatically happen if a schema change fails. RESTPP comes back online. All buffered query requests will go through after RESTPP resumes, and will use the new updated schema.
Schema changes are not recommended for production environments. Even if attributes are deleted, TigerGraph's engine will still scan all previous attributes. We recommend limiting schema changes to dev environments.
Schema changes are all or nothing. If a schema change fails in the middle, changes will not be made to the schema.
Failure when creating a graph
Global Schema Change Failure
Local Schema Change Failure
Dropping a graph fails
If GPE or RESTPP fail to start due to YAML error, please report this to TigerGraph.
If you encounter a failure, please take a look at the GSQL log file : gadmin log gsql
. Please look for these error codes:
Error code 8 - The engine is not ready for the snapshot. Either the pre-check failed or snapshot was stopped. The system is in critical non-auto recoverable error state. Manual resolution is required. Please contact TigerGraph support.
Error code 310 - Schema change job failed and the proposed change has not taken effect. This is the normal failure error code. Please see next section for failure reasons.
Another schema change or a loading job is running. This will cause the schema change to fail right away.
GPE is busy. Potential reasons include :
Long running query.
Loading job is running.
Rebuild process is taking a long time.
Service is down. (RESTPP/GPE/GSE)
Cluster system clocks are not in sync. Schema change job will think the request is stale, causing this partition's schema change to fail.
Config Error. If the system is shrunk manually, schema change will fail.
You will need to check the logs in this order : GSQL log, admin_server log, service log.
Admin_server log files can be found here : ~/tigergraph/logs/admin_server/
You will want to take a look at the INFO file.
The service log is each of the services respectively. gadmin log <service_name>
will show you the location of these log files.
In this case, we see that RESTPP failed at the DRAIN stage. We need to first look at whether RESTPP services are all up. Then, verify that the time of each machine is the same. If all these are fine, we need to look at RESTPP log to see why it fails. Again, use the "DSC" keyword to navigate the log.
If the GSE process fails to start, it is usually attributed to a license issue, please check these factors :
License Expiration
gadmin status license
This command will show you the expiration date of your license.
Single Node License on a Cluster If you are on a TigerGraph cluster, but using a license key intended for a single machine, this will cause issues. Please check with your point of contact to see which license type you have.
Graph Size Exceeds License Limit Two cases may apply for this reason. The first reason is you have multiple graphs but your license only allows for a single graph. The second reason is that your graph size exceeds the memory size that was agreed upon for the license. Please check with your point of contact to verify this information.
Usually in this state, GSE is warming up. This process can take quite some time depending on the size of your graph.
<INCLUDE PROCESS NAME SHOWING CPU USAGE TO VERIFY THE "WARMING UP" STATE>
Very rarely, this will be a ZEROMQ issue. Restarting TigerGraph should resolve this issue
gadmin restart -y
GSE crashes are likely due to and Out Of Memory issue. Use the dmesg -T
command to check any errors.
If GSE crashes, and there are no reports of OOM, please reach out to TigerGraph support.
If your system has unexpectedly high memory usage, here are possible causes :
Length of ID strings is too long GSE will automatically deny IDs with a length longer than 16k. Memory issues could also arise if an ID string is too long ( > 500). One proposed solution to this is to hash the string.
Too Many Vertex Types Check the number of unique vertex types in your graph schema. If your graph schema requires more than 200 unique vertex types, please contact TigerGraph support.
If your browser crashes or freezes (shown below), please refresh your browser.
If you suspect GraphStudio has crashed, first check gadmin status
to verify all the components are in good shape. Two known causes of GraphStudio crashes are :
Huge JSON response
User-written queries often return very large JSON responses. There is a JSON size limiter, but this could still potentially cause an issue.
This issue can be mitigated by editing he maximum response size in this file : ~/tigergraph/visualization/server/src/config/local.json
The value you want to change is :
"responseSizeLimit": 33554432,
Very Dense Graph Visualization On the Explore Graph page, the "Show All Paths" query on a very dense graph is known to cause a crash.
To find the location of GraphStudio log files, use this command : gadmin log vis
Allowing GraphStudio DEBUG mode will print out more information to the log files. To allow DEBUG mode, please edit the following file : /home/tigergraph/tigergraph/visualization/server/src/config/local.json
After editing the file, run gadmin restart vis -y
to restart the GraphStudio service. Follow along the log file to see what is happening : tail -f /home/tigergraph/tigergraph/logs/gui/gui_INFO.log
Repeat the error inducing operations in GraphStudio and view the logs.
Query is still running, it is just slow If you have a very large graph, please be patient. Ensure that there is no infinite loop in your query, and refer to the section for possible causes.
Go down to the last error and it will point you to the error. This will show you any query errors that could be causing the failed installation. If you have created a , you could potentially have a c++ compilation error.
Using quotes in the data file may cause interference with the tokenization of elements in the data file. Please check the GSQL Language Reference section under . Look for the QUOTE parameter to see how you should set up your loading job.
Do you have duplicate ids, resulting in the same vertex or edge being loading more than once. Is this intended or unintended? TigerGraph's default loading semantics is UPSERT. Check the loading documentation to maker sure you understand the semantics in detail:
This section will only cover the debugging schema change jobs, for more information about schema changes, please read the page.
To check the status of GSE, and all other processes, run gadmin status
to show the status of key TigerGraph processes. As with all other processes, you are able to find the log file locations for GSE by the gadmin log
command. Refer to the for more information about which files to check.
There are list of known GraphStudio issues .
If after taking these actions you cannot solve the issue, please reach out to support@tigergraph.com to request assistance. Please refer to the section to gather information about the system to increase the efficiency of the support process.
Primitive Types
Advanced Types
Complex Types
INT
UINT
FLOAT
DOUBLE
BOOL
STRING
STRING COMPRESS
DATETIME
User-Defined Tuple (UDT)
LIST
SET
MAP
Syntax Detail
v2.0 Loading
Online
Offline
CREATE JOB statement
Keyword LOADING is used:
CREATE LOADING JOB job_name FOR GRAPH graph_name {...
Keyword ONLINE_POST is used:
CREATE ONLINE_POST JOB job_name FOR GRAPH graph_name {...
Keyword LOADING is used:
CREATE LOADING JOB job_name FOR GRAPH graph_name {...
input source filename
LOAD statement must refer to a valid filepath or to a file variable.LOAD "myFile1" TO ... LOAD fileVar2 TO ...
Filepath can specify one or more machines in a cluster. Optional DEFINE FILENAME statement to define a file variable. File variables can be set/overridden at run time:RUN myJob USING f1="myFile1", f2="myFile2"
Filename appears in the USING clause in the RUN statement, so one JOB can handle only one input file:RUN myJob USING FILENAME="myFile"
Filename appears in the LOAD statement, so one JOB can handle multiple input files if it has multiple LOAD statements:
LOAD "myFile1" TO ... LOAD "myFile2" TO ...
HEADER, SEPARATOR, EOL, QUOTE parameters
These parameters appear at the end of the LOAD statement:
LOAD "myFile" TO ... USING SEPARATOR=",", QUOTE="double"
These parameters appear at the end of the RUN statement:
RUN myJob USING FILENAME="myFile", SEPARATOR=",", QUOTE="double"
These parameters appear at the end of the LOAD statement:
LOAD "myFile" TO ... USING SEPARATOR=",", QUOTE="double"
Parameter
Meaning of value
Allowed values
Comments
SEPARATOR
specifies the special character that separates tokens (columns) in the data file
any single ASCII character
Required.
"\t"
for tab
"\nn"
for ASCII decimal code nn
HEADER
whether the data file's first line is a header line which assigns names to the columns.
In offline loading, the Loader reads the header line to obtain mnemonic names for the columns. In online loading, the Loader just skips the header line.
"true", "false"
Default = "false"
QUOTE
specifies whether strings are enclosed in
single quotation marks: 'a string'
or double quotation marks: "a string"
"single", "double"
Optional; no default value.
Parameter
Meaning of value
Allowed values
Comments
FILENAME
name of input data file
any valid path to a data file
Required for online loading. Not allowed for offloading loading
EOL
the end-of-line character
any ASCII sequence
Default = "\n"
(system-defined newline character or character sequence)
bookcode
genre
101
fiction
101
adventure
101
fantasy
101
young adult
102
fiction
102
science fiction
102
Chinese
Accumulators
data types
SumAccum
INT, UINT, FLOAT, DOUBLE, STRING
MaxAccum, MinAccum
INT, UINT, FLOAT, DOUBLE, VERTEX
AvgAccum
INT, UINT, FLOAT, DOUBLE (output is DOUBLE)
AndAccum, OrAccum
BOOL
BitwiseAndAccum, BitwiseOrAccum
INT (acting as a sequence of bits)
ListAccum, SetAccum, BagAccum
baseType, TUPLE, STRING COMPRESS
ArrayAccum
accumulator, other than MapAccum, HeapAccum, or GroupByAccum
MapAccum
key: baseType, TUPLE, STRING COMPRESS
value: baseType, TUPLE, STRING COMPRESS, ListAccum, SetAccum, BagAccum, MapAccum, HeapAccum
HeapAccum< tuple_type >(heapSize, sortKey [, sortKey_i]*)
TUPLE
GroupByAccum
key: baseType, TUPLE, STRING COMPRESS
accumulator: ListAccum, SetAccum, BagAccum, MapAccum