Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
In a distributed graph (where the data are spread across multiple machines), the default execution plan is as follows:
One machine will be selected as the execution hub, regardless of the number or the distribution of starting point vertices.
All the computation work for the query will take place at the execution hub. The vertex and edge data from other machines will be copied to the hub machine for processing.
TigerGraph Enterprise Edition offers a Distributed Query mode which provides a more optimized execution plan for queries which are likely to start at several machines and continue their traversal across several machines.
A set of machines representing one full copy of the entire graph will participate in the query. If the cluster has a replication factor of 2 (so there are two copies of each piece of data), then half the machines will participate.
The query executes in parallel across all the machines which have source vertex data for a given hop in the query. That is, each SELECT statement defines a 1-hop traversal from a set of source vertices to a set of target vertices. Unlike the default mode where all the needed data are brought to one machine, in Distributed Query mode, the computation moves across the cluster, following the traversal pattern of the query.
The output results will be gathered at one machine.
To invoke Distributed Query Mode, insert the keyword DISTRIBUTED
before QUERY
in the query definition:
The basic trade-off between distributed query mode and default mode is greater parallelism for the given query vs. using more system resources, which reduces the potential for concurrency with other operations. Each machine has a certain number of workers available for concurrent execution of queries. A query in default mode uses only one worker out of the whole system. (This one worker will have multiple threads for processing edge traversals in parallel.) However, a query in distributed mode uses one query worker per machine. This means this query can run faster, but it leaves fewer workers for other queries running concurrently.
In general, Distributed Query Mode is likely to improve the performance of a query if the query:
Starts at a very large set of starting point vertices.
Performs many hops.
For example, algorithms that compute a value for every vertex or one value for the entire graph should use Distributed Query Mode. This includes PageRank, Centrality, and Connected Component algorithms.
For applications where the same query (queries with the same logic but different input parameters) will be run many times in production, the application designer is encouraged to try both modes during development and choose the one which works better for their use case and data.
The following GSQL features are not supported in Distributed Query Mode:
Functions
Evaluate()
Accumulator nesting limitations
Accumulator methods are not supported if the accumulator is nested inside another accumulator
This section describes the data types that are native to and are supported by the GSQL Query Language. Most of the data objects used in queries come from one of three sources:
The query's input parameters
The vertices, edges, and their attributes which are encountered when traversing the graph
The variables defined within the query to assist in the computational work of the query
This section covers the following subset of the EBNF language definitions:
An identifier is the name for an instance of a language element. In the GSQL query language, identifiers are used to name elements such as a query, a variable, or a user-defined function. In the EBNF syntax, an identifier is referred as name
. It can be a sequence of letters, digits, or underscores ("_"
). Other punctuation characters are not supported. The initial character can only be a letter or an underscore.
Different types of data can be used in different contexts. The EBNF syntax defines several classes of data types. The most basic is called base type (baseType
). The other independent types are FILE
and STRING COMPRESS
. The remaining types are either compound data types built from the independent data types, or supersets of other types. The table below gives an overview of their definitions and their uses.
The query language supports the following base types, which can be declared and assigned anywhere within their scope. Any of these base types may be used when defining a global variable, a local variable, a query return value, a parameter, part of a tuple, or an element of a container accumulator. Accumulators are described in detail in a later section.
The default value of each base type is shown in the table below. The default value is the initial value of a base type variable (see Section "Variable Types" for more details), or the default return value for some functions (see Section "Operators, Functions, and Expressions" for more details).
The first seven types (INT
, UINT
, FLOAT
, DOUBLE
, BOOL
, STRING
, and DATETIME
) are the same ones mentioned in the "Attribute Data Types" section of GSQL Language Reference, Part 1.
FLOAT
and DOUBLE
input values must be in fixed point d.dddd
format, where d
is a digit. Output values will be printed in either fixed point for exponential notation, whichever is more compact.
The GSQL Loader can read FLOAT and DOUBLE values with exponential notation (e.g., 1.25 E-7).
Vertex and edge are the two types of objects which form a graph. A query parameter or variable can be declared as either of these two types. In addition, the schema for the graph defines specific vertex and edge types. The parameter or variable type can be restricted by giving the vertex/edge type in angle brackets <>
after the keyword VERTEX
or EDGE
. A vertex or edge variable declared without a specifier is called a generic type. Below are examples of generic and typed vertex and edge variable declarations:
The following table maps vertex or edge attribute types in the Data Definition Language (DDL) to GSQL query language types. If an attribute of a vertex or edge is referenced in a GSQL query, they will be automatically converted to their corresponding data type in the GSQL query language.
SET
and LIST
literalsIn the GSQL query language, one cannot declare a variable of SET
(vertex sets are an exception), LIST
, or MAP
types. However, one can still use SET
and LIST
literals to update the value of a vertex attribute of type SET
or LIST
, insert a vertex or edge with attributes of type SET
or LIST
, and initialize an accumulator.
Currently, GSQL query language syntax does not support MAP
literals.
JSONOBJECT
and JSONARRAY
These two base types allow users to pass a complex data object or to write output in a customized format. These types follow the industry-standard definition of JSON. A JSONOBJECT
instance's external representation (as input and output) is a string, starting and ending with curly braces ( {}
) which enclose an unordered list of key-value pairs. A JSONARRAY
is represented as a string, starting and ending with square brackets ([]
)which enclose an ordered list of values. Since a value can be an object or an array, JSON supports hierarchical, nested data structures.
More details are introduced in the Section JSONOBJECT and JSONARRAY Functions.
A JSONOBJECT
or JSONARRAY
value is immutable. No operator is allowed to modify its value.
A tuple is a user-defined data structure consisting of a fixed sequence of base type variables. Tuple types can be created and named using a TYPEDEF
statement. Tuples must be defined first, before any other statements in a query.
A tuple can also be defined in a graph schema and then can be used as a vertex or edge attribute type. A tuple type that has been defined in the graph schema does not need to be re-defined in a query.
The vertex type person
contains two complex attributes:
secretInfo
of type SECRET_INFO
, which a user-defined tuple
portfolio
of type MAP<STRING, DOUBLE>
The query below reads both the SECRET_INFO
tuple and the portfolio MAP. The tuple type does not need to redefine SECRET_INFO
. To read and save the map, we define a MapAccum
with the same types for key and value as the portfolio
attribute. In addition, the query creates a new tuple type, ORDER_RECORD
.
STRING COMPRESS
STRING COMPRESS
is an integer type encoded by the system to represent string values. STRING COMPRESS
uses less memory than STRING
. The STRING COMPRESS
type is designed to act like STRING
: data are loaded and printed just as string data, and most functions and operators which take STRING
input can also take STRING COMPRESS
input. The difference is in how the data are stored internally. A STRING COMPRESS
value can be obtained from a STRING_SET COMPRESS
or STRING_LIST COMPRESS
attribute or from converting a STRING
value.
Using STRING COMPRESS
instead of STRING
is a trade-off: smaller storage vs. slower access times. The storage space will only be smaller if (1) the original strings are long, and (2) there are only a small number of different strings. Performance will always be slower; the slowdown is greater if the STRING COMPRESS
attributes are accessed more often.
We recommend performing comparison tests for both performance and memory usage before settling on STRING COMPRESS
.
STRING COMPRESS
type is beneficial for sets of string values when the same values are used multiple times. In practice, STRING COMPRESS
are most useful for container accumulators like ListAccum<STRING COMPRESS>
or SetAccum<STRING COMPRESS>
.
An accumulator containing STRING COMPRESS
stores the dictionary when it is assigned an attribute value or from another accumulator containing STRING COMPRESS
. An accumulator containing STRING COMPRESS
can store multiple dictionaries. A STRING
value can be converted to a STRING COMPRESS
value only if the value is in the dictionaries. If the STRING
value is not in the dictionaries, the original string value is saved. A STRING COMPRESS
value can be automatically converted to a STRING
value.
When a STRING COMPRESS
value is output (e.g. by a PRINT
statement), it is shown as a STRING
.
STRING COMPRESS
is not a base type.
FILE
ObjectA FILE
object is a sequential data storage object, associated with a text file on the local machine.
When referring to a FILE
object, we always capitalize the word FILE
to distinguish it from ordinary files.
When a FILE
object is declared, associated with a particular text file, any existing content in the text file will be erased. During the execution of the query, content written to the FILE
will be appended to the FILE
. When the query where the FILE
was declared finishes running, the FILE
contents are saved to the text file.
A FILE
object can be passed as a parameter to another query. When a query receives a FILE
object as a parameter, it can append data to that FILE
, as can every other query which receives this FILE
object as a parameter.
Input parameters to a query can be base type (except EDGE
, JSONARRAY
, or JSONOBJECT
). A parameter can also be a SET
or BAG
which uses base type (except EDGE
, JSONARRAY
, or JSONOBJECT
) as the element type. A FILE
object can also be a parameter. Within the query, SET
and BAG
are converted to SetAccum
and BagAccum
, respectively.
A query parameter is immutable. It cannot be assigned a new value within the query.
The FILE
object is a special case. It is passed by reference, meaning that the receiving query gets a link to the original FILE
object. The receiving query can write to the FILE
object.
This page lists the methods of a JSONARRAY object. Methods can be accessed via the dot(.
) operator.
getBool()
jsonarray.getBool( idx )
Returns the boolean value at a specified index.
BOOL
getDouble()
jsonarray.getDouble( idx )
Returns the double at a specified index.
DOUBLE
getInt()
jsonarray.getInt( idx )
Returns the integer value at a specified index.
INT
getJsonArray()
jsonarray.getJSONArray( idx )
Returns the JSONArray value at a specified index.
BOOL
getJsonObject()
jsonarray.getJsonObject( idx )
Returns the JSONOBJECT value at a specified index.
JSONOBJECT
getString()
jsonarray.getString( idx )
Returns the boolean value at a specified index.
STRING
size()
jsonarray.size()
Returns the size of the array.
INT
None.
This page lists DATETIME
functions that are available in the GSQL query language. Every function in this page either takes a DATETIME
object as its argument or return a DATETIME
object.
datetime_add()
datetime_add( date, INTERVAL int_value time_unit )
Calculates a new DATETIME
from a specified datepart multiplied by a specified amount, added to a specified DATETIME
.INTERVAL
is a keyword that must be exactly entered. time_unit
is one of the keywords YEAR
, MONTH
, DAY
, HOUR
, MINUTE
, or SECOND
.
DATETIME
datetime_diff()
datetime_diff( date1, date2 )
Calculates the difference in seconds between two DATETIME
values
INT
datetime_format()
datetime_format(date[, str])
Print a DATETIME
value in a specific format indicated by a string.
STRING
datetime_sub( )
datetime_sub(date, INTERVAL int_value time_unit)
Calculates a new DATETIME
from a specified datepart multiplied by a specified amount, subtracted from a specified DATETIME
.INTERVAL
is a keyword that must be exactly entered. time_unit
is one of the keywords YEAR
, MONTH
, DAY
, HOUR
, MINUTE
, or SECOND
.
DATETIME
datetime_to_epoch()
datetime_to_epoch( date )
Converts a DATETIME
value to epoch time.
INT
day()
day( date )
Returns the day of the month of a DATETIME
value.
INT
epoch_to_datetime()
epoch_to_datetime(int_value)
Converts an epoch time value to a DATETIME
value.
DATETIME
hour()
hour(date)
Extracts the hour of the day from a DATETIME
value.
INT
minute()
minute(date)
Extracts the minute of the hour from a DATETIME
value.
INT
month()
month(date)
Extracts the month of the year from a DATETIME
value.
now()
now()
Returns the current time in DATETIME
DATETIME
None.
second()
second(date)
Extracts the second from a DATETIME
value.
INT
year()
year(date)
Extracts the year from a DATETIME
value.
Type
Default value
Literal
INT
0
A signed integer: -3
UINT
0
An unsigned integer: 5
FLOAT
0
A decimal: 3.14159
DOUBLE
0
A decimal with greater precision than FLOAT
BOOL
false
TRUE
or FALSE
STRING
""
Characters enclosed by double quotes: "Hello"
DATETIME
1970-01-01 00:00:00
No literal. Can be converted from a correctly formatted string with to_datetime()
.
VERTEX
"Unknown"
No literal.
EDGE
No edge: {}
No literal.
JSONOBJECT
An empty object: {}
No literal. Can be converted from a correctly formatted string with parse_json_object()
.
JSONARRAY
An empty array: []
No literal. Can be converted from a correctly formatted string with parse_json_array()
.
DDL
GSQL Query
INT
INT
UINT
UINT
FLOAT
FLOAT
DOUBLE
DOUBLE
BOOL
BOOL
STRING
STRING
STRING COMPRESS
STRING
SET< type >
SetAccum< type >
LIST< type >
ListAccum< type >
MAP <key_type, value_type>
MapAccum <key_type, value_type>
DATETIME
DATETIME
Parameter | Description | Data type |
| The index of the value to return |
|
Parameter | Description | Data type |
| The index of the value to return |
|
Parameter | Description | Data type |
| The index of the value to return |
|
Parameter | Description | Data type |
| The index of the value to return |
|
Parameter | Description | Data type |
| The index of the value to return |
|
Parameter | Description | Data type |
| The index of the value to return |
|
Parameter | Description | Data type |
| The |
|
| An integer value |
|
Parameter | Description | Data type |
| A |
|
| A |
|
Parameter | Description | Data type |
| A |
|
| A string pattern expressing the format to print
The default value for this parameter is |
|
Parameter | Description | Data type |
| The |
|
| An integer value |
|
Parameter | Description | Data type |
| A |
|
Parameter | Description | Data type |
| A |
|
Parameter | Description | Data type |
| An epoch time value |
|
Parameter | Description | Data type |
| A |
|
Parameter | Description | Data type |
| A |
|
Parameter | Description | Data type |
| A |
|
Parameter | Description | Data type |
| A |
|
Parameter | Description | Data type |
| A |
|
EBNF term
Description
Use Case
baseType
INT
, UINT
, FLOAT
, DOUBLE
, STRING
, BOOL
, DATETIME,
VERTEX
, EDGE
,
JSONOBJECT
, or JSONARRAY
Global variable
Query return value
tupleType
Sequence of base types
User-defined tuple
accumType
Family of specialized data objects which support accumulation operations
Accumulate and aggregate data, when traversing a set of vertices or edges (Details are in the Query Lang Spec - Accumulators chapter.)
FILE
FILE
object
Global sequential data object, linked to a text file
parameterType
baseType
(except EDGE
or JSONOBJECT
), a set or bag of baseType
, or FILE
object
Query parameter
STRING COMPRESS
STRING COMPRESS
more compact storage of STRING, if there is a limited number of different values and the value is rarely accessed. Otherwise, it may use more memory.
elementType
baseType
, STRING COMPRESS
, or identifier
Element for most types of container accumulators: SetAccum
, BagAccum
, GroupByAccum
, key of a MapAccum
element
type
baseType
, STRING COMPRESS
, identifier, or accumType
Element of a ListAccum
, value of a MapAccum
element
Local variable
This page lists the mathematical functions that are available in the GSQL query language. They are divided into three categories:
General
Logarithmic
Trigonometric
abs()
abs( num )
Returns the absolute value of a number.
Number
ceil()
ceil(num)
Rounds a number up to the smallest integer that's greater than or equal to the number.
INT
exp()
exp(num)
Returns the base-e exponential of a number.
FLOAT
float_to_int()
float_to_int (num)
Converts a floating-point number to an integer by truncating the floating part.
INT
floor()
floor(num)
Rounds a number down to the biggest integer that is smaller than or equal to the number.
INT
fmod()
fmod(numer, denom)
Returns the floating-point remainder of numer
divided by denom
FLOAT
ldexp()
ldexp(x, exp)
Returns x
multiplied by 2 raised to the power of exp
FLOAT
pow()
pow(base, exp)
Returns the power of a number.
FLOAT
sqrt()
sqrt(num)
Returns the square root of a number
FLOAT
log(num)
Returns the natural logarithm of a number (base e).
FLOAT
log10()
log10(num)
Return the common logarithm of a number (base 10).
FLOAT
acos()
acos(num)
Returns the arc cosine of a number.
FLOAT
asin()
asin(num)
Returns the arc sine of a number.
FLOAT
atan()
atan(num)
Returns the arctangent of a number.
FLOAT
atan2()
atan2(y, x)
Returns the arctangent of a fraction.
FLOAT
cos(num)
Returns the cosine of a number.
FLOAT
cosh()
cosh(num)
Returns the hyperbolic cosine of a number.
FLOAT
sin()
sin(num)
Returns the sine of a number.
FLOAT
sinh()
sinh(num)
Returns the hyperbolic sine of a number.
FLOAT
tan()
tan(num)
Returns the tangent of a number.
FLOAT
tanh()
tanh(num)
Returns the hyperbolic tangent of a number.
FLOAT
In GSQL, users can supplement the language by defining their own query user-defined functions (query UDF). Query UDFs can be called in queries and subqueries to perform a set of defined actions and return a value like the built-in functions.
This page introduces the process to define a query UDF. Once defined, the new functions will be added into GSQL automatically next time GSQL is executed.
Below are the steps to add a Query UDF to GSQL:
Use the GET ExprFunctions
command in GSQL to download the current UDF file to any location on your machine. The file and the directores will be created if they do not exist, and the file must end with the file extention .hpp
:
If your query UDF requires a user-defined struct or helper function, also use the GET ExprUtil
command to download the current ExprUtil
file:
Define the C++ function inside the UDIMPL
namespace inside of the UDF file you just downloaded in Step 1. The definition of the function should include the keyword inline
. Only bool
, int
, float
, double
, and string
(NOT std::string
) are allowed as the return value type and the function argument type. However, any C++ type is allowed inside a function body.
If the function requires a user-defined struct or helper function, define it in the ExprUtil
file you downloaded in Step 1.
Below is an example of a query UDF definition:
If any code in ExprFunctions.hpp
or ExprUtil.hpp
causes a compilation error, GSQL cannot install any GSQL query, even if the GSQL query doesn't call any query UDF. Therefore, please test each new query UDF after adding it. One way of testing a function is to create a new file test.cpp
and compile it:
> g++ test.cpp
> ./a.out
You might need to remove the include header #include <gle/engine/cpplib/headers.hpp>
in ExprFunctions.hpp
and ExprUtil.hpp
in order to compile.
After you have defined the function, use the PUT
command to upload the files you modified.
The PUT
command will automatically upload the files to all nodes in a cluster. Once the files are uploaded, you will be able to call the query UDF the next time GSQL is executed. This includes the next time you start the GSQL shell or executing GSQL scripts from a bash shell.
Suppose you are working in a distributed environment and want to add a function that that returns a random double between 0 and 1.
Start by downloading the current UDF file with the GET
command:
In the downloaded file, add the function definition for function rng
and add the necessary include directives at the top:
Lastly, use the PUT
command to upload the file. This will uploaded the file to all nodes in a cluster:
The UDF has now been added to GSQL and you can start using the function in GSQL queries.
(suitable only in limited circumstances)
Parameter
Description
Data type
num
The number to return the absolute value for
Number
Parameter
Description
Data type
num
The number to round up from
num
Parameter
Description
Data type
num
The exponent
Number
Parameter
Description
Data type
num
The floating-point number to convert to integer
FLOAT
Parameter
Description
Data type
num
The number to round down from
Number
Parameter
Description
Data type
numer
The dividend
Number
denom
The divisor
Number
Parameter
Description
Data type
x
The base
Number
exp
The exponent of 2
Number
Parameter
Description
Data type
base
The base
Number
exp
The exponent
Number
Parameter
Description
Data type
num
The number to get square root for.
Number
Parameter
Description
Data type
num
The number to compute natural logarithm for
Number
Parameter
Description
Data type
num
The number to compute common logarithm for
Number
Parameter
Description
Data type
num
The number to compute arccosine for
Number
Parameter
Description
Data type
num
The number to compute arcsine for
Number
Parameter
Description
Data type
num
The number to compute arctangent for
Number
Parameter
Description
Data type
y
The dividend of the fraction to compute arctangent for
Number
x
The divisor of the fraction to compute arctangent for
Number
Parameter
Description
Data type
num
The number to return cosine for
Number
Parameter
Description
Data type
num
The number to compute hyperbolic cosine for
Number
Parameter
Description
Data type
num
The number to compute sine for
Number
Parameter
Description
Data type
num
The number to compute hyperbolic sine for
Number
Parameter
Description
Data type
num
The number to compute tangent for
Number
Parameter
Description
Data type
num
The number to compute hyperbolic tangent for
Number
No computer can store all floating point numbers (i.e., non-integers) with perfect precision. The float data type offers about 7 decimal digits of precision; the double data type offers about 15 decimal digits of precision. Comparing two float or double values by using operators involving exact equality (==, <=, >=, BETWEEN ... AND ...) might lead to unexpected behavior. If the GSQL language parser detects that the user is attempting an exact equivalence test with float or double data types, it will display a warning message and suggestion. For example, if there are two float variables v and v2, the expression v == v2 causes the following warning message:
Response to Non-existent vertex ID
If a query has a vertex parameter (VERTEX or VERTEX<vType>), and if the ID for a nonexistent vertex is given when running the query, an error message is shown, and the query won't run. This is also the response when calling a function to convert a single vertex ID string to a vertex:
to_vertex(): See Section "Miscellaneous Functions".
However, if the parameter is a vertex set (SET<VERTEX> or SET<VERTEX<vType>>), and one or more nonexistent IDs are given when running the query, a warning message is shown, but the query still runs, ignoring those nonexistent IDs. Therefore, if all given IDs are nonexistent, the parameter becomes an empty set. T his is also the response when calling a function to convert a set of vertex IDs to a set of vertices :
to_vertex_set(): See Section " Miscellaneous Functions ".
SelectVertex(): See Section " Miscellaneous Functions ".