Version 2.2. This work is licensed under a Creative Commons Attribution 4.0 International License.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The PRINT statement specifies output data. Each execution of a PRINT statement adds a JSON object to the results array which will be part of the query output. A PRINT statement can appear anywhere that query-body statements are permitted.
A PRINT statement does not trigger immediate output. The full set of data from all PRINT statements is delivered at one time, when the query concludes.
Each PRINT statement contains a list of expressions for output data. The optional WHERE clause filters the output. If the condition is false for any items, then those items are excluded from the output.
Each printExpr contributes one key-value pair to the PRINT statement's JSON object result. The optional AS clause sets the key for the expression, overriding the default key (explained below).
Each printExpr may be one of the following:
A literal value
A global or local variable (including VERTEX and EDGE variables)
An attribute of a vertex variable, e.g., Person.name
A global accumulator
An expression whose terms are among the types above. The following operators may be used:
Parentheses can be used for controlling order of precedence.
A vertex set variable
A vertex expression set vExprSet (only available if the output API is set to "v2". Vertex expression sets are explained in a separate section below.
In output API v2, the print expression list can be a mixed list of any of the expression types. In output API v1, vertex set variables cannot be on the same PRINT statement with other types of expressions.
If a printExpr includes the optional AS name clause, then the name sets the key for that expression in the JSON output. Otherwise, the following rules determine the key: If the expression is simply a single variable (local variable, global variable, global accumulator, or vertex set variable), then the key is the variable name. Also, for a vertex expression set, the key is the vertex set variable name. Otherwise, the key is the entire expression, represented as a string.
Each data type has a distinct output format.
Simple numeric, string, and boolean data types follow JSON standards.
Lists, sets, bags, and arrays are printed as JSON arrays (i.e., a list enclosed in square brackets).
Maps and tuples are printed as JSON objects (i.e., a list of key:value pairs enclosed in curly braces).
Vertices and edges have a custom JSON object, shown below.
A vertex set variable is treated as a list of vertices.
Accumulator output format is determined by the accumulator's return type. For example, an AvgAccum outputs a DOUBLE value, and a BitwiseAndAccum outputs a INT value. For container accumulators, simply consider whether the output is a list, set, bag, or map.
ListAccum, SetAccum, BagAccum, ArrayAccum: list
MapAccum: map
HeapAccum, GroupByAccum: list of tuples
Full details of vertices are printed only when part of a vertex set variable or vertex expression set. When a single vertex is printed (from a variable or accumulator whose data type happens to be VERTEX), only the vertex id is printed.
Vertex (when not part of a vertex set variable)
The output is just the vertex id as a string:
Vertex (as part of a vertex set variable)
Edge
List, Set or Bag
Map
Tuple
Vertex Set Variable
A vertex expression set is a list of expressions which is applied to each vertex in a vertex set variable. The expression list is used to compute an alternative set of values to display in the "attributes" field of each vertex.
The easiest way to understand this is to consider examples containing only one term and then consider combinations. Consider the following example query. C is a vertex set variable containing the set of all company vertices. Furthermore, each vertex has a vertex-attached accumulator @count.
If we print the full vertex set, the "attributes" field of each vertex will contain 3 fields: "id", "country", and "@count". Now consider some simple vertex expression sets:
PRINT C[C.country]
prints the vertex set variable C, except that the "attributes" field will contain only "country", instead of 3 fields.
PRINT C[C.@count]
prints the vertex set variable C, except that the "attributes" field will contain only "@count", instead of 3 fields.
PRINT C[C.id, C.@count]
prints the vertex set variable C, except that the "attributes" field will contain only "id" and "@count".
PRINT C[C.id+"_ex", C.@count+1]
prints the vertex set variable C, except that the "attributes" field contains the following:
One field consists of each vertex's id value, with the string "_ex" appended to it.
Another field consists of the @count value incremented by 1. Note: the value of @count itself has not changed, only the displayed value is incremented.
The last example illustrates the general format for a vertex expression set:
The vertex expression set begins with the name of a vertex set variable. It is followed by a list of attribute expressions, enclosed in square brackets. Each attribute expression follows the same rules described earlier in the Print Expressions section. That is, each attribute expression may refer to one or more attributes or vertex-attached accumulators of the current vertices, as well as literals, local or global variables, and global accumulators. The allowed operators (for numeric, string, or set operations) are the same ones mentioned above.
The key for the vertex expression set is the vertex set variable name.
The value for the vertex expression set is a modified vertex set variable, where the regular "attributes" value for each vertex is replaced with a set of key:value pairs corresponding to the set of attribute expressions given in the print expression.
An example which shows all of the cases described above, in combination, is shown below.
Note how the results of the six PRINT statements are grouped in the JSON "results" field below:
Each of the six PRINT statements is represented as one JSON object with the "results" array.
When a PRINT statement has more than one expression (like the first one), the expressions may appear in the output in a different order than on the PRINT statement.
The 2nd PRINT statement shows a key that is generated from the expression itself.
The 3rd and 4th PRINT statements show a set of vertices (different than a vertex set variable) and a map, respectively.
The 5th PRINT statement shows the vertex set variable A, including its vertex-attached accumulators (PRINT A).
The 6th PRINT statement shows a vertex set expression for A, customized to include only one static attribute plus a newly computed attribute.
Instead of printing output in JSON format, output can be written to a FILE object in comma-separated values (CSV) format. To select this option, at the end of the PRINT statement, include the keyword TO_CSV followed by the FILE object name:
The bracket > is no longer supported for directing output to a file or FILE. You must use the keyword TO_CSV.
Each execution of the PRINT statement appends one line to the FILE. If the PRINT statement includes multiple expressions, then each printed value is separated from its neighbor by a comma. If an expression evaluates to a set or list, then the collection's values are delimited by single spaces. Due to the simpler format of CSV vs. JSON, the TO_CSV feature only supports data with a simple one- or two-dimension structure.
Limitations of PRINT > File
Printing a full Vertex set variable is not supported.
If a vertex is printed, only its ID value is printed.
If printing a vertex set's vertex-attached accumulator or a vertex set's variable, the result is a list of values, one for each vertex, separated by newlines.
The syntax for printing a vertex set expression is currently different when printing to a file than when printing to standard output. Compare:
PRINT A[A.gender]; # with brackets
PRINT A.gender TO_CSV file1; # without brackets
Writing to FILE objects is optimized for parallel processing. Consequently, the order in which data is written to the FILE is not guaranteed. Therefore, it is strongly recommended that the user design their queries such that one of these conditions is satisfied:
The query prints only one set of data, and the order of the set is not important.
Each line of data to print to a file includes a label which can be used to identify the data.
Instead of printing CSV output to a FILE object, data can be written to a regular file.
This feature is deprecated because printing to a FILE object covers the same functionality.
The table below shows the differences between printing TO_CSV <FILE object> vs. TO_CSV <fllepath>.
One of the two ways to write data to a FILE object is with the FILE println statement. (The other way is with the PRINT statement's TO_CSV option.)
println is a method (function) of a FILE object variable. The println statement can be used either at the query-body level or a a DML-sub-statement, e.g., within the ACCUM clause of a SELECT block. Each time println is called, it adds one new line of values to the FILE object, and then to the corresponding file.
The println function can print most of the expressions handled by PRINT. Note, however, that this does not include vertex expression sets (vExprSet). If the println statement has a list of expressions to print, then this will produce a comma-separated list of values. If an expression refers to a list or set, then the output will be a list of values separated by spaces, the same format produced by TO_CSV.
The data from query-body level FILE print statements (either TO_CSV or println) will appear in their original order. However, due to the parallel processing of statements in an ACCUM block, the order in which println statements at the DML-sub-statement level are processed cannot be guaranteed. Moreover, the output from println statements in an ACCUM block can be interspersed with the query-body statements.
All of the PRINT statements in this example use the TO_CSV option, so there is no JSON output to the console.
All the output in this case goes the the FILE object. In the query definition, the footer is the last FILE statement, but the println statements from the SELECT block happen to be delayed and are printed AFTER the footer line.
A FILE Object can be passed from one query to a subquery. The subquery can then also write to the FILE object.
The LOG statement is another means to output data. It works as a function that outputs information to a log file.
The first argument of the LOG statement is a boolean condition that enables or disables logging. This allows logging to be easily turned on/off, for uses such as debugging. After the condition, LOG takes one or more expressions (separated by commas). These expressions are evaluated and output to the log file.
Unlike the PRINT statement, which can only be used as a query-body statement, the LOG statement can be used as both a query-body statement and a DML-sub-statement.
The values will be recorded in the GPE log. To find the log file after the query has completed, open a Linux shell and use the command "gadmin log gpe". It may show you more than one log file name; use the one ending in "INFO". Search this file for "UDF_".
The RETURN statement specifies data that a sub-query passes back to an outer query that called the sub-query. In order for a query to be used as a subquery, its initial CREATE QUERY statement must include the optional RETURNS clause, and its body must end with a RETURN statement. Exactly one type is allowed in the RETURNS clause, and thus RETURN statement can only return one expression.The returned expression must have the same type as the RETURNS clause indicates. A sub-query must be created before its corresponding super-query. A sub-query must be install either before or in the same INSTALL QUERY command with its super-query.
The return type can be any base type or any accumulator type, except GroupByAccum and any accumulator containing any tuple type. For the purposes of return type, SetAccum is equivalent to SET, and BagAccum is equivalent to BAG. A vertex set variable can be returned if SET<VERTEX<type>> or SetAccum<VERTEX<type>> (<type> is optional) is used in the RETURNS clause.
See also Section 5.11 - Queries ad Functions.
Numeric
Arithmetic: + - * / . %
Bit: << >> & |
String
concatenation: +
Set
UNION INTERSECT MINUS
FILE Object
filepath
When filepath is specified
Either run-time or compile-time, depending on how users chooses to write the query
compile-time
Vertex IDs
displayed correctly
displayed as TigerGraph internal ID codes
Append or overwrite
Appends, but FILE object declaration will reset the FILE.
Always appends.
filepath can be absolute or relative
Currently only absolute
Absolute or relative
No computer can store all floating point numbers (i.e., non-integers) with perfect precision. The float data type offers about 7 decimal digits of precision; the double data type offers about 15 decimal digits of precision. Comparing two float or double values by using operators involving exact equality (==, <=, >=, BETWEEN ... AND ...) might lead to unexpected behavior. If the GSQL language parser detects that the user is attempting an exact equivalence test with float or double data types, it will display a warning message and suggestion. For example, if there are two float variables v and v2, the expression v == v2 causes the following warning message:
Response to Non-existent vertex ID
If a query has a vertex parameter (VERTEX or VERTEX<vType>), and if the ID for a nonexistent vertex is given when running the query, an error message is shown, and the query won't run. This is also the response when calling a function to convert a single vertex ID string to a vertex:
to_vertex(): See Section "Miscellaneous Functions".
However, if the parameter is a vertex set (SET<VERTEX> or SET<VERTEX<vType>>), and one or more nonexistent IDs are given when running the query, a warning message is shown, but the query still runs, ignoring those nonexistent IDs. Therefore, if all given IDs are nonexistent, the parameter becomes an empty set. T his is also the response when calling a function to convert a set of vertex IDs to a set of vertices :
to_vertex_set(): See Section " Miscellaneous Functions ".
SelectVertex(): See Section " Miscellaneous Functions ".
This is the definition for the GSQL Query Language syntax. It is defined as a set of rules expressed in EBNF notation.
This defines the EBNF notation used to describe the syntax. Rules contains terminal and non-terminal symbols. A terminal symbol is a base-level symbol which expresses literal output. All symbols in single or double quotes (e.g., '+', "=", ")", "10") are terminal symbols. A non-terminal symbol is defined as some combination of terminal and non-terminal symbols. The left-hand side of a rule is always a non-terminal; this rule defines the non-terminal. The example rule below defines assignmentStmt (that is, an Assignment Statement) to be a name followed by an equal sign followed by an expression, operator, and expression with a terminating semi-colon. AssignmentStmt, name, and expr are all non-terminals. Additionally, all KEYWORDS are in all-capitals and are terminal symbols. The ":=" is part of EBNF and states the left hand side can be expanded to the right hand side.
A vertical bar | in EBNF indicates choice. Choose either the symbol on the left or on the right. A sequence of vertical bars means choose any one of the symbols in the sequence.
Square brackets [ ] indicate an optional part or group of symbols. Parentheses ( ) group symbols together. The rule below defines a constant to be one, two, or three digits preceded by an optional plus or minus sign.
Star * and plus + are symbols in EBNF for closure. Star means zero or more occurrences, and plus means one or more occurrences. The following defines intConstant to be an optional plus or minus followed by one or more digits. It also defines floatConstant to be an optional plus or minus followed by zero or more digits followed by a decimal followed by one or more digits. The star and plus also can be applied to groups of symbols as in the definition of list. The non-terminal list is defined as a parenthesized list of comma-separated expressions (expr). The list has at least one expression which can be followed by zero or more comma-expression pairs.
Curly braces { } enclose an optional group of symbols which are repeated zero or more times. Therefore, curly braces are equivalent to square brackets or parentheses followed by a star + to indicate zero or more repetitions. All of the following expressions are equivalent:
For brevity, the literal comma is sometimes shown without quotation marks:
An expression is a combination of fixed values, variables, operators, function calls, and groupings which specify a computation, resulting in a data value. This section of the specification describes the literals (fixed values), operators, and functions available in the GSQL query language. It covers the subset of the EBNF definitions shown below. However, more so than in other sections of the specification, syntax alone is not an adequate description. The semantics (functionality) of the particular operators and functions are an essential complement to the syntax.
Each primitive data type supports constant values:
GSL_UINT_MAX = 2 ^ 64 - 1 = 18446744073709551615
GSQL_INT_MAX = 2 ^ 63 - 1 = 9223372036854775807
GSQL_INT_MIN = -2 ^ 63 = -9223372036854775808
An operator is a keyword token which performs a specific computational function to return a resulting value, using the adjacent expressions (its operands) as input values. An operator is similar to a function in that both compute a result from inputs, but syntactically they are different. The most familiar operators are the mathematical operators for addition + and subtraction - .
Tip: The operators listed in this section are designed to behave like the operators in MySQL.
We support the following standard mathematical operators and meanings. The latter four ("<<" | ">>" | "&" | "|") are for bitwise operations. See the section below: "Bit Operators".
Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence:
We support the standard Boolean operators and standard order of precedence: AND, OR, NOT
Bit operators (<<, >>, &, and |) operate on integers and return an integer.
Operator + can be used for concatenating strings.
The fields of the tuple can be accessed using the dot operator.
A condition is an expression which evaluates to a boolean value of either true or false. One type of condition uses the familiar comparison operators. A comparison operator compares two numeric values.
The expression expr1 BETWEEN expr2 AND expr3 is true if the value expr1 is in the range from expr2 to expr3, including the endpoint values. Each expression must be numeric.
" expr1 BETWEEN expr2 AND expr3 " is equivalent to " expr1 <= expr3 AND expr1 >= expr2".
IS NULL and IS NOT NULL can be used for checking whether an optional parameter is given any value.
Every attribute value stored in GSQL is a valid value, so IS NULL and IS NOT NULL is only effective for query parameters.
The LIKE operator is used for string pattern matching. The expression
string1 LIKE string_pattern
evaluates to boolean true if string1 matches the pattern in string_pattern ; otherwise it is false. Both operands must be strings. LIKE may be used only in WHERE clauses. Additionally, string_patternsupports the following wildcard and other symbols, in order to express a pattern:
There are a number of built-in functions which act on either an accumulator, a base type, or vertex variable. The accumulator function calls are discussed in detail in the "Accumulators" section.
Below is a list of built-in functions which act on either INT, FLOAT, or DOUBLE value(s).
The following built-in functions are provided for text processing. Note that these functions do not modify the input parameter. They each return a new string.
In the syntax for trim(), the words in bold ( LEADING, TRAILING, BOTH, and FROM ) are keywords which should appear exactly as shown.
STRING is just an indicator of the datatype; it is not an explicit keyword.
The trim() function have the following options:
By using one of the keywords LEADING, TRAILING, or BOTH, the user can specify that characters are to be removed from the left end, right end, or both ends of the string, respectively. If none of these keywords is used, the function will removed from both ends.
removal_char is a single character. The function will remove consecutive instances of removal_char , until it encounters a different character. If removal_char is not specified, then trim() removes whitespace (spaces, tabs, and newlines).
Notes about the trim() function:
The following functions convert from/to DATETIME to/from other types.
The following function converts a DATETIME value into a string format specified by the user:
The followings are other functions related to DATETIME :
JSONOBJECT and JSONARRAY are base types, meaning they can be used as a parameter type, an element type for most accumulators, or a return type. This enables the input and output of complex, customized data structures. For input and output, a string representation of the JSON is used. Hence, the GSQL query language offers several functions to convert a formatted string into JSON and then to search and access the components of a JSON structure.
Data Conversion Functions
The following parsing functions convert a string into a JSONOBJECT or a JSONARRAY:
Both functions generate a run-time error if the input string cannot be converted into a JSON object or a JSON array. To be properly formatted, besides having the proper nesting and matching of curly braces { } and brackets [ ], each value field must be one of the following: a string (in double quotes "), a number, a boolean ( true or false ), or a JSONOBJECT or JSONARRAY. Each key of a key:value pair must be a string in double quotes.
See examples below.
Data Access Methods
JSONOBJECT and JSONARRAY are object classes, each class supporting a set of data access methods, using dot notation:
jsonVariable.functionName(parameter_list)
The following methods (class functions) can act on a JSONOBJECT variable:
The above getType(STRING keyStr ) function generates a run-time error if
The key keyStr doesn't exist, or
The function's return type is different than the stored value type. See the next note about numeric data.
Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.
The following methods can act on a JSONARRAY variable:
Similar to the methods of JSONOBJECT, the above getType(INT idx ) function generates a run-time error if
idx is out of bounds, or
The function's return type is different than the stored value type. See the next note about numeric data.
Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.
Below is an example of using these functions and methods :
Attributes on vertices or edges are defined in the graph schema. Additionally, each vertex and edge has a built-in STRING attribute called type which represents the user-defined type of that edge or vertex. These attributes, including type , can be accessed for a particular edge or vertex with the dot operator.
For example, the following code snippet shows two different SELECT statements which produce equivalent results. The first uses the dot operator on the vertex variable v to access the "subject" attribute, which is defined in the graph schema. The FROM clause in the first SELECT statement necessitates that any target vertices will be of type "post" (also defined in the graph schema). The second SELECT schema checks that the vertex variable v's type is a "post" vertex by using the dot operator to access the built-in type attribute.
Below is a list of built-in functions that can be accessed by vertex aliases, using the dot operator:
Currently, these functions are only available for vertex aliases (defined in the FROM clause); vertex variables do not have these functions.
Note that in order to calculate outdegree by edge type, the graph schema must be defined such that vertices keep track of their edge types using WITH STATS="OUTDEGREE_BY_EDGETYPE" (however, "OUTDEGREE_BY_EDGETYPE" is now the default STATS option).
The optional .FILTER(condition) clause offers an additional filter for selecting which elements are added to the output set of the neighbor, neighborAttribute and edgeAttribute functions. The condition is evaluated for each element . If the condition is true, the element is added to the output set; if false, it is not. An example is shown below:
Below are the built-in functions that can be accessed by edge aliases, using the dot operator. Edge functions follow the same general rules as vertex functions (see above).
Accumulator functions for each accumulator type are illustrated at the "Accumulator Type" section.
SELECT blocks take an input vertex set and perform various selection and filtering operations to produce an output set. Therefore, set/bag expressions and their operators are a useful and powerful part of the GSQL query language. A set/bag expression can use either SetAccum or BagAccum.
The operators are straightforward, when two operands are both sets, the result expression is a set. When at least one operant is a bag, the result expression is a bag. If one operant is a bag and the other is a set, the operator treats the set operant as a bag containing one of each value.
The result of these operators is another set or bag, so these operations can be nested and chained to form more complex expressions, such as
For example , suppose setBagExpr_A is ("a", "b", "c")
The IN and NOT IN operators support all base types on the left-hand side, and any set/bag expression on the right-hand side. The base type must be the same as the accumulator's element type. IN and NOT IN return a BOOL value.
The following example uses NOT IN to exclude neighbors that are on a blacklist.
The aggregation functions take a set/bag expression as its input parameter and return one value or element.
count() : Returns the size (INT) of the set.
sum() : Returns the sum of all elements. This is only applicable to a set/bag expression with numeric type.
min() : Returns the member with minimum value. This is only applicable to a set/bag expression with numeric type.
max() : Returns the member with maximum value. This is only applicable to a set/bag expression with numeric type.
avg() : Returns the average of all elements. This is only applicable to a set/bag expression with numeric type. The average is INT if the element type of the set/bag expression is INT.
SelectVertex() reads a data file which lists particular vertices of the graph and returns the corresponding vertex set. This function can only be used in a vertex set variable declaration statement as a seed set. The data file must be organized as a table with one or more columns. One column must be for vertex id. Optionally, another column is for vertex type. SelectVertex() has five parameters explained in the below table: filePath, vertexIdColumn, vertexTypeColumn, separator, and header. The rules for column separators and column headings are the same as for the GSQL Loader.
One vertex set variable declaration statement can have multiple SelectVertex() function calls. However, if a declaration statement has multiple SelectVertex() calls referring to the same file, they must use the same separator and header parameters. If any row of the file contains an invalid vertex type, a run time error occurs; if any row of the file contains an nonexistent vertex id, a warning message is shown with the count of nonexistent ids.
Below is a query example using SelectVertex calls, reading from the data file selectVertexInput.csv.
to_vertex() and to_vertex_set() convert a string or a string set into a vertex or a vertex set, respectively, of a given vertex type. These two functions are useful when the vertex id(s) are obtained and only known at run-time.
Running these functions requires real-time conversion of an external id to a GSQL internal id, which is a relatively slow process. Therefore,
If the user can always know the id before running the query, define the query with VERTEX or SET<VERTEX> parameters instead of STRING or SET<STRING> parameters, and avoid calling to_vertex() or to_vertex_set().
Calling to_vertex_set() one time is much faster than c alling to_vertex() multiple times . Use to_vertex_set() instead of to_vertex() as much as possible.
The first parameter of to_vertex() is the vertex id string. The first parameter of to_vertex_set() is a string set representing vertex ids. The second parameter of both functions is the vertex type string.
to_vertex_set can accept a bag of vertices as input, but the function will reduce the bag to a set by eliminating duplicate items.
If the vertex id or the vertex type doesn't exist, to_vertex() will have a run-time error, as shown below. However, to_vertex_set() will have a run-time error only if the vertex type doesn't exist. If one or more vertex ids are nonexistent, to_vertex_set() will display a warning message but will still run, converting all valid ids and skipping nonexistent vertex ids. If the user wants an error instead of a warning if a nonexistent id is given when converting a string set to a vertex set, the user can use to_vertex() inside a FOREACH loop, instead of to_vertex_set(). See the example below .
The getvid(v) function returns the internal ID number of the given vertex v. The internal ID is not the primary_id which the user assigned when creating the vertex. However, there is a 1-to-1 mapping between the external ID (primary_id) and internal ID. The engine can access the internal ID faster than accessing the external ID, so if a query needs unique values for a large number of vertices, but doesn't care about particular values, getvid() can be a useful option.
For example, in many community detection algorithms, we start by assigning every vertex a unique community ID. Then, as the algorithm progresses, some vertices will join the community of one of their neighbors, giving up their current community ID and copying the IDs of their neighbors.
The COALESCE function evaluates each argument value in order, and returns the first value which is not NULL. This evaluation is the same as that used for IS NULL and IS NOT NULL. The COALESCE function requires all its arguments have the same data type (BOOL, INT, FLOAT, DOUBLE, STRING, or VERTEX). The only exception is that different numeric types can be used together. In this case, all values are converted into the first argument type.
The COALESCE function is useful when multiple optional parameters are allowed, and one of them must be chosen if available. For example,
The COALESCE function's parameter list should have a default value as the last argument. Otherwise, i f all values are NULL, the default value of the data type is returned.
The function evaluate() takes a string argument and interprets it as an expression which is evaluated during run-time. This enables users to create a general purpose query instead of separate queries for each specific computation.
The evaluate() function has two parameters: expressionStr is the expression string, and typeStr is a string literal indicating the type of expression. This function returns a value whose type is typeStr and whose value is the evaluation of expressionStr. The following rules apply:
evaluate() can only be used inside a SELECT block, and only inside a WHERE clause, ACCUM clause, POST-ACCUM clause, HAVING clause, or ORDER BY clause. It cannot be used in a LIMIT clause or outside a SELECT block.
The result type must be specified at query installation time: typeStr must be a string literal for a primitive data type, e.g., one of "int", "float", "double", "bool", "string" (case insensitive). The default value is "bool".
In expressionStr, identifiers can refer only to a vertex or edge aliases, vertex-attached accumulators, global accumulators, parameters, or scalar function calls involving the above variables. The expression may not refer to local variables, global variables, or to FROM clause vertices or edges by type.
Any accumulators in the expression must be scalar accumulators (e.g., MaxAccum) for primitive-type data. Container accumulators (e.g., SetAccum) or scalar accumulators with non-primitive type (e.g. VERTEX, EDGE, DATETIME) are not supported. Container type attributes are not supported.
evaluate() cannot be nested.
The following situations generate a run-time error:
The expression string expressionStr cannot be compiled (unless the error is due to a non-existent vertex or edge attribute).
The result type of the expression does not match the parameter typeStr.
Silent failure conditions
If any of the following conditions occur, the query may continue running, but the entire clause or statement in which the evaluate() function resides will fail, without producing a run-time error message. For conditional clauses (WHERE, HAVING), a failing evaluate() clause is treated as if the condition is false. An assignment statement with a failing evaluate() will not execute, and an ORDER BY clause with a failing evaluate() will not sort.
The expression references a non-existent attribute of a vertex or edge alias.
The expression uses an operator for non-compatible operation. For example, 123 == "xyz".
The following example employs dynamic expressions in both the WHERE condition and the accumulator value in the POST-ACCUM clause.
A query that has been defined (with a CREATE QUERY ... RETURNS statement) can be treated as a callable function. A query can call itself recursively.
The following limitations apply to queries calling queries:
Each parameter of the called query may be one of the following types:
Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL
VERTEX
A Set or Bag of primitive or VERTEX elements
The return value may be one of the following types. See also the "Return Statement" section.
Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL
VERTEX
a vertex set (e.g., the result of a SELECT statement)
An accumulator of primitive types. GroupByAccum and accumulators containing tuples are not supported.
A query which returns a SetAccum or BagAccum may be called with a Set or Bag argument, respectively.
The order of definition matters. A query cannot call a query which has not yet been defined.
Users can define their own expression functions in C++ in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp. Only bool, int, float, double, and string (NOT std::string) are allowed as the return value type and the function argument type. However, any C++ type is allowed inside a function body. Once defined, the new functions will be added into GSQL automatically next time GSQL is executed.
If a user-defined struct or a helper function needs to be defined, define it in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprUtil.hpp.
Here is an example:
If any code in ExprFunctions.hpp or ExprUtil.hpp causes a compilation error, GSQL cannot install any GSQL query, even if the GSQL query doesn't call any user-defined function. Therefore, please test each new user-defined expression function after adding it. One way of testing the function is creating a new cpp file test.cpp and compiling it by > g++ test.cpp > ./a.out You might need to remove the include header #include <gle/engine/cpplib/headers.hpp> in ExprFunction.hpp and ExprUtil.hpp in order to compile.
Below is a list of examples of expressions. Note that ( argList ) is a set/bag expression, while [ argList ] is a list expression.
Data Type | Constant | Examples |
Numeric types (INT, UINT, FLOAT, DOUBLE) |
| 123 -5 45.67 2.0e-0.5 |
UINT | GSQL_UINT_MAX |
INT | GSQL_INT_MAX GSQL_INT_MIN |
boolean | TRUE FALSE |
string |
|
|
character or syntax | meaning |
% | matches zero or more characters. Example : |
_ (underscore) | matches any single character.
Example : |
[charlist] | match any character in charlist. charlist is a concatenated character set, with no separators.
Example : |
[^charlist] | matches any character NOT in charlist.
Example : |
[!charlist] | matches any character NOT in charlist. |
special syntax within charlist | α-β | matches a character in the range from α to β. A charlist can have multiple ranges.
Example :
|
special syntax within charlist | \\ | matches the character \ |
special syntax within charlist | \\] | matches the character ]
No special treatment is needed for [ inside a charlist.
Example : |
function name and parameters(NUM means INT, FLOAT, or DOUBLE) | description | return type |
abs (NUM num ) | Returns the absolute value of num | Same as parameter type |
sqrt ( NUM num ) | Returns the square root of num | FLOAT |
pow ( NUM base , NUM exp ) | Returns base exp | If base and exp are both INT → INT; Otherwise → FLOAT |
acos ( NUM num ) | arc cosine | FLOAT |
asin ( NUM num ) | arc sine | FLOAT |
atan ( NUM num ) | arc tangent | FLOAT |
atan2 (NUM y , NUM x ) | arc tangent of y / x | FLOAT |
ceil ( NUM num ) | rounds upward | INT |
cos ( NUM num ) | cosine | FLOAT |
cosh ( NUM num ) | hyperbolic cosine | FLOAT |
exp ( NUM num ) | base-e exponential | FLOAT |
floor ( NUM num ) | rounds downward | INT |
fmod (NUM numer , NUM denom ) | floating-point remainder of numer / denom | FLOAT |
ldexp (NUM x , NUM exp ) | x * 2 exp | FLOAT |
log ( NUM num ) | natural logarithm | FLOAT |
log10 ( NUM num ) | common (base-10) logarithm | FLOAT |
sin ( NUM num ) | sine | FLOAT |
sinh ( NUM num ) | hyperbolic sine | FLOAT |
tan ( NUM num ) | tangent | FLOAT |
tanh ( NUM num ) | hyperbolic tangent | FLOAT |
to_string (NUM num ) | Converts num to a STRING value | STRING |
float_to_int (FLOAT num ) | Converts num to a INT value by truncating the floating part | INT |
str_to_int (STRING str ) | Converts str to a INT value. If str is a floating number, the floating part is truncated; If str is not a numerical value, returns 0. | INT |
function name and parameters | description | return type |
to_string (NUM num) | Converts num to a STRING | STRING |
float_to_int (FLOAT num) | Converts num to a INT value by truncating the floating point | INT |
str_to_int (STRING str) | Concerts str to a INT value. If str is a floating number, the floating part is truncated. If str is not a numerical value, returns 0. | INT |
function name and parameters | description | return type |
lower(STRING str ) | Converts str to all lowercase letters | STRING |
upper(STRING str ) | Converts str to all uppercase letters | STRING |
trim( [ [ LEADING | TRAILING | BOTH ] [STRING removal_char ] FROM ] STRING str ) | Trims* characters from the leading and/or trailing ends of str | STRING |
function name and parameters | description | return type |
to_datetime (STRING str ) | Converts str to a DATETIME value | DATETIME |
epoch_to_datetime (INT int_value ) | Converts int_value to a DATETIME value by epoch time conversion | DATETIME |
datetime_to_epoch (DATETIME date ) | Converts date to epoch time. | INT |
function name and parameters | description | return type |
datetime_format( DATETIME date[, STRING str ] ) | Prints date as the str indicates. The following specifiers may be used as the format of str . The “%” character is required before the format specifier characters. If str is not given, "%Y-%m-%d %H:%M:%S" is used. Specifier:
| STRING |
function name and parameters | description | return type |
now() | Returns the current time in DATETIME type. | DATETIME |
year( DATETIME date ) | Extracts the year of date . | INT |
month( DATETIME date ) | Extracts the month of date. | INT |
day( DATETIME date ) | Extracts the day of month of date . | INT |
hour( DATETIME date ) | Extracts the hour of date . | INT |
minute( DATETIME date ) | Extracts the minute of date . | INT |
second( DATETIME date ) | Extracts the second of date . | INT |
datetime_add( DATETIME date, INTERVAL int_value time_unit) | INTERVAL is a keyword; time_unit is one of the keywords YEAR, MONTH, DAY, HOUR, MINUTE, or SECOND. The function returns the DATETIME value which is int_value units later than date . For example, datetime_add( now() , INTERVAL 1 MONTH ) returns a DATETIME value which is 1 month from now. | DATETIME |
datetime_sub( DATETIME date, INTERVAL int_value time_unit ) | Same as datetime_add, except that the returned value is int_value units earlier than date . | DATETIME |
datetime_diff( DATETIMEdate1 , DATETIME date2) | Returns the difference in seconds of these two DATETIME values: ( date1 - date2 ) . | INT |
function name | description | return type |
parse_json_object(STRING str ) | Converts str into a JSON object | JSONOBJECT |
parse_json_array( STRING str ) | Converts str into a JSON array | JSONARRAY |
method name | description | return type |
containsKey(STRING keyStr ) | Returns a boolean value indicating whether the JSON object contains the key keyStr . | BOOL |
getInt(STRING keyStr ) | Returns the numeric value associated with key keyStr as an INT. | INT |
getDouble (STRING keyS tr ) | Returns the numeric value associated with key keyStr as a DOUBLE. | DOUBLE |
getString (STRING keyS tr ) | Returns the string value associated with key keyStr . | STRING |
getBool (STRING keyS tr ) | Returns the bool value associated with key keyStr . | BOOL |
getJsonObject (STRING keyS tr ) | Returns the JSONOBJECT associated with key keyStr . | JSONOBJECT |
getJsonArray (STRING keySt r ) | Returns the JSONARRAY associated with key keyStr . | JSONARRAY |
method name | description | return type |
size() | Returns the size of this array. | INT |
getInt( INT idx ) | Returns the numeric value at position idx as an INT. | INT |
getDouble( INT idx ) | Returns the numeric value at position idx as a DOUBLE. | DOUBLE |
getString( INT idx ) | Returns the string value at position idx . | STRING |
getBool( INT idx ) | Returns the bool value at position idx . | BOOL |
getJsonObject( INT idx ) | Returns the JSONOBJECT value at position idx . | JSONOBJECT |
getJsonArray( INT idx ) | Returns the JSONARRAY value at position idx . | JSONARRAY |
function name | description | return type |
outdegree ([STRING edgeType ]) | Returns the number of outgoing or undirected edges connected to the vertex. If the optional STRING argument edgeTypeis given, then count only edges of the given edgeType. | INT |
neighbors ([ STRING edgeType ]) | Returns the set of ids for the vertices which are out-neighbors or undirected neighbors of the vertex. If the optional STRING argument edgeType is given, then include only those neighbors reachable by edges of the given edgeType . | BagAccum<VERTEX> |
neighborAttribute ( STRING edgeType,STRING targetVertexType, STRING attribute) | From the given vertex, traverses the given edgeType to the given targetVertexType , and return the set of values for the given attribute . edgeType can only be string literal. | BagAccum<attributeType> |
edgeAttribute ( STRING edgeType, STRINGattribute ) | From the given vertex, traverses the given edgeType , and return the set of values for the given edge attribute . edgeTypecan only be string literal. | BagAccum<attributeType> |
function name | description | return type |
isDirected () | Returns a boolean value indicating whether this edge is directed or undirected. | BOOL |
parameter name | type | description |
filePath | string | The absolute file path of the input file to be read. A relative path is not supported. |
vertexIdColumn | $ num , or $ "column_name" if header is true. | The vertex id column position. |
vertexTypeColumn | $ num , $ "column_name" if header is true, or a vertex type | The vertex type column position or a specific vertex type. |
separator | single-character string | The column separator character. |
header | bool | Whether this file has a header. |
The following words are reserved for use by the GSQL query language. This includes words which are currently keywords (such as GRAPH), as well as words which might be used in the future (such as EXTERN). For Data Definition & Loading, there exists a different list of reserved keywords found here.
The GSQL ® Query Language is a language for the exploration and analysis of large scale graphs. The high-level language makes it easy to perform powerful graph traversal queries in the TigerGraph system. By combining features familiar to database users and programmers with highly expressive new capabilities, the GSQL query language offers both easy authoring and powerful execution. A GSQL query contains one or more SELECT statements, where each SELECT statement describes a traversal over a set of vertices and edges in the graph or describes a selection of a subset of vertices. By combining multiple SELECT statements, the user can map out query patterns to answer a virtually unlimited set of real-life data questions.
This document focuses on the formal specification for the GSQL Query Language. It includes example queries which demonstrate the language, each of which works on one of the following six graphs:workNet, socialNet, friendNet, computerNet, minimalNet, and investmentNet . Their schemas are shown below. Appendix D lists the full command and data files to create and load these graphs with small sets of data (~10 to 20 vertices). The data sets are small so that you can understand the result of each query example. The tarball file gsql_ref_examples_2.0.tar.gz contains all of the graph schemas, data files, and queries. Schemas for Example Graphs
In a distributed graph (where the data are spread across multiple machines), the default execution plan is as follows:
One machine will be selected as the execution hub, regardless of the number or distributed of starting point vertices.
All the computation work for the query will take place at the execution hub. The vertex and edge data from other machines will be copied to the hub machine for processing.
TigerGraph Enterprise Edition offers a Distributed Query mode which provides a more optimized execution plan for queries which are likely to start at several machines and continue their traversal across several machines.
A set of machines representing one full copy of the entire graph will participate in the query. If the cluster has a replication factor of 2 (so there are two copies of each piece of data), then half the machines will participate.
The query executes in parallel across all the machines which have source vertex data for a given hop in the query. That is, each SELECT statement defines a 1-hop traversal from a set of source vertices to a set of target vertices. Unlike the default mode where all the needed data are brought to one machine, in Distributed Query mode, the computation moves across the cluster, following the traversal pattern of the query.
The output results will be gathered at one machine.
To invoke Distributed Query mode, simply insert the keyword "DISTRIBUTED" before "QUERY" in a query definition:
The basic trade-off between distributed query mode and default mode is greater parallelism for the given query vs. using more system resources, which reduces the potential for concurrency with other operations. Each machine has a certain number of workers available for concurrent execution of queries. A query in default mode uses only one worker out of the whole system. (This one worker will have multiple threads for processing edge traversals in parallel.) However, a query in distributed mode uses one query worker per machine. This means this query can run faster, but it leaves fewer workers for other queries running concurrently.
We suggest the following guidelines for deciding whether to use default mode or distributed mode.
Queries with one or a few starting point vertices and which take only a few hops → default mode is better.
Queries which start at a very large set of starting point vertices and which traverse many hops → distributed mode is better. For example, algorithms which either compute a value for every vertex or one value for the entire graph should use Distributed Mode. This includes PageRank, Centrality, and Connected Component algorithms.
For applications where the same query (same logic but with different input parameters) will be run many times in production, the application designer can simply try both modes during development and chose the one which works better for their use case and data.
Currently, Distributed Query mode cannot be used for all queries. Please note the limitations carefully. In most cases, the GSQL parser and compiler will report an error if you try to write a Distributed Query using an unsupported feature.
Support was added for many features in TigerGraph 2.2. The table below has changed significantly between v2.1 and v2.2.
The following GSQL features are not supported in Distributed Query mode:
(1) Items in the Supported column are listed only for clarity, so you can compare to the Unsupported column. If a feature which is supported in non-distributed queries is not mentioned in either column, then it is supported in Distributed Query mode .
(2) If the query contains "LIMIT N", and if the number of GPEs working on this query is G, then the output size will be N +/- (G-1). In a conventional cluster configuration, there is one GPE per machine. For example, if N=10 and the graph is distributed across 4 machines, then the output size will be between 7 and 13, inclusive.
A GSQL query is a compiled data retrieval-and-computation task. Users can write queries to explore a data graph however they like, to read and make computations on the graph data along the way, to update the graph, and to deliver resulting data. A query is analogous to a user-defined procedure or function: it can have one or more input parameters, and it can produce output in two ways: by returning a value or by printing. Using a query is a three-step procedure:
CREATE QUERY: define the functionality of the query
INSTALL QUERY: compile the query
RUN QUERY: execute the query with input values
Query Action Privileges
Users with querywriter role or greater (architect, admin, and superuser) can create, install and drop queries.
Any user with queryreader role or greater for a given graph can run the queries for that graph.
To implement fine-grained control over which queries can be executed by which sets of users:
Group your queries into your desired privilege groups.
Define a graph for each privilege group. These graphs can all have the same domain if you wish.
Create your queries, assigning each to its appropriate privilege group.
CREATE QUERY defines the functionality of a query on a given graph schema.
A query has a name, a parameter list, the name of the graph being queried, an optional RETURNS type (see Section "RETURN Statement" for more details), an optional specifier for the output api, and a body. The body consists of an optional sequence of typedefs , followed by an optional sequence of declarations, then followed by one or more statements. The body defines the behavior of the query.
The DISTRIBUTED option applies only to installations where the graph has been distributed across a cluster . If specified, the query will run with a different execution model which may give better performance for queries which traverse a large portion of the cluster. Not all GSQL query language features are supported in DISTRIBUTED mode. For details, see the separate document: Distributed Query Mode.
OR REPLACE is deprecated
If the optional keywords OR REPLACE are included, then this query definition, if error-free, will replace a previous definition with the same query name. However, if there are any errors in this query definition, then the previous query definition will be maintained. If the OR REPLACE option is not used, then GSQL will reject a CREATE QUERY command that uses an existing name.
Typedefs allow the programmer to define custom types for use within the body. The declarations support definition of accumulators (see Chapter "Accumulators" for more details) and global/local variables. All accumulators and global variables must be declared before any statements. There are various types of statements that can be used within the body. Typically, the core statement(s) in the body of a query is one or more SELECT, UPDATE, INSERT, DELETE statements. The language supports conditional statements such as an IF statement as well as looping constructs such as WHILE and FOREACH. It also supports calling functions, assigning variables, printing, and modifying the graph data.
The query body may include calls to other queries. That is, the other queries are treated as subquery functions. See the subsection on "Queries as Functions".
This table lists the supported data types for input parameters and return values.
A statement is a standalone instruction that expresses an action to be carried out. The most common statements are data manipulation language ( DML) statements . DML statements include the SELECT, UPDATE, INSERT INTO, DELETE FROM, and DELETE statements.
A GSQL query has two levels of statements. The upper-level statement type is called query-body-level statement , or query-body statement for short. This statement type is part of either the top-level block or a query-body control flow block. For example, each of the statements at the top level directly under CREATE QUERY is a query-body statement. If one of the statements is a CASE statement with several THEN blocks, each of the statements in the THEN blocks is also a query-body statement. Each query-body statement ends with a semicolon.
The lower-level statement type is called DML-sub-level statement or DML-sub-statement for short. This statement type is used inside certain query-body DML statements, to define particular data manipulation actions. DML-sub-statements are comma-separated. There is no comma or semicolon after the last DML-sub-statement in a block. For example, one of the top-level statements is a SELECT statement, each of the statements in its ACCUM clause is a DML-sub-statement. If one of those DML-sub-statements is a CASE statement, each of the statement in the THEN blocks is a DML-sub-statement.
There is some overlap in the types. For example, an assignStmt can be used either at the query-body level or the DML-sub-level.
Guidelines for understanding statement type hierarchy:
Top-level statements are Query-Body type (each statement ending with a semicolon).
The statements within a DML statement are DML-sub statements (comma-separated list).
The blocks within a Control Flow statement have the same type as the entire Control Flow statement itself.
Here is a descriptive list of query-body statements:
Here is a descriptive list of DML-sub-statements:
A query must be installed before it can be executed. The INSTALL QUERY command will install the queries listed:
INSTALL QUERY queryName1, queryName2, ...
It can also install all uninstalled queries, using either of the following commands: INSTALL QUERY * INSTALL QUERY ALL
Note: Installing takes several seconds for each query. The current version does not support concurrent installation and running of queries. Other concurrent graph operations will be delayed until the installation finishes.
The following options are available:
Reinstall the query even if the system indicates the query is already installed. This is useful for overwriting an installation that is corrupted or otherwise outdated, without having to drop and then recreate the query. If this option is not used, the GSQL shell will refuse to re-install a query that is already installed.
During standard installation, the user-defined queries are dynamically linked to the GSQL language code. Anytime after INSTALL QUERY has been performed, another statement, INSTALL QUERY -OPTIMIZE can be executed. The names of the individual queries are not needed. This operation optimizes all previously installed queries, reducing their run times by about 20%. Optimize a query if query run time is more important to you than query installation time.
Legal:
Illegal:
If you have a distributed database deployment, installing the query in DISTRIBUTED mode can increase performance for single queries - using a single worker from each available machine to yield results. Certain cases may benefit more from this option than others -- more detailed information is available on the next page: Distributed Query Mode .
Installing a query creates a REST++ endpoint. Once a query is installed, there are two ways of executing a query. One way is through the GSQL shell: RUN QUERY query_name( parameterValues ) .
Query output size limitation
There is a maximum size limit of 2GB for the result set of a SELECT block. A SELECT block is the main component of a query which searches for and returns data from the graph. If the result of the SELECT block is larger than 2GB, the system will return no data. NO error message is produced.
The query response time can be reduced by directly submitting an HTTP request to the REST++ server: send a GET request to " http://server_ip:9000/query/graphname/queryname
". If the REST++ server is local, then server_ip is localhost
. The query parameter values are either included directly in the query string of the HTTP request's URL or supplied using a data payload.
Starting with TigerGraph v1.2, the graph name is now part of the GET /query URL.
The current version does not support concurrent installation and running of queries. Other concurrent graph operations will be delayed until the installation finishes.
The following two curl commands are each equivalent to the RUN QUERY command above. The first gives the parameter values in the query string in a URL. This example illustrates the simple format for primitive data types such as INT, DOUBLE, and STRING. The second gives the parameter values through the curl command's data payload -d option.
where RunQueryExPara.dat has the exact string as the query string in the first URL.
To see a list of the parameter names and types for the user-installed GSQL queries, run the following REST++ request:
curl -X GET "http://localhost:9000/endpoints?dynamic=true"
By using the data payload option, the user can avoid using a long and complex URL. In fact, to call the same query but with different parameters, only the data payload file contents need to be changed; the HTTP request can be the same. The file loader loads the entire file, appends multiple lines into one, and uses the resulting string as the URL query string. If both a query string and a data payload are given (which we strongly discourage), both are included, where the URL query string's parameter values overwrite the values given in the data payload.
This subsection describes how to format the complex type parameter values when executing a query by RUN QUERY or curl command. More details about all parameter types are described in Section "Query Parameter Types"
When square brackets are used in a curl URL, the -g option or escape characters must be adopted. If the parameters are given by data payload (either by file or data payload string), the -g option is not needed and escape characters should not be used.
Below are examples.
This data payload option can accept a file up to 128MB by default. To increase this limit to xxx MB, use the following command:
The upper limit of this setting is 1024 MB. Raising the size limit for the data payload buffer reduces the memory available for other operations, so be cautious about increasing this limit.
For more detailed information about REST++ endpoints and requests, see the RESTPP API User Guide .
The following options are available when running a query:
Some queries run with all or almost all vertices in a SELECT statement s, e.g. PageRank algorithm. In this case, the graph processing engine can run much more efficiently in all-vertex mode. In the all-vertex mode, all vertices are always selected, and the following actions become ineffective:
Filtering with selected vertices or vertex types. The source vertex set must be all vertices.
Filtering with the WHERE clause.
Filtering with the HAVING clause.
Assigning designated vertex or designated type of vertexes. E.g. X = { vertex_type .*}
To run the query in all-vertex mode, use the -av option in shell mode or include __GQUERY__USING_ALL_ACTIVE_MODE=true
in the query string of an HTTP request.
The diagnose option can be turned on in order to produce a diagnostic monitoring log, which contains the processing time of each SELECT block . To turn on the monitoring log, use the -d option in shell mode or __GQUERY__monitor=true
in the query string of an HTTP request.
The path of the generated log file will be shown as a part of output message. An example log is shown below:
The standard output of GSQL queries is in industry-standard JSON format. A JSON object is an unordered set of key:value pairs , enclosed in curly braces. Among the acceptable data types for a JSONvalue are array and object . A JSON array is an ordered list of values , enclosed in square brackets. Since values can be objects or arrays, JSON supports hierarchical, nested structures. Strings are enclosed in double quotation marks. We also use the term field to refer to a key (or a key:value pair) of a given object.
At the top level of the JSON structure are three required fields: "error", "message", and "results". If a query is successful, the value of "error" will be "false", the "message" value will be empty, and the "results" value will be the intended output of the query. If an error or exception occurred during query execution, the "error" value will be "true", the "message" value will be a string message describing the error condition, and the "results" field will be empty.
Beginning with version 2 (v2) of the output specification, an additional top-level field is required: "version". The "version" value is an object with the following fields:
Other top-level objects, such as "code" may appear in certain circumstances. Note that the top-level objects are enclosed in curly braces, meaning that they form an unordered set. They may appear in any order.
Below is an example of the output of a successful query:
The value of the "results" key-value pair is a sequential list of the data objects specified by the PRINT statements of the query. The list order follows the order of PRINT execution. The detailed format of the PRINT statement results is described in the Chapter "Output Statements".
For backward compatibility, TigerGraph platforms whose principal output API is v2 can also produce output with API v1.
The following GSQL statement can be used to set the JSON output API configuration.
Currently, the legal values for <version_string> are "v1" and "v2". This statement sets a persistent system parameter. Each version of the TigerGraph platform is pre-configured to what was the latest output API that at the time of release. For example, platform version 1.1 is configured so that each query will produce v2 output by default.
To show the GSQL text of a query, run "SHOW QUERY query_name ". Additionally, the "ls" GSQL command lists all created queries and identifies which queries have been installed.
As of v2.3, the query_name argument can now use * or ? wildcards from Linux globbing, or it can be a regular expression, when preceded by -r. See SHOW: View Parts of the Catalog
To drop a query, run "DROP QUERY query_name ". The query will be uninstalled (if it has been installed) and removed from the dictionary. The GSQL language will refuse to drop an installed query Q if another query R is installed which calls query Q . That is, all calling queries must be dropped before or at the same time that their called subqueries are dropped.
To drop all queries,, either of the following commands can be used: DROP QUERY ALL DROP QUERY *
The scope of ALL depends on the user's current scope. If the user has set a working graph, then DROP ALL removes all the jobs for that graph. If a superuser has set their scope to be global, then DROP ALL removes all jobs across all graph spaces.
The GSQL language provides full support for vertex and edge insertion, deletion, and attribute update is provided. Therefore, the language is more than just a "query" language.
Each query is considered one transaction. Therefore, modifications to the graph data do not take effect until the entire query is completed (committed). Accordingly, any modification statement does not affect any other statements inside the same query.
The query-body DELETE statement deletes a given set of edges or vertices. This statement can only be used as a query-body statement. (Deletion at the DML-sub level is served by the DML-sub DELETE statement, described next.
The vertexSet and edgeSet terms in the FROM clause follow the same rules as those in the FROM clause in a SELECT statement. The WHERE clause can filter the items in the vertexSet or edgeSet.Below are two examples, one for deleting vertices and one for deleting edges.
The following query can be used to observe the effect of the delete statements. This query counts the person vertices who work in the US ("us") and the worksFor edges for persons in the US. When the initial workNet test data loaded, there are 5 persons and 9 worksFor edges for locationId = "us". If query deleteEx2 is run, the worksAtUS query will then find the 5 persons but 0 worksFor edges. Next, if the deleteEx query is run, the worksAtUS query will then find 0 persons and 0 worksFor edges.
For example, the following sequence of countAtLocation, deleteEx2, and deleteEx queries
will produce the following result:
DML-sub DELETE is a DML-substatement which deletes one vertex or edge each time it is called. (Deletion at the query-body level is served by the Query-body DELETE statement described above.) In practice, this statement resides within the body of a SELECT...ACCUM/POST-ACCUM clause, so it is called once for each member of a selected vertex set or edge set.
The ACCUM clause iterates over an edge set, which can encounter the same vertex multiple times. If you wish to delete a vertex, it is best practice to place the DML-sub DELETE statement in the POST-ACCUM clause rather than in the ACCUM clause.
The following example uses and modifies the graph data for socialNet.
For example, the following sequence of selectUserPosts and deletePosts queries
will produce the following result:
The INSERT INTO statement adds edges or vertices to the graph. However, if the ID value(s) for the inserted vertex/edge match those of an existing vertex/edge, then the new values will overwrite the old values. To insert an edge, its endpoint vertices must already exist, either prior to running the query or inserted earlier in that query.The INSERT INTO statement can be used as a query-body-level statement or a DML-substatement.
The formal syntax is complex because it encompasses several options, and even so, it requires additional explanation. The first name symbol is the vertex type or edge type. The user then has two options:
1) Provide a value for the ID(s) and then each attribute, in the canonical order for the vertex or edge type. This format is similar to that of a LOAD statement. In this case, it is not necessary to explicitly name the attributes, since it is assumed that every one is being referenced, in order.
2) Name the specific attributes to be set, and then provide a corresponding list of values. The attributes can be in any order, with the exception that the IDs must come first. That is, to insert a vertex, the first attribute name must be PRIMARY_ID. To insert an edge, the first two attribute names must be FROM and TO.
For each attribute value, provide either an expression expr or "_", which means the default value for that attribute type. The optional name which follows the first two (id) values is to specify the source vertex type and target vertex type, if the edge type had been defined with wildcard vertex types.
The query insertEx illustrates query-body level INSERT statements: insert new company vertices and worksFor examples into the workNet graph.
The query whoWorksForCompany can be used to check the effect of query insertEx. Prior to running insertEx, running whoWorksForCompany("gsql") will find 0 companies called "gsql" and 0 worksFor edges for company "gsql". If we then run the query insertEx("tic", "tac", "toe", "gsql"), then insertEx("gsql") will find a company called "gsql" and another one called "gsql_jp". Moreover, it will find 3 edges, tic, tac, and toe, with different values for the startMonth, startYear, and fullTime parameters.
The following example show a DML-sub level INSERT. Because the statement applies to allCompanies, several vertices will be inserted.
Example: Add a child company in Japan to US-based company company3. List all the Japan-based companies before and after the insertion.
The UPDATE statement updates the attribute of each vertex or edge in a vertex set or edge set, respectively, with new attribute values.
The set of vertices or edges to update is described in the FROM clause, following the same rules as the FROM clause in a SELECT block. In the SET clause, the DMLSubStmtList may contain assignment statements to update the attributes of a vertex or edge. Both simple base type attributes and collection type attributes can be updated. These assignment statements use the vertex or edge aliases declared in the FROM clause. The optional WHERE clause supports boolean conditions to filter the items in the vertexSet or edgeSet.
The UPDATE statement can only be used as a query-body-level statement. However, DML-sub level updates are still possible by using other statement types. A vertex attribute's value can be updated within the POST-ACCUM clause of a SELECT block by using the assignment operator (=); An edge attribute's value can be updated within the ACCUM clause of a SELECT block by using the assignment operator. In fact, the UPDATE statement is equivalent to a SELECT statement with ACCUM and/or POST-ACCUM to update the vertex or edge attribute values. Below is an example.
Updating a vertex's attribute value in a ACCUM clause is not allowed, because the update can occur multiple times in parallel, and possibly result in an non-deterministic value. If the vertex attribute value update depends on an edge attribute value, use the vertex-attached accumulators to save the value and update the vertex attribute's value in the POST-ACCUM clause.
The query below uses the SELECT statement instead of the UPDATE statement, but is functional similar to the query above. Query updateEx2 reverses the locationId change made by updateEx (changing the location back to "us" from "USA").
Below is an example of an edge update with two attribute changes, including an incremental change (e.startYear = e.startYear + 1):
In addition to the above UPDATE statement and SELECT statement, a simple assignment statement at the query-body level can be used to update the attribute value of a single vertex/edge, if the vertex/edge has been assigned to a variable or parameter.
This section describes the data types that are native to and are supported by the GSQL Query Language. Most of the data objects used in queries come from one of three sources: (1) the query's input parameters, (2) the vertices, edges, and their attributes which are encountered when traversing the graph, or (3) variables defined within the query that are used to assist in the computational work of the query.
This section covers the following subset of the EBNF language definitions:
An identifier is the name for an instance of a language element. In the GSQL query language, identifiers are used to name elements such as a query, a variable, or a user-defined function. In the EBNF syntax, an identifier is referred as a name
. It can be a sequence of letters, digits, or underscores ("_"). Other punctuation characters are not supported. The initial character can only be letter or an underscore.
Different types of data can be used in different contexts. The EBNF syntax defines several classes of data types. The most basic is called baseType. The other independent types are FILE and STRING COMPRESS. The remaining types are either compound data types built from the independent data types, or supersets of other types. The table below gives an overview of their definitions and their uses.
The query language supports the following base types , which can be declared and assigned anywhere within their scope. Any of these base types may be used when defining a global variable, a local variable, a query return value, a parameter, part of a tuple, or an element of a container accumulator. Accumulators are described in detail in a later section.
The default value of each base type is shown in the table below. The default value is the initial value of a base type variable (see Section "Variable Types" for more details), or the default return value for some functions (see Section "Operators, Functions, and Expressions" for more details).
The first seven types (INT, UINT, FLOAT, DOUBLE, BOOL, STRING, and DATETIME) are the same ones mentioned in the "Attribute Data Types" section of the GSQL Language Reference, Part 1 .
FLOAT and DOUBLE input values must be in fixed point d.dddd format, where d is a digit. Output values will be printed in either fixed point for exponential notation, whichever is more compact.
The GSQL Loader can read FLOAT and DOUBLE values with exponential notation (e.g., 1.25 E-7).
VERTEX and EDGE are the two types of objects which form a graph. A query parameter or variable can be declared as either of these two types. In additional, the schema for the graph defines specific vertex and edge types (e.g., CREATE VERTEX person ). The parameter or variable type can be restricted by giving the vertex/edge type in angle brackets < > after the keyword VERTEX/EDGE. A VERTEX or EDGE variable declared without a specifier is called a generic type. Below are examples of generic and typed vertex and edge variable declarations:
The following table map s vertex or edge attribute types in the Data Definition Language (DDL) to GSQL query language types. Accumulators are introduced in Section "Accumulators".
More details are introduced in the Section entitled "JSONOBJECT and JSONARRAY Functions".
A JSONOBJECT or JSONARRAY value is immutable. No operator is allowed to modify its value.
A tuple is a user-defined data structure consisting of a fixed sequence of baseType variables. Tuple types can be created and named using a TYPEDEF statement. Tuples must be defined first, before any other statements in a query.
A tuple can also be defined in a graph schema and then can be used as a vertex or edge attribute type. A tuple type which has been defined in the graph schema does not need to be re-defined in a query.
The graph schema investmentNet contains two complex attributes:
user-defined tuple SECRET_INFO, which is used for the secret_info attribute in the person vertex.
portfolio MAP<STRING, DOUBLE > attribute, also in the person vertex.
The query below reads both the SECRET_INFO tuple and the portfolio MAP. The tuple type does not need to redefine SECRET_INFO. To read and save the map, we define a MapAccum with the same key:value type as the original portfolio map. (The "Accumulators" chapter has more information about accumulators.) In addition, the query creates a new tuple type, ORDER_RECORD.
STRING COMPRESS is an integer type encoded by the system to represent string values. STRING COMPRESS uses less memory than STRING. The STRING COMPRESS type is designed to act like STRING: data are loaded and printed just as string data, and most functions and operators which take STRING input can also take STRING COMPRESS input. The difference is in how the data are stored internally. A STRING COMPRESS value can be obtained from a STRING_SET COMPRESS or STRING_LIST COMPRESS attribute or from converting a STRING value.
STRING COMPRESS type is beneficial for sets of string values when the same values are used multiple times. In practice, STRING COMPRESS are most useful for container accumulators like ListAccum<STRING COMPRESS> or SetAccum<STRING COMPRESS>.
An accumulator (introduced in Section "Accumulator") containing STRING COMPRESS stores the dictionary when it is assigned an attribute value or from another accumulator containing STRING COMPRESS. An accumulator containing STRING COMPRESS can store multiple dictionaries. A STRING value can be converted to a STRING COMPRESS value only if the value is in the dictionaries. If the STRING value is not in the dictionaries, the original string value is saved. A STRING COMPRESS value can be automatically converted to a STRING value.
When a STRING COMPRESS value is output (e.g. by PRINT statement, which is introduced in ), it is shown as a STRING.
STRING COMPRESS is not a base type.
A FILE object is a sequential data storage object, associated with a text file on the local machine.
When referring to a FILE object, we always capitalize the word FILE, to distinguish it from ordinary files.
When a FILE object is declared, associated with a particular text file, any existing content in the text file will be erased . During the execution of the query, content written to the FILE will be appended to the FILE. When the query where the FILE was declared finishes running, the FILE contents are saved to the text file.
A FILE object can be passed as a parameter to another query. When a query receives a FILE object as a parameter, it can append data to that FILE, as can every other query which receives this FILE object as a parameter.
Input parameters to a query can be base type (except EDGE, JSONARARY, or JSONOBJECT). A parameter can also be a SET or BAG which uses base type (except EDGE) as the element type. A FILE object can also be a parameter. Within the query, SET and BAG are converted to SetAccum and BagAccum, respectively (See Section "Accumulator" for more details).
A query parameter is immutable . It cannot be assigned a new value within the query.
The FILE object is a special case. It is passed by reference, meaning that the receiving query gets a link to the original FILE object. The receiving query can write to the FILE.
Previous sections focused on the lowest level building blocks of queries: data types (Section 3), operators, functions, and expressions (Section 5), and a special section devoted to accumulators (Section 4). We now begin to look at the types of statements available in GSQL queries. This section focuses on declaration and assignment statements. Later sections will provide a closer look at the all-important SELECT statement, control flow statements and data modification statements. Furthermore, some types of statements can be nested within SELECT, UPDATE, or control flow statements.
This section covers the following subset of the EBNF syntax:
There are six types of variable declarations in a GSQL query:
Accumulator
Global baseType variable
Local baseType variable
Vertex set
File object
Vertex or Edge aliases
The first five types each have their own declaration statement syntax and are covered in this section. Aliases are declared implicitly in a SELECT statement.
Accumulator declaration is discussed in Section 4: "Accumulators".
After accumulator declarations, base type variables can be declared as global variables. The scope of a global variable is from the point of declaration until the end of the query.
A global variable can be accessed (read) anywhere in the query; however, there are restrictions on wh ere it can be updated. See the subsection below on "Assignment Statements".
Multiple global variables of the same type can be declared and initialized at the same line, as in the example below:
A local variable can be declared only in an ACCUM, POST-ACCUM, or UPDATE SET clause, and its scope is limited to that clause. Local variables can only be of base types (e.g. INT, FLOAT, DOUBLE, BOOL, STRING, VERTEX). A local variable must be declared and initialized together at the same statement.
Within a local variable's scope, another local variable with the same name cannot be declared at the same level. However, a new local variable with the same name can be declared at a lower level (i.e., within a nested SELECT or UPDATE statement.) . The lower declaration takes precedence at the lower level.
In a POST-ACCUM clause, each local variable may only be used in source vertex statements or target vertex statements, not both.
Vertex set variables play a special role within GSQL queries. They are used for both the input and output of SELECT statements. Therefore, before the first SELECT statement in a query, a vertex set variable must be declared and initialized. This initial vertex set is called the seed set .
The query below lists all ways of assigning a vertex set variable an initial set of vertices (that is, forming a seed set).
a vertex parameter, untyped (S1) or typed (S2)
a vertex set parameter, untyped (S3) or typed (S4)
a global SetAccum<VERTEX> accumulator, untyped (S5) or typed (S6)
all vertices of any type (S7, S9) or of one type (S8)
a list of vertex ids in an external file (S10)
copy of another vertex set (S11)
a combination of individual vertices, vertex set parameters, or global variables (S12)
union of vertex set variables (S13)
When declaring a vertex set variable, a set of vertex types can be optionally specified to the vertex set variable. If the vertex set variable set type is not specified explicitly, the system determines the type implicitly by the vertex set value. The type can be ANY, _ (equivalent to ANY), or any explicit vertex type(s). See the EBNF grammar rule vertexEdgeType.
Declaration syntax difference: vertex set variable vs. base type variable
In a vertex set variable declaration, the type specifier follows the variable name and should be surrounded by parentheses: vSetName (type) This is different than a base type variable declaration, where the type specifier comes before the base variable name: type varName
After a vertex set variable is declared, the vertex type of the vertex set variable is immutable. Every assignment (e.g. SELECT statement) to this vertex set variable must match the type. The following is an example in which we must declare the vertex set variable type.
In the above example, the query returns the set of vertices after a 5-step traversal from the input "person" vertex. If we declare the vertex set variable S without explicitly giving a type, because the type of vertex parameter m1 is "person", the GSQL engine will implicitly assign S to be "person"-type. However, if S is assigned to "person"-type, the SELECT statement inside the WHILE loop causes a type checking error, because the SELECT block will generate all connected vertices, including non-"person" vertices. Therefore, S must be declared as a ANY-type vertex set variable.
A FILE object is a sequential text storage object, associated with a text file on the local machine.
When referring to a FILE object, we always capitalize the word FILE, to distinguish it from ordinary files.
When a FILE object is declared, associated with a particular text file, any existing content in the text file will be erased . During the execution of the query, content written to or printed to the FILE will be appended to the FILE. When the query where the FILE was declared finishes running, the FILE contents are saved to the text file.
Note that the declaration statement is invoking the FILE object constructor. The syntax for the constructor includes parentheses surrounding the filepath parameter.
Currently, the filePath must be a absolute path.
Assignment statements are used to set or update the value of a variable, after it has been declared. This applies to baseType variables, vertex set variables, and accumulators. Accumulators also have the special += accumulate statement, which was discussed in the Accumulator section. Assignment statements can use expressions (expr) to define the new value of the variable.
In general, assignment statements can take place anywhere after the variable has been declared. However, t here are some restrictions. These restrictions apply to "inner level" statements which are within the body of a higher-level statement:
The ACCUM or POST-ACCUM clause of a SELECT statement
The SET clause of an UPDATE statement
The body of a FOREACH statement
Global accumulator assignment "=" is not permitted within the body of SELECT or UPDATE statements
Global variable assignment is permitted in ACCUM or POST-ACCUM clauses, but the change in value will not take place until exiting the clause. Therefore, if there are multiple assignment statements for the same variable, only the final one will take effect.
Vertex attribute assignment "=" is not permitted in an ACCUM clause. However, edge attribute assignment is permitted. This is because the ACCUM clause iterates over an edge set.
There are additional restrictions within FOREACH loops for the loop variable. See the Data Modification section.
LOADACCUM() can initialize a global accumulator by loading data from a file. LOADACCUM() has 3+n parameters explained in the table below: (filePath, fieldColumn_1, ...., fieldColumn_n, separator, header), where n is the number of fields in the accumulator. One assignment statement can have multiple LOADACCUM() function calls. However, every LOADACCUM() referring to the same file in the same assignment statement must use the same separator and header parameter values.
Any accumulator using generic VERTEX as an element type cannot be initialized by LOADACCUM().
Below is an example with an external file
Typically, a function call returns a value and so is part of an expression (see Section 5 - Operators, Functions and Expressions). In some cases, however, the function does not return a value (i.e., returns VOID) or the return value can be ignored, so the function call can be used as an entire statement. This is a Function Call Statement.
The GSQL Query Language includes a comprehensive set of control flow statements to empower sophisticated graph traversal and data computation: IF/ELSE, CASE, WHILE, and FOREACH.
Note that any of these statements can be used as a query-body statement or as a DML-sub level statement.
If the control flow statement is at the query-body level, then its block(s) of statements are query-body statements ( queryBodyStmts ). In a queryBodyStmts block , each individual statement ends with a semicolon, so there is always a semicolon at the end.
If the control flow statement is at the DML-sub level, then its block(s) of statements are DML-sub statements ( DMLSubStmtList ). In a DMLSubStmtList block, a comma separates statements, but there is no punctuation at the end.
The "Statement Types" subsection in the Chapter on "CREATE / INSTALL / RUN / SHOW / DROP QUERY" has a more detailed general example of the difference between queryBodyStmts and DMLSUbStmts.
If a particular IF condition is not true, then the flow proceeds to the next ELSE IF condition. When a true condition is encountered, its corresponding block of statements is executed, and then the IF statement terminates (skipping any remaining ELSE-IF or ELSE clauses). If an ELSE-clause is present, its block of statements are executed if none of the preceding conditions are true. Overall, the functionality can be summarized as "execute the first block of statements whose conditional test is true."
One CASE statement contains one or more WHEN-THEN clauses, each WHEN presenting one expression. The CASE statement may also have one ELSE clause whose statements are executed if none of the preceding conditions are true.
There are two syntaxes of the CASE statement: one equivalent to an if-else statement, and the other is structured like a switch statement. The if-else version evaluates the boolean condition within each WHEN-clause and executes the first block of statements whose condition is true. The optional concluding ELSE-clause is executed only if all WHEN-clause conditions are false.
The switch version evaluates the expression following the keyword WHEN and compares its value to the expression immediately following the keyword CASE. These expressions do not need to be boolean; the CASE statement compares pairs of expressions to see if their values are equal. The first WHEN-THEN clause to have an expression value equal to the CASE expression value is executed; the remaining clauses are skipped. The optional ELSE-clause is executed only if no WHEN-clause expression has a value matching the CASE value.
The WHILE statement iterates over its body ( queryBodyStmts or DMLSubStmtList ) until the condition evaluates to false or until the iteration limit is met. A condition is any expression that evaluates to a boolean. The condition is evaluated before each iteration. CONTINUE
statements can be used to change the control flow within the while block. BREAK
statements can be used to exit the while loop.
A WHILE statement may have an optional LIMIT clause. LIMIT clauses has a constant positive integer value or integer variable to constrain the maximum number of loop iterations. The example below demonstrates how the LIMIT behaves.
If a limit value is not specified, it is possible for a WHILE loop to iterate infinitely. It is the responsibility of the query author to design the condition logic so that it is guaranteed to eventually be true (or to set a limit).
Below are a number of examples that demonstrate the use of WHILE statements.
The formal syntax for forEachControl appears complex. It can be broken down into the following cases:
name IN setBagExpr
tuple IN setBagExpr
name IN RANGE [ expr, expr ]
name IN RANGE [ expr, expr ].STEP ( expr )
Note that setBagExpr includes container accumulators and explicit sets.
The FOREACH statement has the following restrictions:
In a DML-sub level FOREACH, it is never permissible to update the loop variable (the variable declared before IN, e.g., var in "FOREACH var IN setBagExpr").
In a query-body level FOREACH, in most cases it is not permissible to update the loop variable. The following exceptions apply:
If the iteration is over a ListAccum, its values can be updated.
If the iteration is over a MapAccum, its values can be updated, but its keys cannot.
If the iteration is over a set of vertices, it is not permissible to access (read or write) their vertex-attached accumulators.
A query-body-level FOREACH cannot iterate over a set or bag of constants. For example, FOREACH i in (1,2,3) is not supported. However, DML-sub FOREACH does support this.
The FOREACH statement has an optional RANGE clause RANGE[expr, expr], which can be used to define the iteration collection. Optionally, the range may specify a step size: RANGE[expr, expr].STEP(expr)
Each expr must evaluate to an integer. Any of the integers may be negative, but the step expr may not be 0.
The clause RANGE[a,b].STEP(c) produces the sequence of integers from a to b, inclusive, with step size c. That is, (a, a+c, a+2*c, a+3*c, ... a+k*c), where k = the largest integer such that |k*c| ≤ |b-a|.
If the .STEP method is not given, then the step size c = 1.
The step value can be positive for an ascending range or negative for a descending range. If the step has the wrong polarity, then the loop has zero iterations; that is, the exit condition is already satisfied.
CONTINUE and BREAK Statements
The CONTINUE and BREAK statements can only be used within a block of a WHILE or FOREACH statement. The CONTINUE statement branches control flow to the end of the loop, skipping any remaining statements in the current iteration, and proceeding to the next iteration. That is, everything in the loop block after the CONTINUE statement will be skipped, and then the loop will continue as normal.
The BREAK statement branches control flow out of the loop, i.e., it will exit the loop and stop iteration.
Below are a number of examples that demonstrate the use of BREAK and CONTINUE.
This section discusses the SELECT statement in depth and covers the following EBNF syntax:
The SELECT block selects a set of vertices FROM a vertex set or edge set . There are a number of optional clauses that define and/or refine the selection by constraining the vertex or edge set or the result set. There are two types of SELECT, vertex-induced and edge-induced . Both result in a vertex set, known as the result set .
Size limitation
There is a maximum size limit of 2GB for the result set of a SELECT block . If the result of the SELECT block is larger than 2GB, the system will return no data. NO error message is produced.
The SELECT statement is an assignment statement with a SELECT block on the right hand side. The SELECT block has many possible clauses, which fit together in a logical flow. Overall, the SELECT block starts from a source set of vertices and returns a result set that is either a subset of the source vertices or a subset of their neighboring vertices. Along the way, computations can be performed on the selected vertices and edges. The figure below graphically depicts the overall SELECT data flow. While the ACCUM and POST-ACCUM clauses do not directly affect which vertices are included in the result set, they affect the data (accumulators) which are attached to those vertices.
There are two options for the FROM clause: vertexSet or edgeSet. If vertexSet is used, then the query will be a vertex-induced selection. If edge is used, then the query is an edge-induced selection.
A vertex-induced selection takes an input set of vertices and produces a result set, which is a subset of the input set. The FROM argument has the form Source:s , where Source is a vertex set. Sourceis optionally followed by :s , where s is a vertex alias which represents any vertex in the set Source.
This statement can be interpreted as " Select all vertices s, from the vertex set Source ." The result is a vertex set.
Below is a simple example of a vertex-induced selection.
Multiple types can also be specified by using delimiter "|". Additionally, the keywords "_" or "ANY" can be used for denoting a set which can include any vertex or edge type.
An edge-induced selection starts from a set of vertices, defines a set of edges incident to that set, and produces a result set of vertices that are also incident to those edges. Typically, this is used to traverse from a set of source vertices over a specific edge type to a set of target vertices. The FROM clause argument (defined formally by the EBNF edgeSet rule) is structured as an edge template:Source:s-(eType:e)->tType:t . The edge template has three parts: the source vertex set (Source), the edge type or types (eType), and the target vertex type or types (tType). Both s and t are the vertex aliases and e is the edge alias. The template defines a pattern s → e → t, from source vertex s, across eType edges, to tType target vertices. The edge alias e represents any edge that fits the complete pattern. Likewise, s and t are aliases that represent any source vertices and target vertices, respectively, that fit the complete pattern.
Either the source vertex set ( s ) or target vertex set ( t ) can be used as the SELECT argument, which determines the result of the SELECT statement. Note the small difference in the two SELECT statements below.
resultSet1 is based on the source end of the edges. resultSet2 is based on the target end of the selected edges. However, resultSet1 is NOT identical to the Source vertex set. It is only those members of Source which connect to an eType edge and then to a tType vertex. Other clauses (presented later in this "SELECT Statement" section, can do additional filtering of the Source set.
We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features which are only available to vertex and edge aliases.
The FROM clause chooses edges and target vertices by type. The EBNF symbol vertexEdgeType describes the options:
Note that eType and tType are optional. If eType/tType is omitted (or if ANY or _ is used), then the SELECT will seek out any edge or target vertex that is valid (i.e., there exists a valid path between two vertices over an edge). For the example below, if V1 and V2 are the only possible reachable vertex types via eType , we can omit the target vertex type, making all of the following SELECT statements equivalent. The system will infer the target vertex type at run time.
If is legal to declare an alias without explicitly stating an edge/target type. See the examples below.
Type inference is used whenever possible for the edge set and target vertex set to prune ineligible edges and thereby optimize performance. The vertex type in Source is checked against the graph schema to find all incident edge types. The knowledge of the graph schema is combined with the selection's explicit type conditions given by eType and tType, as well as explicit and implicit type conditions in the WHERE clause to determine a final set of eligible edge sets which match the pattern Source → eType → tType. With type inference, the user has the freedom to express only as much as necessary to select edges.
Similarly, the GSQL engine will infer the edge type at run time. For example, if E1, E2 , and E3 are the only possible edge types that can be traversed to reach vertices of type tType , we can omit specifying the edge type, making the following SELECT statements equivalent.
The following are a set of queries that demonstrate edge-induced SELECT blocks. The allPostsLiked and allPostsMade queries show how the target vertex type can be omitted. The allPostsLikedOrMade query uses the "|" operator to select multiple types of edges.
This example is another edge selection that uses the "|" operator to select edges that have target vertices of multiple types.
Vertex and edge aliases are declared within the FROM clause of a SELECT block, by using the character ":", followed by the alias name. Aliases can be accessed anywhere within the same SELECT block. They are used to reference a single selected vertex or edge of a set. It is through the vertex or edge aliases that attributes of these vertices or edges can be accessed.
For example, the following code snippets shows two different SELECT statements. The first SELECT statement starts from a vertex set called allVertices, and the vertex alias name v can access each individual vertex from allVertices. The second SELECT statement selects a set of edges. It can use the vertex alias s to reference the source vertices, or the alias t to reference the target vertices.
The following example shows an edge-based SELECT statement, declaring aliases for all three parts of the edge. In the ACCUM clause, the e and t aliases are assigned to local vertex and edge variables.
We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features which are only available to vertex and edge aliases.
The SAMPLE clause is an optional clause that selects a uniform random sample from the population of edges or vertices specified in the FROM argument. To be clear, the edge population consists of those edges which satisfy all three parts – source set, edge type, and target type – of the FROM clause. The SAMPLE clause is intended to provide a representative sample of the distribution of edges (or vertices) connected to hub vertices, instead of dealing with all edges. A hub vertex is a vertex with a relatively high degree. (The degree of a vertex is the number of edges which connect to it. If edges are directional, one can distinguish between indegree and outdegree.)
Note
Currently, the WHEN condition that can be used with a SAMPLE clause is limited strictly to checking if the result of a function call on a vertex is greater than or greater than/equal to some number.
The expression following SAMPLE specifies the sample size, either an absolute number or a percentage of the population. The expression in sampleClause must evaluate to a positive integer. There are two sampling methods. One is sampling based on edge id. The other is based on target vertex id: if a target vertex id is sampled, all edges from this source vertex to the sampled target vertex are sampled.
Given that the sampling is random, some of the details of each of the example queries may change each time they are run.
The following query displays two modes of sampling: an absolute number of edges from a source vertex and a percentage of edges fro a source vertex. We use the computerNet graph (see Appendix D). In computerNet, there are 31 vertices and 43 edges, but only 7 vertices are source vertices. Moreover, c1, c12, and c23 are hub nodes, with at least 10 outgoing edges each. For the absolute count case, we set the size to 1 edge per source vertex, which is equivalent to a random walk. We expect exactly 7 edges to be selected. For the percentage sampling case, we sample 33% of the edges for vertices which have 3 or more outgoing edges. We expect about 15 edges, but the number may vary.
Below is an example of using SELECT to only traverse one edge for each source vertex. The vertex-attached accumulators @timesTraversedNoSample and @timesTraversedWithSample are used to keep track of the number of times an edge is traversed to reach the target vertex. Without using sampling, this occurs once for each edge; thus @timesTraversedNoSample has the same number as the in-degree of the vertex. With sampling edges, the number of edges is restricted. This is reflected in the @timesTraversedWithSample accumulator. Notice the difference in the result set. Because only one edge per source vertex is traversed when the SAMPLE clause is used, not all target vertices are reached. The vertex company3 has 3 incident edges, but in one instance of the query execution, it is never reached. Additionally, company2 has 6 incident edges, but only 4 source vertices sampled an edge incident to company2 .
Since the PRINT statements are placed at the end of query, the two vertex sets beforeSample and afterSample are almost identical, showing the final values of both accumulators@timesTraversedNoSample and @timesTraversedWithSample. There is one difference: company3 is not included in afterSample because none of the sample-selected edges reached company3.
The WHERE clause is an optional clause that constrains edges and vertices specified in the FROM and SAMPLE clauses.
The WHERE clause uses a boolean condition to test each vertex or edge in the FROM set (or the sampled vertex and edge sets, if the SAMPLE clause was used).
If the expression evaluates to false for vertex/edge X, then X excluded from further consideration in the result set. The expression may use constants or any variables or parameters within the scope of the SELECT, arithmetic operators (+, -, *, /,%), comparison operators (==, !=, <, <=, >,>=), boolean operators (AND, OR, NOT), set operators (IN, NOT IN) and parentheses to enforce precedence. The WHERE conditional expression may use any of the variables within its scope (global accumulators, vertex set variables, query input parameters, the FROM clause's vertex and edge sets (or their vertex and edge aliases), or any of the attributes or accumulators of the vertex/edge sets.) For a more formal explanation of condition, see the EBNF definitions of condition and expr.
Using built-in vertex and edge attributes and functions, such as .type and .neighbors(), the WHERE clause can be used to implement sophisticated selection rules for the edge traversal. In the following example, the selection conditions are completely specified in the WHERE clause, with no edge types or vertex types mentioned in the FROM clause.
The following examples demonstrate using the WHERE clause to limit the resulting vertex set based on a vertex attribute.
WHERE NOT limitations
The NOT operator may not be used in combination with the .type attribute selector. To check if an edge or vertex type is not equal to a given type, use the != operator. See the example below.
The following example shows the equivalence of using WHERE as a type filter as well as its limitations.
The following example uses edge attributes to determine which workers are registered as full time for some company.
If multiple edge types are specified in edge-induced selection, the WHERE clause should use OR to separate each edge type or each target vertex type. For example,
The above query is compilable. However, if we use line 5 as the WHERE clause instead, the query is not compilable. The edge-type conflict checking detects an error, because i t uses attributes from both "liked" edges and "friend" edges without separating them out by OR.
The optional ACCUM and POST-ACCUM clauses enable sophisticated aggregation and other computations across the set of vertices or edges selected by the preceding FROM, SAMPLE, and WHERE clauses. A query can contain one or both of these clauses. The statements in an ACCUM clause are applied for every edge in an edge-induced selection or every vertex in a vertex-induced selection.
If there is more than one statement in the ACCUM clause, the statements are separated by commas and executed sequentially for each selected element. However, the TigerGraph system uses parallelism to improve performance. Within an ACCUM clause, each edge is handled by a separate process. As such, there is no fixed order in which the edges are processed within the ACCUM clause and the edges should not be treated as executing sequentially. The accumulators are mutex variables shared among each of these processes. The results of any accumulation within the ACCUM clause is not complete until all edges are traversed. Any inspection of an intermediate result within the ACCUM is incomplete and may not be that meaningful.
The statements within the ACCUM clause are executed sequentially for a given vertex or edge. However, there is no fixed order in which a vertex set or edge set is processed.
The optional POST-ACCUM clause enables aggregation and other computations across the set of vertices (but not edges) selected by the preceding clauses. POST-ACCUM can be used without ACCUM. If it is preceded by an ACCUM clause, then it can be used for 2-stage accumulative computation: a first stage in ACCUM followed by a second stage in POST-ACCUM.
As of v1.1, the keyword POST-ACCUM may also be spelled with an underscore: POST_ACCUM.
Each statement within the POST-ACCUM clause can refer to either source vertices or target vertices but not both.
In edge-induced selection, since the ACCUM clause iterates over edges, and often two edges will connect to the same source vertex or to the same target vertex, the ACCUM clause can be repeated multiple times for one vertex.
Operations that are to be performed exactly once per vertex should be performed in the POST-ACCUM clause.
The primary purpose of the ACCUM or POST-ACCUM clause is to collect information about the graph by updating accumulators (via += or =). See the "Accumulator" section for details on the += operation. However, other kinds of statements (e.g., branching, iteration, local assignments) are permitted to support more complex computations or to log activity. The EBNF syntax below defines the allowable kinds of statements that can occur within an ACCUM or POST-ACCUM. The DMLSubStmt list is similar to the queryBodyStmt list which applies to statements outside of a SELECT block; it is important to note the differences. Each of these statement types is discussed in one of the main sections of this reference document.
Note that DML-sub-statements do not include global accumulator assignment statement (gAccumAssignStmt) but global accumulator accumulation statement (gAccumAccumStmt). Global accumulators may perform accumulation += but not assignment "=" within these clauses.
There are additional restrictions on DML-sub level statements:
Global variable assignment is permitted in ACCUM or POST-ACCUM clauses, but the change in value will not take place until the query completes. Therefore, if there are multiple assignment statements for the same variable, only the final one will take effect.
Vertex attribute assignment "=" is not permitted in an ACCUM clause. However, edge attribute assignment is permitted. This is because the ACCUM clause iterates over an edge set. Vertex attribute attribute assignment is permitted in the POST-ACCUM clause. Like all updates, the change in value does not take place until the query completes.
To reference each element of the selected set, use the aliases defined in the FROM clause. For example, assume that we have the following aliases:
Let (V1, V2,... Vn) be the vertices in the vertex-induced selection . The following pseudocode emulates ACCUM clause behavior.
Let E = (E1, E2,... En) be the edges in the edge-induced selected set. Further, let S = (S1,S1,...Sn) and T= (T1,T2,...Tn) be the multisets (bags) of source vertices and target vertices which correspond to the edge set. S and T are bags, because they can contain repeated elements.
Note that any reference to the source alias s or target alias t is for the endpoint vertices of the current edge.
Similarly, the POST-ACCUM clause acts like a FOREACH loop on the vertex result set specified in the SELECT clause (e.g., either S or T).
If multiple edge types are specified in edge-induced selection, each ACCUM statement in ACCUM clause checks whether edge types are conflicted. If only a subset of edge types are effective in an ACCUM statement , this statement is not executed on other edge types. For example:
In the above example, line 6 is only executed on "liked" edges, because "actionTime" is the attribute of "liked" edge only. Similarly, line 7 is only executed on "friend" edges, because "gender" is the attribute of "person" only, and only "friend" edge uses "person" as target vertex. However, line 8 causes a compilation error, because it uses multiple edges where some edges cannot be supported in a part of the statement, i.e., "liked" edges doesn't have t.gender, "friend" edges doesn't have e.actionTime.
We strongly suggest that if multiple edge types are specified in edge-induced selection, ACCUM clauses should uses CASE statement (see Section "Control Flow Statements" for more details) to separate the operation on each edge type or each target vertex type (or combination of target vertex type and edge type). The edge-type conflict checking then checks the ACCUM statement inside each THEN/ELSE blocks based on the condition. For example,
The above query is compilable. However, if we switch line 8 and line 10, the edge-type conflict checking generates errors because "liked" edges doesn't support t.gender and "friend" edges doesn't support e.actionTime.
Similar to the ACCUM clause, if multiple source/target vertex types are specified in edge-induced selection and the POST-ACCUM clauses accesses source/target vertex, each ACCUM statement in POST-ACCUM clause checks whether source/target vertex types are conflicted. If only a subset of source/target vertex types are effective in a POST-ACCUM statement, this statement is not executed on other source/target vertex types.
Similar to ACCUM clause, we strongly suggest that if multiple source/target vertex types are specified in edge-induced selection and the POST-ACCUM clauses accesses source/target vertex, POST-ACCUM clauses should uses CASE statement (see Section "Control Flow Statements" for more details) to separate the operation on each source/target vertex type. The vertex type conflict checking then checks the ACCUM statement inside each THEN/ELSE blocks based on the condition.
Prior to v1.0, a vertex-attached accumulator could only be updated in an ACCUM or POST-ACCUM clause and only if its vertex was selected for by the preceding FROM-SAMPLE-WHERE clauses.
Beginning in v1.0, there are additional circumstances where a vertex-attached accumulator may be updated. Vertices which are referenced via a vertex-attached accumulator of a selected vertex may have their vertex-attached accumulators updated in the ACCUM clause (but not in the POST-ACCUM clause). That is, a vertex referenced by an selected vertex can be updated, with some limitations explained below. Some examples will help to illustrate this more complex condition.
Suppose a query declares a vertex-attached accumulator which holds vertex information . We call this a vertex-holding accumulator . This could take several forms:
A scalar accumulator, e.g., MaxAccum< VERTEX > @maxV;
A collection accumulator: e.g., ListAccum< VERTEX > @listV;
An accumulator containing tuple(s), where the tuple type contains a VERTEX field.
If a vertex V is selected, then not only can V's accumulators be updated, but the vertices stored in its vertex-holding accumulators can also be updated, in the ACCUM clause.
Before these indirectly referenced vertices can be used, they need to be activated . There are two ways to activate an indirect vertex:
A vertex from a vertex-holding accumulator is first assigned to a local vertex variable. The vertex can now be updated through the local vertex variable.
A FOREACH loop can iterate on a vertex-holding collection accumulator. The vertices can now be updated through the loop variable.
The following uses are NOT supported by the new rules:
Indirectly activated vertices may not be updated in the POST-ACCUM clause or outside of a SELECT statement.
Passing a vertex into the query as an input parameter is not a route to activation.
Using a global vertex-holding accumulator is not a route to activation.
If a vertex is being indirectly activated by assigning it to a local variable (e.g., a variable declaring in ACCUM or POST-ACCUM), note the following rule, which always applies to all local variables:
A local variable can be declared and initialized in an ACCUM block once. It cannot be redeclared or reassigned later in the ACCUM block.
The following query demonstrates updates to indirectly activated vertices.
We now show several examples. This example demonstrates how ACCUM or POST-ACCUM can be used to count the number of vertices in the given set.
This example uses ACCUM to find all the subjects a user posted about.
This example shows each person's posted vertices and each person's like behaviors (liked edges).
This example counts the total number of times each topic is used.
This is an example of using ACCUM and POST-ACCUM in conjunction. The ACCUM traverses the graph and finds all people who live and work in the same country. After this is determined, POST-ACCUM examines each vertex (person) to see if they work where they live.
This is an example of a POST-ACCUM only that counts the number people with a particular gender.
The optional HAVING clause provides constraints on the result set of the SELECT. The constraints are applied after ACCUM and POST-ACCUM actions. This differs from the WHERE clause, which is applied before the ACCUM and POST-ACCUM actions.
A HAVING clause can only be used if there is an ACCUM or POST-ACCUM clause . The condition is applied to each vertex in the SELECT set (either source or target vertices) which also fulfilled the FROM and WHERE conditions. The HAVING clause is intended to test one or more of the accumulator variables that were updated in the ACCUM or POST-ACCUM clause, though the condition may be anything that equates to a boolean value. If the condition is false for a particular vertex, then that vertex is excluded from the result set.
The following example demonstrates using the HAVING clause to constrain a result set based on the vertex accumulator variable which was updated during the ACCUM clause.
If the activityThreshold parameter is set to 3, the query returns 5 vertices:
If the activityThreshold parameter is set to 2, the query would return 8 vertices. With activityThreshold = 4, the query would return no vertices.
The following example demonstrates the equivalence of a SELECT statement in which the condition for the HAVING clause is always true.
The following shows an example of equivalent result sets from using WHERE vs. HAVING. Recall that the WHERE clause is evaluated before the ACCUM and that the HAVING clause is evaluated after the ACCUM. Both constrain the result set based on a condition that vertices must meet.
The following example has a compilation error because the result set is taken from the source vertices, but the HAVING condition is checking the target vertices.
The optional ORDER BY clause sorts the result set.
ASC specifies ascending order (least value first), and DESC specifies descending order (greatest value first). If neither is specified, then ascending order is used. Each expr must refer to the attributes or accumulators of a member of the result set, and the expr must evaluate to a sortable value (e.g., a number or a string). ORDER BY offers hierarchical sorting by allowing a comma-separated list of expressions, sorting first by the leftmost expr. It uses the next expression only to sort items where the current sort expr results in identical values. Any items in the result set which cannot be sorted (because the sort expressions do not pertain to them) will appear at the end of the set, after the sorted items.
The following example demonstrates the use of ORDER BY with multiple expressions. The returned vertex set is first ordered by the number of friends of the vertex, and then ordered by the number of coworkers of that vertex.
The optional LIMIT clause sets constraints on the number and ranking of items included in the final result set.
Each of the expr must evaluate to a nonnegative integer. To understand LIMIT, note that the tentative result set is held in the computer as a list of vertices. If the query has an ORDER BY clause, the order is specified; otherwise the list order is unknown. Assume we number the vertices as v_1 , v_2 , ..., v_n . The LIMIT clause specifies a range of vertices, starting from a lower position in the list to an upper position.
There are three forms:
Case 1: LIMIT k
When a single expr is provided, LIMIT returns the first k elements from the tentative result set. If there are fewer than k elements available, then all elements will be returned in the result set. If k=5 and the tentative result set has at least 5 items, then the final result list will be [ v_1 , v_2 , v_3 , v_4 , v_5 ].
Case 2: LIMIT j, k
When a comma separates two expressions, LIMIT treats the first expression j as an offset. That is, it skips the first j items in the list. The second expr k tells the maximum number of items items to include. If the list has at least 7 items, then LIMIT 2, 5 would return [ v_3 , v_4 , v_5, v_6 , v_7 ].
Case 3: LIMIT k OFFSET j
The behavior of Case 3 is the same as that of Case 2, except that the syntax is different. The keyword OFFSET separates the two expressions, and the count comes before the offset, rather than vice versa. If the list has at least 7 items, then LIMIT 5 OFFSET 2 would return [ v_3 , v_4 , v_5, v_6 , v_7 ].
If any of the expressions evaluate to a negative integer, the results are undefined.
OFFSET is intended for result sets which are in a known order. It is a compile time error to use OFFSET without the ORDER BY clause.
The following examples demonstrate the various forms of the LIMIT clause.
The first example shows the LIMIT clause when used as an upper limit. It returns a result set with a maximum size of 4 elements in the set.
The following example shows how to use the LIMIT clause with an offset.
The following example shows the alternative syntax for a result size limit with an offset. This time we try larger values for offset and size. In a large data set, limitTest(5,20) might return 20 vertices, but since we don't have 25 vertices in the original data, the output was fewer than 20 vertices.
These two base types allow users to pass a complex data object or to write output in a customized format. These types follow the industry standard definition of JSON at . A JSONOBJECT instance's external representation (as input and output) is a string, starting and ending with curly braces "{" and "}", which enclose an unordered list of string:value pairs. A JSONARRAY is represented as a string, starting and ending with square brackets "[" and "]", which enclose an ordered list of values . Since a value can be an object or an array, JSON supports hierarchical, nested data structures.
The IF statement provides conditional branching: execute a block of statements ( queryBodyStmts or DMLSubStmtList ) only if a given condition is true. The IF statement allows for zero or more ELSE-IF clauses, followed by an optional ELSE clause. The IF statement can be used either at the query-body level or at the DML-sub-statement level. (See the .)
The CASE statement provides conditional branching: execute a block of statements only if a given condition is true. CASE statements can be used as query-body statements or DML-sub-statements. (See the .)
The WHILE statement provides unbounded iteration over a block of statements. WHILE statements can be used as query-body statements or DML-sub-statements. (See the .)
The FOREACH statement provides bounded iteration over a block of statements. FOREACH statements can be used as query-body statements or DML-sub-statements. (See the .)
Feature
Not Supported
Supported as of v2.2 (1)
General
User-defined exceptions
Data update to the graph
Access to target vertex's values in ACCUM
Query calling a distributed query
Statement Types
LOADACCUM
FOREACH, WHILE, UPDATE, INSERT, DELETE
SELECT clauses
SAMPLE clause
exact count for LIMIT clause (2)
Data types
LIST, SET, BAG
JSONOBJECT, JSONARRAY
ArrayAccum
SET<> parameter, GroupByAccum
Operations and Operators
Any data update to the graph, including assignment statements to vertex attributes
vertex and edge functions
.neighbors(), .neighborAttribute(), .edgeAttribute()
isDirected()
.outdegree()
accumulator and collection functions
reallocate()
size(), get(), top(), pop(), update(), remove(), removeAll(), clear(), contains), containskey(), resize()
Other functions
selectVertex(), to_vertex(), to_vertex_set(),
COALESCE(), EVALUATE()
sum(), count(), min(), max(), avg()
Parameter Types
any baseType (except EDGE, JSONOBJECT, JSONARRAY): INT, UINT, FLOAT, DOUBLE, STRING, BOOL, STRING, VERTEX
SET<baseType>, BAG<baseType>
Exception: EDGE type is not supported, either as a primitive parameter or as part of a complex type.
Return Types
any baseType (including EDGE): INT, UINT, FLOAT, DOUBLE, STRING, BOOL, STRING, VERTEX, EDGE, JSONOBJECT, JSONARRAY
any accumulator type, except GroupByAccum
EBNF term
Common Name
Description
assignStmt
Assignment Statement
See Chapter 6: "Declaration and Assignment Statements"
vSetVarDeclStmt
Vertex Set Variable Declaration Statement
See Chapter 6: "Declaration and Assignment Statements"
gAccumAssignStmt
Global Accumulator Assignment Statement
See Chapter 6: "Declaration and Assignment Statements"
gAccumAccumStmt
Global Accumulator Accumulation Statement
See Chapter 6: "Declaration and Assignment Statements"
funcCallStmt
Functional Call or Query Call Statement
See Chapter 6: "Declaration and Assignment Statements"
selectStmt
SELECT Statement
See Chapter 7: "SELECT Statement"
queryBodyCaseStmt
query-body CASE statement
See Chapter 8: "Control Flow Statements"
queryBodyIfStmt
query-body IF statement
See Chapter 8: "Control Flow Statements"
queryBodyWhileStmt
query-body WHILE statement
See Chapter 8: "Control Flow Statements"
queryBodyForEachStmt
query-body FOREACH statement
See Chapter 8: "Control Flow Statements"
updateStmt
UPDATE Statement
See Chapter 9: "Data Modification Statements"
insertStmt
INSERT INTO statement
See Chapter 9: "Data Modification Statements"
queryBodyDeleteStmt
Query-body DELETE Statement
See Chapter 9: "Data Modification Statements"
printStmt
PRINT Statement
See Chapter 10: "Output Statements"
logStmt
LOG Statement
See Chapter 10: "Output Statements"
returnStmt
RETURN Statement
See Chapter 10: "Output Statements"
raiseStmt
PRINT Statement
See Chapter 11: "Exception Statements"
tryStmt
TRY Statement
See Chapter 11: "Exception Statements"
EBNF term
Common Name
Description
assignStmt
Assignment Statement
See Chapter 6: "Declaration and Assignment Statements"
funcCallStmt
Functional Call Statement
See Chapter 6: "Declaration and Assignment Statements"
gAccumAccumStmt
Global Accumulator Accumulation Statement
See Chapter 6: "Declaration and Assignment Statements"
vAccumFuncCall
Vertex-attached Accumulator Function Call Statement
See Chapter 6: "Declaration and Assignment Statements"
localVarDeclStmt
Local Variable Declaration Statement
See Chapter 7: "SELECT Statement"
insertStmt
INSERT INTO Statement
See Chapter 8: "Control Flow Statements"
DMLSubDeleteStmt
DML-sub DELETE Statement
See Chapter 9: "Data Modification Statements"
DMLSubcaseStmt
DML-sub CASE statement
See Chapter 9: "Data Modification Statements"
DMLSubIfStmt
DML-sub IF statement
See Chapter 9: "Data Modification Statements"
DMLSubForEachStmt
DML-sub FOREACH statement
See Chapter 9: "Data Modification Statements"
DMLSubWhileStmt
DML-sub WHILE statement
See Chapter 9: "Data Modification Statements"
logStmt
LOG Statement
See Chapter 10: "Output Statements"
Parameter type
RUN QUERY
Query string for GET /query HTTP Request
SET or BAG of primitives
Square brackets enclose the collection of values.
Example: a set p1 of integers: [1,5,10]
Assign multiple values to the same parameter name.
Example: a set p1 of integers: p1=1&p1=5&p1=10
VERTEX<type>
If the vertex type is specified in the query definition, then the vertex argument is simply vertex_id
Example: vertex type is person and desired id is person2. "person2"
parameterName=vertex_id
Example: vertex type is person and desired id is person2. vp=person2
VERTEX
(type not pre-specified)
If the type is not defined in the query definition, then the argument must provide both the id and type in parentheses:(vertex_id, vertex_type)
Example: a vertex va w ith id="person1" and type="person: ("person1","person")
parameterName=vertex_id¶meterName.type=vertex_type
Example: parameter vertex va when type="person" and id="person1": va=person1&va.type=person
SET or BAG of VERTEX<type>
Same as a SET or BAG of primitives, where the primitive type is vertex_id. Example: [ "person3", "person4" ]
Same as a SET or BAG of primitives, where the primitive type is vertex_id. Example: vp=person3&vp=person4
SET or BAG of VERTEX
(type not pre-specified)
Same as a SET or BAG of vertices, with vertex type not pre-specified. Square brackets enclose a comma-separated list of vertex (id, type) pairs. Mixed types are permitted. Example: [ ("person1","person") , ("11","post") ]
The SET or BAG must be treated like an array, specifying the first, second, etc. elements with indices [0], [1], etc. The example below provides the same input arguments as the RUN QUERY example to the left.
vp[0]=person1&vp[0].type=person&vp[1]=11&vp[1].type=post
"version" field
value
api
A string specifying the output API version. Values are specified as follows:
"v1": Output API used in TigerGraph platform v0.8 through v1.0. If the output does not have a "version" field, the JSON format is presumed to be v1.
"v2": Output API introduced in TigerGraph platform v1.1. This is the latest API. (Note: for backward compatibility, TigerGraph platforms which support the v2 output api can be configured to produce either v1 or v2 output.)
schema
An integer representing which version of the user's graph schema is currently in use. When a CREATE GRAPH statement is executed, the version is initialized to 0. Each time a SCHEMA_CHANGE JOB is run, the schema value is incremented (e.g., 1, 2, etc.).
type | default value |
INT, UINT, FLOAT, DOUBLE (see note below) | 0 |
BOOL | false |
STRING | "" |
DATETIME | 1970-01-01 00:00:00 |
VERTEX | "Unknown" |
EDGE | No edge: {} |
JSONOBJECT | An empty object: {} |
JSONARRAY | An empty array: [] |
DDL | GSQL Query |
INT | INT |
UINT | UINT |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
BOOL | BOOL |
STRING | STRING |
STRING COMPRESS | STRING |
SET< type > | SetAccum< type > |
LIST< type > | ListAccum< type > |
DATETIME | DATETIME |
parameter name | type | description |
filePath | string | The absolute file path of the input file to be read. A relative path is not supported. |
accumField1,...., accumFieldN | $ num , or $ "column_name" if header is true. | The column position(s) or column name(s) of the data file which supply data values to each field of the accumulator. |
separator | single-character string | The separator of columns. |
header | bool | Whether this file has a header. |
accepted vertex/edge types | accepted vertex/edge types |
_ | any type |
ANY | any type |
name | the given vertex/edge type |
name | name ... | any of the vertex/edge types listed |
A comment is a section of text that is ignored by the language parser; its purpose is to provide information to human readers. The comment markers follow the conventions used in C++ and SQL:
Single-line or partial-line comments begin with either # or // and end at the end of the line (with the newline character).
Multi-line comment blocks begin with /* and end with */
EBNF term | Description | Use Case |
baseType | INT, UINT, FLOAT, DOUBLE, STRING, BOOL, DATETIME, VERTEX, EDGE, JSONOBJECT, or JSONARRAY |
|
tupleType | sequence of baseType |
|
accumType | family of specialized data objects which support accumulation operations |
FILE | FILE object |
|
parameterType | baseType, a SET or BAG of baseType, or FILE object |
|
STRING COMPRESS | STRING COMPRESS |
|
elementType | baseType, STRING COMPRESS, or identifier |
|
type | baseType, STRING COMPRESS, identifier, or accumType |
|
This section describes how the GSQL language responds to exceptions and supports user-defined exception handling . An exception is a run-time error. The GSQL language supports both built-in system exceptions and user-defined exceptions. Built-in exceptions include GSQL language exceptions (such as out-of-range value, wrong data type, and illegal operation), and errors arising in other TigerGraph components or from the operation system.
The GSQL query language also supports user-defined exception responses, also known as exception handling. This section covers the following syntax for user-defined exception behavior:
When an exception occurs during the execution of a query, the default response is the following:
The query will not execute any more statements; it will exit.
If the query was run using the RUN QUERY command, then an error message will be displayed.
If the query was run by invoking the GET /query REST++ endpoint, then the output will be a simple JSON object. Some errors have a error "code" field; others do no t:
The example below show two common errors: wrong data type and divide-by-zero. First we define a simple query that divides 100.0 by the query's input parameter.
We then test three cases:
A valid input (such as n1 = 7)
Wrong data type (n1 = "A")
Divide by zero (n1 = 0)
First we test using the GSQL interface. When the query runs without error, the output is in JSON format. Where there is a built-in exception, however, only an error message is displayed.
The situation is a little different when running the query as a REST++ endpoint. The output is always in JSON format.
As of TigerGraph v1.2, the format for the GET /query endpoint has changed. The graph name must now be specified after /query:
/query/{graph_name}/{query_name}
A query author can specify what should be the response if a particular type of exception occurs within a particular specified block of statements.
The following statement types are available to specify a user-defined exception condition or a user-defined exception response.
The EXCEPTION Declaration Statement names a user-defined exception.
The RAISE Statement indicates that one of the user-defined exceptions has occurred.
The TRY…EXCEPTION Statement is used to define and apply user-defined exception handling to a block of query-body statements. This can be used with or without preceding user-defined EXCEPTION and RAISE statements.
Built-in exceptions always take precedence over user-defined exceptions. Therefore, user-defined exceptions can only be used to catch conditions that would not be caught by a built-in exception. This means that built–in exceptions are best used to capture situations which are legal according to the general syntax and semantics of the GSQL query language, but which are illegal or undesirable for a particular user application.
To use a user-defined exception, it must first be declared. An exception declaration statement declares a user-defined exception type, assigning a name and identification number. The id number errorCode must be greater than 40,000. Numbers 40,000 and lower are reserved for system exceptions. Exception statements must be placed before any query-body statements, after accumulator declaration statements . A query can declare multiple exception types.
The RAISE statement announces that a user-defined exception has just occurred. The exceptVarName must match one of the exceptions that was previously declared. An optional error message can be specified. Once the RAISE statement is executed, the flow of execution changes. If the RAISE statement is not within a TRY clause, then the query ends with the default exception response, using the error code and error message defined by the exception type and RAISE statements. If the RAISE is within a TRY statement, then execution jumps to the EXCEPTION handling clause of the TRY statement.
A RAISE statement itself does not include the conditions that define the exception. Typically, the user will use an IF…THEN statement and place the RAISE statement within the THEN clause.
In the current version, a RAISE statement can only be used as a query-body-statement. It cannot be used as a DML-sub-statement. In particular, you cannot RAISE an exception inside a SELECT statement.
The example below defines and checks for two types of exceptions: an empty input set (40001) and no matching edges (40002). Remember that the minimum allowed code number is 40001.
The TRY…EXCEPTION Statement is used to define and apply user-defined exception handling to a block of query-body statements. A TRY...EXCEPTION statement can be nested within a TRY block or EXCEPTION block.
The current version of GSQL does not support custom handling of built-in exceptions. Therefore, if a built-in exception occurs, it ignores the TRY..EXCEPTION blocks and simply applies the default handling, and the query aborts. In future updates, we plan to support custom handling of both custom exceptions (RAISE) and built-in exception with the TRY...EXCEPTION block.
The TRY…EXCEPTION Statement is a compound statement containing two blocks. The first block (TRY) consists of the query-body statements for which custom error handling should be applied. The second block (EXCEPTION) contains a series of WHEN…THEN exception handling clauses. Each exception handling clause names an exception type and specifies what actions to take in the event of the exception. An optional ELSE clause contains handling statements for all other exceptions. The following text and visual flowchart details how the TRY... EXCEPTION block handles an exception.
When an exception occurs within a TRY block, the flow of execution skips the remainder of the TRY block and jumps to the EXCEPTION block. The GSQL flow now seeks to match the exception type with a handler. After executing the handling statements in the THEN or ELSE clause, the flow skips the remainder of the EXCEPTION block and continues with the statement following the END statement. However, if there is no matching WHEN or ELSE handler, then the exception is propagated. That is, the RAISE state is maintained after exiting the EXCEPTION block. If the TRY...EXCEPTION block is nested inside another TRY block, then the handling process is repeated at this upper level. This repeats until either the exception is handled or there are no more TRY...EXCEPTION blocks.
Finally, if the unhandled exception is not within a TRY block, then the the query is aborted, and the default exception response is the output.
Case 1: If cond1 is true in the outer TRY block,
RAISE A and jump to the output EXCEPTION block.
Handled by ELSE HandStmtsZ.
Case 2: If cond2 is true in the inner TRY block,
RAISE A and jump to the inner EXCEPTION block.
Handled by handStmtsX;
Case 3: If cond3 is true in the inner TRY block,
RAISE B and jump to the inner EXCEPTION block. There is no matching handler here, so propagate the exception. Jump to the outer EXCEPTION block. Handled by handStmtsY.
Custom Handling Example:
The following example is a modified shortest path query. It looks for all paths from a source to a target in a computer network. It uses breadth-first search and stops at depth N when it has found at least one path at depth N, or it has searched the entire graph. There are three conditions which will cause it to RAISE an exception and abort the search:
Seeing an edge with a negative connection speed (because the graph has bad data).
Seeing an edge with a very slow connection speed (again because the graph has bad data).
If no path was found in the graph (the search is already over, but we skip printing results).
Note that cases 1 and 2 do NOT mean that a negative or slow speed edge is actually on a shortest path, only that the query noticed a bad edge during its search. Also, because we cannot RAISE within the SELECT block, we use a workaround: set an integer variable with an error code. Immediately after the SELECT block, test the integer variable and RAISE exceptions as needed.
As the data in Appendix D show:
Any search passing through c1 will see negative edges.
Any search passing through c12 will see negative and slow edges.
Any search passing through c14 will see negative edges.
The results for 5 cases are shown: 1 valid search plus each of the 3 exception conditions. The 5th case is the same as the 4th, but exception handling is not enabled.
The flowchart below summarizes all the cases for triggering and handling exceptions, both user-defined and built-in.
Below is the listing of the graph create&load command files and data files to generate the six example graphs used in this document: workNet , socialNet , friendNet , computerNet , minimalNet , andinvestmentNet . The tar-gzip file gsql_ref_examples_2.0.gz contains all of these files. Each graph has its own folder. To create a particular graph, go in its folder and run the following command: gsql graph_create.gsql
There is no loading job or data for minimalNet (hence, "minimal.")
Accumulators are special types of variables that accumulate information about the graph during its traversal and exploration. Because they are a unique and important feature of the GSQL query language, we devote a separate section for their introduction, but additional detail on their usage will be covered in other sections, the "SELECT Statement" section in particular. This section covers the following subset of the EBNF language definitions:
There are a number of different types of accumulators, each providing specific accumulation functions. Accumulators are declared to have one of two types of association: global or vertex-attached .
More technically, accumulators are mutable mutex variables shared among all the graph computation threads exploring the graph within a given query. To improve performance, the graph processing engine employs multithreaded processing. Modification of accumulators is coordinated at run-time so the accumulation operator works correctly (i.e., mutually exclusively) across all threads. This is particularly relevant in the ACCUM clause. During traversal of the graph, the selected set of edges or vertices is partitioned among a group of threads. These threads have shared mutually exclusive access to the accumulators.
All accumulator variables must be declared at the beginning of a query, immediately after any typedefs, and before any other type of statement. The scope of the accumulator variables is the entire query.
The name of a vertex-attached accumulator begins with a single "@". The name of a global accumulator begins with "@@". Additionally, a global accumulator may be declared to be static.
Vertex-attached accumulators are mutable state variables that are attached to each vertex in the graph for the duration of the query's lifetime. They act as run-time attributes of a vertex. They are shared, mutual exclusively, among all of the query's processes. Vertex-attached accumulators can be set to a value with the = operator. Additionally, an accumulate operator += can be used to update the state of the accumulator; the function of += depends on the accumulator type. In the example below, there are two accumulators attached to each vertex. The initial value of an accumulator of a given type is predefined, however it can be changed at declaration as in the accumulator @weight below. All vertex-attached accumulator names have a single leading at-sign "@".
If there is a graph with 10 vertices, then there is an instance of @neighbors and @weight for each vertex (hence 10 of each, and 20 total accumulator instances). These are accessed via the dot operator on a vertex variable or a vertex alias (e.g., v.@neighbor ). The accumulator operator += only impacts the accumulator for the specific vertex being referenced. A statement such as v1.@neighbors += 1will only impact v1 's @neighbors and not the @neighbors for other vertices.
Vertex-attached accumulators can only be accessed or updated (via = or +=) in an ACCUM or POST-ACCUM clause within a SELECT block. The only exception to this rule is that vertex-attached accumulators can be referenced in a PRINT statement, as the PRINT has access to all information attached to a vertex set.
Edge-attached accumulators are not supported.
A global accumulator is a single mutable accumulator that can be accessed or updated within a query. The names of global accumulators start with a double at-sign "@@".
Global accumulators can only be assigned (using the = operator) outside a SELECT block (i.e., not within an ACCUM or POST-ACCUM clause). Global accumulators can be accessed or updated via the accumulate operator += anywhere within a query, including inside a SELECT block.
It is important to note that the accumulation operation for global accumulators in an ACCUM clause executes once for each process. That is, if the FROM clause uses an edge-induced selection (introduced in Section "SELECT Statement"), the ACCUM clause executes one process for each edge in the selected edge set. If the FROM clause uses a vertex-induced selection (introduced in Section "SELECT Statement"), the ACCUM clause executes one process for each vertex in the selected vertex set. Since global accumulators are shared in a mutually exclusive manner among processes, they behave very differently than a non-accumulator variable (see Section "Variable Types" for more details) in an ACCUM clause. Take the following code example. The global accumulator@@globalRelationshipCount is accumulated for every worksFor edge traversed since it is shared among processes. Conversely, relationshipCount appears to have only been incremented once. This is because a non-accumulator variable is not shared among processes. Each process has its own separate unshared copy of relationshipCount and increments the original value by one. (E.g., each process increments relationshipCount from 0 to 1.) There is no accumulation and the final value is one.
A static global accumulator retains its value after the execution of a query. To declare a static global accumulator, include the STATIC keyword at the beginning of the declaration statement. For example, if a static global accumulator is incremented by 1 each time a query is executed, then its value is equal to the number of times the query has been run, since the query was installed. Each static global accumulator belongs to the particular query in which it is declared; it cannot be shared among different queries. The value only persists in the context of running the same query multiple times. The value will reset to the default value when the GPE is restarted.
There is no command to deallocate a static global accumulator. If a static global accumulator is a collection accumulator and it no longer needed, it should be cleared to minimize the memory usage.
The following are the accumulator types we currently support. Each type of accumulator supports one or more data types .
The accumulators fall into two major groups :
Scalar Accumulators store a single value:
SumAccum
MinAccum, MaxAccum
AvgAccum
AndAccum, OrAccum
BitwiseAndAccum, BitwiseOrAccum
Collection Accumulators store a set of values:
ListAccum
SetAccum
BagAccum
MapAccum
ArrayAccum
HeapAccum
GroupByAccum
The details of each accumulator type are summarized in the table below. The Accumulation Operation column explains how the accumulator accumName is updated when the statement accumName += newVal is executed. Following the table are example queries for each accumulator type.
Table Ac1: Accumulator Types and Their Accumulation Behavior
The SumAccum type computes and stores the cumulative sum of numeric values or the cumulative concatenation of text values. The output of a SumAccum is a single numeric or string value. SumAccum variables operate on values of type INT , UINT, FLOAT, DOUBLE, or STRING only.
The += operator updates the accumulator's state. For INT, FLOAT, and DOUBLE types, += arg
performs a numeric addition, while for the STRING value type += arg
concatenates arg to the current value of the SumAccum.
The MinAccum and MaxAccum types calculate and store the cumulative minimum or the cumulative maximum of a series of values. The output of a MinAccum or a MaxAccum is a single numeric value. MinAccum and MaxAccum variables operate on values of type INT, UINT, FLOAT, and DOUBLE, VERTEX (with optional specific vertex type) only.
For MinAccum, += arg
checks if the current value held is less than arg and stores the smaller of the two. MaxAccum behaves the same, with the exception that it checks for and stores the greater instead of the lesser of the two.
MinAccum and MaxAccum operating on VERTEX type have a special comparison. They do not compare vertex ids, but TigerGraph internal ids, which might n ot be in t he same order as the external ids. Comparing internal ids is much faster, so MinAccum/ MaxAccum<VERTEX> provide an efficient way to compar e and select vertices. This is helpful for some graph algorithms that require the vertices to be numbered and sortable . For example, the following query returns one post from each person. The returned vertex is not necessarily the vertex with alphabetically largest id.
The AvgAccum type calculates and stores the cumulative mean of a series of numeric values. Internally, its state information includes the sum value of all inputs and a count of how many input values it has accumulated. The output is the mean value; the sum and the count values are not accessible to the user. The data type of a AvgAccum variable is not declared; all AvgAccum accumulators accept inputs of type INT, UINT, FLOAT, and DOUBLE. The output is always DOUBLE type.
The += arg operation updates the AvgAccum variable's state to be the mean of all the previous arguments along with the current argument; The = arg
operation clears all the previously accumulated state and sets the new state to be arg with a count of one.
The AndAccum and OrAccum types calculate and store the cumulative result of a series of boolean operations. The output of an AndAccum or an OrAccum is a single boolean value (True or False). AndAccum and OrAccum variables operate on boolean values only. The data type does not need to be declared.
For AndAccum, += arg updates the state to be the logical AND between the current boolean state and arg . OrAccum behaves the same, with the exception that it stores the result of a logical OR operation.
The BitwiseAndAccum and BitwiseOrAccum types calculate and store the cumulative result of a series of bitwise boolean operations and store the resulting bit sequences. BitwiseAndAccum and BitwiseOrAccum operator on INT only. The data type does not need to be declared.
Fundamental for understanding and using bitwise operations is the knowledge that integers are stored in base-2 representation as a 64-bit sequence of 1s and 0s. "Bitwise" means that each bit is treated as a separate boolean value, with 1 representing true and 0 representing false. Hence, an integer is equivalent to a sequence of boolean values. Computing the Bitwise AND of two numbers A and B means to compute the bit sequence C where the j th bit of C, denoted C j , is equal to (A j AND B j ).
For BitwiseAndAccum, += arg
updates the accumulator's state to be the Bitwise AND of the current state and arg .
BitwiseOrAccum behaves the same, with the exception that it computes a Bitwise OR.
Bitwise Operations and Negative Integers
Most computer systems represent negative integers using "2's complement" format, where the uppermost bit has special significance. Operations which affect the uppermost bit are crossing the boundary between positive and negative numbers, and vice versa.
The ListAccum type maintains a sequential collection of elements. The output of a ListAccum is a list of values in the order the elements were added. The element type can be any base type, tuple, or STRING COMPRESS. Additionally, a ListAccum can contain a nested collection of type ListAccum. Nesting of ListAccums is limited to a depth of three.
The += arg operation appends arg
to the end of the list. In this case, arg
may be either a single element or another ListAccum.
ListAccum supports two additional operations:
@list1 + @list2 creates a new ListAccum, which contains the elements of @list1 followed by the elements of @list2. The two ListAccums must have identical data types.
Change in "+" definition
The pre-v2.0 definition of the ListAccum "+" operator ( @list + arg : Add arg to each member of @list) is no longer supported.
@list1 * @list2 (STRING data only) generates a new list of strings consisting of all permutations of an element of the first list followed by an element of the second list.
ListAccum also supports the following class functions.
Functions which modify the ListAccum (mutator functions) can be used only under the following conditions:
Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.
The SetAccum type maintains a collection of unique elements. The output of a SetAccum is a list of elements in arbitrary order. A SetAccum instance can contain values of one type. The element type can be any base type, tuple, or STRING COMPRESS.
For SetAccum, the += arg operation adds a non-duplicate element or set of elements to the set. If an element is already represented in the set, then the SetAccum state does not change.
SetAccum also can be used with the three canonical set operators: UNION, INTERSECT, and MINUS (see Section "Set/Bag Expression and Operators" for more details).
SetAccum also supports the following class functions.
Functions which modify the SetAccum (mutator functions) can be used only under the following conditions:
Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.
The BagAccum type maintains a collection of elements with duplicated elements allowed. The output of a BagAccum is a list of elements in arbitrary order. A BagAccum instance can contain values of one type. The element type can be any base type, tuple, or STRING COMPRESS.
For BagAccum, the += arg operation adds an element or bag of elements to the bag.
BagAccum also supports the + operator:
@bag1 + @bag2 creates a new BagAccum, which contains the elements of @bag1 and the elements of @bag2. The two BagAccums must have identical data types.
BagAccum also supports the following class functions.
Functions which modify the BagAccum (mutator functions) can be used only under the following conditions:
Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.
The MapAccum type maintains a collection of (key -> value) pairs. The output of a MapAccum is a set of key and value pairs in which the keys are unique.
The key type of a MapAccum can be all base types or tuples. If the key type is VERTEX, then only the vertex's id is stored and displayed.
The value type of a MapAccum can be all base types, tuples, or any type of accumulator, except for HeapAccum.
For MapAccum, the += (key->val)
operation adds a key-value element to the collection if key is not yet used in the MapAccum. If the MapAccum already contains key , then val is accumulated to the current value, where the accumulation operation depends on the data type of val . (Strings would get concatenated, lists would be appended, numerical values would be added, etc.)
MapAccum also supports the + operator:
@map1 + @map2 creates a new MapAccum, which contains the (key,value) pairs of @map2 added to the (key,value) pairs of @map1. The two MapAccums must have identical data types.
MapAccum also supports the following class functions.
Functions which modify the MapAccum (mutator functions) can be used only under the following conditions:
Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.
The ArrayAccum type maintains an array of accumulators. An array is a fixed-length sequence of elements, with direct access to elements by position. The ArrayAccum has these particular characteristics:
The elements are accumulators, not primitive or base data types. All accumulators, except HeapAccum, MapAccum, and GroupByAccum, can be used.
An ArrayAccum instance can be multidimensional. There is no limit to the number of dimensions.
The size can be set at run-time (dynamically).
There are operators which update the entire array efficiently.
When an ArrayAccum is declared, the instance name should be followed by a pair of brackets for each dimension. The brackets may either contain an integer constant to set the size of the array, or they may be empty. In that case, the size must be set with the reallocate function before the ArrayAccum can be used.
Because each element of an ArrayAccum itself is an accumulator, the operators =, +=, and + can be used in two contexts: accumulator-level and element-level.
If @A is an ArrayAccum of length 6, then @A[0] and @A[5] refer to its first and last elements, respectively. Referring to an ArrayAccum element is like referring to an accumulator of that type. For example, given the following definitions:
then @@Sums[0], @@Sums[1], and @@Sums[2] each refer to an individual SumAccum<INT>, and @@Lists[0] and @@Lists[1] each refer to a ListAccum<STRING>, supporting all the operations for those accumulator and data types.
The operators =, +=, and + have special meanings when applied to an ArrayAccum as a whole. There operations efficiently update an entire ArrayAccum. All of the ArrayAccums must have the same element type.
ArrayAccum also supports the following class functions.
Functions which modify the ArrayAccum (mutator functions) can be used only under the following conditions:
Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.
The HeapAccum type maintains a sorted collection of tuples and enforces a maximum number of tuples in the collection. The output of a HeapAccum is a sorted collection of tuple elements. The += arg operation adds a tuple to the collection in sorted order. If the HeapAccum is already at maximum capacity when the += operator is applied, then the tuple which is last in the sorted order is dropped from the HeapAccum. Sorting of tuples is performed on one or more defined tuple fields ordered either ascending or descending. Sorting precedence is performed based on defined tuple fields from left to right.
The declaration of a HeapAccum is more complex than for most other accumulators, because the user must define a custom tuple type, set the maximum capacity of the HeapAccum, and specify how the HeapAccum should be sorted. The declaration syntax is outlined in the figure below:
First, the HeapAccum declaration must be preceded by a TYPEDEF statement which defines the tuple type. At least one of the fields (field_1, ..., field_n) must be of a data type that can be sorted.
In the declaration of the HeapAccum itself, the keyword "HeapAccum" is followed by the tuple type in angle brackets < >. This is followed by a parenthesized list of two or more parameters. The first parameter is the maximum number of tuples that the HeapAccum may store. This parameter must be a positive integer. The subsequent parameters are a subset of the tuple's field, which are used as sort keys. The sort key hierarchy is from left to right, with the leftmost key being the primary sort key. The keywords ASC and DESC indicate Ascending (lowest value first) or Descending (highest value first) sort order. Ascending order is the default.
HeapAccum also supports the following class functions.
Functions which modify the HeapAccum (mutator functions) can be used only under the following conditions:
Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.
The GroupByAccum is compound accumulator, an accumulator of accumulators. At the top level, it is a MapAccum where both the key and the value can have multiple fields. Moreover, each of the value fields is an accumulator type.
In the EBNF above, the type terms form the key set, and the accumType terms form the map's value. Since they are accumulators, they perform a grouping. Like a MapAccum, if we try to store a (key->value) whose key has already been used, then the new value will accumulate to the data which is already stored. In this case, each field of the multiple-field value has its own accumulation function. One way to think about GroupByAccum is that each unique key is a group ID.
In GroupByAccum, the key types can be base type, tuple, or STRING COMPRESS. The accumulators are used for aggregating group values. Each accumulator type can be any type except HeapAccum. Each base type and each accumulator type must be followed an alias. Below is an example declaration.
To add new data to this GroupByAccum, the data should be formatted as (key1, key2 -> value1, value2) .
GroupByAccum also supports the following class functions.
Functions which modify the GroupByAccum (mutator functions) can be used only under the following conditions:
Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.
Certain collection accumulators may be nested. That is, an accumulator may contain a collection of elements where the elements themselves are accumulators. For example:
Only ListAccum, ArrayAccum, MapAccum, and GroupByAccum can contain other accumulators. However, not all combinations of collection accumulators are allowed. The following constraints apply:
ListAccum: ListAccum is the only accumulator type which can be nested within ListAccum, up to a depth of 3:
2. MapAccum: All accumulator types, except for HeapAccum, can be nested within MapAccum as the value type. For example,
3. GroupByAccum: All accumulator types, except for HeapAccum, can be nested within GroupByAccum as the accumulator type. For example:
4. ArrayAccum: Unlike the other accumulators in this list, where nesting is optional, nesting is mandatory for ArrayAccum. See the ArrayAccum section above.
It is legal to define nested ListAccums to form a multi-dimensional array. Note the declaration statements and the nested [ bracket ] notation in the example below:
accumulate and aggregate data, when traversing a set of vertices or edges (Details are in the chapter.)
Accumulator Type (Case Sensitive)
Default Initial Value
Accumulation operation
(result of accumName
+=
newVal
)
SumAccum<INT>
0
accumName plus newVal
SumAccum<FLOAT or DOUBLE>
0.0
accumName plus newVal
SumAccum<STRING>
empty string
String concatenation of accumName and newVal
MaxAccum<INT>
INT_MIN
The greater of newVal and accumName
MaxAccum<FLOAT or DOUBLE>
FLOAT_MIN or DOUBLE_MIN
The greater of newVal and accumName
MaxAccum<VERTEX>
the vertex with internal id 0
The vertex with the greater internal id , either newVal or accumName
MinAccum<INT>
INT_MAX
The lesser of newVal and accumName
MinAccum<FLOAT or DOUBLE>
FLOAT_MAX or DOUBLE_MAX
The lesser of newVal and accumName
MinAccum<VERTEX>
unknown
The vertex with the lesser internal id, either newVal or accumName
AvgAccum
0.0 (double precision)
Double precision average of newVal and all previous values accumulated toaccumName
AndAccum
True
Boolean AND of newVal and accumName
OrAccum
False
Boolean OR of newVal and accumName
BitwiseAndAccum
-1 (INT) = 64-bit sequence of 1s
Bitwise AND of newVal and accumName
BitwiseOrAccum
0 (INT) = 64-bit sequence of 0s
Bitwise OR of newVal and accumName
ListAccum< typ e >
(ordered collection of elements)
empty list
List with newVal appended to end of accumName. newVal can be a single value or a list. If accumName is [ 2, 4, 6 ], then accumName += 4 produces accumName equal to [ 2, 4, 6, 4 ]
SetAccum<t ype >
(unordered collection of elements, duplicate items not allowed)
empty set
Set union of newVal and accumName . newVal can be a single value or a set/bag.If accumName is ( 2, 4, 6 ), then accumName += 4 produces accumName equal to ( 2, 4, 6)
BagAccum<t ype >
(unordered collection of elements, duplicate items allowed)
empty bag
Bag union of newVal and accumName . newVal can be a single value or a set/bag.If accumName is ( 2, 4, 6 ), then accumName += 4 would result in accumName equal to ( 2, 4, 4, 6)
MapAccum< type, type >
(unordered collection of (key,value) pairs)
empty map
Add or update a key:value pair to the accumName map. If accumName is [ ("red",3), ("green",4),("blue",2) ], then accumName += ("black"-> 5) produces accumName equal to [ ("red",3), ("green",4),("blue",2), ("black",5) ]
ArrayAccum< accumType >
empty list
See the ArrayAccum section below for details.
HeapAccum< tuple >(heapSize, sortKey [, sortKey_i]*)
(sorted collection of tuples)
empty heap
Insert newVal into the accumName heap, maintaining the heap in sorted order, according to the sortKey(s) and size limit declared for this HeapAccum
GroupByAccum< type [, type]* , accumType [, accumType]* >
empty group by map
Add or update a key:value pair in accumName . See Section "GroupByAccum" for more details.
function (T is the element type)
return type
Accessor / Mutator
description
size()
INT
Accessor
Returns the number of elements in the list.
contains( T val )
BOOL
Accessor
Returns true/false if the list does/doesn't contain the value .
get( INT idx )
T
Accessor
Returns the value at the given index position in the list. The index begins at 0. If the index is out of bound (including any negative value), the default value of the element type is returned.
clear()
VOID
Mutator
Clears the list so it becomes empty with size 0.
update (INT index, T value )
VOID
Mutator
Assigns value to the list element at position index .
function (T is the element type)
return type
Accessor / Mutator
description
size()
INT
Accessor
Returns the number of elements in the set.
contains( T value )
BOOL
Accessor
Returns true/false if the set does/doesn't contain the value .
remove( T value )
VOID
Mutator
Removes value from the set.
clear()
VOID
Mutator
Clears the set so it becomes empty with size 0.
function (T is the element type)
return type
Accessor / Mutator
description
size()
INT
Accessor
Returns the number of elements in the bag.
contains( T value )
BOOL
Accessor
Returns true/false if the bag does/doesn't contain the value .
clear()
VOID
Mutator
Clears the bag so it becomes empty with size 0.
remove( T value )
VOID
Mutator
Removes one instance of value from the bag.
removeAll( T value )
VOID
Mutator
Removes all instances of the given value from the bag.
function (KEY is the key type)
return type
Accessor / Mutator
description
size()
INT
Accessor
Returns the number of elements in the map.
containsKey( KEY key )
BOOL
Accessor
Returns true/false if the map does/doesn't contain key .
get( KEY key )
value type
Accessor
Returns the value which the map associates with key . If the map doesn't containkey , then the return value is undefined.
clear()
VOID
Mutator
Clears the map so it becomes empty with size 0.
Operator
Description
Example
=
sets the ArrayAccum on the left equal to the ArrayAccum on the right. The two ArrayAccums must have the same element type, but the left-side ArrayAccum will change its size and dimensions to match the one on the right-side.
@A = @B;
+
performs element-by-element addition of two ArrayAccums of the same type and size. The result is a new ArrayAccum of the same size.
@C = @A + @B; // @A and @B must be the same size
+=
performs element-by-element accumulation (+=) from the right-side ArrayAccum to the left-side ArrayAccum. They must be the same type and size.
@A += @B; // @A and @B must be the same size
function
return type
Accessor / Mutator
description
size()
INT
Accessor
Returns the total number of elements in the (multi-dimensional) array. For example, the size of an ArrayAccum declared as @A[3][4] is 12.
reallocate( INT, ... )
VOID
Mutator
Discards the previous ArrayAccum instance and creates a new ArrayAccum, with the size(s) given. An N-dimensional ArrayAccum requires N integer parameters. The reallocate function cannot be used to change the number of dimensions.
function
return type
Accessor / Mutator
description
size()
INT
Accessor
Returns the number of elements in the heap.
top()
tupleType
Accessor
Returns the top tuple. If this heap is empty, returns a tuple with each element equal to the default value.
pop()
tupleType
Mutator
Returns the top tuple and removes it from the heap. If this heap is empty, returns a tuple with each element equal to the default value.
resize( INT )
VOID
Mutator
Changes the maximum capacity of the heap.
clear()
VOID
Mutator
Clears the heap so it becomes empty with size 0.
function (KEY1..KEYn are the key types)
return type
Accessor / Mutator
description
size()
INT
Accessor
Returns the number of elements in the heap.
get( KEY1 key_value1 , KEY2 key_value2 ... )
element type(s) of the accumulator(s)
Accessor
Returns the values from each accumulator in the group associating with the given key(s). If the key(s) doesn't exist, return the default value(s) of the accumulator type(s).
containsKey( KEY1 key_value1 , KEY2 key_value2... )
BOOL
Accessor
Returns true/false if the accumulator contains the key(s)
clear()
VOID
Mutator
Clears the heap so it becomes empty with size 0.
remove ( KEY1 key_value1 , KEY2 key_value2 ... )
VOID
Mutator
Removes the group associating with the key(s)