# Operators, Functions, and Expressions

An expression is a combination of fixed values, variables, operators, function calls, and groupings which specify a computation, resulting in a data value. This section of the specification describes the literals (fixed values), operators, and functions available in the GSQL query language. It covers the subset of the EBNF definitions shown below. However, more so than in other sections of the specification, syntax alone is not an adequate description. The semantics (functionality) of the particular operators and functions are an essential complement to the syntax.

# Constants

Each primitive data type supports constant values:

GSL_UINT_MAX = 2 ^ 64 - 1 = 18446744073709551615

GSQL_INT_MAX = 2 ^ 63 - 1 = 9223372036854775807

GSQL_INT_MIN = -2 ^ 63 = -9223372036854775808

# Operators

An operator is a keyword token which performs a specific computational function to return a resulting value, using the adjacent expressions (its operands) as input values. An operator is similar to a function in that both compute a result from inputs, but syntactically they are different. The most familiar operators are the mathematical operators for addition + and subtraction - .

## Mathematical Operators and Expressions

We support the following standard mathematical operators and meanings. The latter four ("<<" | ">>" | "&" | "|") are for bitwise operations. See the section below: "Bit Operators".

Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence:

## Boolean Operators

We support the standard Boolean operators and standard order of precedence: AND, OR, NOT

## Bit Operators

Bit operators (<<, >>, &, and |) operate on integers and return an integer.

## String Operators

Operator + can be used for concatenating strings.

## Tuple Fields

The fields of the tuple can be accessed using the dot operator.

# Comparison Operators and Conditions

A condition is an expression which evaluates to a boolean value of either true or false. One type of condition uses the familiar comparison operators. A comparison operator compares two numeric or string values.

Strings are compared based on standard lexicographical ordering: (space) < (digit) < (uppercase_letter) < (lowercase_letter).

## BETWEEN expr AND expr

The expression expr1 BETWEEN expr2 AND expr3 is true if the value expr1 is in the range from expr2 to expr3, including the endpoint values. Each expression must be numeric.

" expr1 BETWEEN expr2 AND expr3 " is equivalent to " expr1 <= expr3 AND expr1 >= expr2".

## IS NULL, IS NOT NULL

IS NULL and IS NOT NULL can be used for checking whether an optional parameter is given any value.

## LIKE

The LIKE operator is used for string pattern matching. The expression

string1 LIKE string_pattern

evaluates to boolean true if string1 matches the pattern in string_pattern ; otherwise it is false. Both operands must be strings. LIKE may be used only in WHERE clauses. Additionally, string_patternsupports the following wildcard and other symbols, in order to express a pattern:

# Mathematical Functions

There are a number of built-in functions which act on either an accumulator, a base type, or vertex variable. The accumulator function calls are discussed in detail in the "Accumulators" section.

Below is a list of built-in functions which act on either INT, FLOAT, or DOUBLE value(s).

# String Functions

The following built-in functions are provided for text processing. Note that these functions do not modify the input parameter. They each return a new string.

• In the syntax for trim(), the words in bold ( LEADING, TRAILING, BOTH, and FROM ) are keywords which should appear exactly as shown.

• STRING is just an indicator of the datatype; it is not an explicit keyword.

• The trim() function have the following options:

• By using one of the keywords LEADING, TRAILING, or BOTH, the user can specify that characters are to be removed from the left end, right end, or both ends of the string, respectively. If none of these keywords is used, the function will removed from both ends.

• removal_char is a single character. The function will remove consecutive instances of removal_char , until it encounters a different character. If removal_char is not specified, then trim() removes whitespace (spaces, tabs, and newlines).

# Datetime Functions

The following functions convert from/to DATETIME to/from other types.

The following function converts a DATETIME value into a string format specified by the user:

The followings are other functions related to DATETIME :

# JSONOBJECT and JSONARRAY Functions

JSONOBJECT and JSONARRAY are base types, meaning they can be used as a parameter type, an element type for most accumulators, or a return type. This enables the input and output of complex, customized data structures. For input and output, a string representation of the JSON is used. Hence, the GSQL query language offers several functions to convert a formatted string into JSON and then to search and access the components of a JSON structure.

Data Conversion Functions

The following parsing functions convert a string into a JSONOBJECT or a JSONARRAY:

Both functions generate a run-time error if the input string cannot be converted into a JSON object or a JSON array. To be properly formatted, besides having the proper nesting and matching of curly braces { } and brackets [ ], each value field must be one of the following: a string (in double quotes "), a number, a boolean ( true or false ), or a JSONOBJECT or JSONARRAY. Each key of a key:value pair must be a string in double quotes.

See examples below.

Data Access Methods

JSONOBJECT and JSONARRAY are object classes, each class supporting a set of data access methods, using dot notation:

jsonVariable.functionName(parameter_list)

The following methods (class functions) can act on a JSONOBJECT variable:

The above getType(STRING keyStr ) function generates a run-time error if

1. The key keyStr doesn't exist, or

2. The function's return type is different than the stored value type. See the next note about numeric data.

3. Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.

The following methods can act on a JSONARRAY variable:

Similar to the methods of JSONOBJECT, the above getType(INT idx ) function generates a run-time error if

1. idx is out of bounds, or

2. The function's return type is different than the stored value type. See the next note about numeric data.

3. Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.

Below is an example of using these functions and methods :

# Vertex, Edge, and Accumulator Functions and Attributes

## Accessing attributes

Attributes on vertices or edges are defined in the graph schema. Additionally, each vertex and edge has a built-in STRING attribute called type which represents the user-defined type of that edge or vertex. These attributes, including type, can be accessed for a particular edge or vertex with the dot operator:

For example, the following code snippet shows two different SELECT statements which produce equivalent results. The first uses the dot operator on the vertex variable v to access the "subject" attribute, which is defined in the graph schema. The FROM clause in the first SELECT statement necessitates that any target vertices will be of type "post" (also defined in the graph schema). The second SELECT schema checks that the vertex variable v's type is a "post" vertex by using the dot operator to access the built-in type attribute.

## Vertex Functions

Below is a list of built-in functions that can be accessed by vertex aliases, using the dot operator:

Currently, these functions are only available for vertex aliases (defined in the FROM clause); vertex variables do not have these functions.

Note that in order to calculate outdegree by edge type, the graph schema must be defined such that vertices keep track of their edge types using WITH STATS="OUTDEGREE_BY_EDGETYPE" (however, "OUTDEGREE_BY_EDGETYPE" is now the default STATS option).

## FILTER

The optional .FILTER(condition) clause offers an additional filter for selecting which elements are added to the output set of the neighbor, neighborAttribute and edgeAttribute functions. The condition is evaluated for each element . If the condition is true, the element is added to the output set; if false, it is not. An example is shown below:

## Edge Functions

Below are the built-in functions that can be accessed by edge aliases, using the dot operator. Edge functions follow the same general rules as vertex functions (see above).

## Accumulator Functions

This section describes functions which all to all or most accumulators. Other accumulator functions for each accumulator type are illustrated at the "Accumulator Type" section.

### Previous value of accumulator

The tick operator ( ' ) can be used to read the value of an accumulator as it was at the start an ACCUM clause, before any changes that took place within the ACCUM clause. It can only be used in the POST-ACCUM clause. A typical use is to compare the value of the accumulator before and after the ACCUM clause. The PageRank algorithm provides a good example:

In the last line, we compute @@max_diff as the absolute value of the difference between the post-ACCUM score (s.@score) and the pre-ACCUM score (s.@score').

# Set/Bag Expression and Operators

SELECT blocks take an input vertex set and perform various selection and filtering operations to produce an output set. Therefore, set/bag expressions and their operators are a useful and powerful part of the GSQL query language. A set/bag expression can use either SetAccum or BagAccum.

## Set/Bag Expression Operators - UNION, INTERSECT, MINUS

The operators are straightforward, when two operands are both sets, the result expression is a set. When at least one operant is a bag, the result expression is a bag. If one operant is a bag and the other is a set, the operator treats the set operant as a bag containing one of each value.

The result of these operators is another set or bag, so these operations can be nested and chained to form more complex expressions, such as

## Set/Bag Expression Membership Operators

For example , suppose setBagExpr_A is ("a", "b", "c")

The IN and NOT IN operators support all base types on the left-hand side, and any set/bag expression on the right-hand side. The base type must be the same as the accumulator's element type. IN and NOT IN return a BOOL value.

The following example uses NOT IN to exclude neighbors that are on a blocked list.

## Aggregation Functions - COUNT, SUM, MIN, MAX, AVG

The aggregation functions take a set/bag expression as its input parameter and return one value or element.

• count() : Returns the size (INT) of the set.

• sum() : Returns the sum of all elements. This is only applicable to a set/bag expression with numeric type.

• min() : Returns the member with minimum value. This is only applicable to a set/bag expression with numeric type.

• max() : Returns the member with maximum value. This is only applicable to a set/bag expression with numeric type.

• avg() : Returns the average of all elements. This is only applicable to a set/bag expression with numeric type. The average is INT if the element type of the set/bag expression is INT.

# Miscellaneous Functions

## SelectVertex()

SelectVertex() reads a data file which lists particular vertices of the graph and returns the corresponding vertex set. This function can only be used in a vertex set variable declaration statement as a seed set. The data file must be organized as a table with one or more columns. One column must be for vertex id. Optionally, another column is for vertex type. SelectVertex() has five parameters explained in the below table: filePath, vertexIdColumn, vertexTypeColumn, separator, and header. The rules for column separators and column headings are the same as for the GSQL Loader.

One vertex set variable declaration statement can have multiple SelectVertex() function calls. However, if a declaration statement has multiple SelectVertex() calls referring to the same file, they must use the same separator and header parameters. If any row of the file contains an invalid vertex type, a run time error occurs; if any row of the file contains an nonexistent vertex id, a warning message is shown with the count of nonexistent ids.

Below is a query example using SelectVertex calls, reading from the data file selectVertexInput.csv.

## to_vertex() and to_vertex_set()

to_vertex() and to_vertex_set() convert a string or a string set into a vertex or a vertex set, respectively, of a given vertex type. These two functions are useful when the vertex id(s) are obtained and only known at run-time.

The first parameter of to_vertex() is the vertex id string. The first parameter of to_vertex_set() is a string set representing vertex ids. The second parameter of both functions is the vertex type string.

to_vertex_set can accept a bag of vertices as input, but the function will reduce the bag to a set by eliminating duplicate items.

If the vertex id or the vertex type doesn't exist, to_vertex() will have a run-time error, as shown below. However, to_vertex_set() will have a run-time error only if the vertex type doesn't exist. If one or more vertex ids are nonexistent, to_vertex_set() will display a warning message but will still run, converting all valid ids and skipping nonexistent vertex ids. If the user wants an error instead of a warning if a nonexistent id is given when converting a string set to a vertex set, the user can use to_vertex() inside a FOREACH loop, instead of to_vertex_set(). See the example below .

## getvid()

The getvid(v) function returns the internal ID number of the given vertex v. The internal ID is not the primary_id which the user assigned when creating the vertex. However, there is a 1-to-1 mapping between the external ID (primary_id) and internal ID. The engine can access the internal ID faster than accessing the external ID, so if a query needs unique values for a large number of vertices, but doesn't care about particular values, getvid() can be a useful option.

For example, in many community detection algorithms, we start by assigning every vertex a unique community ID. Then, as the algorithm progresses, some vertices will join the community of one of their neighbors, giving up their current community ID and copying the IDs of their neighbors.

## COALESCE()

The COALESCE function evaluates each argument value in order, and returns the first value which is not NULL. This evaluation is the same as that used for IS NULL and IS NOT NULL. The COALESCE function requires all its arguments have the same data type (BOOL, INT, FLOAT, DOUBLE, STRING, or VERTEX). The only exception is that different numeric types can be used together. In this case, all values are converted into the first argument type.

The COALESCE function is useful when multiple optional parameters are allowed, and one of them must be chosen if available. For example,

## Dynamic Expressions with EVALUATE()

The function evaluate() takes a string argument and interprets it as an expression which is evaluated during run-time. This enables users to create a general purpose query instead of separate queries for each specific computation.

The evaluate() function has two parameters: expressionStr is the expression string, and typeStr is a string literal indicating the type of expression. This function returns a value whose type is typeStr and whose value is the evaluation of expressionStr. The following rules apply:

1. evaluate() can only be used inside a SELECT block, and only inside a WHERE clause, ACCUM clause, POST-ACCUM clause, HAVING clause, or ORDER BY clause. It cannot be used in a LIMIT clause or outside a SELECT block.

2. The result type must be specified at query installation time: typeStr must be a string literal for a primitive data type, e.g., one of "int", "float", "double", "bool", "string" (case insensitive). The default value is "bool".

3. In expressionStr, identifiers can refer only to a vertex or edge aliases, vertex-attached accumulators, global accumulators, parameters, or scalar function calls involving the above variables. The expression may not refer to local variables, global variables, or to FROM clause vertices or edges by type.

4. Any accumulators in the expression must be scalar accumulators (e.g., MaxAccum) for primitive-type data. Container accumulators (e.g., SetAccum) or scalar accumulators with non-primitive type (e.g. VERTEX, EDGE, DATETIME) are not supported. Container type attributes are not supported.

5. evaluate() cannot be nested.

The following situations generate a run-time error:

1. The expression string expressionStr cannot be compiled (unless the error is due to a non-existent vertex or edge attribute).

2. The result type of the expression does not match the parameter typeStr.

The following example employs dynamic expressions in both the WHERE condition and the accumulator value in the POST-ACCUM clause.

# Queries as Functions

A query that has been defined (with a CREATE QUERY ... RETURNS statement) can be treated as a callable function. A query can call itself recursively.

The following limitations apply to queries calling queries:

1. Each parameter of the called query may be one of the following types:

1. Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL

2. VERTEX

3. A Set or Bag of primitive or VERTEX elements

2. The return value may be one of the following types. See also the "Return Statement" section.

1. Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL

2. VERTEX

3. a vertex set (e.g., the result of a SELECT statement)

4. An accumulator of primitive types. GroupByAccum and accumulators containing tuples are not supported.

3. A query which returns a SetAccum or BagAccum may be called with a Set or Bag argument, respectively.

4. The order of definition matters. A query cannot call a query which has not yet been defined. However, recursive queries are supported.

Recursive Query example: Given a set of persons as starting points, find all the friends within a given distance. Note: while recursive queries may look simpler to write, GSQL parallel processing and flow control usually makes iterative queries more efficient than recursive queries.

Test cases: Starting from person1, search to a depth of 2 and a depth of 3.

# User-Defined Functions

Users can define their own expression functions in C++ in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp. Only bool, int, float, double, and string (NOT std::string) are allowed as the return value type and the function argument type. However, any C++ type is allowed inside a function body. Once defined, the new functions will be added into GSQL automatically next time GSQL is executed.

Here is an example:

# Examples of Expressions

Below is a list of examples of expressions. Note that ( argList ) is a set/bag expression, while [ argList ] is a list expression.