1 of 19

Part 2 - Querying

Version 2.2. This work is licensed under a Creative Commons Attribution 4.0 International License.

Introduction

The GSQL ® Query Language is a language for the exploration and analysis of large scale graphs. The high-level language makes it easy to perform powerful graph traversal queries in the TigerGraph system. By combining features familiar to database users and programmers with highly expressive new capabilities, the GSQL query language offers both easy authoring and powerful execution. A GSQL query contains one or more SELECT statements, where each SELECT statement describes a traversal over a set of vertices and edges in the graph or describes a selection of a subset of vertices. By combining multiple SELECT statements, the user can map out query patterns to answer a virtually unlimited set of real-life data questions.

This document focuses on the formal specification for the GSQL Query Language. It includes example queries which demonstrate the language, each of which works on one of the following six graphs:workNet, socialNet, friendNet, computerNet, minimalNet, and investmentNet . Their schemas are shown below. Appendix D lists the full command and data files to create and load these graphs with small sets of data (~10 to 20 vertices). The data sets are small so that you can understand the result of each query example. The tarball file gsql_ref_examples_2.0.tar.gz contains all of the graph schemas, data files, and queries. Schemas for Example Graphs

Graph Schema: socialNet

CREATE VERTEX person(PRIMARY_ID personId UINT, id STRING, gender STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
CREATE UNDIRECTED EDGE friend(FROM person, TO person)
CREATE VERTEX post(PRIMARY_ID postId UINT, subject STRING, postTime DATETIME)
CREATE DIRECTED EDGE posted(FROM person, TO post)
CREATE DIRECTED EDGE liked(FROM person, TO post, actionTime DATETIME)

Graph Schema: workNet

CREATE VERTEX person(PRIMARY_ID personId STRING, id STRING, locationId STRING, skillSet SET<INT>, skillList LIST<INT>, interestSet SET<STRING COMPRESS>, interestList LIST<STRING COMPRESS>)
CREATE VERTEX company(PRIMARY_ID clientId STRING, id STRING, country STRING)
CREATE UNDIRECTED EDGE worksFor(FROM person, TO company, startYear INT, startMonth INT, fullTime BOOL)

Graph Schema: friendNet

CREATE VERTEX person(PRIMARY_ID personId UINT, id STRING)
CREATE UNDIRECTED EDGE friend(FROM person, TO person)
CREATE UNDIRECTED EDGE coworker(FROM person, TO person)

Graph Schema: computerNet

CREATE VERTEX computer(PRIMARY_ID compID STRING, id STRING)
CREATE DIRECTED EDGE connected(FROM computer, TO computer, connectionSpeed INT)

Graph Schema: minimalNet

CREATE VERTEX testV(PRIMARY_ID id STRING)
CREATE UNDIRECTED EDGE testE(FROM testV, TO testV)

Graph Schema: investmentNet

TYPEDEF TUPLE < age UINT (4), mothersName STRING(20) > SECRET_INFO
CREATE VERTEX person(PRIMARY_ID personId STRING, portfolio MAP<STRING, DOUBLE>, secretInfo SECRET_INFO)
CREATE VERTEX stockOrder(PRIMARY_ID orderId STRING, ticker STRING, orderSize UINT, price FLOAT)
CREATE UNDIRECTED EDGE makeOrder(FROM person, TO stockOrder, orderTime DATETIME)

CREATE/INSTALL/RUN QUERY

A GSQL query is a compiled data retrieval-and-computation task. Users can write queries to explore a data graph however they like, to read and make computations on the graph data along the way, to update the graph, and to deliver resulting data. A query is analogous to a user-defined procedure or function: it can have one or more input parameters, and it can produce output in two ways: by returning a value or by printing. Using a query is a three-step procedure:

CREATE QUERY: define the functionality of the query
INSTALL QUERY: compile the query
RUN QUERY: execute the query with input values

Query Action Privileges

Users with querywriter role or greater (architect, admin, and superuser) can create, install and drop queries.

Any user with queryreader role or greater for a given graph can run the queries for that graph.

To implement fine-grained control over which queries can be executed by which sets of users:

Group your queries into your desired privilege groups.
Define a graph for each privilege group. These graphs can all have the same domain if you wish.
Create your queries, assigning each to its appropriate privilege group.

EBNF for CREATE QUERY

createQuery := CREATE [DISTRIBUTED][OR REPLACE] QUERY name "(" [parameterList] ")" FOR GRAPH name
               [RETURNS "("  baseType | accumType ")"]
               [API "(" stringLiteral ")"]
               "{" [typedefs] [declStmts] [declExceptStmts] queryBodyStmts "}"

parameterValueList := parameterValue [, parameterValue]*
parameterValue := parameterConstant
                | "[" parameterValue [, parameterValue]* "]"  // BAG or SET
                | "(" stringLiteral, stringLiteral ")"        // a generic VERTEX value
parameterConstant := numeric | stringLiteral | TRUE | FALSE
parameterList := parameterType name ["=" constant] ["," parameterType name ["=" constant]]*
 
typedefs := (typedef ";")+
declStmts := (declStmt ";")+
declStmt := baseDeclStat | accumDeclStmt | fileDeclStmt
declExceptStmts := (declExceptStmt ";")+
queryBodyStmts := (queryBodyStmt ";")+

installQuery := INSTALL QUERY [installOptions] ( "*" | ALL |name [, name]* )
runQuery := RUN QUERY [runOptions] name "(" parameterValueList ")"

showQuery := SHOW QUERY name
dropQuery := DROP QUERY ( "*" | ALL | name [, name]* )

CREATE QUERY Statement

createQuery := CREATE [DISTRIBUTED] [OR REPLACE] QUERY name "(" [parameterList] ")" FOR GRAPH name
               [RETURNS "("  baseType | accumType ")"]
               "{" [typedefs] [declStmts] [declExceptStmts] queryBodyStmts "}"

CREATE QUERY defines the functionality of a query on a given graph schema.

A query has a name, a parameter list, the name of the graph being queried, an optional RETURNS type (see Section "RETURN Statement" for more details), an optional specifier for the output api, and a body. The body consists of an optional sequence of typedefs , followed by an optional sequence of declarations, then followed by one or more statements. The body defines the behavior of the query.

The DISTRIBUTED option applies only to installations where the graph has been distributed across a cluster . If specified, the query will run with a different execution model which may give better performance for queries which traverse a large portion of the cluster. Not all GSQL query language features are supported in DISTRIBUTED mode. For details, see the separate document: Distributed Query Mode.

OR REPLACE is deprecated

If the optional keywords OR REPLACE are included, then this query definition, if error-free, will replace a previous definition with the same query name. However, if there are any errors in this query definition, then the previous query definition will be maintained. If the OR REPLACE option is not used, then GSQL will reject a CREATE QUERY command that uses an existing name.

Typedefs allow the programmer to define custom types for use within the body. The declarations support definition of accumulators (see Chapter "Accumulators" for more details) and global/local variables. All accumulators and global variables must be declared before any statements. There are various types of statements that can be used within the body. Typically, the core statement(s) in the body of a query is one or more SELECT, UPDATE, INSERT, DELETE statements. The language supports conditional statements such as an IF statement as well as looping constructs such as WHILE and FOREACH. It also supports calling functions, assigning variables, printing, and modifying the graph data.

The query body may include calls to other queries. That is, the other queries are treated as subquery functions. See the subsection on "Queries as Functions".

Example of a CREATE QUERY statement

CREATE QUERY createQueryEx (STRING uid) FOR GRAPH socialNet RETURNS (int) {
  # declaration statements
  users = {person.*};
  # body statements
  posts = SELECT p
    FROM users:u-(posted)->:p
    WHERE u.id == uid;
  PRINT posts;
  RETURN posts.size();
}

Query Parameter and Return Types

This table lists the supported data types for input parameters and return values.

Statement Types

A statement is a standalone instruction that expresses an action to be carried out. The most common statements are data manipulation language ( DML) statements . DML statements include the SELECT, UPDATE, INSERT INTO, DELETE FROM, and DELETE statements.

A GSQL query has two levels of statements. The upper-level statement type is called query-body-level statement , or query-body statement for short. This statement type is part of either the top-level block or a query-body control flow block. For example, each of the statements at the top level directly under CREATE QUERY is a query-body statement. If one of the statements is a CASE statement with several THEN blocks, each of the statements in the THEN blocks is also a query-body statement. Each query-body statement ends with a semicolon.

The lower-level statement type is called DML-sub-level statement or DML-sub-statement for short. This statement type is used inside certain query-body DML statements, to define particular data manipulation actions. DML-sub-statements are comma-separated. There is no comma or semicolon after the last DML-sub-statement in a block. For example, one of the top-level statements is a SELECT statement, each of the statements in its ACCUM clause is a DML-sub-statement. If one of those DML-sub-statements is a CASE statement, each of the statement in the THEN blocks is a DML-sub-statement.

There is some overlap in the types. For example, an assignStmt can be used either at the query-body level or the DML-sub-level.

queryBodyStmts := (queryBodyStmt ";")+

queryBodyStmt := assignStmt           // Assignment
               | vSetVarDeclStmt      // Declaration
               | gAccumAssignStmt     // Assignment
               | gAccumAccumStmt      // Assignment
               | funcCallStmt         // Function Call
               | selectStmt           // Select 
               | queryBodyCaseStmt    // Control Flow
               | queryBodyIfStmt      // Control Flow
               | queryBodyWhileStmt   // Control Flow
               | queryBodyForEachStmt // Control Flow
               | BREAK                // Control Flow
               | CONTINUE             // Control Flow
               | updateStmt           // Data Modification
               | insertStmt           // Data Modification
               | queryBodyDeleteStmt  // Data Modification
               | printStmt            // Output
               | printlnStmt          // Output
               | logStmt              // Output
               | returnStmt           // Output
               | raiseStmt            // Exception
               | tryStmt              // Exception

DMLSubStmtList := DMLSubStmt ["," DMLSubStmt]*

DMLSubStmt := assignStmt           // Assignment    
            | funcCallStmt         // Function Call
            | gAccumAccumStmt      // Assignment
            | vAccumFuncCall       // Function Call
            | localVarDeclStmt     // Declaration 
            | DMLSubCaseStmt       // Control Flow
            | DMLSubIfStmt         // Control Flow
            | DMLSubWhileStmt      // Control Flow
            | DMLSubForEachStmt    // Control Flow
            | BREAK                // Control Flow
            | CONTINUE             // Control Flow
            | insertStmt           // Data Modification
            | DMLSubDeleteStmt     // Data Modification
            | printlnStmt          // Output
            | logStmt              // Output

Guidelines for understanding statement type hierarchy:

Top-level statements are Query-Body type (each statement ending with a semicolon).
The statements within a DML statement are DML-sub statements (comma-separated list).
The blocks within a Control Flow statement have the same type as the entire Control Flow statement itself.

Schematic illustration of relationship between queryBodyStmt and DMLSubStmt

# Each statement's operation type is either ControlFlow, DML, or other.
# Each statement's syntax type is either queryBodyStmt or DMLSubStmt.

CREATE QUERY stmtTypes (parameterList) FOR GRAPH g [
	other queryBodyStmt1;
	ControlFlow queryBodyStmt2   # ControlFlow inside top level.
		other queryBodyStmt2.1;      # subStmts in ControlFlow are queryBody unless inside DML.
		ControlFlow queryBodyStmt2.2 # ControlFlow inside ControlFlow inside top level
			other queryBodyStmt2.2.1;
			other queryBodyStmt2.2.2;
		END;
		DML queryBodyStmt2.3     # DML inside ControlFlow inside top-level
			other DMLSubStmt2.3.1,   # switch to DMLSubStmt
			other DMLSubStmt2.3.2
		;
	END;
	DML queryBodyStmt3           # DML inside top level.
		other DMLSubStmt3.1,      # All subStmts in DML must be DMLSubStmt type
		ControlFlow DMLSubStmt3.2 # ControlFlow inside DML inside top level
			other DMLSubStmt3.2.1,
			other DMLSubStmt3.2.2
		,
		DML DMLsubStmt3.3
			other DMLSubStmt3.3.1,
			other DMLSubStmt3.3.2
	;
	other queryBodyStmt4;

Here is a descriptive list of query-body statements:

Here is a descriptive list of DML-sub-statements:

INSTALL QUERY

installQuery := INSTALL QUERY [installOptions] ( "*" | ALL | name [, name]* )

A query must be installed before it can be executed. The INSTALL QUERY command will install the queries listed:

INSTALL QUERY queryName1, queryName2, ...

It can also install all uninstalled queries, using either of the following commands: INSTALL QUERY * INSTALL QUERY ALL

Note: Installing takes several seconds for each query. The current version does not support concurrent installation and running of queries. Other concurrent graph operations will be delayed until the installation finishes.

The following options are available:

-force Option

Reinstall the query even if the system indicates the query is already installed. This is useful for overwriting an installation that is corrupted or otherwise outdated, without having to drop and then recreate the query. If this option is not used, the GSQL shell will refuse to re-install a query that is already installed.

-OPTIMIZE Option

During standard installation, the user-defined queries are dynamically linked to the GSQL language code. Anytime after INSTALL QUERY has been performed, another statement, INSTALL QUERY -OPTIMIZE can be executed. The names of the individual queries are not needed. This operation optimizes all previously installed queries, reducing their run times by about 20%. Optimize a query if query run time is more important to you than query installation time.

Legal:

CREATE QUERY query1... 
INSTALL QUERY query1 
RUN QUERY query1(...) 
... 
INSTALL QUERY -OPTIMIZE    # (optional) optimizes run time performance for query1 and query2 
RUN QUERY query1(...)      # runs faster than before

Illegal:

INSTALL QUERY -OPTIMIZE query_name

-DISTRIBUTED Option

If you have a distributed database deployment, installing the query in DISTRIBUTED mode can increase performance for single queries - using a single worker from each available machine to yield results. Certain cases may benefit more from this option than others -- more detailed information is available on the next page: Distributed Query Mode .

INSTALL QUERY -DISTRIBUTED query_name

Running a Query

Installing a query creates a REST++ endpoint. Once a query is installed, there are two ways of executing a query. One way is through the GSQL shell: RUN QUERY query_name( parameterValues ) .

CREATE, INSTALL, RUN example

CREATE QUERY RunQueryEx(INT p1, STRING p2, DOUBLE p3) FOR GRAPH testGraph{ .... }
INSTALL QUERY RunQueryEx
RUN QUERY RunQueryEx(1, "test", 3.14)

Query output size limitation

There is a maximum size limit of 2GB for the result set of a SELECT block. A SELECT block is the main component of a query which searches for and returns data from the graph. If the result of the SELECT block is larger than 2GB, the system will return no data. NO error message is produced.

The query response time can be reduced by directly submitting an HTTP request to the REST++ server: send a GET request to " http://server_ip:9000/query/graphname/queryname ". If the REST++ server is local, then server_ip is localhost . The query parameter values are either included directly in the query string of the HTTP request's URL or supplied using a data payload.

Starting with TigerGraph v1.2, the graph name is now part of the GET /query URL.

The current version does not support concurrent installation and running of queries. Other concurrent graph operations will be delayed until the installation finishes.

The following two curl commands are each equivalent to the RUN QUERY command above. The first gives the parameter values in the query string in a URL. This example illustrates the simple format for primitive data types such as INT, DOUBLE, and STRING. The second gives the parameter values through the curl command's data payload -d option.

Running a query via HTTP request

curl -X GET "http://localhost:9000/query/testGraph/RunQueryEx?p1=1&p2=test&p3=3.14"
curl -d @RunQueryExPara.dat -X GET "http://localhost:9000/query/testGraph/RunQueryEx"

where RunQueryExPara.dat has the exact string as the query string in the first URL.

RunQueryExPara.dat

p1=1&p2=test&p3=3.14

To see a list of the parameter names and types for the user-installed GSQL queries, run the following REST++ request:

curl -X GET "http://localhost:9000/endpoints?dynamic=true"

By using the data payload option, the user can avoid using a long and complex URL. In fact, to call the same query but with different parameters, only the data payload file contents need to be changed; the HTTP request can be the same. The file loader loads the entire file, appends multiple lines into one, and uses the resulting string as the URL query string. If both a query string and a data payload are given (which we strongly discourage), both are included, where the URL query string's parameter values overwrite the values given in the data payload.

Complex Type Parameter Passing

This subsection describes how to format the complex type parameter values when executing a query by RUN QUERY or curl command. More details about all parameter types are described in Section "Query Parameter Types"

When square brackets are used in a curl URL, the -g option or escape characters must be adopted. If the parameters are given by data payload (either by file or data payload string), the -g option is not needed and escape characters should not be used.

Below are examples.

Running a query via HTTP request - complex parameter type

# 1. SET or BAG
CREATE QUERY RunQueryEx2(SET<INT> p1) FOR GRAPH testGraph{ .... }
# To run this query (either RUN QUERY or curl):
GSQL > RUN QUERY RunQueryEx2([1,5,10]) 
curl -X GET "http://localhost:9000/query/testGraph/RunQueryEx2?p1=1&p1=5&p1=10"


# 2. VERTEX.
# First parameter is any vertex; second parameter must be a person type.
CREATE QUERY printOneVertex(VERTEX va, VERTEX<person> vp) FOR GRAPH socialNet {
  PRINT va, vp;
}
# To run this query:
GSQL > RUN QUERY printOneVertex(("person1","person"),"person2")   # 1st param must give type: (vertex_id, vertex_type)
curl -X GET 'http://localhost:9000/query/socialNet/printOneVertex?va=person1&va.type=person&vp=person2'


# 3. BAG or SET of VERTEX, any type
CREATE QUERY printOneBagVertices(BAG<VERTEX> va) FOR GRAPH socialNet {
  PRINT va;
}
# To run this query:
GSQL > RUN QUERY printOneBagVertices([("person1","person"), ("11","post")])  # [(vertex_1_id, vertex_1_type), (vertex_2_id, vertex_2_type), ...]
curl -X GET 'http://localhost:9000/query/socialNet/printOneBagVertices?va\[0\]=person1&va\[0\].type=person&va\[1\]=11&va\[1\].type=post'
curl -g -X GET 'http://localhost:9000/query/socialNet/printOneBagVertices?va[0]=person1&va[0].type=person&va[1]=11&va[1].type=post'


# 4. BAG or SET of VERTEX, pre-specified type
CREATE QUERY printOneSetVertices(SET<VERTEX<person>> vp) FOR GRAPH socialNet {
  PRINT vp;
}
# To run this query:
GSQL > RUN QUERY printOneSetVertices(["person3", "person4"])  # [vertex_1_id, vertex_2_id, ...]
curl -X GET 'http://localhost:9000/query/socialNet/printOneSetVertices?vp=person3&vp=person4'

Payload Size Limit

This data payload option can accept a file up to 128MB by default. To increase this limit to xxx MB, use the following command:

gadmin --set nginx.client_max_body_size xxx -f

The upper limit of this setting is 1024 MB. Raising the size limit for the data payload buffer reduces the memory available for other operations, so be cautious about increasing this limit.

For more detailed information about REST++ endpoints and requests, see the RESTPP API User Guide .

The following options are available when running a query:

All-Vertex Mode -av Option

Some queries run with all or almost all vertices in a SELECT statement s, e.g. PageRank algorithm. In this case, the graph processing engine can run much more efficiently in all-vertex mode. In the all-vertex mode, all vertices are always selected, and the following actions become ineffective:

Filtering with selected vertices or vertex types. The source vertex set must be all vertices.
Filtering with the WHERE clause.
Filtering with the HAVING clause.
Assigning designated vertex or designated type of vertexes. E.g. X = { vertex_type .*}

To run the query in all-vertex mode, use the -av option in shell mode or include __GQUERY__USING_ALL_ACTIVE_MODE=true in the query string of an HTTP request.

GSQL > RUN QUERY -av test()
  
## In a curl URL call.  Note the use of both single and double underscores.
curl -X GET 'http://localhost:9000/query/graphname/queryname?__GQUERY__USING_ALL_ACTIVE_MODE=true'

Diagnose -d Option

The diagnose option can be turned on in order to produce a diagnostic monitoring log, which contains the processing time of each SELECT block . To turn on the monitoring log, use the -d option in shell mode or __GQUERY__monitor=true in the query string of an HTTP request.

GSQL > RUN QUERY -d test()
  
## In a curl URL call.  Note the use of both single and double underscores.
curl -X GET 'http://localhost:9000/query/graphname/queryname?__GQUERY__monitor=true'

The path of the generated log file will be shown as a part of output message. An example log is shown below:

Query Block Start (#6) start at 11:52:06.415284
Query Block Start (#6) end   at 11:52:06.415745 (takes 0.000442 s) 

Query test takes totally 0.001 s (restpp's pre/post process time not included)
---------------- Summary (sort by total_time desc) ----------------

Query Block Start on Line 6
----------------------------------------------------------
  total iterations count      : 1
  avg   iterations stats      : 0.000442s 
  max   iterations stats      : 0.000442s 
  min   iterations stats      : 0.000442s 
  total activated vertex count : 2
  max   activated vertex count : 2
  min   activated vertex count : 2

GSQL Query Output Format

The standard output of GSQL queries is in industry-standard JSON format. A JSON object is an unordered set of key:value pairs , enclosed in curly braces. Among the acceptable data types for a JSONvalue are array and object . A JSON array is an ordered list of values , enclosed in square brackets. Since values can be objects or arrays, JSON supports hierarchical, nested structures. Strings are enclosed in double quotation marks. We also use the term field to refer to a key (or a key:value pair) of a given object.

At the top level of the JSON structure are three required fields: "error", "message", and "results". If a query is successful, the value of "error" will be "false", the "message" value will be empty, and the "results" value will be the intended output of the query. If an error or exception occurred during query execution, the "error" value will be "true", the "message" value will be a string message describing the error condition, and the "results" field will be empty.

Beginning with version 2 (v2) of the output specification, an additional top-level field is required: "version". The "version" value is an object with the following fields:

Other top-level objects, such as "code" may appear in certain circumstances. Note that the top-level objects are enclosed in curly braces, meaning that they form an unordered set. They may appear in any order.

Below is an example of the output of a successful query:

Top Level JSON of a Valid Query - Example

{
  "version": {"api": "v2","schema": "1"},
  "error": false,
  "message": "",
  "results": [
    {results_of_PRINT_statement_1},
    ...,
    {results_of_PRINT_statement_N}
  ]
}

The value of the "results" key-value pair is a sequential list of the data objects specified by the PRINT statements of the query. The list order follows the order of PRINT execution. The detailed format of the PRINT statement results is described in the Chapter "Output Statements".

For backward compatibility, TigerGraph platforms whose principal output API is v2 can also produce output with API v1.

Changing the Default Output API

The following GSQL statement can be used to set the JSON output API configuration.

SET json_api = <version_string>

Currently, the legal values for <version_string> are "v1" and "v2". This statement sets a persistent system parameter. Each version of the TigerGraph platform is pre-configured to what was the latest output API that at the time of release. For example, platform version 1.1 is configured so that each query will produce v2 output by default.

SHOW QUERY

To show the GSQL text of a query, run "SHOW QUERY query_name ". Additionally, the "ls" GSQL command lists all created queries and identifies which queries have been installed.

As of v2.3, the query_name argument can now use * or ? wildcards from Linux globbing, or it can be a regular expression, when preceded by -r. See SHOW: View Parts of the Catalog

DROP QUERY

To drop a query, run "DROP QUERY query_name ". The query will be uninstalled (if it has been installed) and removed from the dictionary. The GSQL language will refuse to drop an installed query Q if another query R is installed which calls query Q . That is, all calling queries must be dropped before or at the same time that their called subqueries are dropped.

To drop all queries,, either of the following commands can be used: DROP QUERY ALL DROP QUERY *

The scope of ALL depends on the user's current scope. If the user has set a working graph, then DROP ALL removes all the jobs for that graph. If a superuser has set their scope to be global, then DROP ALL removes all jobs across all graph spaces.

Distributed Query Mode

In a distributed graph (where the data are spread across multiple machines), the default execution plan is as follows:

One machine will be selected as the execution hub, regardless of the number or distributed of starting point vertices.
All the computation work for the query will take place at the execution hub. The vertex and edge data from other machines will be copied to the hub machine for processing.

TigerGraph Enterprise Edition offers a Distributed Query mode which provides a more optimized execution plan for queries which are likely to start at several machines and continue their traversal across several machines.

A set of machines representing one full copy of the entire graph will participate in the query. If the cluster has a replication factor of 2 (so there are two copies of each piece of data), then half the machines will participate.
The query executes in parallel across all the machines which have source vertex data for a given hop in the query. That is, each SELECT statement defines a 1-hop traversal from a set of source vertices to a set of target vertices. Unlike the default mode where all the needed data are brought to one machine, in Distributed Query mode, the computation moves across the cluster, following the traversal pattern of the query.
The output results will be gathered at one machine.

To invoke Distributed Query mode, simply insert the keyword "DISTRIBUTED" before "QUERY" in a query definition:

createQuery := CREATE [OR REPLACE] [DISTRIBUTED] QUERY name "(" [parameterList] ")" FOR GRAPH name
               [RETURNS "("  baseType | accumType ")"]
               [API "(" stringLiteral ")"]
               "{" [typedefs] [declStmts] [declExceptStmts] queryBodyStmts "}"

Guidelines for Selecting Distributed Query Mode

The basic trade-off between distributed query mode and default mode is greater parallelism for the given query vs. using more system resources, which reduces the potential for concurrency with other operations. Each machine has a certain number of workers available for concurrent execution of queries. A query in default mode uses only one worker out of the whole system. (This one worker will have multiple threads for processing edge traversals in parallel.) However, a query in distributed mode uses one query worker per machine. This means this query can run faster, but it leaves fewer workers for other queries running concurrently.

We suggest the following guidelines for deciding whether to use default mode or distributed mode.

Queries with one or a few starting point vertices and which take only a few hops → default mode is better.
Queries which start at a very large set of starting point vertices and which traverse many hops → distributed mode is better. For example, algorithms which either compute a value for every vertex or one value for the entire graph should use Distributed Mode. This includes PageRank, Centrality, and Connected Component algorithms.
For applications where the same query (same logic but with different input parameters) will be run many times in production, the application designer can simply try both modes during development and chose the one which works better for their use case and data.

Supported and Unsupported Features

Currently, Distributed Query mode cannot be used for all queries. Please note the limitations carefully. In most cases, the GSQL parser and compiler will report an error if you try to write a Distributed Query using an unsupported feature.

Support was added for many features in TigerGraph 2.2. The table below has changed significantly between v2.1 and v2.2.

The following GSQL features are not supported in Distributed Query mode:

(1) Items in the Supported column are listed only for clarity, so you can compare to the Unsupported column. If a feature which is supported in non-distributed queries is not mentioned in either column, then it is supported in Distributed Query mode .

(2) If the query contains "LIMIT N", and if the number of GPEs working on this query is G, then the output size will be N +/- (G-1). In a conventional cluster configuration, there is one GPE per machine. For example, if N=10 and the graph is distributed across 4 machines, then the output size will be between 7 and 13, inclusive.

Data Types

This section describes the data types that are native to and are supported by the GSQL Query Language. Most of the data objects used in queries come from one of three sources: (1) the query's input parameters, (2) the vertices, edges, and their attributes which are encountered when traversing the graph, or (3) variables defined within the query that are used to assist in the computational work of the query.

This section covers the following subset of the EBNF language definitions:

Identifiers

An identifier is the name for an instance of a language element. In the GSQL query language, identifiers are used to name elements such as a query, a variable, or a user-defined function. In the EBNF syntax, an identifier is referred as a name . It can be a sequence of letters, digits, or underscores ("_"). Other punctuation characters are not supported. The initial character can only be letter or an underscore.

Overview of Types

Different types of data can be used in different contexts. The EBNF syntax defines several classes of data types. The most basic is called baseType. The other independent types are FILE and STRING COMPRESS. The remaining types are either compound data types built from the independent data types, or supersets of other types. The table below gives an overview of their definitions and their uses.

Base Types

The query language supports the following base types , which can be declared and assigned anywhere within their scope. Any of these base types may be used when defining a global variable, a local variable, a query return value, a parameter, part of a tuple, or an element of a container accumulator. Accumulators are described in detail in a later section.

The default value of each base type is shown in the table below. The default value is the initial value of a base type variable (see Section "Variable Types" for more details), or the default return value for some functions (see Section "Operators, Functions, and Expressions" for more details).

The first seven types (INT, UINT, FLOAT, DOUBLE, BOOL, STRING, and DATETIME) are the same ones mentioned in the "Attribute Data Types" section of the GSQL Language Reference, Part 1 .

FLOAT and DOUBLE input values must be in fixed point d.dddd format, where d is a digit. Output values will be printed in either fixed point for exponential notation, whichever is more compact.

The GSQL Loader can read FLOAT and DOUBLE values with exponential notation (e.g., 1.25 E-7).

VERTEX and EDGE

VERTEX and EDGE are the two types of objects which form a graph. A query parameter or variable can be declared as either of these two types. In additional, the schema for the graph defines specific vertex and edge types (e.g., CREATE VERTEX person ). The parameter or variable type can be restricted by giving the vertex/edge type in angle brackets < > after the keyword VERTEX/EDGE. A VERTEX or EDGE variable declared without a specifier is called a generic type. Below are examples of generic and typed vertex and edge variable declarations:

Vertex and Edge Attribute Types

The following table map s vertex or edge attribute types in the Data Definition Language (DDL) to GSQL query language types. Accumulators are introduced in Section "Accumulators".

JSONOBJECT and JSONARRAY

More details are introduced in the Section entitled "JSONOBJECT and JSONARRAY Functions".

A JSONOBJECT or JSONARRAY value is immutable. No operator is allowed to modify its value.

TUPLE

A tuple is a user-defined data structure consisting of a fixed sequence of baseType variables. Tuple types can be created and named using a TYPEDEF statement. Tuples must be defined first, before any other statements in a query.

A tuple can also be defined in a graph schema and then can be used as a vertex or edge attribute type. A tuple type which has been defined in the graph schema does not need to be re-defined in a query.

The graph schema investmentNet contains two complex attributes:

user-defined tuple SECRET_INFO, which is used for the secret_info attribute in the person vertex.
portfolio MAP<STRING, DOUBLE > attribute, also in the person vertex.

The query below reads both the SECRET_INFO tuple and the portfolio MAP. The tuple type does not need to redefine SECRET_INFO. To read and save the map, we define a MapAccum with the same key:value type as the original portfolio map. (The "Accumulators" chapter has more information about accumulators.) In addition, the query creates a new tuple type, ORDER_RECORD.

STRING COMPRESS

STRING COMPRESS is an integer type encoded by the system to represent string values. STRING COMPRESS uses less memory than STRING. The STRING COMPRESS type is designed to act like STRING: data are loaded and printed just as string data, and most functions and operators which take STRING input can also take STRING COMPRESS input. The difference is in how the data are stored internally. A STRING COMPRESS value can be obtained from a STRING_SET COMPRESS or STRING_LIST COMPRESS attribute or from converting a STRING value.

STRING COMPRESS type is beneficial for sets of string values when the same values are used multiple times. In practice, STRING COMPRESS are most useful for container accumulators like ListAccum<STRING COMPRESS> or SetAccum<STRING COMPRESS>.

An accumulator (introduced in Section "Accumulator") containing STRING COMPRESS stores the dictionary when it is assigned an attribute value or from another accumulator containing STRING COMPRESS. An accumulator containing STRING COMPRESS can store multiple dictionaries. A STRING value can be converted to a STRING COMPRESS value only if the value is in the dictionaries. If the STRING value is not in the dictionaries, the original string value is saved. A STRING COMPRESS value can be automatically converted to a STRING value.

When a STRING COMPRESS value is output (e.g. by PRINT statement, which is introduced in ), it is shown as a STRING.

STRING COMPRESS is not a base type.

FILE Object

A FILE object is a sequential data storage object, associated with a text file on the local machine.

When referring to a FILE object, we always capitalize the word FILE, to distinguish it from ordinary files.

When a FILE object is declared, associated with a particular text file, any existing content in the text file will be erased . During the execution of the query, content written to the FILE will be appended to the FILE. When the query where the FILE was declared finishes running, the FILE contents are saved to the text file.

A FILE object can be passed as a parameter to another query. When a query receives a FILE object as a parameter, it can append data to that FILE, as can every other query which receives this FILE object as a parameter.

Query Parameter Types

Input parameters to a query can be base type (except EDGE, JSONARARY, or JSONOBJECT). A parameter can also be a SET or BAG which uses base type (except EDGE) as the element type. A FILE object can also be a parameter. Within the query, SET and BAG are converted to SetAccum and BagAccum, respectively (See Section "Accumulator" for more details).

A query parameter is immutable . It cannot be assigned a new value within the query.

The FILE object is a special case. It is passed by reference, meaning that the receiving query gets a link to the original FILE object. The receiving query can write to the FILE.

Operators, Functions, and Expressions

An expression is a combination of fixed values, variables, operators, function calls, and groupings which specify a computation, resulting in a data value. This section of the specification describes the literals (fixed values), operators, and functions available in the GSQL query language. It covers the subset of the EBNF definitions shown below. However, more so than in other sections of the specification, syntax alone is not an adequate description. The semantics (functionality) of the particular operators and functions are an essential complement to the syntax.

Constants

Each primitive data type supports constant values:

GSL_UINT_MAX = 2 ^ 64 - 1 = 18446744073709551615

GSQL_INT_MAX = 2 ^ 63 - 1 = 9223372036854775807

GSQL_INT_MIN = -2 ^ 63 = -9223372036854775808

Operators

An operator is a keyword token which performs a specific computational function to return a resulting value, using the adjacent expressions (its operands) as input values. An operator is similar to a function in that both compute a result from inputs, but syntactically they are different. The most familiar operators are the mathematical operators for addition + and subtraction - .

Tip: The operators listed in this section are designed to behave like the operators in MySQL.

Mathematical Operators and Expressions

We support the following standard mathematical operators and meanings. The latter four ("<<" | ">>" | "&" | "|") are for bitwise operations. See the section below: "Bit Operators".

Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence:

Boolean Operators

We support the standard Boolean operators and standard order of precedence: AND, OR, NOT

Bit Operators

Bit operators (<<, >>, &, and |) operate on integers and return an integer.

String Operators

Operator + can be used for concatenating strings.

Tuple Fields

The fields of the tuple can be accessed using the dot operator.

Comparison Operators and Conditions

A condition is an expression which evaluates to a boolean value of either true or false. One type of condition uses the familiar comparison operators. A comparison operator compares two numeric values.

BETWEEN expr AND expr

The expression expr1 BETWEEN expr2 AND expr3 is true if the value expr1 is in the range from expr2 to expr3, including the endpoint values. Each expression must be numeric.

" expr1 BETWEEN expr2 AND expr3 " is equivalent to " expr1 <= expr3 AND expr1 >= expr2".

IS NULL, IS NOT NULL

IS NULL and IS NOT NULL can be used for checking whether an optional parameter is given any value.

Every attribute value stored in GSQL is a valid value, so IS NULL and IS NOT NULL is only effective for query parameters.

LIKE

The LIKE operator is used for string pattern matching. The expression

string1 LIKE string_pattern

evaluates to boolean true if string1 matches the pattern in string_pattern ; otherwise it is false. Both operands must be strings. LIKE may be used only in WHERE clauses. Additionally, string_patternsupports the following wildcard and other symbols, in order to express a pattern:

Mathematical Functions

There are a number of built-in functions which act on either an accumulator, a base type, or vertex variable. The accumulator function calls are discussed in detail in the "Accumulators" section.

Below is a list of built-in functions which act on either INT, FLOAT, or DOUBLE value(s).

Type Conversion Functions

String Functions

The following built-in functions are provided for text processing. Note that these functions do not modify the input parameter. They each return a new string.

In the syntax for trim(), the words in bold ( LEADING, TRAILING, BOTH, and FROM ) are keywords which should appear exactly as shown.
STRING is just an indicator of the datatype; it is not an explicit keyword.
The trim() function have the following options:
- By using one of the keywords LEADING, TRAILING, or BOTH, the user can specify that characters are to be removed from the left end, right end, or both ends of the string, respectively. If none of these keywords is used, the function will removed from both ends.
- removal_char is a single character. The function will remove consecutive instances of removal_char , until it encounters a different character. If removal_char is not specified, then trim() removes whitespace (spaces, tabs, and newlines).

Notes about the trim() function:

Datetime Functions

The following functions convert from/to DATETIME to/from other types.

The following function converts a DATETIME value into a string format specified by the user:

The followings are other functions related to DATETIME :

JSONOBJECT and JSONARRAY Functions

JSONOBJECT and JSONARRAY are base types, meaning they can be used as a parameter type, an element type for most accumulators, or a return type. This enables the input and output of complex, customized data structures. For input and output, a string representation of the JSON is used. Hence, the GSQL query language offers several functions to convert a formatted string into JSON and then to search and access the components of a JSON structure.

Data Conversion Functions

The following parsing functions convert a string into a JSONOBJECT or a JSONARRAY:

Both functions generate a run-time error if the input string cannot be converted into a JSON object or a JSON array. To be properly formatted, besides having the proper nesting and matching of curly braces { } and brackets [ ], each value field must be one of the following: a string (in double quotes "), a number, a boolean ( true or false ), or a JSONOBJECT or JSONARRAY. Each key of a key:value pair must be a string in double quotes.

See examples below.

Data Access Methods

JSONOBJECT and JSONARRAY are object classes, each class supporting a set of data access methods, using dot notation:

jsonVariable.functionName(parameter_list)

The following methods (class functions) can act on a JSONOBJECT variable:

The above getType(STRING keyStr ) function generates a run-time error if

The key keyStr doesn't exist, or
The function's return type is different than the stored value type. See the next note about numeric data.
Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.

The following methods can act on a JSONARRAY variable:

Similar to the methods of JSONOBJECT, the above getType(INT idx ) function generates a run-time error if

idx is out of bounds, or
The function's return type is different than the stored value type. See the next note about numeric data.
Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.

Below is an example of using these functions and methods :

Vertex, Edge, and Accumulator Functions and Attributes

Accessing attributes

Attributes on vertices or edges are defined in the graph schema. Additionally, each vertex and edge has a built-in STRING attribute called type which represents the user-defined type of that edge or vertex. These attributes, including type , can be accessed for a particular edge or vertex with the dot operator.

For example, the following code snippet shows two different SELECT statements which produce equivalent results. The first uses the dot operator on the vertex variable v to access the "subject" attribute, which is defined in the graph schema. The FROM clause in the first SELECT statement necessitates that any target vertices will be of type "post" (also defined in the graph schema). The second SELECT schema checks that the vertex variable v's type is a "post" vertex by using the dot operator to access the built-in type attribute.

Vertex Functions

Below is a list of built-in functions that can be accessed by vertex aliases, using the dot operator:

Currently, these functions are only available for vertex aliases (defined in the FROM clause); vertex variables do not have these functions.

Note that in order to calculate outdegree by edge type, the graph schema must be defined such that vertices keep track of their edge types using WITH STATS="OUTDEGREE_BY_EDGETYPE" (however, "OUTDEGREE_BY_EDGETYPE" is now the default STATS option).

FILTER

The optional .FILTER(condition) clause offers an additional filter for selecting which elements are added to the output set of the neighbor, neighborAttribute and edgeAttribute functions. The condition is evaluated for each element . If the condition is true, the element is added to the output set; if false, it is not. An example is shown below:

Edge Functions

Below are the built-in functions that can be accessed by edge aliases, using the dot operator. Edge functions follow the same general rules as vertex functions (see above).

Accumulator Functions

Accumulator functions for each accumulator type are illustrated at the "Accumulator Type" section.

Set/Bag Expression and Operators

SELECT blocks take an input vertex set and perform various selection and filtering operations to produce an output set. Therefore, set/bag expressions and their operators are a useful and powerful part of the GSQL query language. A set/bag expression can use either SetAccum or BagAccum.

Set/Bag Expression Operators - UNION, INTERSECT, MINUS

The operators are straightforward, when two operands are both sets, the result expression is a set. When at least one operant is a bag, the result expression is a bag. If one operant is a bag and the other is a set, the operator treats the set operant as a bag containing one of each value.

The result of these operators is another set or bag, so these operations can be nested and chained to form more complex expressions, such as

Set/Bag Expression Membership Operators

For example , suppose setBagExpr_A is ("a", "b", "c")

The IN and NOT IN operators support all base types on the left-hand side, and any set/bag expression on the right-hand side. The base type must be the same as the accumulator's element type. IN and NOT IN return a BOOL value.

The following example uses NOT IN to exclude neighbors that are on a blacklist.

Aggregation Functions - COUNT, SUM, MIN, MAX, AVG

The aggregation functions take a set/bag expression as its input parameter and return one value or element.

count() : Returns the size (INT) of the set.
sum() : Returns the sum of all elements. This is only applicable to a set/bag expression with numeric type.
min() : Returns the member with minimum value. This is only applicable to a set/bag expression with numeric type.
max() : Returns the member with maximum value. This is only applicable to a set/bag expression with numeric type.
avg() : Returns the average of all elements. This is only applicable to a set/bag expression with numeric type. The average is INT if the element type of the set/bag expression is INT.

Miscellaneous Functions

SelectVertex()

SelectVertex() reads a data file which lists particular vertices of the graph and returns the corresponding vertex set. This function can only be used in a vertex set variable declaration statement as a seed set. The data file must be organized as a table with one or more columns. One column must be for vertex id. Optionally, another column is for vertex type. SelectVertex() has five parameters explained in the below table: filePath, vertexIdColumn, vertexTypeColumn, separator, and header. The rules for column separators and column headings are the same as for the GSQL Loader.

One vertex set variable declaration statement can have multiple SelectVertex() function calls. However, if a declaration statement has multiple SelectVertex() calls referring to the same file, they must use the same separator and header parameters. If any row of the file contains an invalid vertex type, a run time error occurs; if any row of the file contains an nonexistent vertex id, a warning message is shown with the count of nonexistent ids.

Below is a query example using SelectVertex calls, reading from the data file selectVertexInput.csv.

to_vertex() and to_vertex_set()

to_vertex() and to_vertex_set() convert a string or a string set into a vertex or a vertex set, respectively, of a given vertex type. These two functions are useful when the vertex id(s) are obtained and only known at run-time.

Running these functions requires real-time conversion of an external id to a GSQL internal id, which is a relatively slow process. Therefore,

If the user can always know the id before running the query, define the query with VERTEX or SET<VERTEX> parameters instead of STRING or SET<STRING> parameters, and avoid calling to_vertex() or to_vertex_set().
Calling to_vertex_set() one time is much faster than c alling to_vertex() multiple times . Use to_vertex_set() instead of to_vertex() as much as possible.

The first parameter of to_vertex() is the vertex id string. The first parameter of to_vertex_set() is a string set representing vertex ids. The second parameter of both functions is the vertex type string.

to_vertex_set can accept a bag of vertices as input, but the function will reduce the bag to a set by eliminating duplicate items.

If the vertex id or the vertex type doesn't exist, to_vertex() will have a run-time error, as shown below. However, to_vertex_set() will have a run-time error only if the vertex type doesn't exist. If one or more vertex ids are nonexistent, to_vertex_set() will display a warning message but will still run, converting all valid ids and skipping nonexistent vertex ids. If the user wants an error instead of a warning if a nonexistent id is given when converting a string set to a vertex set, the user can use to_vertex() inside a FOREACH loop, instead of to_vertex_set(). See the example below .

getvid()

The getvid(v) function returns the internal ID number of the given vertex v. The internal ID is not the primary_id which the user assigned when creating the vertex. However, there is a 1-to-1 mapping between the external ID (primary_id) and internal ID. The engine can access the internal ID faster than accessing the external ID, so if a query needs unique values for a large number of vertices, but doesn't care about particular values, getvid() can be a useful option.

For example, in many community detection algorithms, we start by assigning every vertex a unique community ID. Then, as the algorithm progresses, some vertices will join the community of one of their neighbors, giving up their current community ID and copying the IDs of their neighbors.

COALESCE()

The COALESCE function evaluates each argument value in order, and returns the first value which is not NULL. This evaluation is the same as that used for IS NULL and IS NOT NULL. The COALESCE function requires all its arguments have the same data type (BOOL, INT, FLOAT, DOUBLE, STRING, or VERTEX). The only exception is that different numeric types can be used together. In this case, all values are converted into the first argument type.

The COALESCE function is useful when multiple optional parameters are allowed, and one of them must be chosen if available. For example,

The COALESCE function's parameter list should have a default value as the last argument. Otherwise, i f all values are NULL, the default value of the data type is returned.

Dynamic Expressions with EVALUATE()

The function evaluate() takes a string argument and interprets it as an expression which is evaluated during run-time. This enables users to create a general purpose query instead of separate queries for each specific computation.

The evaluate() function has two parameters: expressionStr is the expression string, and typeStr is a string literal indicating the type of expression. This function returns a value whose type is typeStr and whose value is the evaluation of expressionStr. The following rules apply:

evaluate() can only be used inside a SELECT block, and only inside a WHERE clause, ACCUM clause, POST-ACCUM clause, HAVING clause, or ORDER BY clause. It cannot be used in a LIMIT clause or outside a SELECT block.
The result type must be specified at query installation time: typeStr must be a string literal for a primitive data type, e.g., one of "int", "float", "double", "bool", "string" (case insensitive). The default value is "bool".
In expressionStr, identifiers can refer only to a vertex or edge aliases, vertex-attached accumulators, global accumulators, parameters, or scalar function calls involving the above variables. The expression may not refer to local variables, global variables, or to FROM clause vertices or edges by type.
Any accumulators in the expression must be scalar accumulators (e.g., MaxAccum) for primitive-type data. Container accumulators (e.g., SetAccum) or scalar accumulators with non-primitive type (e.g. VERTEX, EDGE, DATETIME) are not supported. Container type attributes are not supported.
evaluate() cannot be nested.

The following situations generate a run-time error:

The expression string expressionStr cannot be compiled (unless the error is due to a non-existent vertex or edge attribute).
The result type of the expression does not match the parameter typeStr.

Silent failure conditions

If any of the following conditions occur, the query may continue running, but the entire clause or statement in which the evaluate() function resides will fail, without producing a run-time error message. For conditional clauses (WHERE, HAVING), a failing evaluate() clause is treated as if the condition is false. An assignment statement with a failing evaluate() will not execute, and an ORDER BY clause with a failing evaluate() will not sort.

The expression references a non-existent attribute of a vertex or edge alias.
The expression uses an operator for non-compatible operation. For example, 123 == "xyz".

The following example employs dynamic expressions in both the WHERE condition and the accumulator value in the POST-ACCUM clause.

Queries as Functions

A query that has been defined (with a CREATE QUERY ... RETURNS statement) can be treated as a callable function. A query can call itself recursively.

The following limitations apply to queries calling queries:

Each parameter of the called query may be one of the following types:
1. Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL
2. VERTEX
3. A Set or Bag of primitive or VERTEX elements
The return value may be one of the following types. See also the "Return Statement" section.
1. Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL
2. VERTEX
3. a vertex set (e.g., the result of a SELECT statement)
4. An accumulator of primitive types. GroupByAccum and accumulators containing tuples are not supported.
A query which returns a SetAccum or BagAccum may be called with a Set or Bag argument, respectively.
The order of definition matters. A query cannot call a query which has not yet been defined.

User-Defined Functions

Users can define their own expression functions in C++ in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp. Only bool, int, float, double, and string (NOT std::string) are allowed as the return value type and the function argument type. However, any C++ type is allowed inside a function body. Once defined, the new functions will be added into GSQL automatically next time GSQL is executed.

If a user-defined struct or a helper function needs to be defined, define it in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprUtil.hpp.

Here is an example:

If any code in ExprFunctions.hpp or ExprUtil.hpp causes a compilation error, GSQL cannot install any GSQL query, even if the GSQL query doesn't call any user-defined function. Therefore, please test each new user-defined expression function after adding it. One way of testing the function is creating a new cpp file test.cpp and compiling it by > g++ test.cpp > ./a.out You might need to remove the include header #include <gle/engine/cpplib/headers.hpp> in ExprFunction.hpp and ExprUtil.hpp in order to compile.

Examples of Expressions

Below is a list of examples of expressions. Note that ( argList ) is a set/bag expression, while [ argList ] is a list expression.

Examples of Expression Statements

Declaration and Assignment Statements

Previous sections focused on the lowest level building blocks of queries: data types (Section 3), operators, functions, and expressions (Section 5), and a special section devoted to accumulators (Section 4). We now begin to look at the types of statements available in GSQL queries. This section focuses on declaration and assignment statements. Later sections will provide a closer look at the all-important SELECT statement, control flow statements and data modification statements. Furthermore, some types of statements can be nested within SELECT, UPDATE, or control flow statements.

This section covers the following subset of the EBNF syntax:

Declaration Statements

There are six types of variable declarations in a GSQL query:

Accumulator
Global baseType variable
Local baseType variable
Vertex set
File object
Vertex or Edge aliases

The first five types each have their own declaration statement syntax and are covered in this section. Aliases are declared implicitly in a SELECT statement.

Accumulators

Accumulator declaration is discussed in Section 4: "Accumulators".

Global Variables

After accumulator declarations, base type variables can be declared as global variables. The scope of a global variable is from the point of declaration until the end of the query.

A global variable can be accessed (read) anywhere in the query; however, there are restrictions on wh ere it can be updated. See the subsection below on "Assignment Statements".

Multiple global variables of the same type can be declared and initialized at the same line, as in the example below:

Local Variables

A local variable can be declared only in an ACCUM, POST-ACCUM, or UPDATE SET clause, and its scope is limited to that clause. Local variables can only be of base types (e.g. INT, FLOAT, DOUBLE, BOOL, STRING, VERTEX). A local variable must be declared and initialized together at the same statement.

Within a local variable's scope, another local variable with the same name cannot be declared at the same level. However, a new local variable with the same name can be declared at a lower level (i.e., within a nested SELECT or UPDATE statement.) . The lower declaration takes precedence at the lower level.

In a POST-ACCUM clause, each local variable may only be used in source vertex statements or target vertex statements, not both.

Vertex Set Variable Declaration and Assignment

Vertex set variables play a special role within GSQL queries. They are used for both the input and output of SELECT statements. Therefore, before the first SELECT statement in a query, a vertex set variable must be declared and initialized. This initial vertex set is called the seed set .

The query below lists all ways of assigning a vertex set variable an initial set of vertices (that is, forming a seed set).

a vertex parameter, untyped (S1) or typed (S2)
a vertex set parameter, untyped (S3) or typed (S4)
a global SetAccum<VERTEX> accumulator, untyped (S5) or typed (S6)
all vertices of any type (S7, S9) or of one type (S8)
a list of vertex ids in an external file (S10)
copy of another vertex set (S11)
a combination of individual vertices, vertex set parameters, or global variables (S12)
union of vertex set variables (S13)

When declaring a vertex set variable, a set of vertex types can be optionally specified to the vertex set variable. If the vertex set variable set type is not specified explicitly, the system determines the type implicitly by the vertex set value. The type can be ANY, _ (equivalent to ANY), or any explicit vertex type(s). See the EBNF grammar rule vertexEdgeType.

Declaration syntax difference: vertex set variable vs. base type variable

In a vertex set variable declaration, the type specifier follows the variable name and should be surrounded by parentheses: vSetName (type) This is different than a base type variable declaration, where the type specifier comes before the base variable name: type varName

After a vertex set variable is declared, the vertex type of the vertex set variable is immutable. Every assignment (e.g. SELECT statement) to this vertex set variable must match the type. The following is an example in which we must declare the vertex set variable type.

In the above example, the query returns the set of vertices after a 5-step traversal from the input "person" vertex. If we declare the vertex set variable S without explicitly giving a type, because the type of vertex parameter m1 is "person", the GSQL engine will implicitly assign S to be "person"-type. However, if S is assigned to "person"-type, the SELECT statement inside the WHILE loop causes a type checking error, because the SELECT block will generate all connected vertices, including non-"person" vertices. Therefore, S must be declared as a ANY-type vertex set variable.

FILE Object Declaration

A FILE object is a sequential text storage object, associated with a text file on the local machine.

When referring to a FILE object, we always capitalize the word FILE, to distinguish it from ordinary files.

When a FILE object is declared, associated with a particular text file, any existing content in the text file will be erased . During the execution of the query, content written to or printed to the FILE will be appended to the FILE. When the query where the FILE was declared finishes running, the FILE contents are saved to the text file.

Note that the declaration statement is invoking the FILE object constructor. The syntax for the constructor includes parentheses surrounding the filepath parameter.

Currently, the filePath must be a absolute path.

Assignment and Accumulate Statements

Assignment statements are used to set or update the value of a variable, after it has been declared. This applies to baseType variables, vertex set variables, and accumulators. Accumulators also have the special += accumulate statement, which was discussed in the Accumulator section. Assignment statements can use expressions (expr) to define the new value of the variable.

Restrictions on Assignment Statements

In general, assignment statements can take place anywhere after the variable has been declared. However, t here are some restrictions. These restrictions apply to "inner level" statements which are within the body of a higher-level statement:

The ACCUM or POST-ACCUM clause of a SELECT statement
The SET clause of an UPDATE statement
The body of a FOREACH statement

Global accumulator assignment "=" is not permitted within the body of SELECT or UPDATE statements
Global variable assignment is permitted in ACCUM or POST-ACCUM clauses, but the change in value will not take place until exiting the clause. Therefore, if there are multiple assignment statements for the same variable, only the final one will take effect.
Vertex attribute assignment "=" is not permitted in an ACCUM clause. However, edge attribute assignment is permitted. This is because the ACCUM clause iterates over an edge set.
There are additional restrictions within FOREACH loops for the loop variable. See the Data Modification section.

LOADACCUM Statement

LOADACCUM() can initialize a global accumulator by loading data from a file. LOADACCUM() has 3+n parameters explained in the table below: (filePath, fieldColumn_1, ...., fieldColumn_n, separator, header), where n is the number of fields in the accumulator. One assignment statement can have multiple LOADACCUM() function calls. However, every LOADACCUM() referring to the same file in the same assignment statement must use the same separator and header parameter values.

Any accumulator using generic VERTEX as an element type cannot be initialized by LOADACCUM().

Below is an example with an external file

Function Call Statements

Typically, a function call returns a value and so is part of an expression (see Section 5 - Operators, Functions and Expressions). In some cases, however, the function does not return a value (i.e., returns VOID) or the return value can be ignored, so the function call can be used as an entire statement. This is a Function Call Statement.

SELECT Statement

This section discusses the SELECT statement in depth and covers the following EBNF syntax:

The SELECT block selects a set of vertices FROM a vertex set or edge set . There are a number of optional clauses that define and/or refine the selection by constraining the vertex or edge set or the result set. There are two types of SELECT, vertex-induced and edge-induced . Both result in a vertex set, known as the result set .

Size limitation

There is a maximum size limit of 2GB for the result set of a SELECT block . If the result of the SELECT block is larger than 2GB, the system will return no data. NO error message is produced.

SELECT Statement Data Flow

The SELECT statement is an assignment statement with a SELECT block on the right hand side. The SELECT block has many possible clauses, which fit together in a logical flow. Overall, the SELECT block starts from a source set of vertices and returns a result set that is either a subset of the source vertices or a subset of their neighboring vertices. Along the way, computations can be performed on the selected vertices and edges. The figure below graphically depicts the overall SELECT data flow. While the ACCUM and POST-ACCUM clauses do not directly affect which vertices are included in the result set, they affect the data (accumulators) which are attached to those vertices.

FROM Clause: Vertex and Edge Sets

There are two options for the FROM clause: vertexSet or edgeSet. If vertexSet is used, then the query will be a vertex-induced selection. If edge is used, then the query is an edge-induced selection.

Vertex-Induced Selection

A vertex-induced selection takes an input set of vertices and produces a result set, which is a subset of the input set. The FROM argument has the form Source:s , where Source is a vertex set. Sourceis optionally followed by :s , where s is a vertex alias which represents any vertex in the set Source.

This statement can be interpreted as " Select all vertices s, from the vertex set Source ." The result is a vertex set.

Below is a simple example of a vertex-induced selection.

Edge-Induced Selection

Multiple types can also be specified by using delimiter "|". Additionally, the keywords "_" or "ANY" can be used for denoting a set which can include any vertex or edge type.

An edge-induced selection starts from a set of vertices, defines a set of edges incident to that set, and produces a result set of vertices that are also incident to those edges. Typically, this is used to traverse from a set of source vertices over a specific edge type to a set of target vertices. The FROM clause argument (defined formally by the EBNF edgeSet rule) is structured as an edge template:Source:s-(eType:e)->tType:t . The edge template has three parts: the source vertex set (Source), the edge type or types (eType), and the target vertex type or types (tType). Both s and t are the vertex aliases and e is the edge alias. The template defines a pattern s → e → t, from source vertex s, across eType edges, to tType target vertices. The edge alias e represents any edge that fits the complete pattern. Likewise, s and t are aliases that represent any source vertices and target vertices, respectively, that fit the complete pattern.

Either the source vertex set ( s ) or target vertex set ( t ) can be used as the SELECT argument, which determines the result of the SELECT statement. Note the small difference in the two SELECT statements below.

resultSet1 is based on the source end of the edges. resultSet2 is based on the target end of the selected edges. However, resultSet1 is NOT identical to the Source vertex set. It is only those members of Source which connect to an eType edge and then to a tType vertex. Other clauses (presented later in this "SELECT Statement" section, can do additional filtering of the Source set.

We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features which are only available to vertex and edge aliases.

Edge Set and Target Vertex Set Options

The FROM clause chooses edges and target vertices by type. The EBNF symbol vertexEdgeType describes the options:

Note that eType and tType are optional. If eType/tType is omitted (or if ANY or _ is used), then the SELECT will seek out any edge or target vertex that is valid (i.e., there exists a valid path between two vertices over an edge). For the example below, if V1 and V2 are the only possible reachable vertex types via eType , we can omit the target vertex type, making all of the following SELECT statements equivalent. The system will infer the target vertex type at run time.

If is legal to declare an alias without explicitly stating an edge/target type. See the examples below.

Type inference is used whenever possible for the edge set and target vertex set to prune ineligible edges and thereby optimize performance. The vertex type in Source is checked against the graph schema to find all incident edge types. The knowledge of the graph schema is combined with the selection's explicit type conditions given by eType and tType, as well as explicit and implicit type conditions in the WHERE clause to determine a final set of eligible edge sets which match the pattern Source → eType → tType. With type inference, the user has the freedom to express only as much as necessary to select edges.

Similarly, the GSQL engine will infer the edge type at run time. For example, if E1, E2 , and E3 are the only possible edge types that can be traversed to reach vertices of type tType , we can omit specifying the edge type, making the following SELECT statements equivalent.

The following are a set of queries that demonstrate edge-induced SELECT blocks. The allPostsLiked and allPostsMade queries show how the target vertex type can be omitted. The allPostsLikedOrMade query uses the "|" operator to select multiple types of edges.

This example is another edge selection that uses the "|" operator to select edges that have target vertices of multiple types.

Vertex and Edge Aliases

Vertex and edge aliases are declared within the FROM clause of a SELECT block, by using the character ":", followed by the alias name. Aliases can be accessed anywhere within the same SELECT block. They are used to reference a single selected vertex or edge of a set. It is through the vertex or edge aliases that attributes of these vertices or edges can be accessed.

For example, the following code snippets shows two different SELECT statements. The first SELECT statement starts from a vertex set called allVertices, and the vertex alias name v can access each individual vertex from allVertices. The second SELECT statement selects a set of edges. It can use the vertex alias s to reference the source vertices, or the alias t to reference the target vertices.

The following example shows an edge-based SELECT statement, declaring aliases for all three parts of the edge. In the ACCUM clause, the e and t aliases are assigned to local vertex and edge variables.

We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features which are only available to vertex and edge aliases.

SAMPLE Clause

The SAMPLE clause is an optional clause that selects a uniform random sample from the population of edges or vertices specified in the FROM argument. To be clear, the edge population consists of those edges which satisfy all three parts – source set, edge type, and target type – of the FROM clause. The SAMPLE clause is intended to provide a representative sample of the distribution of edges (or vertices) connected to hub vertices, instead of dealing with all edges. A hub vertex is a vertex with a relatively high degree. (The degree of a vertex is the number of edges which connect to it. If edges are directional, one can distinguish between indegree and outdegree.)

Note

Currently, the WHEN condition that can be used with a SAMPLE clause is limited strictly to checking if the result of a function call on a vertex is greater than or greater than/equal to some number.

The expression following SAMPLE specifies the sample size, either an absolute number or a percentage of the population. The expression in sampleClause must evaluate to a positive integer. There are two sampling methods. One is sampling based on edge id. The other is based on target vertex id: if a target vertex id is sampled, all edges from this source vertex to the sampled target vertex are sampled.

Given that the sampling is random, some of the details of each of the example queries may change each time they are run.

The following query displays two modes of sampling: an absolute number of edges from a source vertex and a percentage of edges fro a source vertex. We use the computerNet graph (see Appendix D). In computerNet, there are 31 vertices and 43 edges, but only 7 vertices are source vertices. Moreover, c1, c12, and c23 are hub nodes, with at least 10 outgoing edges each. For the absolute count case, we set the size to 1 edge per source vertex, which is equivalent to a random walk. We expect exactly 7 edges to be selected. For the percentage sampling case, we sample 33% of the edges for vertices which have 3 or more outgoing edges. We expect about 15 edges, but the number may vary.

Below is an example of using SELECT to only traverse one edge for each source vertex. The vertex-attached accumulators @timesTraversedNoSample and @timesTraversedWithSample are used to keep track of the number of times an edge is traversed to reach the target vertex. Without using sampling, this occurs once for each edge; thus @timesTraversedNoSample has the same number as the in-degree of the vertex. With sampling edges, the number of edges is restricted. This is reflected in the @timesTraversedWithSample accumulator. Notice the difference in the result set. Because only one edge per source vertex is traversed when the SAMPLE clause is used, not all target vertices are reached. The vertex company3 has 3 incident edges, but in one instance of the query execution, it is never reached. Additionally, company2 has 6 incident edges, but only 4 source vertices sampled an edge incident to company2 .

Since the PRINT statements are placed at the end of query, the two vertex sets beforeSample and afterSample are almost identical, showing the final values of both accumulators@timesTraversedNoSample and @timesTraversedWithSample. There is one difference: company3 is not included in afterSample because none of the sample-selected edges reached company3.

WHERE Clause

The WHERE clause is an optional clause that constrains edges and vertices specified in the FROM and SAMPLE clauses.

The WHERE clause uses a boolean condition to test each vertex or edge in the FROM set (or the sampled vertex and edge sets, if the SAMPLE clause was used).

If the expression evaluates to false for vertex/edge X, then X excluded from further consideration in the result set. The expression may use constants or any variables or parameters within the scope of the SELECT, arithmetic operators (+, -, *, /,%), comparison operators (==, !=, <, <=, >,>=), boolean operators (AND, OR, NOT), set operators (IN, NOT IN) and parentheses to enforce precedence. The WHERE conditional expression may use any of the variables within its scope (global accumulators, vertex set variables, query input parameters, the FROM clause's vertex and edge sets (or their vertex and edge aliases), or any of the attributes or accumulators of the vertex/edge sets.) For a more formal explanation of condition, see the EBNF definitions of condition and expr.

Using built-in vertex and edge attributes and functions, such as .type and .neighbors(), the WHERE clause can be used to implement sophisticated selection rules for the edge traversal. In the following example, the selection conditions are completely specified in the WHERE clause, with no edge types or vertex types mentioned in the FROM clause.

The following examples demonstrate using the WHERE clause to limit the resulting vertex set based on a vertex attribute.

WHERE NOT limitations

The NOT operator may not be used in combination with the .type attribute selector. To check if an edge or vertex type is not equal to a given type, use the != operator. See the example below.

The following example shows the equivalence of using WHERE as a type filter as well as its limitations.

The following example uses edge attributes to determine which workers are registered as full time for some company.

If multiple edge types are specified in edge-induced selection, the WHERE clause should use OR to separate each edge type or each target vertex type. For example,

The above query is compilable. However, if we use line 5 as the WHERE clause instead, the query is not compilable. The edge-type conflict checking detects an error, because i t uses attributes from both "liked" edges and "friend" edges without separating them out by OR.

ACCUM and POST-ACCUM Clauses

The optional ACCUM and POST-ACCUM clauses enable sophisticated aggregation and other computations across the set of vertices or edges selected by the preceding FROM, SAMPLE, and WHERE clauses. A query can contain one or both of these clauses. The statements in an ACCUM clause are applied for every edge in an edge-induced selection or every vertex in a vertex-induced selection.

If there is more than one statement in the ACCUM clause, the statements are separated by commas and executed sequentially for each selected element. However, the TigerGraph system uses parallelism to improve performance. Within an ACCUM clause, each edge is handled by a separate process. As such, there is no fixed order in which the edges are processed within the ACCUM clause and the edges should not be treated as executing sequentially. The accumulators are mutex variables shared among each of these processes. The results of any accumulation within the ACCUM clause is not complete until all edges are traversed. Any inspection of an intermediate result within the ACCUM is incomplete and may not be that meaningful.

The statements within the ACCUM clause are executed sequentially for a given vertex or edge. However, there is no fixed order in which a vertex set or edge set is processed.

The optional POST-ACCUM clause enables aggregation and other computations across the set of vertices (but not edges) selected by the preceding clauses. POST-ACCUM can be used without ACCUM. If it is preceded by an ACCUM clause, then it can be used for 2-stage accumulative computation: a first stage in ACCUM followed by a second stage in POST-ACCUM.

As of v1.1, the keyword POST-ACCUM may also be spelled with an underscore: POST_ACCUM.

Each statement within the POST-ACCUM clause can refer to either source vertices or target vertices but not both.

In edge-induced selection, since the ACCUM clause iterates over edges, and often two edges will connect to the same source vertex or to the same target vertex, the ACCUM clause can be repeated multiple times for one vertex.

Operations that are to be performed exactly once per vertex should be performed in the POST-ACCUM clause.

The primary purpose of the ACCUM or POST-ACCUM clause is to collect information about the graph by updating accumulators (via += or =). See the "Accumulator" section for details on the += operation. However, other kinds of statements (e.g., branching, iteration, local assignments) are permitted to support more complex computations or to log activity. The EBNF syntax below defines the allowable kinds of statements that can occur within an ACCUM or POST-ACCUM. The DMLSubStmt list is similar to the queryBodyStmt list which applies to statements outside of a SELECT block; it is important to note the differences. Each of these statement types is discussed in one of the main sections of this reference document.

Note that DML-sub-statements do not include global accumulator assignment statement (gAccumAssignStmt) but global accumulator accumulation statement (gAccumAccumStmt). Global accumulators may perform accumulation += but not assignment "=" within these clauses.

There are additional restrictions on DML-sub level statements:

Global variable assignment is permitted in ACCUM or POST-ACCUM clauses, but the change in value will not take place until the query completes. Therefore, if there are multiple assignment statements for the same variable, only the final one will take effect.
Vertex attribute assignment "=" is not permitted in an ACCUM clause. However, edge attribute assignment is permitted. This is because the ACCUM clause iterates over an edge set. Vertex attribute attribute assignment is permitted in the POST-ACCUM clause. Like all updates, the change in value does not take place until the query completes.

Aliases and ACCUM/POST-ACCUM Iteration Model

To reference each element of the selected set, use the aliases defined in the FROM clause. For example, assume that we have the following aliases:

Let (V1, V2,... Vn) be the vertices in the vertex-induced selection . The following pseudocode emulates ACCUM clause behavior.

Let E = (E1, E2,... En) be the edges in the edge-induced selected set. Further, let S = (S1,S1,...Sn) and T= (T1,T2,...Tn) be the multisets (bags) of source vertices and target vertices which correspond to the edge set. S and T are bags, because they can contain repeated elements.

Note that any reference to the source alias s or target alias t is for the endpoint vertices of the current edge.

Similarly, the POST-ACCUM clause acts like a FOREACH loop on the vertex result set specified in the SELECT clause (e.g., either S or T).

Edge/Vertex Type Inference and Conflict

If multiple edge types are specified in edge-induced selection, each ACCUM statement in ACCUM clause checks whether edge types are conflicted. If only a subset of edge types are effective in an ACCUM statement , this statement is not executed on other edge types. For example:

In the above example, line 6 is only executed on "liked" edges, because "actionTime" is the attribute of "liked" edge only. Similarly, line 7 is only executed on "friend" edges, because "gender" is the attribute of "person" only, and only "friend" edge uses "person" as target vertex. However, line 8 causes a compilation error, because it uses multiple edges where some edges cannot be supported in a part of the statement, i.e., "liked" edges doesn't have t.gender, "friend" edges doesn't have e.actionTime.

We strongly suggest that if multiple edge types are specified in edge-induced selection, ACCUM clauses should uses CASE statement (see Section "Control Flow Statements" for more details) to separate the operation on each edge type or each target vertex type (or combination of target vertex type and edge type). The edge-type conflict checking then checks the ACCUM statement inside each THEN/ELSE blocks based on the condition. For example,

The above query is compilable. However, if we switch line 8 and line 10, the edge-type conflict checking generates errors because "liked" edges doesn't support t.gender and "friend" edges doesn't support e.actionTime.

Similar to the ACCUM clause, if multiple source/target vertex types are specified in edge-induced selection and the POST-ACCUM clauses accesses source/target vertex, each ACCUM statement in POST-ACCUM clause checks whether source/target vertex types are conflicted. If only a subset of source/target vertex types are effective in a POST-ACCUM statement, this statement is not executed on other source/target vertex types.

Similar to ACCUM clause, we strongly suggest that if multiple source/target vertex types are specified in edge-induced selection and the POST-ACCUM clauses accesses source/target vertex, POST-ACCUM clauses should uses CASE statement (see Section "Control Flow Statements" for more details) to separate the operation on each source/target vertex type. The vertex type conflict checking then checks the ACCUM statement inside each THEN/ELSE blocks based on the condition.

Rules for Updating Vertex-Attached Accumulators

Prior to v1.0, a vertex-attached accumulator could only be updated in an ACCUM or POST-ACCUM clause and only if its vertex was selected for by the preceding FROM-SAMPLE-WHERE clauses.

Beginning in v1.0, there are additional circumstances where a vertex-attached accumulator may be updated. Vertices which are referenced via a vertex-attached accumulator of a selected vertex may have their vertex-attached accumulators updated in the ACCUM clause (but not in the POST-ACCUM clause). That is, a vertex referenced by an selected vertex can be updated, with some limitations explained below. Some examples will help to illustrate this more complex condition.

Suppose a query declares a vertex-attached accumulator which holds vertex information . We call this a vertex-holding accumulator . This could take several forms:
- A scalar accumulator, e.g., MaxAccum< VERTEX > @maxV;
- A collection accumulator: e.g., ListAccum< VERTEX > @listV;
- An accumulator containing tuple(s), where the tuple type contains a VERTEX field.
If a vertex V is selected, then not only can V's accumulators be updated, but the vertices stored in its vertex-holding accumulators can also be updated, in the ACCUM clause.
Before these indirectly referenced vertices can be used, they need to be activated . There are two ways to activate an indirect vertex:
- A vertex from a vertex-holding accumulator is first assigned to a local vertex variable. The vertex can now be updated through the local vertex variable.

A FOREACH loop can iterate on a vertex-holding collection accumulator. The vertices can now be updated through the loop variable.

The following uses are NOT supported by the new rules:

Indirectly activated vertices may not be updated in the POST-ACCUM clause or outside of a SELECT statement.
Passing a vertex into the query as an input parameter is not a route to activation.
Using a global vertex-holding accumulator is not a route to activation.
If a vertex is being indirectly activated by assigning it to a local variable (e.g., a variable declaring in ACCUM or POST-ACCUM), note the following rule, which always applies to all local variables:
- A local variable can be declared and initialized in an ACCUM block once. It cannot be redeclared or reassigned later in the ACCUM block.

The following query demonstrates updates to indirectly activated vertices.

ACCUM and POST-ACCUM Examples

We now show several examples. This example demonstrates how ACCUM or POST-ACCUM can be used to count the number of vertices in the given set.

This example uses ACCUM to find all the subjects a user posted about.

This example shows each person's posted vertices and each person's like behaviors (liked edges).

This example counts the total number of times each topic is used.

This is an example of using ACCUM and POST-ACCUM in conjunction. The ACCUM traverses the graph and finds all people who live and work in the same country. After this is determined, POST-ACCUM examines each vertex (person) to see if they work where they live.

This is an example of a POST-ACCUM only that counts the number people with a particular gender.

HAVING Clause

The optional HAVING clause provides constraints on the result set of the SELECT. The constraints are applied after ACCUM and POST-ACCUM actions. This differs from the WHERE clause, which is applied before the ACCUM and POST-ACCUM actions.

A HAVING clause can only be used if there is an ACCUM or POST-ACCUM clause . The condition is applied to each vertex in the SELECT set (either source or target vertices) which also fulfilled the FROM and WHERE conditions. The HAVING clause is intended to test one or more of the accumulator variables that were updated in the ACCUM or POST-ACCUM clause, though the condition may be anything that equates to a boolean value. If the condition is false for a particular vertex, then that vertex is excluded from the result set.

The following example demonstrates using the HAVING clause to constrain a result set based on the vertex accumulator variable which was updated during the ACCUM clause.

If the activityThreshold parameter is set to 3, the query returns 5 vertices:

If the activityThreshold parameter is set to 2, the query would return 8 vertices. With activityThreshold = 4, the query would return no vertices.

The following example demonstrates the equivalence of a SELECT statement in which the condition for the HAVING clause is always true.

The following shows an example of equivalent result sets from using WHERE vs. HAVING. Recall that the WHERE clause is evaluated before the ACCUM and that the HAVING clause is evaluated after the ACCUM. Both constrain the result set based on a condition that vertices must meet.

The following example has a compilation error because the result set is taken from the source vertices, but the HAVING condition is checking the target vertices.

ORDER BY Clause

The optional ORDER BY clause sorts the result set.

ASC specifies ascending order (least value first), and DESC specifies descending order (greatest value first). If neither is specified, then ascending order is used. Each expr must refer to the attributes or accumulators of a member of the result set, and the expr must evaluate to a sortable value (e.g., a number or a string). ORDER BY offers hierarchical sorting by allowing a comma-separated list of expressions, sorting first by the leftmost expr. It uses the next expression only to sort items where the current sort expr results in identical values. Any items in the result set which cannot be sorted (because the sort expressions do not pertain to them) will appear at the end of the set, after the sorted items.

The following example demonstrates the use of ORDER BY with multiple expressions. The returned vertex set is first ordered by the number of friends of the vertex, and then ordered by the number of coworkers of that vertex.

LIMIT Clause

The optional LIMIT clause sets constraints on the number and ranking of items included in the final result set.

Each of the expr must evaluate to a nonnegative integer. To understand LIMIT, note that the tentative result set is held in the computer as a list of vertices. If the query has an ORDER BY clause, the order is specified; otherwise the list order is unknown. Assume we number the vertices as v_1 , v_2 , ..., v_n . The LIMIT clause specifies a range of vertices, starting from a lower position in the list to an upper position.

There are three forms:

Case 1: LIMIT k

When a single expr is provided, LIMIT returns the first k elements from the tentative result set. If there are fewer than k elements available, then all elements will be returned in the result set. If k=5 and the tentative result set has at least 5 items, then the final result list will be [ v_1 , v_2 , v_3 , v_4 , v_5 ].

Case 2: LIMIT j, k

When a comma separates two expressions, LIMIT treats the first expression j as an offset. That is, it skips the first j items in the list. The second expr k tells the maximum number of items items to include. If the list has at least 7 items, then LIMIT 2, 5 would return [ v_3 , v_4 , v_5, v_6 , v_7 ].

Case 3: LIMIT k OFFSET j

The behavior of Case 3 is the same as that of Case 2, except that the syntax is different. The keyword OFFSET separates the two expressions, and the count comes before the offset, rather than vice versa. If the list has at least 7 items, then LIMIT 5 OFFSET 2 would return [ v_3 , v_4 , v_5, v_6 , v_7 ].

If any of the expressions evaluate to a negative integer, the results are undefined.

OFFSET is intended for result sets which are in a known order. It is a compile time error to use OFFSET without the ORDER BY clause.

The following examples demonstrate the various forms of the LIMIT clause.

The first example shows the LIMIT clause when used as an upper limit. It returns a result set with a maximum size of 4 elements in the set.

The following example shows how to use the LIMIT clause with an offset.

The following example shows the alternative syntax for a result size limit with an offset. This time we try larger values for offset and size. In a large data set, limitTest(5,20) might return 20 vertices, but since we don't have 25 vertices in the original data, the output was fewer than 20 vertices.

Control Flow Statements

The GSQL Query Language includes a comprehensive set of control flow statements to empower sophisticated graph traversal and data computation: IF/ELSE, CASE, WHILE, and FOREACH.

Differences in Block Syntax

Note that any of these statements can be used as a query-body statement or as a DML-sub level statement.

If the control flow statement is at the query-body level, then its block(s) of statements are query-body statements ( queryBodyStmts ). In a queryBodyStmts block , each individual statement ends with a semicolon, so there is always a semicolon at the end.

If the control flow statement is at the DML-sub level, then its block(s) of statements are DML-sub statements ( DMLSubStmtList ). In a DMLSubStmtList block, a comma separates statements, but there is no punctuation at the end.

The "Statement Types" subsection in the Chapter on "CREATE / INSTALL / RUN / SHOW / DROP QUERY" has a more detailed general example of the difference between queryBodyStmts and DMLSUbStmts.

IF Statement

If a particular IF condition is not true, then the flow proceeds to the next ELSE IF condition. When a true condition is encountered, its corresponding block of statements is executed, and then the IF statement terminates (skipping any remaining ELSE-IF or ELSE clauses). If an ELSE-clause is present, its block of statements are executed if none of the preceding conditions are true. Overall, the functionality can be summarized as "execute the first block of statements whose conditional test is true."

CASE Statement

One CASE statement contains one or more WHEN-THEN clauses, each WHEN presenting one expression. The CASE statement may also have one ELSE clause whose statements are executed if none of the preceding conditions are true.

There are two syntaxes of the CASE statement: one equivalent to an if-else statement, and the other is structured like a switch statement. The if-else version evaluates the boolean condition within each WHEN-clause and executes the first block of statements whose condition is true. The optional concluding ELSE-clause is executed only if all WHEN-clause conditions are false.

The switch version evaluates the expression following the keyword WHEN and compares its value to the expression immediately following the keyword CASE. These expressions do not need to be boolean; the CASE statement compares pairs of expressions to see if their values are equal. The first WHEN-THEN clause to have an expression value equal to the CASE expression value is executed; the remaining clauses are skipped. The optional ELSE-clause is executed only if no WHEN-clause expression has a value matching the CASE value.

WHILE Statement

The WHILE statement iterates over its body ( queryBodyStmts or DMLSubStmtList ) until the condition evaluates to false or until the iteration limit is met. A condition is any expression that evaluates to a boolean. The condition is evaluated before each iteration. CONTINUE statements can be used to change the control flow within the while block. BREAK statements can be used to exit the while loop.

A WHILE statement may have an optional LIMIT clause. LIMIT clauses has a constant positive integer value or integer variable to constrain the maximum number of loop iterations. The example below demonstrates how the LIMIT behaves.

If a limit value is not specified, it is possible for a WHILE loop to iterate infinitely. It is the responsibility of the query author to design the condition logic so that it is guaranteed to eventually be true (or to set a limit).

Below are a number of examples that demonstrate the use of WHILE statements.

FOREACH Statement

The formal syntax for forEachControl appears complex. It can be broken down into the following cases:

name IN setBagExpr
tuple IN setBagExpr
name IN RANGE [ expr, expr ]
name IN RANGE [ expr, expr ].STEP ( expr )

Note that setBagExpr includes container accumulators and explicit sets.

The FOREACH statement has the following restrictions:

In a DML-sub level FOREACH, it is never permissible to update the loop variable (the variable declared before IN, e.g., var in "FOREACH var IN setBagExpr").
In a query-body level FOREACH, in most cases it is not permissible to update the loop variable. The following exceptions apply:
- If the iteration is over a ListAccum, its values can be updated.
- If the iteration is over a MapAccum, its values can be updated, but its keys cannot.
If the iteration is over a set of vertices, it is not permissible to access (read or write) their vertex-attached accumulators.
A query-body-level FOREACH cannot iterate over a set or bag of constants. For example, FOREACH i in (1,2,3) is not supported. However, DML-sub FOREACH does support this.

FOREACH ... IN RANGE

The FOREACH statement has an optional RANGE clause RANGE[expr, expr], which can be used to define the iteration collection. Optionally, the range may specify a step size: RANGE[expr, expr].STEP(expr)

Each expr must evaluate to an integer. Any of the integers may be negative, but the step expr may not be 0.

The clause RANGE[a,b].STEP(c) produces the sequence of integers from a to b, inclusive, with step size c. That is, (a, a+c, a+2*c, a+3*c, ... a+k*c), where k = the largest integer such that |k*c| ≤ |b-a|.

If the .STEP method is not given, then the step size c = 1.

The step value can be positive for an ascending range or negative for a descending range. If the step has the wrong polarity, then the loop has zero iterations; that is, the exit condition is already satisfied.

Query-body-level FOREACH Examples

DML-sub FOREACH Examples

CONTINUE and BREAK Statements

The CONTINUE and BREAK statements can only be used within a block of a WHILE or FOREACH statement. The CONTINUE statement branches control flow to the end of the loop, skipping any remaining statements in the current iteration, and proceeding to the next iteration. That is, everything in the loop block after the CONTINUE statement will be skipped, and then the loop will continue as normal.

The BREAK statement branches control flow out of the loop, i.e., it will exit the loop and stop iteration.

Below are a number of examples that demonstrate the use of BREAK and CONTINUE.

Data Modification Statements

The GSQL language provides full support for vertex and edge insertion, deletion, and attribute update is provided. Therefore, the language is more than just a "query" language.

Each query is considered one transaction. Therefore, modifications to the graph data do not take effect until the entire query is completed (committed). Accordingly, any modification statement does not affect any other statements inside the same query.

Query-body DELETE Statement

The query-body DELETE statement deletes a given set of edges or vertices. This statement can only be used as a query-body statement. (Deletion at the DML-sub level is served by the DML-sub DELETE statement, described next.

The vertexSet and edgeSet terms in the FROM clause follow the same rules as those in the FROM clause in a SELECT statement. The WHERE clause can filter the items in the vertexSet or edgeSet.Below are two examples, one for deleting vertices and one for deleting edges.

The following query can be used to observe the effect of the delete statements. This query counts the person vertices who work in the US ("us") and the worksFor edges for persons in the US. When the initial workNet test data loaded, there are 5 persons and 9 worksFor edges for locationId = "us". If query deleteEx2 is run, the worksAtUS query will then find the 5 persons but 0 worksFor edges. Next, if the deleteEx query is run, the worksAtUS query will then find 0 persons and 0 worksFor edges.

For example, the following sequence of countAtLocation, deleteEx2, and deleteEx queries

will produce the following result:

DML-sub DELETE Statement

DML-sub DELETE is a DML-substatement which deletes one vertex or edge each time it is called. (Deletion at the query-body level is served by the Query-body DELETE statement described above.) In practice, this statement resides within the body of a SELECT...ACCUM/POST-ACCUM clause, so it is called once for each member of a selected vertex set or edge set.

The ACCUM clause iterates over an edge set, which can encounter the same vertex multiple times. If you wish to delete a vertex, it is best practice to place the DML-sub DELETE statement in the POST-ACCUM clause rather than in the ACCUM clause.

The following example uses and modifies the graph data for socialNet.

For example, the following sequence of selectUserPosts and deletePosts queries

will produce the following result:

INSERT INTO Statement

The INSERT INTO statement adds edges or vertices to the graph. However, if the ID value(s) for the inserted vertex/edge match those of an existing vertex/edge, then the new values will overwrite the old values. To insert an edge, its endpoint vertices must already exist, either prior to running the query or inserted earlier in that query.The INSERT INTO statement can be used as a query-body-level statement or a DML-substatement.

The formal syntax is complex because it encompasses several options, and even so, it requires additional explanation. The first name symbol is the vertex type or edge type. The user then has two options:

1) Provide a value for the ID(s) and then each attribute, in the canonical order for the vertex or edge type. This format is similar to that of a LOAD statement. In this case, it is not necessary to explicitly name the attributes, since it is assumed that every one is being referenced, in order.

2) Name the specific attributes to be set, and then provide a corresponding list of values. The attributes can be in any order, with the exception that the IDs must come first. That is, to insert a vertex, the first attribute name must be PRIMARY_ID. To insert an edge, the first two attribute names must be FROM and TO.

For each attribute value, provide either an expression expr or "_", which means the default value for that attribute type. The optional name which follows the first two (id) values is to specify the source vertex type and target vertex type, if the edge type had been defined with wildcard vertex types.

Query-Body INSERT

The query insertEx illustrates query-body level INSERT statements: insert new company vertices and worksFor examples into the workNet graph.

The query whoWorksForCompany can be used to check the effect of query insertEx. Prior to running insertEx, running whoWorksForCompany("gsql") will find 0 companies called "gsql" and 0 worksFor edges for company "gsql". If we then run the query insertEx("tic", "tac", "toe", "gsql"), then insertEx("gsql") will find a company called "gsql" and another one called "gsql_jp". Moreover, it will find 3 edges, tic, tac, and toe, with different values for the startMonth, startYear, and fullTime parameters.

DML-sub INSERT

The following example show a DML-sub level INSERT. Because the statement applies to allCompanies, several vertices will be inserted.

Example: Add a child company in Japan to US-based company company3. List all the Japan-based companies before and after the insertion.

UPDATE Statement

The UPDATE statement updates the attribute of each vertex or edge in a vertex set or edge set, respectively, with new attribute values.

The set of vertices or edges to update is described in the FROM clause, following the same rules as the FROM clause in a SELECT block. In the SET clause, the DMLSubStmtList may contain assignment statements to update the attributes of a vertex or edge. Both simple base type attributes and collection type attributes can be updated. These assignment statements use the vertex or edge aliases declared in the FROM clause. The optional WHERE clause supports boolean conditions to filter the items in the vertexSet or edgeSet.

The UPDATE statement can only be used as a query-body-level statement. However, DML-sub level updates are still possible by using other statement types. A vertex attribute's value can be updated within the POST-ACCUM clause of a SELECT block by using the assignment operator (=); An edge attribute's value can be updated within the ACCUM clause of a SELECT block by using the assignment operator. In fact, the UPDATE statement is equivalent to a SELECT statement with ACCUM and/or POST-ACCUM to update the vertex or edge attribute values. Below is an example.

Updating a vertex's attribute value in a ACCUM clause is not allowed, because the update can occur multiple times in parallel, and possibly result in an non-deterministic value. If the vertex attribute value update depends on an edge attribute value, use the vertex-attached accumulators to save the value and update the vertex attribute's value in the POST-ACCUM clause.

The query below uses the SELECT statement instead of the UPDATE statement, but is functional similar to the query above. Query updateEx2 reverses the locationId change made by updateEx (changing the location back to "us" from "USA").

Below is an example of an edge update with two attribute changes, including an incremental change (e.startYear = e.startYear + 1):

Other Update Methods

In addition to the above UPDATE statement and SELECT statement, a simple assignment statement at the query-body level can be used to update the attribute value of a single vertex/edge, if the vertex/edge has been assigned to a variable or parameter.

Output Statements and FILE Objects

PRINT Statement (API v2)

The PRINT statement specifies output data. Each execution of a PRINT statement adds a JSON object to the results array which will be part of the query output. A PRINT statement can appear anywhere that query-body statements are permitted.

A PRINT statement does not trigger immediate output. The full set of data from all PRINT statements is delivered at one time, when the query concludes.

EBNF

printStmt := PRINT printExpr {,printExpr} [WHERE condition] [TO_CSV (filePath | fileVar)]
printExpr := (expr | vExprSet) [ AS name]
vExprSet  := expr "[" vSetProj {, vSetProj} "]"
vSetProj  := expr [ AS name]

Each PRINT statement contains a list of expressions for output data. The optional WHERE clause filters the output. If the condition is false for any items, then those items are excluded from the output.

Each printExpr contributes one key-value pair to the PRINT statement's JSON object result. The optional AS clause sets the key for the expression, overriding the default key (explained below).

Simple Example Showing JSON Output Format

STRING str = "first statement";
INT number = 5;
PRINT str, number;

str = "second statement";
number = number + 1;
PRINT str, number;

# The statements above produce the following output
{
  "version": {"api": "v2","schema": 0},
  "error": false,
  "message": "",
  "results": [
    {
      "str": "first statement",
      "number": 5
    },
    {
      "str": "second statement",
      "number": 6
    }
  ]
}

PRINT Expressions

Each printExpr may be one of the following:

A literal value
A global or local variable (including VERTEX and EDGE variables)
An attribute of a vertex variable, e.g., Person.name
A global accumulator
An expression whose terms are among the types above. The following operators may be used:
Parentheses can be used for controlling order of precedence.
A vertex set variable
A vertex expression set vExprSet (only available if the output API is set to "v2". Vertex expression sets are explained in a separate section below.

In output API v2, the print expression list can be a mixed list of any of the expression types. In output API v1, vertex set variables cannot be on the same PRINT statement with other types of expressions.

JSON Format: Keys

If a printExpr includes the optional AS name clause, then the name sets the key for that expression in the JSON output. Otherwise, the following rules determine the key: If the expression is simply a single variable (local variable, global variable, global accumulator, or vertex set variable), then the key is the variable name. Also, for a vertex expression set, the key is the vertex set variable name. Otherwise, the key is the entire expression, represented as a string.

JSON Format: Values

Each data type has a distinct output format.

Simple numeric, string, and boolean data types follow JSON standards.
Lists, sets, bags, and arrays are printed as JSON arrays (i.e., a list enclosed in square brackets).
Maps and tuples are printed as JSON objects (i.e., a list of key:value pairs enclosed in curly braces).
Vertices and edges have a custom JSON object, shown below.
A vertex set variable is treated as a list of vertices.
Accumulator output format is determined by the accumulator's return type. For example, an AvgAccum outputs a DOUBLE value, and a BitwiseAndAccum outputs a INT value. For container accumulators, simply consider whether the output is a list, set, bag, or map.
- ListAccum, SetAccum, BagAccum, ArrayAccum: list
- MapAccum: map
- HeapAccum, GroupByAccum: list of tuples

Full details of vertices are printed only when part of a vertex set variable or vertex expression set. When a single vertex is printed (from a variable or accumulator whose data type happens to be VERTEX), only the vertex id is printed.

Cases where only the vertex id will be printed

ListAccum<VERTEX> @@vList;  // not a vertex set variable
VERTEX v;                   // not a vertex set variable
...
PRINT @@vList, v;           // output will contain only vertex ids

Vertex (when not part of a vertex set variable)

The output is just the vertex id as a string:

Output Format for a Value which is a Vertex, not part of a Vertex Set Variable

"<vertex_id>"

Vertex (as part of a vertex set variable)

Output Format for a Vertex as part of a Vertex Set Variable

{
  "v_id":   "<vertex_id>",
  "v_type": "<vertex_type>",
  "attributes": {
    <list of key:value pairs,
     one for each attribute
     or vertex-attached accumulator>
  }
}

Edge

Output Format for a Value which is an Edge

{
  "e_type":    "<edge_type>",
  "directed":  <boolean_value>,
  "from_id":   "<source_vertex_id>",
  "from_type": "<source_vertex_type>",
  "to_id":     "<target_vertex_id>",
  "to_type":   "<target_vertex_type>",
  "attributes": {
    <list of key:value pairs,
     one for each attribute>
  }
}

List, Set or Bag

Output format for a Value which is a List, Set, or Bag

[
  <value1>,
  <value2>,
  ...,
  <valueN> 
]

Map

Output Format for a Value which is a Map

{
  <key1>: <value1>,
  <key2>: <value2>,
  ...,
  <keyN>: <valueN>
}

Tuple

Output Format for a Value which is a Tuple

{
  <fieldName1>: <value1>,
  <fieldName2>: <value2>,
  ...,
  <fieldNameN>: <valueN>
}

Vertex Set Variable

Output Format for a Value which is a Vertex Set Variable

[
  <vertex1>,
  <vertex2>,
  ...,
  <vertexN> 
]

Vertex Expression Set

A vertex expression set is a list of expressions which is applied to each vertex in a vertex set variable. The expression list is used to compute an alternative set of values to display in the "attributes" field of each vertex.

The easiest way to understand this is to consider examples containing only one term and then consider combinations. Consider the following example query. C is a vertex set variable containing the set of all company vertices. Furthermore, each vertex has a vertex-attached accumulator @count.

Example Query for Vertex Expression Set

# CREATE VERTEX company(PRIMARY_ID clientId STRING, id STRING, country STRING)

CREATE QUERY vExprSet () FOR GRAPH workNet {
  SumAccum<INT> @count;
  C = {company.*};

  # include some print statements here
}

If we print the full vertex set, the "attributes" field of each vertex will contain 3 fields: "id", "country", and "@count". Now consider some simple vertex expression sets:

PRINT C[C.country]

prints the vertex set variable C, except that the "attributes" field will contain only "country", instead of 3 fields.

PRINT C[C.@count]

prints the vertex set variable C, except that the "attributes" field will contain only "@count", instead of 3 fields.

PRINT C[C.id, C.@count]

prints the vertex set variable C, except that the "attributes" field will contain only "id" and "@count".

PRINT C[C.id+"_ex", C.@count+1]

prints the vertex set variable C, except that the "attributes" field contains the following:

One field consists of each vertex's id value, with the string "_ex" appended to it.
Another field consists of the @count value incremented by 1. Note: the value of @count itself has not changed, only the displayed value is incremented.

The last example illustrates the general format for a vertex expression set:

Syntax for Vertex Expression Set

vExprSet  := expr "[" vSetProj {, vSetProj} "]"
vSetProj  := expr [ AS name]

The vertex expression set begins with the name of a vertex set variable. It is followed by a list of attribute expressions, enclosed in square brackets. Each attribute expression follows the same rules described earlier in the Print Expressions section. That is, each attribute expression may refer to one or more attributes or vertex-attached accumulators of the current vertices, as well as literals, local or global variables, and global accumulators. The allowed operators (for numeric, string, or set operations) are the same ones mentioned above.

The key for the vertex expression set is the vertex set variable name.

The value for the vertex expression set is a modified vertex set variable, where the regular "attributes" value for each vertex is replaced with a set of key:value pairs corresponding to the set of attribute expressions given in the print expression.

An example which shows all of the cases described above, in combination, is shown below.

Print Basic Example

CREATE QUERY printExampleV2(VERTEX<person> v) FOR GRAPH socialNet {

  SetAccum<VERTEX> @@setOfVertices;
  SetAccum<EDGE> @postedSet;
  MapAccum<VERTEX,ListAccum<VERTEX>> @@testMap;
  FLOAT paperWidth = 8.5;
  INT paperHeight = 11;
  STRING Alpha = "ABC";

  Seed = person.*;
  A = SELECT s
      FROM Seed:s
      WHERE s.gender == "Female"
      ACCUM @@setOfVertices += s;

  B = SELECT t
      FROM Seed:s - (posted:e) -> post:t
      ACCUM s.@postedSet += e,
        @@testMap += (s -> t);

# Numeric, String, and Boolean expressions, with renamed keys: 
  PRINT paperHeight*paperWidth AS PaperSize, Alpha+"XYZ" AS Letters, 
    A.size() > 10 AS AsizeMoreThan10;
# Note how an expression is named if "AS" is not used:
  PRINT A.size() > 10;
  
# Vertex variables.  Only the vertex id is included (no attributes):
  PRINT v, @@setOfVertices;

# Map of Person -> Posts posted by that person:
  PRINT @@testMap;

# Vertex Set Variable. Each vertex has a vertex-attached accumulator, which
# happens to be a set of edges (SetAccum<EDGE>), so edge format is shown also:
  PRINT A AS VSetVarWomen;

# Vertex Set Expression. The same set of vertices as above, but with only
# one attribute plus one computed attribute:
  PRINT A[A.gender, A.@postedSet.size()] AS VSetExpr;
}

Note how the results of the six PRINT statements are grouped in the JSON "results" field below:

Each of the six PRINT statements is represented as one JSON object with the "results" array.
When a PRINT statement has more than one expression (like the first one), the expressions may appear in the output in a different order than on the PRINT statement.
The 2nd PRINT statement shows a key that is generated from the expression itself.
The 3rd and 4th PRINT statements show a set of vertices (different than a vertex set variable) and a map, respectively.
The 5th PRINT statement shows the vertex set variable A, including its vertex-attached accumulators (PRINT A).
The 6th PRINT statement shows a vertex set expression for A, customized to include only one static attribute plus a newly computed attribute.

Results from Query printExampleV2 (WITH COMMENTS ADDED)

GSQL > RUN QUERY printExampleV2("person1")
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "AsizeMoreThan10": false,
      "Letters": "ABCXYZ",
      "PaperSize": 93.5
    },
    {"A.size()>10": false},
    {
      "v": "person1",
      "@@setOfVertices": [ "person4", "person5", "person2" ]
    },
    {"@@testMap": {
      "person4": ["3"],
      "person3": ["2"],
      "person2": ["1"],
      "person1": ["0"],
      "person8": [ "7", "8" ],
      "person7": [ "9", "6" ],
      "person6": [ "10", "5" ],
      "person5": [ "4", "11" ]
    }},
    {"VSetVarWomen": [
      {
        "v_id": "person4",
        "attributes": {
          "gender": "Female",
          "id": "person4",
          "@postedSet": [{
            "from_type": "person",
            "to_type": "post",
            "directed": true,
            "from_id": "person4",
            "to_id": "3",
            "attributes": {},
            "e_type": "posted"
          }]
        },
        "v_type": "person"
      },
      {
        "v_id": "person5",
        "attributes": {
          "gender": "Female",
          "id": "person5",
          "@postedSet": [
            {
              "from_type": "person",
              "to_type": "post",
              "directed": true,
              "from_id": "person5",
              "to_id": "11",
              "attributes": {},
              "e_type": "posted"
            },
            {
              "from_type": "person",
              "to_type": "post",
              "directed": true,
              "from_id": "person5",
              "to_id": "4",
              "attributes": {},
              "e_type": "posted"
            }
          ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person2",
        "attributes": {
          "gender": "Female",
          "id": "person2",
          "@postedSet": [{
            "from_type": "person",
            "to_type": "post",
            "directed": true,
            "from_id": "person2",
            "to_id": "1",
            "attributes": {},
            "e_type": "posted"
          }]
        },
        "v_type": "person"
      }
    ]},
    {"VSetExpr": [
      {
        "v_id": "person4",
        "attributes": {
          "A.@postedSet.size()": 1,
          "A.gender": "Female"
        },
        "v_type": "person"
      },
      {
        "v_id": "person5",
        "attributes": {
          "A.@postedSet.size()": 2,
          "A.gender": "Female"
        },
        "v_type": "person"
      },
      {
        "v_id": "person2",
        "attributes": {
          "A.@postedSet.size()": 1,
          "A.gender": "Female"
        },
        "v_type": "person"
      }
    ]}
  ]
}

Printing CSV to a FILE Object

Instead of printing output in JSON format, output can be written to a FILE object in comma-separated values (CSV) format. To select this option, at the end of the PRINT statement, include the keyword TO_CSV followed by the FILE object name:

PRINT to CSV FILE syntax example

PRINT @@setOfVertices TO_CSV file1;

The bracket > is no longer supported for directing output to a file or FILE. You must use the keyword TO_CSV.

Each execution of the PRINT statement appends one line to the FILE. If the PRINT statement includes multiple expressions, then each printed value is separated from its neighbor by a comma. If an expression evaluates to a set or list, then the collection's values are delimited by single spaces. Due to the simpler format of CSV vs. JSON, the TO_CSV feature only supports data with a simple one- or two-dimension structure.

Limitations of PRINT > File

Printing a full Vertex set variable is not supported.
If a vertex is printed, only its ID value is printed.
If printing a vertex set's vertex-attached accumulator or a vertex set's variable, the result is a list of values, one for each vertex, separated by newlines.
The syntax for printing a vertex set expression is currently different when printing to a file than when printing to standard output. Compare:
- PRINT A[A.gender]; # with brackets
- PRINT A.gender TO_CSV file1; # without brackets

Writing to FILE objects is optimized for parallel processing. Consequently, the order in which data is written to the FILE is not guaranteed. Therefore, it is strongly recommended that the user design their queries such that one of these conditions is satisfied:

The query prints only one set of data, and the order of the set is not important.
Each line of data to print to a file includes a label which can be used to identify the data.

PRINT WHERE and PRINT TO_CSV FILE Object Example

CREATE QUERY printExampleFile() FOR GRAPH socialNet {
  SetAccum<VERTEX> @@testSet, @@testSet2;
  ListAccum<STRING> @@strList;
  int x = 3;
  FILE file1 ("/home/tigergraph/printExampleFile.txt");

  Seed = person.*;
  A = SELECT s
      FROM Seed:s
      WHERE s.gender == "Female"
      ACCUM @@testSet += s, @@strList += s.gender;
  A = SELECT s
      FROM Seed:s
      WHERE s.gender == "Male"
      ACCUM @@testSet2 += s;

  PRINT @@testSet, @@testSet2 TO_CSV file1;  # 1st line: 2 4 5, 1 3 6 7 8 (order not guaranteed)
  PRINT x WHERE x < 0 TO_CSV file1;   # 2nd line: <skipped because no content>
  PRINT x WHERE x > 0 TO_CSV file1;   # 3rd line: 3
  PRINT @@strList TO_CSV file1;       # 4th line: Female Female Female 
  PRINT A.gender TO_CSV file1;     # 5th line: Male\n Male\n Male\n Male\n Male
}

Printing to a CSV File as a Filepath (DEPRECATED)

Instead of printing CSV output to a FILE object, data can be written to a regular file.

PRINT to CSV FILE syntax example

 PRINT @@setOfVertices TO_CSV "/home/tigergraph/vset.csv";

This feature is deprecated because printing to a FILE object covers the same functionality.

The table below shows the differences between printing TO_CSV <FILE object> vs. TO_CSV <fllepath>.

FILE println statement

One of the two ways to write data to a FILE object is with the FILE println statement. (The other way is with the PRINT statement's TO_CSV option.)

EBNF for FILE println statement

printlnStmt := fileVar".println" "(" expr {, expr} ")"

println is a method (function) of a FILE object variable. The println statement can be used either at the query-body level or a a DML-sub-statement, e.g., within the ACCUM clause of a SELECT block. Each time println is called, it adds one new line of values to the FILE object, and then to the corresponding file.

The println function can print most of the expressions handled by PRINT. Note, however, that this does not include vertex expression sets (vExprSet). If the println statement has a list of expressions to print, then this will produce a comma-separated list of values. If an expression refers to a list or set, then the output will be a list of values separated by spaces, the same format produced by TO_CSV.

The data from query-body level FILE print statements (either TO_CSV or println) will appear in their original order. However, due to the parallel processing of statements in an ACCUM block, the order in which println statements at the DML-sub-statement level are processed cannot be guaranteed. Moreover, the output from println statements in an ACCUM block can be interspersed with the query-body statements.

File object query example

CREATE QUERY fileEx (STRING fileLocation) FOR GRAPH workNet {
  
    FILE f1 (fileLocation);
    P = {person.*};

    PRINT "header" TO_CSV f1;
  
    USWorkers = SELECT v FROM P:v
              WHERE v.locationId == "us"
              ACCUM f1.println(v.id, v.interestList);
    PRINT "footer" TO_CSV f1;
}
INSTALL QUERY fileEx
RUN QUERY

All of the PRINT statements in this example use the TO_CSV option, so there is no JSON output to the console.

Results from Query fileEx

GSQL > RUN QUERY fileEx("/home/tigergraph/fileEx.txt")
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": []
}

All the output in this case goes the the FILE object. In the query definition, the footer is the last FILE statement, but the println statements from the SELECT block happen to be delayed and are printed AFTER the footer line.

File contents produced by fileEx example

[tigergraph@localhost]$ more /home/tigergraph/fileEx.txt
header
person7,art sport 
person10,football sport 
person4,football 
person9,financial teaching 
person1,management financial
footer

Passing a FILE Object as a Parameter

A FILE Object can be passed from one query to a subquery. The subquery can then also write to the FILE object.

Example: query passing a FILE object to another query

CREATE QUERY fileParamSub(FILE f, STRING label, INT num) FOR GRAPH socialNet {
    f.println(label, "header");
    FOREACH i IN RANGE [1,2] DO
        f.println(label, num+i);
    END;
    f.println(label, "footer");
}

CREATE QUERY fileParamMain(STRING mainlabel) FOR GRAPH socialNet {
    FILE f ("/home/tigergraph/fileParam.txt");
    f.println(mainlabel, "header");
    FOREACH i IN RANGE [1,2] DO
        f.println(mainlabel, i);
        fileParamSub(f, " sub", 10*i);
    END;
    f.println(mainlabel, "footer");
}

GSQL > RUN QUERY fileParamMain("main")
GSQL > EXIT

$ cat /home/tigergraph/fileParam.txt
main,header
main,1
 sub,header
 sub,11
 sub,12
 sub,footer
main,2
 sub,header
 sub,21
 sub,22
 sub,footer
main,footer

LOG Statement

The LOG statement is another means to output data. It works as a function that outputs information to a log file.

EBNF for LOG statement

logStmt := LOG "(" condition "," argList ")"

The first argument of the LOG statement is a boolean condition that enables or disables logging. This allows logging to be easily turned on/off, for uses such as debugging. After the condition, LOG takes one or more expressions (separated by commas). These expressions are evaluated and output to the log file.

Unlike the PRINT statement, which can only be used as a query-body statement, the LOG statement can be used as both a query-body statement and a DML-sub-statement.

The values will be recorded in the GPE log. To find the log file after the query has completed, open a Linux shell and use the command "gadmin log gpe". It may show you more than one log file name; use the one ending in "INFO". Search this file for "UDF_".

Examples

BOOLEAN debug = TRUE;
INT x = 10;
 
LOG(debug, 20);
LOG(debug, 10, x);

RETURN Statement

EBNF for RETURN statement

createQuery := CREATE QUERY name "(" [parameterList] ")" FOR GRAPH name [RETURNS "("  baseType | accumType ")"] "{" [typedefs] [declStmts] queryBodyStmts "}"

returnStmt := RETURN expr

The RETURN statement specifies data that a sub-query passes back to an outer query that called the sub-query. In order for a query to be used as a subquery, its initial CREATE QUERY statement must include the optional RETURNS clause, and its body must end with a RETURN statement. Exactly one type is allowed in the RETURNS clause, and thus RETURN statement can only return one expression.The returned expression must have the same type as the RETURNS clause indicates. A sub-query must be created before its corresponding super-query. A sub-query must be install either before or in the same INSTALL QUERY command with its super-query.

The return type can be any base type or any accumulator type, except GroupByAccum and any accumulator containing any tuple type. For the purposes of return type, SetAccum is equivalent to SET, and BagAccum is equivalent to BAG. A vertex set variable can be returned if SET<VERTEX<type>> or SetAccum<VERTEX<type>> (<type> is optional) is used in the RETURNS clause.

See also Section 5.11 - Queries ad Functions.

Subquery Example 1

CREATE QUERY subquery1 (VERTEX<person> m1) FOR GRAPH socialNet RETURNS(BagAccum<VERTEX<post>>)
{
  Start = {m1};
  L = SELECT t
      FROM Start:s - (liked:e) - post:t;
  RETURN L;
}
CREATE QUERY mainquery1 () FOR GRAPH socialNet
{
  BagAccum<VERTEX<post>> @@testBag;
  Start = {person.*};
  Start = SELECT s FROM Start:s
          ACCUM @@testBag += subquery1(s);
  PRINT @@testBag;
}

Result

GSQL > RUN QUERY mainquery1()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"@@testBag": [
    "6",
    "3",
    "8",
    "4",
    "4",
    "0",
    "0",
    "0",
    "10"
  ]}]
}

Subquery Example 2

CREATE QUERY subquery2 (VERTEX<person> m1) FOR GRAPH socialNet RETURNS(INT)
{
  int x;
  Start = {m1};
  Start = SELECT t FROM Start:t
          ACCUM CASE WHEN t.gender == "Male" THEN x = 5
                     WHEN t.gender == "Female" THEN x = 10
                     ELSE x = -1
                END;
  RETURN x;
}
CREATE QUERY mainquery2 (SET<VERTEX<person>> m1) FOR GRAPH socialNet
{
  SumAccum<INT> @@sum1;
  Start = {m1};
  Start = SELECT t FROM Start:t
          ACCUM @@sum1 += subquery2(t);
  PRINT @@sum1;
}

Result

GSQL > RUN QUERY mainquery2(["person1","person2"])
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"@@sum1": 15}]
}

Appendix

Common Errors and Problems

Floating Point Precision Limits

No computer can store all floating point numbers (i.e., non-integers) with perfect precision. The float data type offers about 7 decimal digits of precision; the double data type offers about 15 decimal digits of precision. Comparing two float or double values by using operators involving exact equality (==, <=, >=, BETWEEN ... AND ...) might lead to unexpected behavior. If the GSQL language parser detects that the user is attempting an exact equivalence test with float or double data types, it will display a warning message and suggestion. For example, if there are two float variables v and v2, the expression v == v2 causes the following warning message:

The comparison 'v==v2' may lead to unexpected behavior because it involves
equality test between float/double numeric values. We suggest to do such
comparison with an error margin, e.g. 'abs((v) - (v2)) < epsilon', where epsilon
is a very small positive value of your choice, such as 0.0001.

Response to Non-existent vertex ID

If a query has a vertex parameter (VERTEX or VERTEX<vType>), and if the ID for a nonexistent vertex is given when running the query, an error message is shown, and the query won't run. This is also the response when calling a function to convert a single vertex ID string to a vertex:

to_vertex(): See Section "Miscellaneous Functions".

However, if the parameter is a vertex set (SET<VERTEX> or SET<VERTEX<vType>>), and one or more nonexistent IDs are given when running the query, a warning message is shown, but the query still runs, ignoring those nonexistent IDs. Therefore, if all given IDs are nonexistent, the parameter becomes an empty set. T his is also the response when calling a function to convert a set of vertex IDs to a set of vertices :

to_vertex_set(): See Section " Miscellaneous Functions ".
SelectVertex(): See Section " Miscellaneous Functions ".

Complete Formal Syntax for Query Language

This is the definition for the GSQL Query Language syntax. It is defined as a set of rules expressed in EBNF notation.

Notation Used to Define Syntax

This defines the EBNF notation used to describe the syntax. Rules contains terminal and non-terminal symbols. A terminal symbol is a base-level symbol which expresses literal output. All symbols in single or double quotes (e.g., '+', "=", ")", "10") are terminal symbols. A non-terminal symbol is defined as some combination of terminal and non-terminal symbols. The left-hand side of a rule is always a non-terminal; this rule defines the non-terminal. The example rule below defines assignmentStmt (that is, an Assignment Statement) to be a name followed by an equal sign followed by an expression, operator, and expression with a terminating semi-colon. AssignmentStmt, name, and expr are all non-terminals. Additionally, all KEYWORDS are in all-capitals and are terminal symbols. The ":=" is part of EBNF and states the left hand side can be expanded to the right hand side.

A vertical bar | in EBNF indicates choice. Choose either the symbol on the left or on the right. A sequence of vertical bars means choose any one of the symbols in the sequence.

Square brackets [ ] indicate an optional part or group of symbols. Parentheses ( ) group symbols together. The rule below defines a constant to be one, two, or three digits preceded by an optional plus or minus sign.

Star * and plus + are symbols in EBNF for closure. Star means zero or more occurrences, and plus means one or more occurrences. The following defines intConstant to be an optional plus or minus followed by one or more digits. It also defines floatConstant to be an optional plus or minus followed by zero or more digits followed by a decimal followed by one or more digits. The star and plus also can be applied to groups of symbols as in the definition of list. The non-terminal list is defined as a parenthesized list of comma-separated expressions (expr). The list has at least one expression which can be followed by zero or more comma-expression pairs.

Curly braces { } enclose an optional group of symbols which are repeated zero or more times. Therefore, curly braces are equivalent to square brackets or parentheses followed by a star + to indicate zero or more repetitions. All of the following expressions are equivalent:

For brevity, the literal comma is sometimes shown without quotation marks:

GSQL Query Language EBNF

Query Language Reserved Words

The following words are reserved for use by the GSQL query language. This includes words which are currently keywords (such as GRAPH), as well as words which might be used in the future (such as EXTERN). For Data Definition & Loading, there exists a different list of reserved keywords found here.

Query Language Reserved Words

ACCUM               ALIGNAS             ALIGNOF             API    
AND                 AND_EQ              ANY                 AS    
ASC                 ASM                 AUTO                AVG     
BAG                 BATCH               BETWEEN             BITAND              
BITOR               BOOL                BOTH                BREAK               
BY                  CASE                CATCH               CHAR                
CHAR16_T            CHAR32_T            CLASS               COALESCE            
COMPL               COMPRESS            CONCEPT             CONST               
CONSTEXPR           CONST_CAST          CONTINUE            COUNT               
CREATE              DATETIME            DATETIME_ADD        DATETIME_SUB        
DECLTYPE            DEFAULT             DELETE              DESC
DISTRIBUTED         DO                  DONE                DOUBLE              
DYNAMIC_CAST        EDGE                ELSE                END                 
ENUM                ESCAPE              EXCEPTION           EXPLICIT            
EXPORT              EXTERN              FALSE              .FILE    
FILTER              FLOAT               FOR                 FOREACH             
FRIEND              FROM                GOTO                GRAPH               
GSQL_INT_MAX        GSQL_INT_MIN        GSQL_UINT_MAX       HAVING              
IF                  IN                  INLINE              INSERT              
INT                 INTERPRET           INTERSECT           INTERVAL            
INTO                IS                  ISEMPTY             JSONARRAY           
JSONOBJECT          LEADING             LIKE                LIMIT               
LIST                LOADACCUM           LOG                 LONG                
MAP                 MAX                 MIN                 MINUS               
MUTABLE             NAMESPACE           NEW                 NOEXCEPT
NO_TRANSLATION_EID_TO_IID               NOT                 NOT_EQ              
NOW                 NULL                NULLPTR             OFFSET              
OPERATOR            OR                  ORDER               OR_EQ               
PINNED              POST-ACCUM          POST_ACCUM          PRIMARY_ID          
PRINT               PRIVATE             PROTECTED           PUBLIC              
QUERY               RAISE               RANGE               REGISTER            
REINTERPRET_CAST    REPLACE             REQUIRES            RETURN              
RETURNS             RUN                 SAMPLE              SELECT              
SELECTVERTEX        SET                 SHORT               SIGNED              
SIZEOF              STATIC              STATIC_ASSERT       STATIC_CAST         
STRING              STRUCT              SUM                 SWITCH              
TARGET              TEMPLATE            THEN                THIS                
THREAD_LOCAL        THROW               TO                  TO_CSV              
TO_DATETIME         TRAILING            TRIM                TRUE                
TRY                 TUPLE               TYPEDEF             TYPEID              
TYPENAME            UINT                UNION               UNSIGNED            
UPDATE              USING               VALUES              VERTEX              
VIRTUAL             VOID                VOLATILE            WCHAR_T             
WHEN                WHERE               WHILE               XOR                 
XOR_EQ

Operators, Functions, and Expressions

EBNF for Operations, Functions, and Expressions

constant := numeric | stringLiteral | TRUE | FALSE | GSQL_UINT_MAX | GSQL_INT_MAX | GSQL_INT_MIN | TO_DATETIME "(" stringLiteral ")"
  
mathOperator := "*" | "/" | "%" | "+" | "-" | "<<" | ">>" | "&" | "|"

comparisonOperator := "<" | "<=" | ">" | ">=" | "==" | "!="

condition := expr
           | expr comparisonOperator expr
           | expr [ NOT ] IN setBagExpr
           | expr IS [ NOT ] NULL
           | expr BETWEEN expr AND expr
           | "(" condition ")"
           | NOT condition
           | condition (AND | OR) condition
           | (TRUE | FALSE)
    
expr :=   ["@@"]name
        | name "." "type"
        | name "." ["@"]name
        | name "." "@"name ["\'"]
        | name "." name "." name "(" [argList] ")"
        | name "." name "(" [argList] ")" [ ".".FILTER "(" condition ")" ]
        | name ["<" type ["," type"]* ">"] "(" [argList] ")"
        | name "." "@"name ("." name "(" [argList] ")")+ ["." name]
        | "@@"name ("." name "(" [argList] ")")+ ["." name]
        | COALESCE "(" [argList] ")"
        | ( COUNT | ISEMPTY | MAX | MIN | AVG | SUM ) "(" setBagExpr ")"
        | expr mathOperator expr
        | "-" expr
        | "(" expr ")"
        | "(" argList "->" argList ")"   // key value pair for MapAccum
        | "[" argList "]"                // a list
        | constant
        | setBagExpr
        | name "(" argList ")"
         
setBagExpr := ["@@"]name
         | name "." ["@"]name
         | name "." "@"name ("." name "(" [argList] ")")+
         | name "." name "(" [argList] ")" [ ".".FILTER "(" condition ")" ]
         | "@@"name ("." name "(" [argList] ")")+
         | setBagExpr (UNION | INTERSECT | MINUS) setBagExpr
         | "(" argList ")"
         | "(" setBagExpr ")"


argList := expr ["," expr]*

Constants

constant := numeric | stringLiteral | TRUE | FALSE | GSQL_UINT_MAX | GSQL_INT_MAX | GSQL_INT_MIN | TO_DATETIME "(" stringLiteral ")"

Each primitive data type supports constant values:

GSL_UINT_MAX = 2 ^ 64 - 1 = 18446744073709551615

GSQL_INT_MAX = 2 ^ 63 - 1 = 9223372036854775807

GSQL_INT_MIN = -2 ^ 63 = -9223372036854775808

Operators

Tip: The operators listed in this section are designed to behave like the operators in MySQL.

Mathematical Operators and Expressions

We support the following standard mathematical operators and meanings. The latter four ("<<" | ">>" | "&" | "|") are for bitwise operations. See the section below: "Bit Operators".

mathOperator := "*" | "/" | "%" | "+" | "-" | "<<" | ">>" | "&" | "|"

Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence:

Operator Precedence, highest to lowest

*, /, %
-, +
<<, >>
&
|
==, >=, >, <=, <, !=

Example 1. Math Operators + - * /

CREATE QUERY mathOperators() FOR GRAPH minimalNet api("v2")
{
    int x,y;
    int z1,z2,z3,z4,z5;
    float f1,f2,f3,f4;

    x = 7;
    y = 3;

    z1 = x * y;    # z = 21
    z2 = x - y;    # z = 4
    z3 = x + y;    # z = 10
    z4 = x / y;    # z = 2
    z5 = x / 4.0;  # z = 1
    f1 = x / y;    # v = 2
    f2 = x / 4.0;  # v = 1.75
    f3 = x % 3;    # v = 1
    f4 = x % y;    # z = 1
    
    PRINT x,y;
    PRINT z1 AS xTIMESy, z2 AS xMINUSy, z3 AS xPLUSy, z4 AS xDIVy, z5 AS xDIV4f;
    PRINT f1 AS xDIVy,   f2 AS xDIV4f,  f3 AS xMOD3,  f4 AS xMODy;
}

mathOperators.json Results

GSQL > RUN QUERY mathOperators()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "x": 7,
      "y": 3
    },
    {
      "xTIMESy": 21,
      "xPLUSy": 10,
      "xMINUSy": 4,
      "xDIVy": 2,
      "xDIV4f": 1
    },
    {
      "xMODy": 1,
      "xMOD3": 1,
      "xDIVy": 2,
      "xDIV4f": 1.75
    }
  ]
}

Boolean Operators

We support the standard Boolean operators and standard order of precedence: AND, OR, NOT

Bit Operators

Bit operators (<<, >>, &, and |) operate on integers and return an integer.

Bit Operators

CREATE QUERY bitOperationTest() FOR GRAPH minimalNet{
  PRINT 80 >> 2;     # 20
  PRINT 80 << 2;     # 320
  PRINT 2 + 80 >> 4; # 5
  PRINT 2 | 3 ;      # 3 
  PRINT 2 & 3 ;      # 2
  PRINT 2 | 3 + 2;   # 7 
  PRINT 2 & 3 - 2;   # 0
}

String Operators

Operator + can be used for concatenating strings.

Tuple Fields

The fields of the tuple can be accessed using the dot operator.

Comparison Operators and Conditions

comparisonOperator := "<" | "<=" | ">" | ">=" | "==" | "!="

condition := expr
           | expr comparisonOperator expr
           | expr [ NOT ] IN setBagExpr
           | expr IS [ NOT ] NULL
           | expr BETWEEN expr AND expr
           | "(" condition ")"
           | NOT condition
           | condition (AND | OR) condition
           | (TRUE | FALSE)
           | expr NOT? LIKE expr (ESCAPE ESCAPE_CHAR)?

BETWEEN expr AND expr

The expression expr1 BETWEEN expr2 AND expr3 is true if the value expr1 is in the range from expr2 to expr3, including the endpoint values. Each expression must be numeric.

" expr1 BETWEEN expr2 AND expr3 " is equivalent to " expr1 <= expr3 AND expr1 >= expr2".

BETWEEN AND example

CREATE QUERY mathOperatorBetween() FOR GRAPH minimalNet
{
    int x;
    bool b;
    x = 1;
    b = (x BETWEEN 0 AND 100); PRINT b;  # True
    b = (x BETWEEN 1 AND 2); PRINT b;    # True
    b = (x BETWEEN 0 AND 1); PRINT b;    # True
}

IS NULL, IS NOT NULL

IS NULL and IS NOT NULL can be used for checking whether an optional parameter is given any value.

IS NULL example

CREATE QUERY parameterIsNULL (INT p) FOR GRAPH minimalNet {
  IF p IS NULL THEN
    PRINT "p is null";
  ELSE
    PRINT "p is not null";
  END;
}

parameterIsNULL.json Results

GSQL > RUN QUERY parameterIsNULL(_)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"p is null": "p is null"}]
}
GSQL > RUN QUERY parameterIsNULL(3)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"p is not null": "p is not null"}]
}

Every attribute value stored in GSQL is a valid value, so IS NULL and IS NOT NULL is only effective for query parameters.

LIKE

The LIKE operator is used for string pattern matching. The expression

string1 LIKE string_pattern

Mathematical Functions

There are a number of built-in functions which act on either an accumulator, a base type, or vertex variable. The accumulator function calls are discussed in detail in the "Accumulators" section.

Below is a list of built-in functions which act on either INT, FLOAT, or DOUBLE value(s).

Type Conversion Functions

String Functions

The following built-in functions are provided for text processing. Note that these functions do not modify the input parameter. They each return a new string.

In the syntax for trim(), the words in bold ( LEADING, TRAILING, BOTH, and FROM ) are keywords which should appear exactly as shown.
STRING is just an indicator of the datatype; it is not an explicit keyword.
The trim() function have the following options:
- By using one of the keywords LEADING, TRAILING, or BOTH, the user can specify that characters are to be removed from the left end, right end, or both ends of the string, respectively. If none of these keywords is used, the function will removed from both ends.
- removal_char is a single character. The function will remove consecutive instances of removal_char , until it encounters a different character. If removal_char is not specified, then trim() removes whitespace (spaces, tabs, and newlines).

CREATE QUERY stringFuncEx() FOR GRAPH minimalNet {
    #Example strings
    string a = "  Abc   ";
    string b = "aa ABC aaa";
    string c = " a A   ";
    
    PRINT lower(a);                        # prints "  abc   "
    PRINT upper(b);                        # prints "AA ABCC AAA"
    PRINT trim(a);                         # prints "Abc"
    PRINT trim(BOTH a);                    # prints "Abc"
    PRINT trim(LEADING c);                 # prints "a A   "
    PRINT trim(TRAILING "a" FROM b);       # prints "aa ABC "
    
    #You can combine functions for more convenient calling:
    PRINT trim(BOTH trim(BOTH " " FROM c) FROM b);
    # prints "BC"
}

Notes about the trim() function:

Datetime Functions

The following functions convert from/to DATETIME to/from other types.

The following function converts a DATETIME value into a string format specified by the user:

datetime_format example

# Show all posts's post time
CREATE QUERY allPostTime() FOR GRAPH socialNet api("v2") {
  start = {post.*};
  #PRINT datetime_format(start.postTime, "a message was posted at %H:%M:%S on %Y/%m/%d");
  PRINT start[datetime_format(start.postTime, "a message was posted at %H:%M:%S on %Y/%m/%d") as postTimeMsg]; // api v2
}

The followings are other functions related to DATETIME :

JSONOBJECT and JSONARRAY Functions

Data Conversion Functions

The following parsing functions convert a string into a JSONOBJECT or a JSONARRAY:

See examples below.

parse_json_object and parse_json_array example

CREATE QUERY jsonEx (STRING strA, STRING strB) FOR GRAPH minimalNet {
  JSONARRAY jsonA;
  JSONOBJECT jsonO;

  jsonA = parse_json_array( strA );
  jsonO = parse_json_object( strB );

  PRINT jsonA, jsonO;
}

jsonEx.json Result

GSQL > RUN QUERY jsonEx("[123]","{\"abc\":123}")
or curl -X GET 'http://localhost:9000/query/jsonEx?strA=\[123\]&strB=\{"abc":123\}' 
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{
    "jsonA": [123],
    "jsonO": {"abc": 123}
  }]
}
GSQL > RUN QUERY jsonEx("{123}","{\"123\":\"123\"}")
Runtime Error: {123} cannot be parsed as a json array.

Data Access Methods

JSONOBJECT and JSONARRAY are object classes, each class supporting a set of data access methods, using dot notation:

jsonVariable.functionName(parameter_list)

The following methods (class functions) can act on a JSONOBJECT variable:

The above getType(STRING keyStr ) function generates a run-time error if

The key keyStr doesn't exist, or
The function's return type is different than the stored value type. See the next note about numeric data.
Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.

The following methods can act on a JSONARRAY variable:

Similar to the methods of JSONOBJECT, the above getType(INT idx ) function generates a run-time error if

idx is out of bounds, or
The function's return type is different than the stored value type. See the next note about numeric data.
Pure JSON stores "numbers" without distinguishing between INT and DOUBLE, but for TigerGraph, if the input value is all digits, it will be stored as INT. Other numeric values are stored as DOUBLE. The getDouble function can read an INT and return its equivalent DOUBLE value, but it is an error to call getINT for a DOUBLE value.

Below is an example of using these functions and methods :

JSONOBJECT and JSONARRAY function example

CREATE QUERY jsonEx2 () FOR GRAPH minimalNet {
  
  JSONOBJECT jsonO, jsonO2;
  JSONARRAY jsonA, jsonA2;
  STRING str, str2;

  str = "{\"int\":1, \"double\":3.0, \"string\":\"xyz\", \"bool\":true, \"obj\":{\"obj\":{\"bool\":false}}, \"arr\":[\"xyz\",123,true] }";
  str2 = "[\"xyz\", 123, false, 5.0]";
  jsonO = parse_json_object( str ) ;
  jsonA = parse_json_array( str2 ) ;


  jsonO2 = jsonO.getJsonObject("obj");
  jsonA2 = jsonO.getJsonArray("arr");

  PRINT jsonO;
  PRINT jsonO.getBool("bool"), jsonO.getJsonObject("obj"), jsonO.getJsonArray("arr"), jsonO2.getJsonObject("obj"), jsonA2.getString(0) , jsonA.getDouble(3), jsonA.getDouble(1);
}

jsonEx2.json Result

GSQL > RUN QUERY jsonEx2()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [ {"jsonO": { "arr": [ "xyz", 123, true ],
      "bool": true,
      "string": "xyz",
      "double": 3,
      "obj": {"obj": {"bool": false}},
      "int": 1
    }},
    {
      "jsonO.getBool(bool)": true,
      "jsonA.getDouble(3)": 5,
      "jsonA.getDouble(1)": 123,
      "jsonO.getJsonObject(obj)": {"obj": {"bool": false}},
      "jsonO2.getJsonObject(obj)": {"bool": false},
      "jsonO.getJsonArray(arr)": [ "xyz", 123, true ],
      "jsonA2.getString(0)": "xyz"
    }
  ]
}

Vertex, Edge, and Accumulator Functions and Attributes

Accessing attributes

Accessing vertex variable attributes

CREATE QUERY coffeeRelatedPosts() FOR GRAPH socialNet
{
    allVertices = {ANY};
    results = SELECT v FROM allVertices:s -(:e)-> post:v WHERE v.subject == "coffee";
    PRINT results;
    results = SELECT v FROM allVertices:s -(:e)-> :v WHERE v.type == "post" AND v.subject == "coffee";
    PRINT results;
}

Results for Query coffeeRelatedPosts

GSQL > RUN QUERY coffeeRelatedPosts()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"results": [{
      "v_id": "4",
      "attributes": {
        "postTime": "2011-02-07 05:02:51",
        "subject": "coffee"
      },
      "v_type": "post"
    }]},
    {"results": [{
      "v_id": "4",
      "attributes": {
        "postTime": "2011-02-07 05:02:51",
        "subject": "coffee"
      },
      "v_type": "post"
    }]}
  ]
}

Vertex Functions

Below is a list of built-in functions that can be accessed by vertex aliases, using the dot operator:

Syntax for vertex functions

vertex_alias.function_name(parameter)[.FILTER(condition)]

Currently, these functions are only available for vertex aliases (defined in the FROM clause); vertex variables do not have these functions.

Vertex function examples

CREATE QUERY vertexFunctionExample(vertex<person> m1) FOR GRAPH socialNet {

  SetAccum<Vertex> @neighborSet; 
  SetAccum<Vertex> @neighborSet2;
  SetAccum<DATETIME> @attr1;
  BagAccum<DATETIME> @attr2;

  int deg1, deg2, deg3, deg4;

  S = {m1};
  S2 = SELECT S
       FROM S - (posted:e) -> post:t
       ACCUM deg1 = S.outdegree(), 
             deg2 = S.outdegree("posted"),
             deg3 = S.outdegree(e.type),  # same as deg2 
             STRING str = "posted",
             deg4 = S.outdegree(str);     # same as deg2 
  PRINT deg1, deg2, deg3, deg4; 

  S3 = SELECT S
       FROM S:s
       POST-ACCUM s.@neighborSet += s.neighbors(),
                  s.@neighborSet2 += s.neighbors("posted"),
                  s.@attr1 += s.neighborAttribute("posted", "post", "postTime"),
                  s.@attr2 += s.edgeAttribute("liked", "actionTime"); 
  PRINT S3;
}

vertexFunctionExample Result

GSQL > RUN QUERY vertexFunctionExample("person5")
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "deg4": 2,
      "deg2": 2,
      "deg3": 2,
      "deg1": 5
    },
    {"S3": [{
      "v_id": "person5",
      "attributes": {
        "@attr2": [1263330725],
        "@attr1": [
          1297054971,
          1296694941
        ],
        "gender": "Female",
        "@neighborSet": [
          "6",
          "11",
          "4",
          "person7",
          "person4"
        ],
        "id": "person5",
        "@neighborSet2": [
          "4",
          "11"
        ]
      },
      "v_type": "person"
    }]}
  ]
}

FILTER

Example: vertex functions with optional filter

CREATE QUERY filterEx (SET<STRING> pIds, INT yr) FOR GRAPH workNet api("v2") {

  SetAccum<vertex<company>> @recentEmplr, @allEmplr;
  BagAccum<string> @diffCountry, @allCountry;

  Start = {person.*};

  L0 = SELECT v
       FROM  Start:v
       WHERE v.id IN pIds
       ACCUM
         # filter using edge attribute
         v.@recentEmplr += v.neighbors("worksFor").filter(worksFor.startYear >= yr),
         v.@allEmplr += v.neighbors("worksFor").filter(true),

        # vertex alias attribute and neighbor type attribute
        v.@diffCountry += v.neighborAttribute("worksFor", "company", "id")
                          .filter(v.locationId != company.country),
        v.@allCountry += v.neighborAttribute("worksFor", "company", "id")
       ;

  PRINT yr, L0[L0.@recentEmplr, L0.@allEmplr, L0.@diffCountry, L0.@allCountry]; // api v2
}

Results in filterEx.json

GSQL > RUN QUERY filterEx(["person1","person2"],2016)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{
    "L0": [
      {
        "v_id": "person1",
        "attributes": {
          "L0.@diffCountry": ["company2"],
          "L0.@recentEmplr": ["company1"],
          "L0.@allCountry": [ "company1", "company2" ],
          "L0.@allEmplr": [ "company2", "company1" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person2",
        "attributes": {
          "L0.@diffCountry": ["company1"],
          "L0.@recentEmplr": [],
          "L0.@allCountry": [ "company1", "company2" ],
          "L0.@allEmplr": [ "company2", "company1" ]
        },
        "v_type": "person"
      }
    ],
    "yr": 2016
  }]
}

Edge Functions

Below are the built-in functions that can be accessed by edge aliases, using the dot operator. Edge functions follow the same general rules as vertex functions (see above).

Accumulator Functions

Accumulator functions for each accumulator type are illustrated at the "Accumulator Type" section.

Set/Bag Expression and Operators

BNF

setBagExpr := ["@@"] name 
		    | name "." ["@"] name
		    | name "." "@" name ("." name "(" [argList] ")")+
		    | name "." name "(" [argList] ")" [ ".".FILTER "(" condition ")" ]
		    | "@@" name ("." name "(" [argList] ")")+
		    | setBagExpr (UNION | INTERSECT | MINUS) setBagExpr
		    | "(" argList ")"
		    | "(" setBagExpr ")"

Set/Bag Expression Operators - UNION, INTERSECT, MINUS

Set/Bag Operator Examples

# Demonstrate Set & Bag operators
CREATE QUERY setOperatorsEx() FOR GRAPH minimalNet   {
  SetAccum<INT> @@setA, @@setB, @@AunionB, @@AintsctB, @@AminusB;
  BagAccum<INT> @@bagD, @@bagE, @@DunionE, @@DintsctE, @@DminusE;
  BagAccum<INT> @@DminusA, @@DunionA, @@AunionBbag;

  BOOL x;

  @@setA = (1,2,3,4);      PRINT @@setA;
  @@setB = (2,4,6,8);      PRINT @@setB;

  @@AunionB = @@setA UNION @@setB ;      PRINT @@AunionB;   // (1, 2, 3, 4, 6, 8)
  @@AintsctB = @@setA INTERSECT @@setB;  PRINT @@AintsctB;   // (2, 4)
  @@AminusB = @@setA MINUS @@setB ;      PRINT @@AminusB;   // C = (1, 3)

  @@bagD = (1,2,2,3);      PRINT @@bagD;
  @@bagE = (2,3,5,7);      PRINT @@bagE;

  @@DunionE = @@bagD UNION @@bagE;     PRINT @@DunionE;   // (1, 2, 2, 2, 3, 3, 5, 7)
  @@DintsctE = @@bagD INTERSECT @@bagE; PRINT @@DintsctE; // (2, 3)
  @@DminusE = @@bagD MINUS @@bagE;     PRINT @@DminusE;   // (1, 2)
  @@DminusA = @@bagD MINUS @@setA;     PRINT @@DminusA;   // (2)
  @@DunionA = @@bagD UNION @@setA;     PRINT @@DunionA;   // (1, 1, 2, 2, 2, 3, 3, 4)
                                                          // because bag UNION set is a bag
  @@AunionBbag = @@setA UNION @@setB;  PRINT @@AunionBbag;  // (1, 2, 3, 4, 6, 8)
                                                          // because set UNION set is a set
}

setOperatorsEx Query Results

GSQL > RUN QUERY setOperatorsEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [ {"@@setA": [ 4, 3, 2, 1 ]},
    {"@@setB": [ 8, 6, 4, 2 ]},
    {"@@AunionB": [ 4, 3, 2, 1, 8, 6 ]},
    {"@@AintsctB": [ 4, 2 ]},
    {"@@AminusB": [ 3, 1 ]},
    {"@@bagD": [ 1, 2, 2, 3 ]},
    {"@@bagE": [ 2, 7, 3, 5 ]},
    {"@@DunionE": [ 1, 2, 2, 2, 3, 3, 7, 5 ]},
    {"@@DintsctE": [ 2, 3 ]},
    {"@@DminusE": [ 1, 2 ]},
    {"@@DminusA": [2]},
    {"@@DunionA": [ 1, 1, 2, 2, 2, 3, 3, 4 ]},
    {"@@AunionBbag": [ 6, 8, 1, 2, 3, 4 ]}
  ]
}

The result of these operators is another set or bag, so these operations can be nested and chained to form more complex expressions, such as

(setBagExpr_A INTERSECT (setBagExpr_B UNION setBagExpr_C) ) MINUS setBagExpr_D

Set/Bag Expression Membership Operators

For example , suppose setBagExpr_A is ("a", "b", "c")

"a" IN setBagExpr_A            => true
"d" IN setBagExpr_A            => false
"a" NOT IN setBagExpr_A        => false
"d" NOT IN setBagExpr_A        => true

The following example uses NOT IN to exclude neighbors that are on a blacklist.

Set Membership example

CREATE QUERY friendsNotInblacklist (VERTEX<person> seed, SET<VERTEX<person>> blackList) FOR GRAPH socialNet `{
  Start = {seed};
  Result = SELECT v
      FROM Start:s-(friend:e)-person:v
      WHERE v NOT IN blackList;
  PRINT Result;
}

Results for Query friendsNotInblacklist

GSQL > RUN QUERY friendsNotInblacklist("person1", ["person2"])
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"Result": [{
    "v_id": "person8",
    "attributes": {
      "gender": "Male",
      "id": "person8"
    },
    "v_type": "person"
  }]}]
}

Aggregation Functions - COUNT, SUM, MIN, MAX, AVG

The aggregation functions take a set/bag expression as its input parameter and return one value or element.

count() : Returns the size (INT) of the set.
sum() : Returns the sum of all elements. This is only applicable to a set/bag expression with numeric type.
min() : Returns the member with minimum value. This is only applicable to a set/bag expression with numeric type.
max() : Returns the member with maximum value. This is only applicable to a set/bag expression with numeric type.
avg() : Returns the average of all elements. This is only applicable to a set/bag expression with numeric type. The average is INT if the element type of the set/bag expression is INT.

Aggregation function example

CREATE QUERY aggregateFuncEx(BAG<INT> x) FOR GRAPH minimalNet   {
  BagAccum<INT> @@t;
  @@t += -5; @@t += 2; @@t+= -1;
  PRINT max(@@t), min(@@t), avg(@@t), count(@@t), sum(@@t);
  PRINT max(x), min(x), avg(x), count(x), sum(x);
}

aggregateFuncEx.json Result

GSQL > RUN QUERY aggregateFuncEx([1,2,5])
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "sum(@@t)": -4,
      "count(@@t)": 3,
      "max(@@t)": 2,
      "avg(@@t)": -1,
      "min(@@t)": -5
    },
    {
      "avg(x)": 2,
      "count(x)": 3,
      "max(x)": 5,
      "min(x)": 1,
      "sum(x)": 8
    }
  ]
}

Miscellaneous Functions

SelectVertex()

Below is a query example using SelectVertex calls, reading from the data file selectVertexInput.csv.

selectVertexInput.csv

c1,c2,c3
person1,person,3
person5,person,4
person6,person,5

selectVertex example

CREATE QUERY selectVertexEx(STRING filename) FOR GRAPH socialNet {
  S = {SelectVertex(filename, $"c1", $1, ",", true),
       SelectVertex(filename, $2, post, ",", true)
  };
  PRINT S;
}

Result

GSQL > RUN QUERY selectVertexEx("/file_directory/selectVertexInput.csv")
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"S": [
    {
      "v_id": "4",
      "attributes": {
        "postTime": "2011-02-07 05:02:51",
        "subject": "coffee"
      },
      "v_type": "post"
    },
    {
      "v_id": "person1",
      "attributes": {
        "gender": "Male",
        "id": "person1"
      },
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {
        "gender": "Female",
        "id": "person5"
      },
      "v_type": "person"
    },
    {
      "v_id": "3",
      "attributes": {
        "postTime": "2011-02-05 01:02:44",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "5",
      "attributes": {
        "postTime": "2011-02-06 01:02:02",
        "subject": "tigergraph"
      },
      "v_type": "post"
    },
    {
      "v_id": "person6",
      "attributes": {
        "gender": "Male",
        "id": "person6"
      },
      "v_type": "person"
    }
  ]}]
}

to_vertex() and to_vertex_set()

Running these functions requires real-time conversion of an external id to a GSQL internal id, which is a relatively slow process. Therefore,

If the user can always know the id before running the query, define the query with VERTEX or SET<VERTEX> parameters instead of STRING or SET<STRING> parameters, and avoid calling to_vertex() or to_vertex_set().
Calling to_vertex_set() one time is much faster than c alling to_vertex() multiple times . Use to_vertex_set() instead of to_vertex() as much as possible.

Function signatures for to_vertex() and to_vertex_set()

 VERTEX to_vertex(STRING id, STRING vertex_type)
SET<VERTEX> to_vertex_set(SET<VERTEX>, STRING vertex_type)
SET<VERTEX> to_vertex_set(BAG<VERTEX>, STRING vertex_type)

to_vertex_set can accept a bag of vertices as input, but the function will reduce the bag to a set by eliminating duplicate items.

to_vertex() and to_vertex_set() example

CREATE QUERY to_vertex_setTest (SET<STRING> uids, STRING uid, STRING vtype) FOR GRAPH workNet {
  SetAccum<VERTEX> @@v2, @@v3;
  SetAccum<STRING> @@strSet;
  VERTEX v;
 
  v = to_vertex (uid, vtype);			# to_vertex assigned to a vertex variable
  PRINT v;								# vertex variable -> only vertex id is printed
 
  @@v2 += to_vertex (uid, vtype);	# to_vertex accumulated to a SetAccum<VERTEX>
  PRINT @@v2;							# SetAccum of vertex -> only vertex ids are printed

  S2 = to_vertex_set (uids, vtype); # to_vertex_set assigned to a vertex set variable
  PRINT S2;								# vertex set variable-> full details printed
  
  @@strSet = uids;						# Show SET<STRING> & SetAccumm<STRING> are the same
  S3 = to_vertex_set(@@strSet, vtype); # Input to to_vertex_set is SetAccum<STRING>
  SDIFF = S2 MINUS S3;					# Now S2 = S3, so SDIFF2 is empty
  PRINT  SDIFF.size();
	
  #FOREACH vid in uids DO    # In this case non-existing ids in uids causes run-time error
  #  @@v3 += to_vertex( vid, vtype );
  #END;
  #L3 = @@v3;
  #PRINT L3;
}

to_vertex_set.json Results

GSQL > RUN QUERY to_vertex_setTest(["person1","personx","person2"], "person3", "person")
{
  "error": false,
  "message": "Runtime Warning: 1 ids are invalid person vertex ids.",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"v": "person3"},
    {"@@v2": ["person3"]},
    {"S2": [
      {
        "v_id": "person1",
        "attributes": {
          "interestList": [ "management", "financial" ],
          "skillSet": [ 3, 2, 1 ],
          "skillList": [ 1, 2, 3 ],
          "locationId": "us",
          "interestSet": ["financial", "management" ],
          "id": "person1"
        },
        "v_type": "person"
      },
      {
        "v_id": "person2",
        "attributes": {
          "interestList": ["engineering"],
          "skillSet": [ 6, 5, 3, 2 ],
          "skillList": [ 2, 3, 5, 6],
          "locationId": "chn",
          "interestSet": ["engineering"],
          "id": "person2"
        },
        "v_type": "person"
      }
    ]},
    {"SDIFF.size()": 0}
  ]
}


GSQL > RUN QUERY to_vertex_setTest(["person1","personx"], "person1", "abc")
Runtime Error: abc is not valid vertex type.

getvid()

getvid_ex.gsql

CREATE QUERY getvid_ex () FOR GRAPH socialNet {
  MinAccum<int> @cc_id = 0;       //each vertex's tentative component id

  Start = {person.*};
  # Initialize: Label each vertex with its own internal ID
  S = SELECT x FROM Start:x
      POST-ACCUM 
         x.@cc_id = getvid(x);
         
  # Community detection steps omitted
  PRINT Start.@cc_id;
}

getvid_ex.json

GSQL > RUN QUERY getvid_ex()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"Start": [
    {
      "v_id": "person7",
      "attributes": {"Start.@cc_id": 0},
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {"Start.@cc_id": 4194304},
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {"Start.@cc_id": 4194305},
      "v_type": "person"
    },
    {
      "v_id": "person4",
      "attributes": {"Start.@cc_id": 11534336},
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {"Start.@cc_id": 13631488},
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {"Start.@cc_id": 20971520},
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {"Start.@cc_id": 22020096},
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {"Start.@cc_id": 24117248},
      "v_type": "person"
    }
  ]}]
}

COALESCE()

coalesce function example

CREATE QUERY coalesceFuncEx (INT p1, DOUBLE p2) FOR GRAPH minimalNet {
  PRINT COALESCE(p1, p2, 999.5);  # p2 and the last value will be converted into first argument type, which is INT.
}

coalesceFuncEx.json Results

GSQL > RUN QUERY coalesceFuncEx(_,_)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"coalesce(p1,p2,999.5)": 999}]
}
GSQL > RUN QUERY coalesceFuncEx(1,2)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"coalesce(p1,p2,999.5)": 1}]
}
GSQL > RUN QUERY coalesceFuncEx(_,2.5)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"coalesce(p1,p2,999.5)": 2}]
}

The COALESCE function is useful when multiple optional parameters are allowed, and one of them must be chosen if available. For example,

coalesce function example

CREATE QUERY coalesceFuncEx2 (STRING homePhone, STRING cellPhone, STRING companyPhone) FOR GRAPH minimalNet {
  PRINT "contact number: " + COALESCE(homePhone, cellPhone, companyPhone); # test all NULL
  PRINT "contact number: " + COALESCE(homePhone, cellPhone, companyPhone, "N/A");  
}

The COALESCE function's parameter list should have a default value as the last argument. Otherwise, i f all values are NULL, the default value of the data type is returned.

coalesceFuncEx2.json Results

GSQL > RUN QUERY coalesceFuncEx2(_,_,_)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"contact number: +coalesce(homePhone,cellPhone,companyPhone)": "contact number: "},
    {"contact number:+coalesce(homePhone,cellPhone,companyPhone,N/A)": "contact number:N/A"}
  ]
}

Dynamic Expressions with EVALUATE()

evaluate(expressionStr, typeStr)

evaluate() can only be used inside a SELECT block, and only inside a WHERE clause, ACCUM clause, POST-ACCUM clause, HAVING clause, or ORDER BY clause. It cannot be used in a LIMIT clause or outside a SELECT block.
The result type must be specified at query installation time: typeStr must be a string literal for a primitive data type, e.g., one of "int", "float", "double", "bool", "string" (case insensitive). The default value is "bool".
In expressionStr, identifiers can refer only to a vertex or edge aliases, vertex-attached accumulators, global accumulators, parameters, or scalar function calls involving the above variables. The expression may not refer to local variables, global variables, or to FROM clause vertices or edges by type.
Any accumulators in the expression must be scalar accumulators (e.g., MaxAccum) for primitive-type data. Container accumulators (e.g., SetAccum) or scalar accumulators with non-primitive type (e.g. VERTEX, EDGE, DATETIME) are not supported. Container type attributes are not supported.
evaluate() cannot be nested.

The following situations generate a run-time error:

The expression string expressionStr cannot be compiled (unless the error is due to a non-existent vertex or edge attribute).
The result type of the expression does not match the parameter typeStr.

Silent failure conditions

The expression references a non-existent attribute of a vertex or edge alias.
The expression uses an operator for non-compatible operation. For example, 123 == "xyz".

The following example employs dynamic expressions in both the WHERE condition and the accumulator value in the POST-ACCUM clause.

Evaluate example

CREATE QUERY evaluateEx (STRING whereCond = "TRUE", STRING postAccumIntExpr = "1") FOR GRAPH socialNet {
  SetAccum<INT> @@timeSet;
  MaxAccum<INT> @latestLikeTime, @latestLikePostTime;

  S = {person.*};
  S2 = SELECT s 
       FROM S:s - (liked:e) -> post:t
       WHERE evaluate(whereCond)
       ACCUM s.@latestLikeTime += datetime_to_epoch( e.actionTime ),
             s.@latestLikePostTime += datetime_to_epoch( t.postTime )
       POST-ACCUM @@timeSet += evaluate(postAccumIntExpr, "int")
       ;
  PRINT @@timeSet;
}

Results for Query evaluateEx

GSQL > RUN QUERY evaluateEx(_,_)
{
  "error": false,
  "message": "",
  "results": [{"@@timeSet": [1]}]
}

GSQL > RUN QUERY evaluateEx("s.gender==\"Male\"", "s.@latestLikePostTime")
{
  "error": false,
  "message": "",
  "results": [
    {
      "@@timeSet": [1263295325,1296752752,1297054971,1296788551]
    }
  ]
}

GSQL > RUN QUERY evaluateEx("s.gender==\"Female\"", "s.@latestLikeTime + 1")
{
  "error": false,
  "message": "",
  "results": [
    {
      "@@timeSet": [1263293536,1263352566,1263330726]
    }
  ]
}


GSQL > RUN QUERY evaluateEx("xx", _)
Runtime Error: xx is undefined parameter.


GSQL > RUN QUERY evaluateEx("e.xyz", _)'   # The attribute doesn't exist, so the entire condition in WHERE clause is false.  
{
  "error": false,
  "message": "",
  "results": [{"@@timeSet": []}]
}


GSQL > RUN QUERY evaluateEx("e.actionTime", _) 
Runtime Error: actionTime is not a primitive type attribute.

GSQL > RUN QUERY evaluateEx("s.id", _)
Runtime Error: Expression 's.id' value type is not bool.


GSQL > RUN QUERY evaluateEx("s.gender==\"Female\"", "s.xx")   # The attribute doesn't exist, so the entire assignment is skipped. 
{
  "error": false,
  "message": "",
  "results": [{"@@timeSet": []}]
}

Queries as Functions

A query that has been defined (with a CREATE QUERY ... RETURNS statement) can be treated as a callable function. A query can call itself recursively.

The following limitations apply to queries calling queries:

Each parameter of the called query may be one of the following types:
1. Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL
2. VERTEX
3. A Set or Bag of primitive or VERTEX elements
The return value may be one of the following types. See also the "Return Statement" section.
1. Primitives: INT, UINT, FLOAT, DOUBLE, STRING, BOOL
2. VERTEX
3. a vertex set (e.g., the result of a SELECT statement)
4. An accumulator of primitive types. GroupByAccum and accumulators containing tuples are not supported.
A query which returns a SetAccum or BagAccum may be called with a Set or Bag argument, respectively.
The order of definition matters. A query cannot call a query which has not yet been defined.

Subquery Example 1

CREATE QUERY subquery1 (VERTEX<person> m1) FOR GRAPH socialNet RETURNS(BagAccum<VERTEX<post>>)
{
  Start = {m1};
  L = SELECT t
      FROM Start:s - (liked:e) - post:t;
  RETURN L;
}
CREATE QUERY mainquery1 () FOR GRAPH socialNet
{
  BagAccum<VERTEX<post>> @@testBag;
  Start = {person.*};
  Start = SELECT s FROM Start:s
          ACCUM @@testBag += subquery1(s);
  PRINT @@testBag;
}

User-Defined Functions

If a user-defined struct or a helper function needs to be defined, define it in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprUtil.hpp.

Here is an example:

new code in ExprFunction.hpp

#include <algorithm>  // for std::reverse
inline bool greater_than_three (double x) {
  return x > 3;
}
inline string reverse(string str){
  std::reverse(str.begin(), str.end());
  return str;
}

user defined expression function

CREATE QUERY udfExample() FOR GRAPH minimalNet {
  DOUBLE x;
  BOOL y;

  x = 3.5;
  PRINT greater_than_three(x);
  y = greater_than_three(2.5);
  PRINT y;

  PRINT reverse("abc");
}

udfExample.json Results

 GSQL > RUN QUERY udfExample()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"greater_than_three(x)": true},
    {"y": false},
    {"reverse(abc)": "cba"}
  ]
}

test.cpp

#include "ExprFunctions.hpp"
#include <iostream>
int main () {
  std::cout << to_string (123) << std::endl;    // to_string and str_to_int are two built-in functions in ExprFunction.hpp
  std::cout << str_to_int ("123") << std::endl;
  return 0;
}

Examples of Expressions

Below is a list of examples of expressions. Note that ( argList ) is a set/bag expression, while [ argList ] is a list expression.

Expression Examples

#Show various types of expressions
CREATE QUERY expressionEx() FOR GRAPH workNet {
  TYPEDEF tuple<STRING countryName, STRING companyName> companyInfo;

  ListAccum<STRING> @companyNames;
  SumAccum<INT> @companyCount;
  SumAccum<INT> @numberOfRelationships;
  ListAccum<companyInfo> @info;
  MapAccum< STRING,ListAccum<STRING> > @@companyEmployeeRelationships;
  SumAccum<INT> @@totalRelationshipCount;

  ListAccum<INT> @@valueList;
  SetAccum<INT> @@valueSet;

  SumAccum<INT> @@a;
  SumAccum<INT> @@b;

  #expr := constant
  @@a = 10;

  #expr := ["@@"] name
  @@b = @@a;

  #expr := expr mathOperator expr
  @@b = @@a + 5;

  #expr := "(" expr ")"
  @@b = (@@a + 5);

  #expr := "-" expr
  @@b = -(@@a + 5);

  PRINT @@a, @@b;

  #expr := "[" argList "]"   // a list
  @@valueList = [1,2,3,4,5];
  @@valueList += [24,80];

  #expr := "(" argList ")"  // setBagExpr
  @@valueSet += (1,2,3,4,5);

  #expr := ( COUNT | ISEMPTY | MAX | MIN | AVG | SUM ) "(" setBagExpr ")"
  PRINT MAX(@@valueList);
  PRINT AVG(@@valueList);

  seed = {ANY};

  company1 = SELECT t FROM seed:s -(worksFor)-> :t WHERE (s.id == "company1");
  company2 = SELECT t FROM seed:s -(worksFor)-> :t WHERE (s.id == "company2");

  #expr := setBagExpr
  worksForBoth = company1 INTERSECT company2;
  PRINT worksForBoth;

  #expr := name "." "type"
  employees = SELECT s FROM seed:s WHERE (s.type == "person");

  employees = SELECT s FROM employees:s -(worksFor)-> :t

    ACCUM
      #expr := name "." ["@"] name
      s.@companyNames += t.id,

      #expr := name "." name "(" [argList] ")" [ ".".FILTER "(" condition ")" ]
      s.@numberOfRelationships += s.outdegree(),

      #expr := name ["<" type ["," type"]* ">"] "(" [argList] ")"
      s.@info += companyInfo(t.country, t.id)

   POST-ACCUM
     #expr := name "." "@" name ("." name "(" [argList] ")")+ ["." name]
     s.@companyCount += s.@companyNames.size(),

    #expr := name "." "@" name ["\'"]
    @@totalRelationshipCount += s.@companyCount,

  FOREACH comp IN s.@companyNames DO
     #expr := "(" argList "->" argList ")"
     @@companyEmployeeRelationships += (s.id -> comp)
  END;

  PRINT employees;
  PRINT @@totalRelationshipCount;
  PRINT @@companyEmployeeRelationships;

  #expr := "@@" name ("." name "(" [argList] ")")+ ["." name]
  PRINT @@companyEmployeeRelationships.size();
}

expressionEx.json Results

GSQL > RUN QUERY expressionEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "@@a": 10,
      "@@b": -15
    },
    {"max(@@valueList)": 80},
    {"avg(@@valueList)": 17},
    {"worksForBoth": [
      {
        "v_id": "person2",
        "attributes": {
          "interestList": ["engineering"],
          "@companyCount": 0,
          "@numberOfRelationships": 0,
          "skillSet": [ 6, 5, 3, 2 ],
          "skillList": [ 2, 3, 5, 6 ],
          "locationId": "chn",
          "interestSet": ["engineering"],
          "@info": [],
          "id": "person2",
          "@companyNames": []
        },
        "v_type": "person"
      },
      {
        "v_id": "person1",
        "attributes": {
          "interestList": [ "management", "financial" ],
          "@companyCount": 0,
          "@numberOfRelationships": 0,
          "skillSet": [ 3, 2, 1 ],
          "skillList": [ 1, 2, 3 ],
          "locationId": "us",
          "interestSet": [ "financial", "management" ],
          "@info": [],
          "id": "person1",
          "@companyNames": []
        },
        "v_type": "person"
      }
    ]},
    {"employees": [
      {
        "v_id": "person4",
        "attributes": {
          "interestList": ["football"],
          "@companyCount": 1,
          "@numberOfRelationships": 1,
          "skillSet": [ 10, 1, 4 ],
          "skillList": [ 4, 1, 10 ],
          "locationId": "us",
          "interestSet": ["football"],
          "@info": [{ "companyName": "company2", "countryName": "chn" }],
          "id": "person4",
          "@companyNames": ["company2"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person12",
        "attributes": {
          "interestList": [
            "music",
            "engineering",
            "teaching",
            "teaching",
            "teaching"
          ],
          "@companyCount": 1,
          "@numberOfRelationships": 1,
          "skillSet": [ 2, 5, 1 ],
          "skillList": [ 1, 5, 2, 2, 2 ],
          "locationId": "jp",
          "interestSet": [ "teaching", "engineering", "music" ],
          "@info": [{ "companyName": "company4", "countryName": "us" }],
          "id": "person12",
          "@companyNames": ["company4"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person3",
        "attributes": {
          "interestList": ["teaching"],
          "@companyCount": 1,
          "@numberOfRelationships": 1,
          "skillSet": [ 6, 1, 4 ],
          "skillList": [ 4, 1, 6 ],
          "locationId": "jp",
          "interestSet": ["teaching"],
          "@info": [{ "companyName": "company1", "countryName": "us" }],
          "id": "person3",
          "@companyNames": ["company1"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person9",
        "attributes": {
          "interestList": [ "financial", "teaching" ],
          "@companyCount": 2,
          "@numberOfRelationships": 4,
          "skillSet": [ 2, 7, 4 ],
          "skillList": [ 4, 7, 2 ],
          "locationId": "us",
          "interestSet": [ "teaching", "financial" ],
          "@info": [
            {
              "companyName": "company3",
              "countryName": "jp"
            },
            {
              "companyName": "company2",
              "countryName": "chn"
            }
          ],
          "id": "person9",
          "@companyNames": [ "company3", "company2" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person11",
        "attributes": {
          "interestList": [ "sport", "football" ],
          "@companyCount": 1,
          "@numberOfRelationships": 1,
          "skillSet": [10],
          "skillList": [10],
          "locationId": "can",
          "interestSet": [ "football", "sport" ],
          "@info": [{ "companyName": "company5", "countryName": "can" }],
          "id": "person11",
          "@companyNames": ["company5"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person10",
        "attributes": {
          "interestList": [ "football", "sport" ],
          "@companyCount": 2,
          "@numberOfRelationships": 4,
          "skillSet": [3],
          "skillList": [3],
          "locationId": "us",
          "interestSet": [ "sport", "football" ],
          "@info": [
            {
              "companyName": "company3",
              "countryName": "jp"
            },
            {
              "companyName": "company1",
              "countryName": "us"
            }
          ],
          "id": "person10",
          "@companyNames": [ "company3", "company1" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person7",
        "attributes": {
          "interestList": [ "art", "sport" ],
          "@companyCount": 2,
          "@numberOfRelationships": 4,
          "skillSet": [ 6, 8 ],
          "skillList": [ 8, 6 ],
          "locationId": "us",
          "interestSet": [ "sport", "art" ],
          "@info": [
            {
              "companyName": "company3",
              "countryName": "jp"
            },
            {
              "companyName": "company2",
              "countryName": "chn"
            }
          ],
          "id": "person7",
          "@companyNames": [ "company3", "company2" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person1",
        "attributes": {
          "interestList": [ "management", "financial" ],
          "@companyCount": 2,
          "@numberOfRelationships": 4,
          "skillSet": [ 3, 2, 1 ],
          "skillList": [ 1, 2, 3 ],
          "locationId": "us",
          "interestSet": [ "financial", "management" ],
          "@info": [
            {
              "companyName": "company2",
              "countryName": "chn"
            },
            {
              "companyName": "company1",
              "countryName": "us"
            }
          ],
          "id": "person1",
          "@companyNames": [ "company2", "company1" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person5",
        "attributes": {
          "interestList": [ "sport", "financial", "engineering" ],
          "@companyCount": 1,
          "@numberOfRelationships": 1,
          "skillSet": [ 5, 2, 8 ],
          "skillList": [ 8, 2, 5 ],
          "locationId": "can",
          "interestSet": [ "engineering", "financial", "sport" ],
          "@info": [{ "companyName": "company2", "countryName": "chn" }],
          "id": "person5",
          "@companyNames": ["company2"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person6",
        "attributes": {
          "interestList": [ "music", "art" ],
          "@companyCount": 1,
          "@numberOfRelationships": 1,
          "skillSet": [ 10, 7 ],
          "skillList": [ 7, 10 ],
          "locationId": "jp",
          "interestSet": [ "art", "music" ],
          "@info": [{ "companyName": "company1", "countryName": "us" }],
          "id": "person6",
          "@companyNames": ["company1"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person2",
        "attributes": {
          "interestList": ["engineering"],
          "@companyCount": 2,
          "@numberOfRelationships": 4,
          "skillSet": [ 6, 5, 3, 2 ],
          "skillList": [ 2, 3, 5, 6 ],
          "locationId": "chn",
          "interestSet": ["engineering"],
          "@info": [
            {
              "companyName": "company2",
              "countryName": "chn"
            },
            {
              "companyName": "company1",
              "countryName": "us"
            }
          ],
          "id": "person2",
          "@companyNames": [ "company2", "company1" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person8",
        "attributes": {
          "interestList": ["management"],
          "@companyCount": 1,
          "@numberOfRelationships": 1,
          "skillSet": [ 2, 5, 1 ],
          "skillList": [ 1, 5, 2 ],
          "locationId": "chn",
          "interestSet": ["management"],
          "@info": [{ "companyName": "company1", "countryName": "us" }],
          "id": "person8",
          "@companyNames": ["company1"]
        },
        "v_type": "person"
      }
    ]},
    {"@@totalRelationshipCount": 17},
    {"@@companyEmployeeRelationships": {
      "person4": ["company2"],
      "person3": ["company1"],
      "person2": [ "company2", "company1" ],
      "person1": [ "company2", "company1" ],
      "person9": [ "company3", "company2" ],
      "person12": ["company4"],
      "person8": ["company1"],
      "person7": [ "company3", "company2" ],
      "person6": ["company1"],
      "person10": [ "company3", "company1" ],
      "person5": ["company2"],
      "person11": ["company5"]
    }},
    {"@@companyEmployeeRelationships.size()": 12}
  ]
}

Examples of Expression Statements

Expression Statement Examples

#Show various types of expression statements
CREATE QUERY expressionStmntEx() FOR GRAPH workNet { 
  TYPEDEF tuple<STRING countryName, STRING companyName> companyInfo;

  ListAccum<companyInfo> @employerInfo;
  SumAccum<INT> @@a;
  ListAccum<STRING> @employers;
  SumAccum<INT> @employerCount;
  SetAccum<STRING> @@countrySet;

  int x;

  #exprStmnt := name "=" expr
  x = 10;

  #gAccumAssignStmt := "@@" name ("+=" | "=") expr
  @@a = 10;

  PRINT x, @@a;

  start = {person.*};

  employees = SELECT s FROM start:s -(worksFor)-> :t              
  	          ACCUM #exprStmnt := name "." "@" name ("+="| "=") expr
                    s.@employers += t.id,
       		        #exprStmnt := name ["<" type ["," type"]* ">"] "(" [argList] ")"
		            s.@employerInfo += companyInfo(t.country, t.id),
                    #gAccumAccumStmt := "@@" name "+=" expr
		            @@countrySet += t.country
	                #exprStmnt := name "." "@" name ["." name "(" [argList] ")"]
	          POST-ACCUM s.@employerCount += s.@employers.size();

  #exprStmnt := "@@" name ["." name "(" [argList] ")"]+
  PRINT @@countrySet.size();
  PRINT employees;
}

GSQL > RUN QUERY expressionStmntEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "@@a": 10,
      "x": 10
    },
    {"@@countrySet.size()": 4},
    {"employees": [
      {
        "v_id": "person4",
        "attributes": {
          "interestList": ["football"],
          "skillSet": [ 10, 1, 4 ],
          "skillList": [ 4, 1, 10 ],
          "locationId": "us",
          "@employerInfo": [{
            "companyName": "company2",
            "countryName": "chn"
          }],
          "interestSet": ["football"],
          "@employerCount": 1,
          "id": "person4",
          "@employers": ["company2"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person11",
        "attributes": {
          "interestList": [ "sport", "football" ],
          "skillSet": [10],
          "skillList": [10],
          "locationId": "can",
          "@employerInfo": [{
            "companyName": "company5",
            "countryName": "can"
          }],
          "interestSet": [ "football", "sport" ],
          "@employerCount": 1,
          "id": "person11",
          "@employers": ["company5"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person10",
        "attributes": {
          "interestList": [ "football", "sport" ],
          "skillSet": [3],
          "skillList": [3],
          "locationId": "us",
          "@employerInfo": [
            {
              "companyName": "company3",
              "countryName": "jp"
            },
            {
              "companyName": "company1",
              "countryName": "us"
            }
          ],
          "interestSet": [ "sport", "football" ],
          "@employerCount": 2,
          "id": "person10",
          "@employers": [ "company3", "company1" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person7",
        "attributes": {
          "interestList": [ "art", "sport" ],
          "skillSet": [ 6, 8 ],
          "skillList": [ 8, 6 ],
          "locationId": "us",
          "@employerInfo": [
            {
              "companyName": "company3",
              "countryName": "jp"
            },
            {
              "companyName": "company2",
              "countryName": "chn"
            }
          ],
          "interestSet": [ "sport", "art" ],
          "@employerCount": 2,
          "id": "person7",
          "@employers": [ "company3", "company2" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person1",
        "attributes": {
          "interestList": [ "management", "financial" ],
          "skillSet": [ 3, 2, 1 ],
          "skillList": [ 1, 2, 3 ],
          "locationId": "us",
          "@employerInfo": [
            {
              "companyName": "company2",
              "countryName": "chn"
            },
            {
              "companyName": "company1",
              "countryName": "us"
            }
          ],
          "interestSet": [ "financial", "management" ],
          "@employerCount": 2,
          "id": "person1",
          "@employers": [ "company2", "company1" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person6",
        "attributes": {
          "interestList": [ "music", "art" ],
          "skillSet": [ 10, 7 ],
          "skillList": [ 7, 10 ],
          "locationId": "jp",
          "@employerInfo": [{ "companyName": "company1", "countryName": "us" }],
          "interestSet": [ "art", "music" ],
          "@employerCount": 1,
          "id": "person6",
          "@employers": ["company1"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person2",
        "attributes": {
          "interestList": ["engineering"],
          "skillSet": [ 6, 5, 3, 2 ],
          "skillList": [ 2, 3, 5, 6 ],
          "locationId": "chn",
          "@employerInfo": [
            {
              "companyName": "company2",
              "countryName": "chn"
            },
            {
              "companyName": "company1",
              "countryName": "us"
            }
          ],
          "interestSet": ["engineering"],
          "@employerCount": 2,
          "id": "person2",
          "@employers": [ "company2", "company1" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person5",
        "attributes": {
          "interestList": [ "sport", "financial", "engineering" ],
          "skillSet": [ 5, 2, 8 ],
          "skillList": [ 8, 2, 5 ],
          "locationId": "can",
          "@employerInfo": [{
            "companyName": "company2",
            "countryName": "chn"
          }],
          "interestSet": [ "engineering", "financial", "sport" ],
          "@employerCount": 1,
          "id": "person5",
          "@employers": ["company2"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person12",
        "attributes": {
          "interestList": [
            "music",
            "engineering",
            "teaching",
            "teaching",
            "teaching"
          ],
          "skillSet": [ 2, 5, 1 ],
          "skillList": [ 1, 5, 2, 2, 2 ],
          "locationId": "jp",
          "@employerInfo": [{ "companyName": "company4", "countryName": "us" }],
          "interestSet": [ "teaching", "engineering", "music" ],
          "@employerCount": 1,
          "id": "person12",
          "@employers": ["company4"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person3",
        "attributes": {
          "interestList": ["teaching"],
          "skillSet": [ 6, 1, 4 ],
          "skillList": [ 4, 1, 6 ],
          "locationId": "jp",
          "@employerInfo": [{ "companyName": "company1", "countryName": "us" }],
          "interestSet": ["teaching"],
          "@employerCount": 1,
          "id": "person3",
          "@employers": ["company1"]
        },
        "v_type": "person"
      },
      {
        "v_id": "person9",
        "attributes": {
          "interestList": [ "financial", "teaching" ],
          "skillSet": [ 2, 7, 4 ],
          "skillList": [ 4, 7, 2 ],
          "locationId": "us",
          "@employerInfo": [
            {
              "companyName": "company3",
              "countryName": "jp"
            },
            {
              "companyName": "company2",
              "countryName": "chn"
            }
          ],
          "interestSet": [ "teaching", "financial" ],
          "@employerCount": 2,
          "id": "person9",
          "@employers": [ "company3", "company2" ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person8",
        "attributes": {
          "interestList": ["management"],
          "skillSet": [ 2, 5, 1 ],
          "skillList": [ 1, 5, 2 ],
          "locationId": "chn",
          "@employerInfo": [{ "companyName": "company1", "countryName": "us" }],
          "interestSet": ["management"],
          "@employerCount": 1,
          "id": "person8",
          "@employers": ["company1"]
        },
        "v_type": "person"
      }
    ]}
  ]
}

Accumulators

Accumulators are special types of variables that accumulate information about the graph during its traversal and exploration. Because they are a unique and important feature of the GSQL query language, we devote a separate section for their introduction, but additional detail on their usage will be covered in other sections, the "SELECT Statement" section in particular. This section covers the following subset of the EBNF language definitions:

EBNF

accumDeclStmt := accumType "@"name ["=" constant][, "@"name ["=" constant]]*
               | "@"name ["=" constant][, "@"name ["=" constant]]* accumType
               | [STATIC] accumType "@@"name ["=" constant][, "@@"name ["=" constant]]*
               | [STATIC] "@@"name ["=" constant][, "@@"name ["=" constant]]* accumType


accumType := "SumAccum" "<" ( INT | FLOAT | DOUBLE | STRING ) ">"
		   | "MaxAccum" "<" ( INT | FLOAT | DOUBLE ) ">"
 	       | "MinAccum" "<" ( INT | FLOAT | DOUBLE ) ">"
     	   | "AvgAccum"
		   | "OrAccum"
		   | "AndAccum"
           | "BitwiseOrAccum"
           | "BitwiseAndAccum"
		   | "ListAccum" "<" type ">"
		   | "SetAccum"  "<" elementType ">"      
		   | "BagAccum"  "<" elementType ">"      
           | "MapAccum"  "<" elementType "," ( baseType | accumType | name ) ">"   
           | "HeapAccum" "<" name ">" "(" (integer | name) "," name [ASC | DESC] ["," name [ASC | DESC]]* ")"
		   | "GroupByAccum" "<" elementType name ["," elementType name]* , accumType name ["," accumType name]* ">"
           | "ArrayAccum" "<" name ">"

elementType := baseType | name | STRING COMPRESS

gAccumAccumStmt := "@@"name "+=" expr

accumClause := ACCUM DMLSubStmtList
  
postAccumClause := POST-ACCUM DMLSubStmtList

There are a number of different types of accumulators, each providing specific accumulation functions. Accumulators are declared to have one of two types of association: global or vertex-attached .

More technically, accumulators are mutable mutex variables shared among all the graph computation threads exploring the graph within a given query. To improve performance, the graph processing engine employs multithreaded processing. Modification of accumulators is coordinated at run-time so the accumulation operator works correctly (i.e., mutually exclusively) across all threads. This is particularly relevant in the ACCUM clause. During traversal of the graph, the selected set of edges or vertices is partitioned among a group of threads. These threads have shared mutually exclusive access to the accumulators.

Declaration of Accumulators

All accumulator variables must be declared at the beginning of a query, immediately after any typedefs, and before any other type of statement. The scope of the accumulator variables is the entire query.

The name of a vertex-attached accumulator begins with a single "@". The name of a global accumulator begins with "@@". Additionally, a global accumulator may be declared to be static.

EBNF for Accumulator Declaration

accumDeclStmt := accumType "@"name ["=" constant][, "@"name ["=" constant]]*
               | "@"name ["=" constant][, "@"name ["=" constant]]* accumType
               | [STATIC] accumType "@@"name ["=" constant][, "@@"name ["=" constant]]*
               | [STATIC] "@@"name ["=" constant][, "@@"name ["=" constant]]* accumType

Vertex-attached Accumulators

Vertex-attached accumulators are mutable state variables that are attached to each vertex in the graph for the duration of the query's lifetime. They act as run-time attributes of a vertex. They are shared, mutual exclusively, among all of the query's processes. Vertex-attached accumulators can be set to a value with the = operator. Additionally, an accumulate operator += can be used to update the state of the accumulator; the function of += depends on the accumulator type. In the example below, there are two accumulators attached to each vertex. The initial value of an accumulator of a given type is predefined, however it can be changed at declaration as in the accumulator @weight below. All vertex-attached accumulator names have a single leading at-sign "@".

Vertex-Attached Accumulators

SumAccum<int>   @neighbors;
MaxAccum<float> @weight = 2.8;

If there is a graph with 10 vertices, then there is an instance of @neighbors and @weight for each vertex (hence 10 of each, and 20 total accumulator instances). These are accessed via the dot operator on a vertex variable or a vertex alias (e.g., v.@neighbor ). The accumulator operator += only impacts the accumulator for the specific vertex being referenced. A statement such as v1.@neighbors += 1will only impact v1 's @neighbors and not the @neighbors for other vertices.

Vertex-attached accumulators can only be accessed or updated (via = or +=) in an ACCUM or POST-ACCUM clause within a SELECT block. The only exception to this rule is that vertex-attached accumulators can be referenced in a PRINT statement, as the PRINT has access to all information attached to a vertex set.

Edge-attached accumulators are not supported.

Global Accumulators

A global accumulator is a single mutable accumulator that can be accessed or updated within a query. The names of global accumulators start with a double at-sign "@@".

Global Accumulators

SumAccum<int>   @@totalNeighbors;
MaxAccum<float> @@entropy = 1.0;

Global accumulators can only be assigned (using the = operator) outside a SELECT block (i.e., not within an ACCUM or POST-ACCUM clause). Global accumulators can be accessed or updated via the accumulate operator += anywhere within a query, including inside a SELECT block.

It is important to note that the accumulation operation for global accumulators in an ACCUM clause executes once for each process. That is, if the FROM clause uses an edge-induced selection (introduced in Section "SELECT Statement"), the ACCUM clause executes one process for each edge in the selected edge set. If the FROM clause uses a vertex-induced selection (introduced in Section "SELECT Statement"), the ACCUM clause executes one process for each vertex in the selected vertex set. Since global accumulators are shared in a mutually exclusive manner among processes, they behave very differently than a non-accumulator variable (see Section "Variable Types" for more details) in an ACCUM clause. Take the following code example. The global accumulator@@globalRelationshipCount is accumulated for every worksFor edge traversed since it is shared among processes. Conversely, relationshipCount appears to have only been incremented once. This is because a non-accumulator variable is not shared among processes. Each process has its own separate unshared copy of relationshipCount and increments the original value by one. (E.g., each process increments relationshipCount from 0 to 1.) There is no accumulation and the final value is one.

Global Variable vs Global Accumulator

#Count the total number of employment relationships for all companies
CREATE QUERY countEmploymentRelationships() FOR GRAPH workNet {

  INT localRelationshipCount;
  SumAccum<INT> @@globalRelationshipCount;

  start = {company.*};

  companies = SELECT s FROM start:s -(worksFor)-> :t
          ACCUM @@globalRelationshipCount += 1,
                localRelationshipCount = localRelationshipCount + 1;

  PRINT localRelationshipCount;
  PRINT @@globalRelationshipCount;
}

countEmploymentRelationship.json Results

GSQL > RUN QUERY countEmploymentRelationships()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"localRelationshipCount": 1},
    {"@@globalRelationshipCount": 17}
  ]
}

Static Global Accumulators

A static global accumulator retains its value after the execution of a query. To declare a static global accumulator, include the STATIC keyword at the beginning of the declaration statement. For example, if a static global accumulator is incremented by 1 each time a query is executed, then its value is equal to the number of times the query has been run, since the query was installed. Each static global accumulator belongs to the particular query in which it is declared; it cannot be shared among different queries. The value only persists in the context of running the same query multiple times. The value will reset to the default value when the GPE is restarted.

Static Global Accumulators example

CREATE QUERY staticAccumEx(INT x) FOR GRAPH minimalNet {
  STATIC ListAccum<INT> @@testList;
  @@testList += x;
  PRINT @@testList;
}

staticAccumEx.json Result

GSQL > RUN QUERY staticAccumEx(3)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"@@testList": [
    3,
    -5,
    3
  ]}]
}
GSQL > RUN QUERY staticAccumEx(-5)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"@@testList": [
    3,
    -5,
    3,
    -5
  ]}]
}

There is no command to deallocate a static global accumulator. If a static global accumulator is a collection accumulator and it no longer needed, it should be cleared to minimize the memory usage.

Accumulator Types

The following are the accumulator types we currently support. Each type of accumulator supports one or more data types .

EBNF for Accumulator Types

accumType := "SumAccum" "<" ( INT | FLOAT | DOUBLE | STRING ) ">"
		   | "MaxAccum" "<" ( INT | FLOAT | DOUBLE ) ">"
 	       | "MinAccum" "<" ( INT | FLOAT | DOUBLE ) ">"
     	   | "AvgAccum"
		   | "OrAccum"
		   | "AndAccum"
           | "BitwiseOrAccum"
           | "BitwiseAndAccum"
		   | "ListAccum" "<" type ">"
		   | "SetAccum"  "<" elementType ">"      
		   | "BagAccum"  "<" elementType ">"      
           | "MapAccum"  "<" elementType "," (baseType | accumType | name ) ">"   
           | "HeapAccum" "<" name ">" "(" (integer | name) "," name [ASC | DESC] ["," name [ASC | DESC]]* ")"
		   | "GroupByAccum" "<" elementType name ["," elementType name]* , accumType name ["," accumType name]* ">"
           | "ArrayAccum" "<" name ">"

elementType := baseType | name | STRING COMPRESS

gAccumAccumStmt := "@@"name "+=" expr

The accumulators fall into two major groups :

Scalar Accumulators store a single value:
- SumAccum
- MinAccum, MaxAccum
- AvgAccum
- AndAccum, OrAccum
- BitwiseAndAccum, BitwiseOrAccum
Collection Accumulators store a set of values:
- ListAccum
- SetAccum
- BagAccum
- MapAccum
- ArrayAccum
- HeapAccum
- GroupByAccum

The details of each accumulator type are summarized in the table below. The Accumulation Operation column explains how the accumulator accumName is updated when the statement accumName += newVal is executed. Following the table are example queries for each accumulator type.

Table Ac1: Accumulator Types and Their Accumulation Behavior

SumAccum

The SumAccum type computes and stores the cumulative sum of numeric values or the cumulative concatenation of text values. The output of a SumAccum is a single numeric or string value. SumAccum variables operate on values of type INT , UINT, FLOAT, DOUBLE, or STRING only.

The += operator updates the accumulator's state. For INT, FLOAT, and DOUBLE types, += arg performs a numeric addition, while for the STRING value type += arg concatenates arg to the current value of the SumAccum.

SumAccum Example

# SumAccum Example
CREATE QUERY sumAccumEx() FOR GRAPH minimalNet {
  
  SumAccum<INT>    @@intAccum;
  SumAccum<FLOAT>  @@floatAccum;
  SumAccum<DOUBLE> @@doubleAccum;
  SumAccum<STRING> @@stringAccum;

  @@intAccum  = 1;
  @@intAccum += 1;
  
  @@floatAccum = @@intAccum;      
  @@floatAccum = @@floatAccum / 3;

  @@doubleAccum  = @@floatAccum * 8;
  @@doubleAccum += -1;

  @@stringAccum  = "Hello ";
  @@stringAccum += "World";
  
  PRINT @@intAccum;
  PRINT @@floatAccum;
  PRINT @@doubleAccum;
  PRINT @@stringAccum;
}

sumAccumEx.json Result

GSQL > RUN QUERY sumAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@intAccum": 2},
    {"@@floatAccum": 0.66667},
    {"@@doubleAccum": 4.33333},
    {"@@stringAccum": "Hello World"}
  ]
}

MinAccum / MaxAccum

The MinAccum and MaxAccum types calculate and store the cumulative minimum or the cumulative maximum of a series of values. The output of a MinAccum or a MaxAccum is a single numeric value. MinAccum and MaxAccum variables operate on values of type INT, UINT, FLOAT, and DOUBLE, VERTEX (with optional specific vertex type) only.

For MinAccum, += arg checks if the current value held is less than arg and stores the smaller of the two. MaxAccum behaves the same, with the exception that it checks for and stores the greater instead of the lesser of the two.

MinAccum and MaxAccum Example

# MinAccum and MaxAccum Example
CREATE QUERY minMaxAccumEx() FOR GRAPH minimalNet {
  
  MinAccum<INT> @@minAccum;
  MaxAccum<FLOAT> @@maxAccum;

  @@minAccum += 40;
  @@minAccum += 20;
  @@minAccum += -10;

  @@maxAccum += -1.1;
  @@maxAccum += 2.5;
  @@maxAccum += 2.8;

  PRINT @@minAccum;
  PRINT @@maxAccum;
}

minMaxAccumEx.json Result

GSQL > RUN QUERY minMaxAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@minAccum": -10},
    {"@@maxAccum": 2.8}
  ]
}

MinAccum and MaxAccum operating on VERTEX type have a special comparison. They do not compare vertex ids, but TigerGraph internal ids, which might n ot be in t he same order as the external ids. Comparing internal ids is much faster, so MinAccum/ MaxAccum<VERTEX> provide an efficient way to compar e and select vertices. This is helpful for some graph algorithms that require the vertices to be numbered and sortable . For example, the following query returns one post from each person. The returned vertex is not necessarily the vertex with alphabetically largest id.

MaxAccum<VERTEX> example

# Output one random post vertex from each person
CREATE QUERY minMaxAccumVertex() FOR GRAPH socialNet api("v2") {

  MaxAccum<VERTEX> @maxVertex;
  allUser = {person.*};
  allUser = SELECT src 
            FROM allUser:src -(posted)-> post:tgt
            ACCUM src.@maxVertex += tgt
            ORDER BY src.id;
  PRINT allUser[allUser.@maxVertex]; // api v2
}

minMaxAccuxVertex.json Result

GSQL > RUN QUERY minMaxAccumVertex()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"allUser": [
    {
      "v_id": "person1",
      "attributes": {"allUser.@maxVertex": "0"},
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {"allUser.@maxVertex": "1"},
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {"allUser.@maxVertex": "2"},
      "v_type": "person"
    },
    {
      "v_id": "person4",
      "attributes": {"allUser.@maxVertex": "3"},
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {"allUser.@maxVertex": "11"},
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {"allUser.@maxVertex": "10"},
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {"allUser.@maxVertex": "9"},
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {"allUser.@maxVertex": "7"},
      "v_type": "person"
    }
  ]}]
}

AvgAccum

The AvgAccum type calculates and stores the cumulative mean of a series of numeric values. Internally, its state information includes the sum value of all inputs and a count of how many input values it has accumulated. The output is the mean value; the sum and the count values are not accessible to the user. The data type of a AvgAccum variable is not declared; all AvgAccum accumulators accept inputs of type INT, UINT, FLOAT, and DOUBLE. The output is always DOUBLE type.

The += arg operation updates the AvgAccum variable's state to be the mean of all the previous arguments along with the current argument; The = arg operation clears all the previously accumulated state and sets the new state to be arg with a count of one.

AvgAccum Example

# AvgAccum Example
CREATE QUERY avgAccumEx() FOR GRAPH minimalNet {

  AvgAccum @@averageAccum;

  @@averageAccum += 10;
  @@averageAccum += 5.5; # avg = (10+5.5) / 2.0
  @@averageAccum += -1;  # avg = (10+5.5-1) / 3.0

  PRINT @@averageAccum;  # 4.8333...

  @@averageAccum = 99;   # reset
  @@averageAccum += 101; # avg = (99 + 101) / 2

  PRINT @@averageAccum;  # 100
}

avgAccumEx.json Result

GSQL > RUN QUERY avgAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@averageAccum": 4.83333},
    {"@@averageAccum": 100}
  ]
}

AndAccum / OrAccum

The AndAccum and OrAccum types calculate and store the cumulative result of a series of boolean operations. The output of an AndAccum or an OrAccum is a single boolean value (True or False). AndAccum and OrAccum variables operate on boolean values only. The data type does not need to be declared.

For AndAccum, += arg updates the state to be the logical AND between the current boolean state and arg . OrAccum behaves the same, with the exception that it stores the result of a logical OR operation.

AndAccum and OrAccum Example

# AndAccum and OrAccum Example
CREATE QUERY andOrAccumEx() FOR GRAPH minimalNet {
  # T = True
  # F = False 

  AndAccum @@andAccumVar; # (default value = T)
  OrAccum  @@orAccumVar;  # (default value = F)

  @@andAccumVar += True;  # T and T = T
  @@andAccumVar += False; # T and F = F
  @@andAccumVar += True;  # F and T = F

  PRINT @@andAccumVar;  

  @@orAccumVar += False;  # F or F == F
  @@orAccumVar += True;   # F or T == T
  @@orAccumVar += False;  # T or F == T

  PRINT @@orAccumVar;
}

andOrAccumEx.json Result

GSQL > RUN QUERY andOrAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@andAccumVar": false},
    {"@@orAccumVar": true}
  ]
}

BitwiseAndAccum / BitwiseOrAccum

The BitwiseAndAccum and BitwiseOrAccum types calculate and store the cumulative result of a series of bitwise boolean operations and store the resulting bit sequences. BitwiseAndAccum and BitwiseOrAccum operator on INT only. The data type does not need to be declared.

Fundamental for understanding and using bitwise operations is the knowledge that integers are stored in base-2 representation as a 64-bit sequence of 1s and 0s. "Bitwise" means that each bit is treated as a separate boolean value, with 1 representing true and 0 representing false. Hence, an integer is equivalent to a sequence of boolean values. Computing the Bitwise AND of two numbers A and B means to compute the bit sequence C where the j th bit of C, denoted C j , is equal to (A j AND B j ).

For BitwiseAndAccum, += arg updates the accumulator's state to be the Bitwise AND of the current state and arg . BitwiseOrAccum behaves the same, with the exception that it computes a Bitwise OR.

Bitwise Operations and Negative Integers

Most computer systems represent negative integers using "2's complement" format, where the uppermost bit has special significance. Operations which affect the uppermost bit are crossing the boundary between positive and negative numbers, and vice versa.

BitwiseAndAccum and BitwiseOrAccum Example

# BitwiseAndAccum and BitwiseOrAccum Example
CREATE QUERY bitwiseAccumEx() FOR GRAPH minimalNet {

  BitwiseAndAccum @@bwAndAccumVar; # default value = 64-bits of 1 = -1 (INT)
  BitwiseOrAccum  @@bwOrAccumVar;  # default value = 64-bits of 0 = 0 (INT))

  # 11110000 = 240
  # 00001111 =  15
  # 10101010 = 170
  # 01010101 =  85

  # BitwiseAndAccum
  @@bwAndAccumVar += 170; # 11111111 & 10101010 -> 10101010 
  @@bwAndAccumVar +=  85; # 10101010 & 01010101 -> 00000000
  PRINT @@bwAndAccumVar;  # 0

  @@bwAndAccumVar = 15;   # reset to 00001111
  @@bwAndAccumVar += 85;  # 00001111 & 01010101 -> 00000101
  PRINT @@bwAndAccumVar;  # 5

  # BitwiseOrAccum
  @@bwOrAccumVar += 170; # 00000000 | 10101010 -> 10101010 
  @@bwOrAccumVar +=  85; # 10101010 | 01010101 -> 11111111 = 255
  PRINT @@bwOrAccumVar;  # 255

  @@bwOrAccumVar = 15;   # reset to 00001111
  @@bwOrAccumVar += 85;  # 00001111 | 01010101 -> 01011111 = 95
  PRINT @@bwOrAccumVar;  # 95
}

bitwiseAccumEx.json Result

GSQL > RUN QUERY bitwiseAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@bwAndAccumVar": 0},
    {"@@bwAndAccumVar": 5},
    {"@@bwOrAccumVar": 255},
    {"@@bwOrAccumVar": 95}
  ]
}

ListAccum

The ListAccum type maintains a sequential collection of elements. The output of a ListAccum is a list of values in the order the elements were added. The element type can be any base type, tuple, or STRING COMPRESS. Additionally, a ListAccum can contain a nested collection of type ListAccum. Nesting of ListAccums is limited to a depth of three.

The += arg operation appends arg to the end of the list. In this case, arg may be either a single element or another ListAccum.

ListAccum supports two additional operations:

@list1 + @list2 creates a new ListAccum, which contains the elements of @list1 followed by the elements of @list2. The two ListAccums must have identical data types.

Change in "+" definition

The pre-v2.0 definition of the ListAccum "+" operator ( @list + arg : Add arg to each member of @list) is no longer supported.

@list1 * @list2 (STRING data only) generates a new list of strings consisting of all permutations of an element of the first list followed by an element of the second list.

ListAccum also supports the following class functions.

Functions which modify the ListAccum (mutator functions) can be used only under the following conditions:

Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.

ListAccum Example

# ListAccum Example
CREATE QUERY listAccumEx() FOR GRAPH minimalNet {

  ListAccum<INT> @@intListAccum;
  ListAccum<STRING> @@stringListAccum;
  ListAccum<STRING> @@stringMultiplyListAccum;
  ListAccum<STRING> @@stringAdditionAccum;
  ListAccum<STRING> @@letterListAccum;
  ListAccum<ListAccum<STRING>> @@nestedListAccum;

  @@intListAccum = [1,3,5];
  @@intListAccum += [7,9];
  @@intListAccum += 11;
  @@intListAccum += 13;
  @@intListAccum += 15;

  PRINT @@intListAccum;
  PRINT @@intListAccum.get(0), @@intListAccum.get(1);
  PRINT @@intListAccum.get(8); # Out of bound: default value of int: 0

  #Other built-in functions
  PRINT @@intListAccum.size();
  PRINT @@intListAccum.contains(2);
  PRINT @@intListAccum.contains(3);

  @@stringListAccum += "Hello";
  @@stringListAccum += "World";

  PRINT @@stringListAccum; // ["Hello","World"]

  @@letterListAccum += "a";
  @@letterListAccum += "b";

  # ListA + ListB produces a new list equivalent to ListB appended to ListA.
  # Ex: [a,b,c] + [d,e,f] => [a,b,c,d,e,f]
  @@stringAdditionAccum = @@stringListAccum + @@letterListAccum;

  PRINT @@stringAdditionAccum;

  #Multiplication produces a list of all list-to-list element combinations (STRING TYPE ONLY)
  # Ex: [a,b] * [c,d] = [ac, ad, bc, bd]
  @@stringMultiplyListAccum = @@stringListAccum * @@letterListAccum;

  PRINT @@stringMultiplyListAccum;

  #Two dimensional list (3 dimensions is possible as well)
  @@nestedListAccum += [["foo", "bar"], ["Big", "Bang", "Theory"], ["String", "Theory"]];

  PRINT @@nestedListAccum;
  PRINT @@nestedListAccum.get(0);
  PRINT @@nestedListAccum.get(0).get(1);
}

listAccumEx.json Result

GSQL > RUN QUERY listAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [ {"@@intListAccum": [ 1, 3, 5, 7, 9, 11, 13, 15 ]},
    {
      "@@intListAccum.get(0)": 1,
      "@@intListAccum.get(1)": 3
    },
    {"@@intListAccum.get(8)": 0},
    {"@@intListAccum.size()": 8},
    {"@@intListAccum.contains(2)": false},
    {"@@intListAccum.contains(3)": true},
    {"@@stringListAccum": [ "Hello", "World" ]},
    {"@@stringAdditionAccum": [ "Hello", "World", "a", "b"]},
    {"@@stringMultiplyListAccum": [ "Helloa", "Worlda", "Hellob", "Worldb" ]},
    {"@@nestedListAccum": [
      [ "foo", "bar" ],
      [ "Big", "Bang", "Theory" ],
      [ "String", "Theory" ]
    ]},
    {"@@nestedListAccum.get(0)": [ "foo", "bar" ]},
    {"@@nestedListAccum.get(0).get(1)": "bar"}
  ]
}

Example for update function on a global ListAccum

CREATE QUERY listAccumUpdateEx() FOR GRAPH workNet {

  # Global ListAccum
  ListAccum<INT> @@intListAccum;
  ListAccum<STRING> @@stringListAccum;
  ListAccum<BOOL> @@passFail;

  @@intListAccum += [0,2,4,6,8];
  @@stringListAccum += ["apple","banana","carrot","daikon"];

  # Global update at Query-Body Level
  @@passFail += @@intListAccum.update(1,-99);
  @@passFail += @@intListAccum.update(@@intListAccum.size()-1,40);  // last element
  @@passFail += @@stringListAccum.update(0,"zero"); // first element
  @@passFail += @@stringListAccum.update(4,"four"); // FAIL: out-of-range

  PRINT @@intListAccum, @@stringListAccum, @@passFail;
}

Results in listAcccumUpdateEx.json

GSQL > RUN QUERY listAccumUpdateEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{
    "@@passFail": [ true, true, true, false ],
    "@@intListAccum": [ 0, -99, 4, 6, 40 ],
    "@@stringListAccum": [ "zero", "banana", "carrot", "daikon" ]
  }]
}

Example for update function on a vertex-attached ListAccum

CREATE QUERY listAccumUpdateEx2(SET<VERTEX<person>> seed) FOR GRAPH workNet api("v2") {

  # Each person has an LIST<INT> of skills and a LIST<STRING COMPRESS> of interests.
  # This function copies their lists into ListAccums, and then udpates the last
  # int with -99 and updates the last string with "fizz".
  ListAccum<INT> @intList;
  ListAccum<STRING COMPRESS> @stringList;
  ListAccum<STRING> @@intFails, @@strFails;

  S0 (person) = seed;
  S1 = SELECT s
    FROM S0:s
    ACCUM
      s.@intList = s.skillList,
      s.@stringList = s.interestList
    POST-ACCUM
      INT len = s.@intList.size(),
      IF NOT s.@intList.update(len-1,-99) THEN
        @@intFails += s.id END,
      INT len2 = s.@stringList.size(),
      IF NOT s.@stringList.update(len2-1,"fizz") THEN
        @@strFails += s.id END
  ;
  PRINT S1[S1.skillList, S1.interestList, S1.@intList, S1.@stringList]; // api v2
  PRINT @@intFails, @@strFails;
}

Results for listAccumUpdateEx2

GSQL > RUN QUERY listAccumUpdateEx2(["person1","person5"])
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"S1": [
      {
        "v_id": "person1",
        "attributes": {
          "S1.@stringList": [ "management","fizz" ],
          "S1.interestList": [ "management", "financial"],
          "S1.skillList": [  1, 2, 3 ],
          "S1.@intList": [ 1, 2, -99 ]
        },
        "v_type": "person"
      },
      {
        "v_id": "person5",
        "attributes": {
          "S1.@stringList": [ "sport", "financial", "fizz" ],
          "S1.interestList": [ "sport", "financial", "engineering" ],
          "S1.skillList": [ 8, 2, 5 ],
          "S1.@intList": [ 8, 2, -99 ]
        },
        "v_type": "person"
      }
    ]},
    {
      "@@strFails": [],
      "@@intFails": []
    }
  ]
}

SetAccum

The SetAccum type maintains a collection of unique elements. The output of a SetAccum is a list of elements in arbitrary order. A SetAccum instance can contain values of one type. The element type can be any base type, tuple, or STRING COMPRESS.

For SetAccum, the += arg operation adds a non-duplicate element or set of elements to the set. If an element is already represented in the set, then the SetAccum state does not change.

SetAccum also can be used with the three canonical set operators: UNION, INTERSECT, and MINUS (see Section "Set/Bag Expression and Operators" for more details).

SetAccum also supports the following class functions.

Functions which modify the SetAccum (mutator functions) can be used only under the following conditions:

Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.

SetAccum Example

# SetAccum Example
CREATE QUERY setAccumEx() FOR GRAPH minimalNet {
   
  SetAccum<INT> @@intSetAccum;
  SetAccum<STRING> @@stringSetAccum;
 
  @@intSetAccum += 5;
  @@intSetAccum.clear();

  @@intSetAccum += 4;
  @@intSetAccum += 11;
  @@intSetAccum += 1;
  @@intSetAccum += 11; # Sets do not store duplicates

  @@intSetAccum += (1,2,3,4); # Can create simple sets this way
  PRINT @@intSetAccum;
  @@intSetAccum.remove(2);
  PRINT @@intSetAccum AS RemovedVal2; # Demostrate remove.

  PRINT @@intSetAccum.contains(3);
   
  @@stringSetAccum += "Hello";
  @@stringSetAccum += "Hello";
  @@stringSetAccum += "There";
  @@stringSetAccum += "World";
  PRINT @@stringSetAccum;

  PRINT @@stringSetAccum.contains("Hello");
  PRINT @@stringSetAccum.size();
}

setAccumEx.json Result

GSQL > RUN QUERY setAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [ {"@@intSetAccum": [ 3, 2, 1, 11, 4 ]},
    {"@@intSetAccum.contains(3)": true},
    {"@@stringSetAccum": [ "World", "There", "Hello" ]},
    {"@@stringSetAccum.contains(Hello)": true},
    {"@@stringSetAccum.size()": 3}
  ]
}

BagAccum

The BagAccum type maintains a collection of elements with duplicated elements allowed. The output of a BagAccum is a list of elements in arbitrary order. A BagAccum instance can contain values of one type. The element type can be any base type, tuple, or STRING COMPRESS.

For BagAccum, the += arg operation adds an element or bag of elements to the bag.

BagAccum also supports the + operator:

@bag1 + @bag2 creates a new BagAccum, which contains the elements of @bag1 and the elements of @bag2. The two BagAccums must have identical data types.

BagAccum also supports the following class functions.

Functions which modify the BagAccum (mutator functions) can be used only under the following conditions:

Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.

BagAccum Example

# BagAccum Example
CREATE QUERY bagAccumEx() FOR GRAPH minimalNet {
 
  #Unordered collection
  BagAccum<INT>    @@intBagAccum;
  BagAccum<STRING> @@stringBagAccum;
 
  @@intBagAccum += 5;
  @@intBagAccum.clear();

  @@intBagAccum += 4;
  @@intBagAccum += 11;
  @@intBagAccum += 1;
  @@intBagAccum += 11;        #Bag accums can store duplicates
  @@intBagAccum += (1,2,3,4);
  PRINT @@intBagAccum;

  PRINT @@intBagAccum.size();
  PRINT @@intBagAccum.contains(4);
 
  @@stringBagAccum += "Hello";
  @@stringBagAccum += "Hello";
  @@stringBagAccum += "There";
  @@stringBagAccum += "World";
  PRINT @@stringBagAccum.contains("Hello");
  @@stringBagAccum.remove("Hello");    #Remove one matching element
  @@stringBagAccum.removeAll("There"); #Remove all matching elements
  PRINT @@stringBagAccum;
}

bagAccumEx.json Result

GSQL > RUN QUERY bagAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [ {"@@intBagAccum": [ 2, 3, 1, 1, 11, 11, 4, 4 ]},
    {"@@intBagAccum.size()": 8},
    {"@@intBagAccum.contains(4)": true},
    {"@@stringBagAccum.contains(Hello)": true},
    {"@@stringBagAccum": [ "World", "Hello" ]}
  ]
}

MapAccum

The MapAccum type maintains a collection of (key -> value) pairs. The output of a MapAccum is a set of key and value pairs in which the keys are unique.

The key type of a MapAccum can be all base types or tuples. If the key type is VERTEX, then only the vertex's id is stored and displayed.

The value type of a MapAccum can be all base types, tuples, or any type of accumulator, except for HeapAccum.

For MapAccum, the += (key->val) operation adds a key-value element to the collection if key is not yet used in the MapAccum. If the MapAccum already contains key , then val is accumulated to the current value, where the accumulation operation depends on the data type of val . (Strings would get concatenated, lists would be appended, numerical values would be added, etc.)

MapAccum also supports the + operator:

@map1 + @map2 creates a new MapAccum, which contains the (key,value) pairs of @map2 added to the (key,value) pairs of @map1. The two MapAccums must have identical data types.

MapAccum also supports the following class functions.

Functions which modify the MapAccum (mutator functions) can be used only under the following conditions:

Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.

MapAccum Example

#MapAccum Example
CREATE QUERY mapAccumEx() FOR GRAPH minimalNet {
 
  #Map(Key, Value)
  # Keys can be INT or STRING only
  MapAccum<STRING, INT> @@intMapAccum;
  MapAccum<INT, STRING> @@stringMapAccum;
  MapAccum<INT, MapAccum<STRING, STRING>> @@nestedMapAccum;
 
  @@intMapAccum += ("foo" -> 1);
  @@intMapAccum.clear();

  @@intMapAccum += ("foo" -> 3);
  @@intMapAccum += ("bar" -> 2);
  @@intMapAccum += ("baz" -> 2);
  @@intMapAccum += ("baz" -> 1); #add 1 to existing value  

  PRINT @@intMapAccum.containsKey("baz");
  PRINT @@intMapAccum.get("bar");
  PRINT @@intMapAccum.get("root");
 
  @@stringMapAccum += (1 -> "apple");
  @@stringMapAccum += (2 -> "pear");
  @@stringMapAccum += (3 -> "banana");
  @@stringMapAccum += (4 -> "a");
  @@stringMapAccum += (4 -> "b"); #append "b" to existing value
  @@stringMapAccum += (4 -> "c"); #append "c" to existing value

  PRINT @@intMapAccum;
  PRINT @@stringMapAccum;
 
  #Checking and getting keys
  if @@stringMapAccum.containsKey(1) THEN
    PRINT @@stringMapAccum.get(1);
  END;
 
  #Map nesting
  @@nestedMapAccum += ( 1 -> ("foo"  -> "bar") );
  @@nestedMapAccum += ( 1 -> ("flip" -> "top") );
  @@nestedMapAccum += ( 2 -> ("fizz" -> "pop") );
  @@nestedMapAccum += ( 1 -> ("foo"  -> "s") );
 
  PRINT @@nestedMapAccum;
 
  if @@nestedMapAccum.containsKey(1) THEN
    if @@nestedMapAccum.get(1).containsKey("foo") THEN
       PRINT @@nestedMapAccum.get(1).get("foo");
    END;
  END;
}

mapAccumEx.json Result

GSQL > RUN QUERY mapAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@intMapAccum.containsKey(baz)": true},
    {"@@intMapAccum.get(bar)": 2},
    {"@@intMapAccum.get(root)": 0},
    {"@@intMapAccum": {
      "bar": 2,
      "foo": 3,
      "baz": 3
    }},
    {"@@stringMapAccum": {
      "1": "apple",
      "2": "pear",
      "3": "banana",
      "4": "abc"
    }},
    {"@@stringMapAccum.get(1)": "apple"},
    {"@@nestedMapAccum": {
      "1": {
        "foo": "bars",
        "flip": "top"
      },
      "2": {"fizz": "pop"}
    }},
    {"@@nestedMapAccum.get(1).get(foo)": "bars"}
  ]
}

ArrayAccum

The ArrayAccum type maintains an array of accumulators. An array is a fixed-length sequence of elements, with direct access to elements by position. The ArrayAccum has these particular characteristics:

The elements are accumulators, not primitive or base data types. All accumulators, except HeapAccum, MapAccum, and GroupByAccum, can be used.
An ArrayAccum instance can be multidimensional. There is no limit to the number of dimensions.
The size can be set at run-time (dynamically).
There are operators which update the entire array efficiently.

When an ArrayAccum is declared, the instance name should be followed by a pair of brackets for each dimension. The brackets may either contain an integer constant to set the size of the array, or they may be empty. In that case, the size must be set with the reallocate function before the ArrayAccum can be used.

ArrayAccum declaration example

ArrayAccum<SetAccum<STRING>> @@names[10];
ArrayAccum<SetAccum<INT>> @@ids[][];  // 2-dimensional, size to be determined

Because each element of an ArrayAccum itself is an accumulator, the operators =, +=, and + can be used in two contexts: accumulator-level and element-level.

Element-level operations

If @A is an ArrayAccum of length 6, then @A[0] and @A[5] refer to its first and last elements, respectively. Referring to an ArrayAccum element is like referring to an accumulator of that type. For example, given the following definitions:

ArrayAccum<SumAccum<INT>> @@Sums[3];
ArrayAccum<ListAccum<STRING>> @@Lists[2];

then @@Sums[0], @@Sums[1], and @@Sums[2] each refer to an individual SumAccum<INT>, and @@Lists[0] and @@Lists[1] each refer to a ListAccum<STRING>, supporting all the operations for those accumulator and data types.

@@Sums[1] = 1;
@@Sums[1] += 2;  // value is now 3
@@Lists[0] = "cat";
@@Lists[0] += "egory";  // value is now "category"

Accumulator-level operations

The operators =, +=, and + have special meanings when applied to an ArrayAccum as a whole. There operations efficiently update an entire ArrayAccum. All of the ArrayAccums must have the same element type.

ArrayAccum also supports the following class functions.

Functions which modify the ArrayAccum (mutator functions) can be used only under the following conditions:

Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.

Example of ArrayAccum Element-level Operations

CREATE QUERY ArrayAccumElem() FOR GRAPH minimalNet {
  	
	ArrayAccum<SumAccum<DOUBLE>> @@aaSumD[2][2];  # 2D Sum Double
	ArrayAccum<SumAccum<STRING>> @@aaSumS[2][2];  # 2D Sum String
	ArrayAccum<MaxAccum<INT>> @@aaMax[2];
	ArrayAccum<MinAccum<UINT>> @@aaMin[2];
	ArrayAccum<AvgAccum> @@aaAvg[2];
	ArrayAccum<AndAccum<BOOL>> @@aaAnd[2];
	ArrayAccum<OrAccum<BOOL>> @@aaOr[2];
	ArrayAccum<BitwiseAndAccum> @@aaBitAnd[2];
	ArrayAccum<BitwiseOrAccum> @@aaBitOr[2];
	ArrayAccum<ListAccum<INT>> @@aaList[2][2];    # 2D List
	ArrayAccum<SetAccum<FLOAT>> @@aaSetF[2];
	ArrayAccum<BagAccum<DATETIME>> @@aaBagT[2];
	
	## for test data	
	ListAccum<STRING> @@words;
	BOOL toggle = false;
	@@words += "1st"; @@words += "2nd"; @@words += "3rd"; @@words += "4th";

	# Int:  a[0] += 1, 2;   a[1] += 3, 4
	# Bool: alternate true/false
	# Float: a[0] += 1.111, 2.222;  a[1] += 3.333, 4.444
	# 2D Doub: a[0][0] += 1.111, 2.222;   a[0][1] += 5.555, 6.666;
	#          a[1][0] += 3.333, 4.444;   a[0][1] += 7.777, 8.888;
	
	FOREACH i IN RANGE [0,1] DO
		FOREACH n IN RANGE [1, 2] DO
			toggle = NOT toggle;
			@@aaMax[i] += i*2 + n;
			@@aaMin[i] += i*2 + n;
			@@aaAvg[i] += i*2 + n;
			@@aaAnd[i] += toggle;
			@@aaOr[i] += toggle;
			@@aaBitAnd[i] += i*2 + n;
			@@aaBitOr[i] += i*2 + n;
			@@aaSetF[i] += (i*2 + n)/0.9;
			@@aaBagT[i] += epoch_to_datetime(i*2 + n);

			FOREACH j IN RANGE [0,1] DO
				@@aaSumD[i][j] += (j*4 + i*2 + n)/0.9;
				@@aaSumS[i][j] += @@words.get((j*2 + i + n)%4);
				@@aaList[i][j] += j*4 +i*2 + n ;
			END;
		END;
	END;
				
	PRINT @@aaSumD;		PRINT @@aaSumS;
	PRINT @@aaMax;		PRINT @@aaMin;		PRINT @@aaAvg;
	PRINT @@aaAnd;		PRINT @@aaOr;
	PRINT @@aaBitAnd;	PRINT @@aaBitOr;
	PRINT @@aaList;		PRINT @@aaSetF;		PRINT @@aaBagT;
}

ArrayAccumElem.json Results

GSQL > RUN QUERY ArrayAccumElem()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@aaSumD": [
      [ 3.33333, 12.22222 ],
      [ 7.77778, 16.66667 ]
    ]},
    {"@@aaSumS": [
      [ "2nd3rd", "4th1st" ],
      [ "3rd4th", "1st2nd" ]
    ]},
    {"@@aaMax": [ 2, 4 ]},
    {"@@aaMin": [ 1, 3 ]},
    {"@@aaAvg": [ 1.5, 3.5 ]},
    {"@@aaAnd": [ false, false ]},
    {"@@aaOr": [ true, true ]},
    {"@@aaBitAnd": [ 0, 0 ]},
    {"@@aaBitOr": [ 3, 7]},
    {"@@aaList": [
      [
        [ 1, 2 ],
        [ 5, 6]
      ],
      [
        [ 3, 4 ],
        [ 7, 8 ]
      ]
    ]},
    {"@@aaSetF": [
      [ 2.22222, 1.11111],
      [ 4.44444, 3.33333 ]
    ]},
    {"@@aaBagT": [
      [ 2, 1 ],
      [ 4, 3 ]
    ]}
  ]
}

Example of Operations between Whole ArrayAccums

CREATE QUERY ArrayAccumOp3(INT lenA) FOR GRAPH minimalNet {

	ArrayAccum<SumAccum<INT>> @@arrayA[5]; // Original size
	ArrayAccum<SumAccum<INT>> @@arrayB[2];
	ArrayAccum<SumAccum<INT>> @@arrayC[][]; // No size
	STRING msg;
	@@arrayA.reallocate(lenA);  # Set/Change size dynamically
	@@arrayB.reallocate(lenA+1);
	@@arrayC.reallocate(lenA, lenA+1);

	// Initialize arrays
	FOREACH i IN RANGE[0,lenA-1] DO
		@@arrayA[i] += i*i;
		FOREACH j IN RANGE[0,lenA] DO
			@@arrayC[i][j] += j*10 + i;
		END;
	END;
	FOREACH i IN RANGE[0,lenA] DO
		@@arrayB[i] += 100-i;
	END;
	msg = "Initial Values";
	PRINT msg, @@arrayA, @@arrayB, @@arrayC;

    msg = "Test 1: A = C, C = B";	// = operator
    @@arrayA = @@arrayC;		// change dimensions: 1D <- 2D
    @@arrayC = @@arrayB;		// change dimensions: 2D <- 1D
    PRINT msg, @@arrayA, @@arrayC;

    msg = "Test 2: B += C"; 		// += operator
    @@arrayB += @@arrayC; 		// B and C must have same size & dim
    PRINT msg, @@arrayB, @@arrayC;

    msg = "Test 3: A = B + C"; 		// + operator
    @@arrayA = @@arrayB + @@arrayC; // B & C must have same size & dim
    PRINT msg, @@arrayA; 			// A changes size & dim
}

ArrayAccumOp3.json Results

GSQL > RUN QUERY ArrayAccumOp3(3)
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "msg": "Initial Values",
      "@@arrayC": [
		[ 0, 10, 20, 30 ],
        [ 1, 11, 21, 31 ],
        [ 2, 12, 22, 32 ]
      ],
      "@@arrayB": [ 100, 99, 98, 97 ],
      "@@arrayA": [ 0, 1, 4 ]
    },
    {
      "msg": "Test 1: A = C, C = B",
      "@@arrayC": [ 100, 99, 98, 97 ],
      "@@arrayA": [
		[ 0, 10, 20, 30 ],
        [ 1, 11, 21, 31 ],
        [ 2, 12, 22, 32 ]
      ]
    },
    {
      "msg": "Test 2: B += C",
      "@@arrayC": [ 100, 99, 98, 97 ],
      "@@arrayB": [ 200, 198,196, 194 ]
    },
    {
      "msg": "Test 3: A = B + C",
      "@@arrayA": [ 300, 297, 294, 291 ]
    }
  ]
}

Example for Vertex-Attached ArrayAccum

CREATE QUERY arrayAccumLocal() FOR GRAPH socialNet api("v2") {
	# Count each person's edges by type
	# friend/liked/posted edges are type 0/1/2, respectively
	ArrayAccum<SumAccum<INT>> @edgesByType[3];
	Persons = {person.*};
	
	Persons = SELECT s
		FROM Persons:s -(:e)-> :t
		ACCUM CASE e.type
			WHEN "friend" THEN s.@edgesByType[0] += 1
			WHEN "liked"  THEN s.@edgesByType[1] += 1
			WHEN "posted" THEN s.@edgesByType[2] += 1
			END
		ORDER BY s.id;
		
	#PRINT Persons.@edgesByType; // api v1
    PRINT Persons[Persons.@edgesByType]; // api v2
}

Results for Query ArrayAccumLocal

GSQL > RUN QUERY arrayAccumLocal()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [{"Persons": [
    {
      "v_id": "person1",
      "attributes": {"Persons.@edgesByType": [ 2, 1, 1 ]},
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {"Persons.@edgesByType": [ 2, 2, 1 ]},
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {"Persons.@edgesByType": [ 2, 1, 1 ]},
      "v_type": "person"
    },
    {
      "v_id": "person4",
      "attributes": {"Persons.@edgesByType": [ 3, 1, 1 ]},
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {"Persons.@edgesByType": [ 2, 1, 2 ]},
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {"Persons.@edgesByType": [ 2, 1, 2 ]},
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {"Persons.@edgesByType": [ 2, 1, 2 ]},
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {"Persons.@edgesByType": [ 3, 1, 2 ]},
      "v_type": "person"
    }
  ]}]
}

HeapAccum

The HeapAccum type maintains a sorted collection of tuples and enforces a maximum number of tuples in the collection. The output of a HeapAccum is a sorted collection of tuple elements. The += arg operation adds a tuple to the collection in sorted order. If the HeapAccum is already at maximum capacity when the += operator is applied, then the tuple which is last in the sorted order is dropped from the HeapAccum. Sorting of tuples is performed on one or more defined tuple fields ordered either ascending or descending. Sorting precedence is performed based on defined tuple fields from left to right.

The declaration of a HeapAccum is more complex than for most other accumulators, because the user must define a custom tuple type, set the maximum capacity of the HeapAccum, and specify how the HeapAccum should be sorted. The declaration syntax is outlined in the figure below:

HeapAccum declaration syntax

TYPEDEF TUPLE<type field_1,.., type field_n> tupleName;
...
HeapAccum<tupleName>(capacity, field_a [ASC|DESC],... , field_z [ASC|DESC]);

First, the HeapAccum declaration must be preceded by a TYPEDEF statement which defines the tuple type. At least one of the fields (field_1, ..., field_n) must be of a data type that can be sorted.

In the declaration of the HeapAccum itself, the keyword "HeapAccum" is followed by the tuple type in angle brackets < >. This is followed by a parenthesized list of two or more parameters. The first parameter is the maximum number of tuples that the HeapAccum may store. This parameter must be a positive integer. The subsequent parameters are a subset of the tuple's field, which are used as sort keys. The sort key hierarchy is from left to right, with the leftmost key being the primary sort key. The keywords ASC and DESC indicate Ascending (lowest value first) or Descending (highest value first) sort order. Ascending order is the default.

HeapAccum also supports the following class functions.

Functions which modify the HeapAccum (mutator functions) can be used only under the following conditions:

Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.

HeapAccum Example

#HeapAccum Example
CREATE QUERY heapAccumEx() FOR GRAPH minimalNet {
  TYPEDEF tuple<STRING firstName, STRING lastName, INT score> testResults;
  
  #Heap with max size of 4 sorted decending by score then ascending last name
  HeapAccum<testResults>(4, score DESC, lastName ASC) @@topTestResults;

  PRINT @@topTestResults.top();
  
  @@topTestResults += testResults("Bruce", "Wayne", 80);
  @@topTestResults += testResults("Peter", "Parker", 80);  
  @@topTestResults += testResults("Tony", "Stark", 100);  
  @@topTestResults += testResults("Bruce", "Banner", 95);  
  @@topTestResults += testResults("Jean", "Summers", 95);
  @@topTestResults += testResults("Clark", "Kent", 80);
  
  #Show element with the highest sorted position
  PRINT @@topTestResults.top();
  PRINT @@topTestResults.top().firstName, @@topTestResults.top().lastName, @@topTestResults.top().score;

  PRINT @@topTestResults;

  #Increase the size of the heap to add more elements
  @@topTestResults.resize(5);

  #Find the size of the current heap
  PRINT @@topTestResults.size();

  @@topTestResults += testResults("Bruce", "Wayne", 80);
  @@topTestResults += testResults("Peter", "Parker", 80);  
  
  PRINT @@topTestResults;  

  #Resizing smaller WILL REMOVE excess elements from the HeapAccum
  @@topTestResults.resize(3);
  PRINT @@topTestResults;

  #Increasing capacity will not restore dropped elements
  @@topTestResults.resize(5);
  PRINT @@topTestResults;

  #Removes all elements from the HeapAccum
  @@topTestResults.clear(); 
  PRINT @@topTestResults.size();
}

heapAccumEx.json Results

GSQL > RUN QUERY heapAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@topTestResults.top()": {
      "firstName": "",
      "lastName": "",
      "score": 0
    }},
    {"@@topTestResults.top()": {
      "firstName": "Tony",
      "lastName": "Stark",
      "score": 100
    }},
    {
      "@@topTestResults.top().firstName": "Tony",
      "@@topTestResults.top().lastName": "Stark",
      "@@topTestResults.top().score": 100
    },
    {"@@topTestResults": [
      {
        "firstName": "Tony",
        "lastName": "Stark",
        "score": 100
      },
      {
        "firstName": "Bruce",
        "lastName": "Banner",
        "score": 95
      },
      {
        "firstName": "Jean",
        "lastName": "Summers",
        "score": 95
      },
      {
        "firstName": "Clark",
        "lastName": "Kent",
        "score": 80
      }
    ]},
    {"@@topTestResults.size()": 4},
    {"@@topTestResults": [
      {
        "firstName": "Tony",
        "lastName": "Stark",
        "score": 100
      },
      {
        "firstName": "Bruce",
        "lastName": "Banner",
        "score": 95
      },
      {
        "firstName": "Jean",
        "lastName": "Summers",
        "score": 95
      },
      {
        "firstName": "Clark",
        "lastName": "Kent",
        "score": 80
      },
      {
        "firstName": "Peter",
        "lastName": "Parker",
        "score": 80
      }
    ]},
    {"@@topTestResults": [
      {
        "firstName": "Tony",
        "lastName": "Stark",
        "score": 100
      },
      {
        "firstName": "Bruce",
        "lastName": "Banner",
        "score": 95
      },
      {
        "firstName": "Jean",
        "lastName": "Summers",
        "score": 95
      }
    ]},
    {"@@topTestResults": [
      {
        "firstName": "Tony",
        "lastName": "Stark",
        "score": 100
      },
      {
        "firstName": "Bruce",
        "lastName": "Banner",
        "score": 95
      },
      {
        "firstName": "Jean",
        "lastName": "Summers",
        "score": 95
      }
    ]},
    {"@@topTestResults.size()": 0}
  ]
}

GroupByAccum

The GroupByAccum is compound accumulator, an accumulator of accumulators. At the top level, it is a MapAccum where both the key and the value can have multiple fields. Moreover, each of the value fields is an accumulator type.

GroupByAccum syntax

GroupByAccum<type [, type]* , accumType [, accumType]* >

In the EBNF above, the type terms form the key set, and the accumType terms form the map's value. Since they are accumulators, they perform a grouping. Like a MapAccum, if we try to store a (key->value) whose key has already been used, then the new value will accumulate to the data which is already stored. In this case, each field of the multiple-field value has its own accumulation function. One way to think about GroupByAccum is that each unique key is a group ID.

In GroupByAccum, the key types can be base type, tuple, or STRING COMPRESS. The accumulators are used for aggregating group values. Each accumulator type can be any type except HeapAccum. Each base type and each accumulator type must be followed an alias. Below is an example declaration.

GroupByAccum<INT a, STRING b, MaxAccum<INT> maxa, ListAccum<ListAccum<INT>> lists> @@group;

To add new data to this GroupByAccum, the data should be formatted as (key1, key2 -> value1, value2) .

GroupByAccum also supports the following class functions.

Functions which modify the GroupByAccum (mutator functions) can be used only under the following conditions:

Mutator functions of global accumulators may only be used at the query-body level.
Mutator functions of vertex-attached accumulators may only be used in a POST-ACCUM clause.

GroupByAccum Example

#GroupByAccum Example
CREATE QUERY groupByAccumEx () FOR GRAPH socialNet {
   
  ## declaration, first two primitive type are group by keys; the rest accumulator type are aggregates
  GroupByAccum<INT a, STRING b, MaxAccum<INT> maxa, ListAccum<ListAccum<INT>> lists> @@group;
  GroupByAccum<STRING gender, MapAccum<VERTEX<person>, DATETIME> m> @@group2;
  # nested GroupByAccum
  GroupByAccum<INT a, MaxAccum<INT> maxa, GroupByAccum<INT a, MaxAccum<INT> maxa> heap> @@group3;

  Start = { person.* };  

  ## usage of global GroupByAccum
  @@group += (1, "a" -> 1, [1]);
  @@group += (1, "a" -> 2, [2]);
  @@group += (2, "b" -> 1, [4]);

  @@group3 += (2 -> 1, (2 -> 0) );
  @@group3 += (2 -> 1, (2 -> 5) );
  @@group3 += (2 -> 5, (3 -> 3) );
  PRINT @@group, @@group.get(1, "a"), @@group.get(1, "a").lists,  @@group.containsKey(1, "c"), @@group3;

  ## two kinds of foreach
  FOREACH g IN @@group DO
    PRINT g.a, g.b, g.maxa, g.lists;
  END;
  FOREACH (g1,g2,g3,g4) IN @@group DO
    PRINT g1,g2,g3,g4;
  END;

  S = SELECT v 
      FROM Start:v - (liked:e) - post:t
      ACCUM @@group2 += (v.gender -> (v -> e.actionTime));

  PRINT @@group2, @@group2.get("Male").m, @@group2.get("Female").m; 
}

Result for Query groupByAccum

GSQL > RUN QUERY groupByAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "@@group.get(1,a).lists": [
        [1],
        [2]
      ],
      "@@group3": [{
        "a": 2,
        "heap": [
          {
            "a": 3,
            "maxa": 3
          },
          {
            "a": 2,
            "maxa": 5
          }
        ],
        "maxa": 5
      }],
      "@@group.containsKey(1,c)": false,
      "@@group.get(1,a)": {
        "lists": [
          [1],
          [2]
        ],
        "maxa": 2
      },
      "@@group": [
        {
          "a": 2,
          "b": "b",
          "lists": [[4]],
          "maxa": 1
        },
        {
          "a": 1,
          "b": "a",
          "lists": [
            [1],
            [2]
          ],
          "maxa": 2
        }
      ]
    },
    {
      "g.b": "b",
      "g.maxa": 1,
      "g.lists": [[4]],
      "g.a": 2
    },
    {
      "g.b": "a",
      "g.maxa": 2,
      "g.lists": [
        [1],
        [2]
      ],
      "g.a": 1
    },
    {
      "g1": 2,
      "g2": "b",
      "g3": 1,
      "g4": [[4]]
    },
    {
      "g1": 1,
      "g2": "a",
      "g3": 2,
      "g4": [
        [1],
        [2]
      ]
    },
    {
      "@@group2.get(Male).m": {
        "person3": 1263618953,
        "person1": 1263209520,
        "person8": 1263180365,
        "person7": 1263295325,
        "person6": 1263468185
      },
      "@@group2": [
        {
          "gender": "Male",
          "m": {
            "person3": 1263618953,
            "person1": 1263209520,
            "person8": 1263180365,
            "person7": 1263295325,
            "person6": 1263468185
          }
        },
        {
          "gender": "Female",
          "m": {
            "person4": 1263352565,
            "person2": 2526519281,
            "person5": 1263330725
          }
        }
      ],
      "@@group2.get(Female).m": {
        "person4": 1263352565,
        "person2": 2526519281,
        "person5": 1263330725
      }
    }
  ]
}

Nested Accumulators

Certain collection accumulators may be nested. That is, an accumulator may contain a collection of elements where the elements themselves are accumulators. For example:

ListAccum<ListAccum<INT>> @@matrix; # a 2-dimensional jagged array of integers.  Each inner list has its own unique size.

Only ListAccum, ArrayAccum, MapAccum, and GroupByAccum can contain other accumulators. However, not all combinations of collection accumulators are allowed. The following constraints apply:

ListAccum: ListAccum is the only accumulator type which can be nested within ListAccum, up to a depth of 3:

ListAccum<ListAccum<INT>>
ListAccum<ListAccum<ListAccum<INT>>>
ListAccum<SetAccum<INT>> # illegal

2. MapAccum: All accumulator types, except for HeapAccum, can be nested within MapAccum as the value type. For example,

MapAccum<STRING, ListAccum<INT>>
MapAccum<INT, MapAccum<INT, STRING>>
MapAccum<VERTEX, SumAccum<INT>>
MapAccum<STRING, SetAccum<VERTEX>>
MapAccum<STRING, GroupByAccum<VERTEX a, MaxAccum<INT> maxs>>
MapAccum<SetAccum<INT>, INT> # illegal

3. GroupByAccum: All accumulator types, except for HeapAccum, can be nested within GroupByAccum as the accumulator type. For example:

GroupByAccum<INT a, STRING b, MaxAccum<INT> maxs, ListAccum<ListAccum<INT>> lists>

4. ArrayAccum: Unlike the other accumulators in this list, where nesting is optional, nesting is mandatory for ArrayAccum. See the ArrayAccum section above.

It is legal to define nested ListAccums to form a multi-dimensional array. Note the declaration statements and the nested [ bracket ] notation in the example below:

CREATE QUERY nestedAccumEx() FOR GRAPH minimalNet {
  ListAccum<ListAccum<INT>> @@_2d_list;
  ListAccum<ListAccum<ListAccum<INT>>> @@_3d_list;
  ListAccum<INT> @@_1d_list;
  SumAccum <INT> @@sum = 4;
  
  @@_1d_list += 1;
  @@_1d_list += 2;
  // add 1D-list to 2D-list as element
  @@_2d_list += @@_1d_list;
  
  // add 1D-enum-list to 2D-list as element
  @@_2d_list += [@@sum, 5, 6];
  // combine 2D-enum-list and 2d-list
  @@_2d_list += [[7, 8, 9], [10, 11], [12]];
  
  // add an empty 1D-list
  @@_1d_list.clear();
  @@_2d_list += @@_1d_list;
  
  // combine two 2D-list
  @@_2d_list += @@_2d_list;
  
  PRINT @@_2d_list;
  
  // test 3D-list
  @@_3d_list += @@_2d_list;
  @@_3d_list += [[7, 8, 9], [10, 11], [12]];
  PRINT @@_3d_list;
}

nestedAccumEx.json Results

GSQL > RUN QUERY nestedAccumEx()
{
  "error": false,
  "message": "",
  "version": {
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@_2d_list": [
      [1,2],
      [4,5,6],
      [7,8,9],
      [10,11],
      [12],
      [],
      [1,2],
      [4,5,6],
      [7,8,9],
      [10,11],
      [12],
      []
    ]},
    {"@@_3d_list": [
      [
        [1,2],
        [4,5,6],
        [7,8,9],
        [10,11],
        [12],
        [],
        [1,2],
        [4,5,6],
        [7,8,9],
        [10,11],
        [12],
        []
      ],
      [
        [7,8,9],
        [10,11],
        [12]
      ]
    ]}
  ]
}