SELECT Statement

GSQL Syntax Versions

Starting with TigerGraph 2.6, there are now two versions of the GSQL Query Language. Both are available in the product. The original version is called GSQL Syntax V1. GSQL Syntax V2 was introduced to provide more flexible and powerful Pattern matching. A query indicates which grammar to use in the SYNTAX clause of the CREATE QUERY header. V1 is the default.

As new features are being added to GSQL, some are available only in V2. In the future, we plan to transition so that Syntax V2 will be the default.

The key differences between the two grammars:

SQL-like Tabular SELECT is only in Syntax V2 (since TigerGraph 3.1)
Multi-hop Pattern Matching is only in Syntax V2 (since TigerGraph 2.6)
- In V2, each SELECT statement can traverse a multi-hop path.
  - The traversal direction is under the control of the query writer, with arrowheads on each edge set to show the direction.
  - There is no arrowhead outside the parentheses:
    
    Start:s -( (ForwardEdge> | <BackwardEdge):e )- Target:t
  - Users can write paths that explicitly show multiple hops, and they can use a Kleene star (*) to indicate repetition. Start:s -(Edge1>:e1)- Middle:m -(<Edge2:e2)- Target:t or Start:s -(Edge*1..3)- Target:t
- In V1 (default), each SELECT statement can traverse one hop (from a set of vertices to their adjacent vertices).
  - To write a multi-hop query, you write a sequence of SELECT statements.
  - The traversal action is from left to right, and the notation uses "ASCII art" to depict a connection, either with a rightward facing arrowhead or no arrowhead: Start:s -( Edges:e )-> Target:t or Start:s -( Edges:e )- Target:t

This section discusses the SELECT statement in-depth and covers the following EBNF syntax. Note that there are now two forms for a SELECT statement, the classic GSQL form gsqlSelectBlock, and the new SQL-like form sqlSelectBlock. Since GSQL has always been similar to SQL in many ways, there are only a few differences to know about.

EBNF for Select Statement

selectStmt  := gsqlSelectBlock
             | sqlSelectBlock

gsqlSelectBlock := gsqlSelectClause
               fromClause
               [sampleClause]
               [whereClause]
               [accumClause]
               [postAccumClause]*
               [havingClause]
               [orderClause]
               [limitClause]

sqlSelectBlock := sqlSelectClause
               fromClause
               [whereClause]
               [groupByClause]
               [havingClause]
               [orderClause]
               [limitClause]

gsqlSelectClause := ertexSetName "=" SELECT vertexAlias
sqlSelectClause := SELECT [DISTINCT] columnExpr ("," columnExpr)*
                   INTO tableName
columnExpr := expr [AS columnName]
            | aggregator "("[DISTINCT] expr ")" [AS columnName]
columnName := name
tableName := name

fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)

step   :=  stepSourceSet ["-" "(" stepEdgeSet ")" ("-"|"->") stepVertexSet]
stepV2 :=  stepVertexSet ["-" "(" stepEdgeSet ")" "-" stepVertexSet]

stepSourceSet := vertexSetName [":" vertexAlias]
stepEdgeSet := [stepEdgeTypes] [":" edgeAlias]
stepVertexSet := [stepVertexTypes] [":" vertexAlias]
alias := (vertexAlias | edgeAlias)
vertexAlias := name
edgeAlias := name

stepEdgeTypes := atomicEdgeType | "(" edgeSetType ["|" edgeSetType]* ")"
atomicEdgeType := "_" | ANY | edgeSetType
edgeSetType := edgeType | paramName | globalAccumName

stepVertexTypes := atomicVertexType | "(" vertexSetType ["|" vertexSetType]* ")"
atomicVertexType := "_" | ANY | vertexSetType
vertexSetType := vertexType | paramName | globalAccumName

#----------# Pattern Matching #----------#
pathPattern := stepVertexSet ["-" "(" pathEdgePattern ")" "-" stepVertexSet]*

pathEdgePattern := atomicEdgePattern
                 | "(" pathEdgePattern ")"
                 | pathEdgePattern "." pathEdgePattern
                 | disjPattern
                 | starPattern

atomicEdgePattern  := atomicEdgeType
        	        | atomicEdgeType ">"
        	        | "<" atomicEdgeType

disjPattern := atomicEdgePattern ("|" atomicEdgePattern)*

starPattern := ([atomicEdgePattern] | "(" disjPattern ")") "*" [starBounds]

starBounds := CONST_INT ".." CONST_INT
            | CONST_INT ".."
            | ".." CONST_INT
            | CONST_INT
#----------------------------------------#
sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition
              | SAMPLE expr TARGET WHEN condition
              | SAMPLE expr "%" TARGET PINNED WHEN condition

whereClause := WHERE condition

accumClause := [perClauseV2] ACCUM dmlSubStmtList

perClauseV2 := PER "(" alias ["," alias] ")"

postAccumClause := "POST-ACCUM" dmlSubStmtList

dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*

dmlSubStmt := assignStmt           // Assignment
            | funcCallStmt         // Function Call
            | gAccumAccumStmt      // Assignment
            | lAccumAccumStmt      // Assignment
            | attrAccumStmt        // Assignment
            | vAccumFuncCall       // Function Call
            | localVarDeclStmt     // Declaration
            | dmlSubCaseStmt       // Control Flow
            | dmlSubIfStmt         // Control Flow
            | dmlSubWhileStmt      // Control Flow
            | dmlSubForEachStmt    // Control Flow
            | BREAK                // Control Flow
            | CONTINUE             // Control Flow
            | insertStmt           // Data Modification
            | dmlSubDeleteStmt     // Data Modification
            | printlnStmt          // Output
            | logStmt              // Output

vAccumFuncCall := vertexAlias "." localAccumName ("." funcName "(" [argList] ")")+

groupByClause := GROUP BY groupExpr ("," groupExpr)*
groupExpr := expr

havingClause := HAVING condition

orderClause := ORDER BY expr [ASC | DESC] ["," expr [ASC | DESC]]*

limitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )

The SELECT block uses a step pattern or path pattern to select some of the graph’s vertices and edges. There are a number of optional clauses that define and/or refine the selection by constraining the vertex or edge set or the result set. The final output of a query is either a vertex set known as the result set or a table.

Size limitation

There is a maximum size limit of 2 GB for the result set of a SELECT block . If the result of the SELECT block is larger than 2 GB, the system will return no data. NO error message is produced.

GSQL SELECT Statement

EBNF for GSQL Select Statement

gsqlSelectBlock := gsqlSelectClause
               fromClause
               [sampleClause]
               [whereClause]
               [accumClause]
               [postAccumClause]*
               [havingClause]
               [orderClause]
               [limitClause]

gsqlSelectClause := vertexSetName "=" SELECT vertexAlias

In classic GSQL, the SELECT statement is an assignment statement with a SELECT block on the right hand side. The initial clause is the SELECT clause: SELECT vertexAlias. Its purpose is to specify which set of vertices from the FROM clause is to become the output. The classic SELECT clause may contain only one item: a vertex alias defined in the FROM clause. As of v3.1, the vertex alias may be from anywhere in a multi-hop pattern, not only an endpoint. GSQL now also supports SQL-like SELECT statements with tabular output.

The FROM Clause defines a path pattern to traverse in the graph, and each vertex in the path pattern can be given a vertexAlias name. Thus, the SELECT clause picks the set of vertices at one of these points in the pattern — the source vertices, the target vertices, or those from an interior point in a multi-hop path — to be the output vertices.

The SELECT block has many optional clauses, which fit together in a logical flow. Overall, the SELECT block starts from a source set of vertices and returns a result set that is either a subset of the source vertices or a subset of their neighboring vertices. Along the way, computations can be performed on the selected vertices and edges. The figure below graphically depicts the overall SELECT data flow. While the ACCUM and POST-ACCUM clauses do not directly affect which vertices are included in the result set, they affect the data (accumulators) which are attached to those vertices.

Basic data flow for a classic 1-hop GSQL SELECT statement

FROM Clause

In classic (Syntax v1) GSQL, the FROM clause described one step or hop pattern. In Pattern Matching (Syntax v2), the pattern can be multiple hops long. Path patterns also have many other options for finer control and greater flexibility.

FROM clause

fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)

A hop or step consists of going from a starting set of vertices, crossing over a set of their edges, to an ending set of vertices. We typically use the names Source and Target for the starting and ending vertex sets: Source -(Edges)-> Target

The step pattern defines constraints for the Source set, the Edge set, and the Target set. The result of the FROM clause can be interpreted as a 3-column virtual table called the match table. Each row is a 3-element tuple: (source vertex, connected edge, target vertex).

Source Vertex Set (SYNTAX v1)

Notice that the edge set and target set are optional: a step can be just source vertices (stepSourceSet).

EBNF for source-only pattern (SYNTAX v1)

step   :=  stepSourceSet ["-" "(" stepEdgeSet ")" ("-"|"->") stepVertexSet]

Rules for Source Vertex Set in Syntax V1:

The source set may only be a vertexSetName. If this the first SELECT statement in the query, then the vertexSetName is generally created as a seedsSet:
The vertexSetName is optionally followed by an alias, which used in subsequent clauses to refer to the source set:
```
stepSourceSet := vertexSetName [":" vertexAlias]
vertexAlias := name
```

For example:

resultSet = SELECT s FROM Source:s;

This statement can be interpreted as "Select all vertices s, from the vertex set Source ." The result is a vertex set. Below is a simple example of a vertex selection.

Query
Results

Vertex SELECT example

# displays all 'post'-type vertices
CREATE QUERY printAllPosts() FOR GRAPH socialNet
{
  start =   {post.*};			   # initialized with all vertices of type 'post'
  results = SELECT s FROM start:s; # select these vertices
  PRINT results;
}

Results of Query printAllPosts

GSQL > RUN QUERY printAllPosts()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"results": [
    {
      "v_id": "0",
      "attributes": {
        "postTime": "2010-01-12 11:22:05",
        "subject": "Graphs"
      },
      "v_type": "post"
    },
    {
      "v_id": "10",
      "attributes": {
        "postTime": "2011-02-04 03:02:31",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "2",
      "attributes": {
        "postTime": "2011-02-03 01:02:42",
        "subject": "query languages"
      },
      "v_type": "post"
    },
    {
      "v_id": "4",
      "attributes": {
        "postTime": "2011-02-07 05:02:51",
        "subject": "coffee"
      },
      "v_type": "post"
    },
    {
      "v_id": "9",
      "attributes": {
        "postTime": "2011-02-05 23:12:42",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "3",
      "attributes": {
        "postTime": "2011-02-05 01:02:44",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "5",
      "attributes": {
        "postTime": "2011-02-06 01:02:02",
        "subject": "tigergraph"
      },
      "v_type": "post"
    },
    {
      "v_id": "7",
      "attributes": {
        "postTime": "2011-02-04 17:02:41",
        "subject": "Graphs"
      },
      "v_type": "post"
    },
    {
      "v_id": "1",
      "attributes": {
        "postTime": "2011-03-03 23:02:00",
        "subject": "tigergraph"
      },
      "v_type": "post"
    },
    {
      "v_id": "11",
      "attributes": {
        "postTime": "2011-02-03 01:02:21",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "8",
      "attributes": {
        "postTime": "2011-02-03 17:05:52",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "6",
      "attributes": {
        "postTime": "2011-02-05 02:02:05",
        "subject": "tigergraph"
      },
      "v_type": "post"
    }
  ]}]
}

1-Hop Step (SYNTAX v1)

Usually, a FROM clause has a full 1-hop step.

step   :=  stepSourceSet ["-" "(" stepEdgeSet ")" ("-"|"->") stepVertexSet]

The symbols -( and )- enclose the stepEdgeSet and separate the three parts. Each of the three parts may also define an alias, which makes a convenient way to refer to each of the three sets of entities.

stepSourceSet := vertexSetName [":" vertexAlias]
stepEdgeSet := [setEdgeTypes] [":" edgeAlias]
stepVertexSet := [setVertexTypes] [":" vertexAlias]

Below is a simple example:

Person:s -( (Bought|Rented):e )- (Product|Service):t

The Source set is all Persons, but the pattern will not include all Persons. It will only include those Persons who Bought or Rented a Product or Service. Moreover, the result will be a set of matched triples: (s, e, t). For example, if Sam bought a TV and Andy rented a car, the results will include (Sam, Bought, TV) and (Andy, Rented, Car). However, these two facts do not imply (Sam, Rented, Car) or (Andy, Rented, TV).

The GSQL grammar allows the righthand enclosure to be either )- or
)->. Previously, we recommended the arrowhead )-> . However, for better compatibility with the new V2 pattern matching syntax, we now recommend the headless version: )- Syntax v2 has a specific semantic meaning for the arrowheads.

Edge Set and Target Vertex Set Options

Rules for Edge Set and Target Set in Syntax V1:

The edge set (stepEdgeSet) and target vertex set (stepVertexSet) obey very similar rules. Each may be a set specifier (stepEdgeTypes or stepVertexTypes) followed optionally by an alias.

The set specifier can be any of the following:

"_" or "ANY" or <blank>, which means any edge/vertex.
a named type (edgeType or vertexType)
a global SetAcum accumulator containing a set of edges or vertices ("@@"accumName)
a string or string set parameter which names one or more edge or vertex types (paramName) This parameterized type is one aspect of Dynamic Querying.
A list of types, string parameters, or global accumulators, e.g., "(" edgeSetType ["|" edgeSetType] ")"*

The list of types must contain the same type of specifiers.

stepEdgeSet := [setEdgeTypes] [":" edgeAlias]
stepVertexSet := [setVertexTypes] [":" vertexAlias]
alias := (vertexAlias | edgeAlias)
vertexAlias := name
edgeAlias := name

stepEdgeTypes := atomicEdgeType
               | "(" edgeSetType ["|" edgeSetType]* ")"
atomicEdgeType := "_" | ANY | edgeSetType
edgeSetType := edgeType | paramName | globalAccumName

stepVertexTypes := atomicVertexType
                 | "(" vertexSetType ["|" vertexSetType]* ")"
atomicVertexType := "_" | ANY | vertexSetType
vertexSetType := vertexType | paramName | globalAccumName

Notation	Accepted vertex/edge types
(empty)	any type
_	any type
ANY	any type
vertex/edge type name	given type
string parameter holding a vertex/edge type name	given type
global SetAccum containing vertices or edges	given set
(name \| name …)	UNION of two or more collections

Notation

Accepted vertex/edge types

(empty)

any type

ANY

any type

vertex/edge type name

given type

string parameter holding a vertex/edge type name

given type

global SetAccum containing vertices or edges

given set

(name | name …)

UNION of two or more collections

Parentheses are always needed around the edge type in a FROM clause:
FROM Source:s -(eType:e)- Target:t

Parentheses are also needed if a vertexEdgeType is the union of more than one or more individual edge types or vertex types:
FROM Source:s -((eType1 | eType2):e) - Target:t

Note the double set of parentheses for the edge specifier. If these is an edge alias, these parentheses are needed. If however there is no edge alias, it is legal to have just a single set of parentheses:
FROM Source:s -(eType1 | eType2) - Target:t

Either the source vertex set ( s ) or target vertex set ( t ) can be used as the SELECT argument, which determines the result of the SELECT statement. Note the small difference in the two SELECT statements below.

Selecting source or target vertices from edge-induced selection

resultSet1 = SELECT s FROM source:s-(eType:e)-tType:t;   //select from the source set
resultSet2 = SELECT t FROM source:s-(eType:e)-tType:t;   //select from the target set

resultSet1 is based on the source end of the edges. resultSet2 is based on the target end of the selected edges. However, resultSet1 is NOT identical to the Source vertex set. It is only those members of Source which connect to an eType edge and then to a tType vertex. Other clauses (presented later in this "SELECT Statement" section, can do additional filtering of the Source set.

We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features which are only available to vertex and edge aliases.

It is legal to declare an alias without explicitly stating an edge/target type. See the examples below.

Target vertex type inference
Edge type inference

resultSet3 = SELECT v FROM Source:v-(eType:e)->(V1|V2):t;
resultSet4 = SELECT v FROM Source:v-(eType:e)->:t;
resultSet5 = SELECT v FROM Source:v-(eType:e)->ANY:t;
resultSet6 = SELECT v FROM Source:v-(eType:e)->_:t;

resultSet7 = SELECT v FROM Source:v-((E1|E2|E3):e)->tType:t;
resultSet8 = SELECT v FROM Source:v-(:e)->tType:t;
resultSet9 = SELECT v FROM Source:v-(_:e)->tType:t;
resultSet10 = SELECT v FROM Source:v-(ANY:e)->tType:t;

The following are a set of queries that demonstrate edge-induced SELECT blocks. The allPostsLiked and allPostsMade queries show how the target vertex type can be omitted. The allPostsLikedOrMade query uses the "|" operator to select multiple types of edges.

Query
Results

Edge induced SELECT example

# uses various SELECT statements (some of which are equivalent) to print out
# either the posts made by the given user, the posts liked by the given
# user, or the posts made or liked by the given user.
CREATE QUERY printAllPosts2(vertex<person> seed) FOR GRAPH socialNet
{
	start = {seed}; # initialize starting set of vertices


	# --- statements produce equivalent results
	# select all 'post' vertices which can be reached from 'start' in one hop
	# 	using an edge of type 'liked'
	allPostsLiked = SELECT targetVertex FROM start -(liked:e)-> post:targetVertex;

	# select all vertices of any type which can be reached from 'start' in one hop
	# 	using an edge of type 'liked'
	allPostsLiked = SELECT targetVertex FROM start -(liked:e)-> :targetVertex;
	# ----


	# --- statements produce equivalent results
	# start with the vertex set from above, and traverse all edges of type "posted"
  	# 	(locally those edges are just given a name 'e' in case they need accessed)
  	# 	and return all vertices of type 'post' which can be reached within one-hop of 'start' vertices
	allPostsMade = SELECT targetVertex FROM start -(posted:e)-> post:targetVertex;

	# start with the vertex set from above, and traverse all edges of type "posted"
  	# 	(locally those edges are just given a name 'e' in case they need accessed)
  	# 	and return all vertices of any type which can be reached within one-hop of 'start' vertices
	allPostsMade = SELECT targetVertex FROM start -(posted:e)-> :targetVertex;
	# ----


	# --- statements produce equivalent results
	# select all vertices of type 'post' which can be reached from 'start' in one hop
	# 	using an edge of any type
	# not equivalent to any statement. because it doesn't restrict the edge type,
	# 	this will include any vertex connected by 'liked' or 'posted' edge types
	allPostsLikedOrMade = SELECT t FROM start -(:e)-> t;

	# select all vertices of type 'post' which can be reached from 'start' in one hop
	#	using an edge of type either 'posted' or 'liked'
	allPostsLikedOrMade = SELECT t FROM start -((posted|liked):e)-> post:t;

	# select all vertices of any type which can be reached from 'start' in one hop
	#	using an edge of type either 'posted' or 'liked/
	allPostsLikedOrMade = SELECT t FROM start -((posted|liked):e)-> :t;

	#option for simplified parentheses in edge pattern:
	allPostsLikedOrMade = SELECT t FROM start - (posted|liked)-> :t
	# ----

	PRINT allPostsLiked;
    PRINT allPostsMade;
    PRINT allPostsLikedOrMade;
}

Results of Query printAllPosts2

GSQL > RUN QUERY printAllPosts2("person2")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"allPostsLiked": [
      {
        "v_id": "0",
        "attributes": {
          "postTime": "2010-01-12 11:22:05",
          "subject": "Graphs"
        },
        "v_type": "post"
      },
      {
        "v_id": "3",
        "attributes": {
          "postTime": "2011-02-05 01:02:44",
          "subject": "cats"
        },
        "v_type": "post"
      }
    ]},
    {"allPostsMade": [{
      "v_id": "1",
      "attributes": {
        "postTime": "2011-03-03 23:02:00",
        "subject": "tigergraph"
      },
      "v_type": "post"
    }]},
    {"allPostsLikedOrMade": [
      {
        "v_id": "0",
        "attributes": {
          "postTime": "2010-01-12 11:22:05",
          "subject": "Graphs"
        },
        "v_type": "post"
      },
      {
        "v_id": "3",
        "attributes": {
          "postTime": "2011-02-05 01:02:44",
          "subject": "cats"
        },
        "v_type": "post"
      },
      {
        "v_id": "1",
        "attributes": {
          "postTime": "2011-03-03 23:02:00",
          "subject": "tigergraph"
        },
        "v_type": "post"
      }
    ]}
  ]
}
GSQL > RUN QUERY printAllPosts2("person6")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"allPostsLiked": [{
      "v_id": "8",
      "attributes": {
        "postTime": "2011-02-03 17:05:52",
        "subject": "cats"
      },
      "v_type": "post"
    }]},
    {"allPostsMade": [
      {
        "v_id": "10",
        "attributes": {
          "postTime": "2011-02-04 03:02:31",
          "subject": "cats"
        },
        "v_type": "post"
      },
      {
        "v_id": "5",
        "attributes": {
          "postTime": "2011-02-06 01:02:02",
          "subject": "tigergraph"
        },
        "v_type": "post"
      }
    ]},
    {"allPostsLikedOrMade": [
      {
        "v_id": "10",
        "attributes": {
          "postTime": "2011-02-04 03:02:31",
          "subject": "cats"
        },
        "v_type": "post"
      },
      {
        "v_id": "5",
        "attributes": {
          "postTime": "2011-02-06 01:02:02",
          "subject": "tigergraph"
        },
        "v_type": "post"
      },
      {
        "v_id": "8",
        "attributes": {
          "postTime": "2011-02-03 17:05:52",
          "subject": "cats"
        },
        "v_type": "post"
      }
    ]}
  ]
}

This example is another edge selection that uses the "|" operator to select edges that have target vertices of multiple types.

Query
Results

Edge induced SELECT example

# uses a SELECT statement to print out everything related to a given user
# 	this includes posts that the user liked, posts that the user made, and friends
# 	of the user
CREATE QUERY printAllRelatedItems(vertex<person> seed) FOR GRAPH socialNet
{
	sourceVertex = {seed};

	# -- statements produce equivalent output
	# returns all vertices of type either 'person' or 'post' that can be reached
	# 	from the sourceVertex set using one edge of any type
	everythingRelated = SELECT v FROM sourceVertex -(:e)-> (person|post):v;

	# returns all vertices of any type that can be reached from the sourceVertex
	# 	using one edge of any type
	# this statement is equivalent to the above one because the graph schema only
	#	has vertex types of either 'person' or 'post'. if there were more vertex
	#	types present, these would not be equivalent.
	everythingRelated = SELECT v FROM sourceVertex -(:e)-> :v;
	# --

	PRINT everythingRelated;
}

GSQL > RUN QUERY printAllRelatedItems("person2")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"everythingRelated": [
    {
      "v_id": "0",
      "attributes": {
        "postTime": "2010-01-12 11:22:05",
        "subject": "Graphs"
      },
      "v_type": "post"
    },
    {
      "v_id": "person3",
      "attributes": {
        "gender": "Male",
        "id": "person3"
      },
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {
        "gender": "Male",
        "id": "person1"
      },
      "v_type": "person"
    },
    {
      "v_id": "3",
      "attributes": {
        "postTime": "2011-02-05 01:02:44",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "1",
      "attributes": {
        "postTime": "2011-03-03 23:02:00",
        "subject": "tigergraph"
      },
      "v_type": "post"
    }
  ]}]
}
GSQL > RUN QUERY printAllRelatedItems("person6")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"everythingRelated": [
    {
      "v_id": "person4",
      "attributes": {
        "gender": "Female",
        "id": "person4"
      },
      "v_type": "person"
    },
    {
      "v_id": "10",
      "attributes": {
        "postTime": "2011-02-04 03:02:31",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "5",
      "attributes": {
        "postTime": "2011-02-06 01:02:02",
        "subject": "tigergraph"
      },
      "v_type": "post"
    },
    {
      "v_id": "person8",
      "attributes": {
        "gender": "Male",
        "id": "person8"
      },
      "v_type": "person"
    },
    {
      "v_id": "8",
      "attributes": {
        "postTime": "2011-02-03 17:05:52",
        "subject": "cats"
      },
      "v_type": "post"
    }
  ]}]
}

Vertex and Edge Aliases

Vertex and edge aliases are declared within the FROM clause of a SELECT block, by using the character ":", followed by the alias name. Aliases can be accessed anywhere within the same SELECT block. They are used to reference a single selected vertex or edge of a set. It is through the vertex or edge aliases that attributes of these vertices or edges can be accessed.

For example, the following code snippets show two different SELECT statements. The first SELECT statement starts from a vertex set called allVertices, and the vertex alias name v can access each individual vertex from allVertices. The second SELECT statement selects a set of edges. It can use the vertex alias s to reference the source vertices, or the alias t to reference the target vertices.

Vertex variables

results = SELECT v FROM allVertices:v;
results = SELECT t FROM allVertices:s -()-> :t;

The following example shows an edge-based SELECT statement, declaring aliases for all three parts of the edge. In the ACCUM clause, the e and t aliases are assigned to local vertex and edge variables.

Edge variables

results = SELECT v
          FROM allVertices:s -(:e)-> :t
          ACCUM VERTEX v = t, EDGE eg = e;

We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features only available to vertex and edge aliases.

Pattern Matching (SYNTAX v2)

We give a brief overview of Pattern Matching syntax and semantics in the FROM clause, from the perspective of the formal notation and how the rules can be inferred from that. For a more practical explanation of Pattern Matching, we recommend the Pattern Matching Tutorial.

There are three options for the syntax of the pattern in a FROM clause:

fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)

step pattern, SYNTAX v1. This is the default syntax from classic GSQL.
stepV2 pattern, SYNTAX v2. The query or the GSQL session must specify that SYNTAX v2 is to be used. This is still a 1-Hop query, but there are two differences:
- The source vertex set now has the same flexibility as the target vertex. It is not necessary to create a seedSet. Instead, the source can be one or more vertex types, given either statically or dynamically as string parameters, or the special symbol ANY or _.
- The arrowhead -( )-> should not be used between the edge set and the target set. Simple use a dash -( )-.
One or more pathPattern s.

A pathPattern begins with a stepVertexSet and then has one or more hops across and edge set to a target set: -(pathEdgePattern)-setVertexSet.

pathPattern := stepVertexSet ["-" "(" pathEdgePattern ")" "-" stepVertexSet]*

We have already studied stepVertexSet. Now we will look at the options for pathEdgePattern and their meanings.

pathEdgePattern := atomicEdgePattern
                 | "(" pathEdgePattern ")"
                 | pathEdgePattern "." pathEdgePattern
                 | disjPattern
                 | starPattern

atomicEdgePattern  := atomicEdgeType
        	        | atomicEdgeType ">"
        	        | "<" atomicEdgeType

atomicEdgeType := "_" | ANY | edgeSetType
edgeSetType := edgeType | paramName | globalAccumName

The most basic form for a pathEdgePattern is an atomicEdgePattern. This in turn can be, similar to stepVertexSet, one of the following:

"_" or "ANY"
an edgeType, a string parameter, or a global SetAccum accumulator.

Edge Direction

Moreover, an atomicEdgePattern can have either a left pointer "<" on the left or a right pointer ">" on the right. These indicate edge direction, of course. If no pointer is used, then the edge is undirected. Suppose we have 3 edge types or parameters called A, B, C.

A> is a rightward facing A edge
<B is a leftward facing B edge
C is an undirected C edge. If C is actually a directed edge type, then there is no match.

Disjunction, Repeats, and Dot Concatenation

Looking at the remaining options for pathEdgePattern, we see that we can have parentheses around it, we can use a dot "." between two pathEdgePatterns, or we can have a disjPattern or starPattern.

disjPattern is how we write "either this edge pattern or that edge pattern""

disjPattern := atomicEdgePattern ("|" atomicEdgePattern)*

For example, we can combine the three direction-specfic examples from above: (A> | <B | C)

starPattern explains how the Kleene star and min..max range specifiers can be used to say "repeat this edge pattern from min to max times."

starPattern := ([atomicEdgePattern] | "(" disjPattern ")") "*" [starBounds]

starBounds := CONST_INT ".." CONST_INT
            | CONST_INT ".."
            | ".." CONST_INT
            | CONST_INT

For example,(A> | <B | C)*2..4 means "a series of 2 to 4 adjacent edges, where each edge has type A>, <B, or C. Note that we do not have to use the same edge type for all the repeats. Either iteration can select an option from disjPattern.

The dot operator means concatenate the two edge patterns into one. Naturally, there must be a vertex joining the two edges, but it is omitted from the syntax. The dot operator is a shorthand, when you don’t care about the type of that intermediate vertex.
(A>.<B.C) means a series of 3 edges, having the specifying types and directions.

Conjunctive Pattern Matching

The optional repeating phrase ["," pathPattern]* allows you to have multiple pathPatterns. They form a conjunction, meaning all of them must be satisfied in order to have a valid match result.

fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)

Recall that each step pattern or path pattern forms a match table, one row per matching path in the graph. Each vertex alias or edge alias is one column in the table. When we have a conjunctive path, each path must share at least one vertex alias with another path. This enables the two path sets (and match tables) to be joined. Formally, we make the natural join of the two tables.

This explains all of the syntax for the FROM clause with SYNTAX v2 (Pattern Matching). The other significant area to consider for SYNTAX v2 is the ACCUM and POST-ACCUM clauses.

SAMPLE Clause

The SAMPLE clause is an optional clause that selects a uniform random sample from the population of edges or target vertices specified in the FROM argument.

If you want to sample from a set of vertices directly, not from edges or from neighboring (target) vertices, then the following technique is simpler and faster:

Select k random vertices from a vertex set S

random = SELECT s
         FROM S:s
         LIMIT k;

The SAMPLE clause draws from the edge population consisting of those edges which satisfy all three parts — source set, edge type, and target type — of the FROM clause. The SAMPLE clause is intended to provide a representative sample of the distribution of edges (or vertices) connected to hub vertices, instead of dealing with all edges. A hub vertex is a vertex with a relatively high degree. (The degree of a vertex is the number of edges which connect to it. If edges are directional, one can distinguish between indegree and outdegree.)

EBNF for Sample Clause

sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition # sample an absolute number (or a percentage) of edges for each source vertex.
              | SAMPLE expr TARGET WHEN condition              # sample an absolute number of edges incident to each target vertex.
              | SAMPLE expr "%" TARGET PINNED WHEN condition   # sample a percentage of edges incident to each target vertex.

The expression following SAMPLE specifies the sample size, either an absolute number or a percentage of the population. The expression in sampleClause must evaluate to a positive integer. There are two sampling methods. One is sampling based on edge id. The other is based on target vertex id: if a target vertex id is sampled, all edges from this source vertex to the sampled target vertex are sampled.

Note: Currently, the WHEN condition that can be used with a SAMPLE clause is limited strictly to checking if the result of a function call on a vertex is greater than or greater than/equal to some number.

Given that the sampling is random, some of the details of each of the example queries may change each time they are run.

The following query displays two modes of sampling: an absolute number of edges from a source vertex and a percentage of edges fro a source vertex. We use the computerNet graph (see Appendix D). In computerNet, there are 31 vertices and 43 edges, but only 7 vertices are source vertices. Moreover, c1, c12, and c23 are hub nodes, with at least 10 outgoing edges each. For the absolute count case, we set the size to 1 edge per source vertex, which is equivalent to a random walk. We expect exactly 7 edges to be selected. For the percentage sampling case, we sample 33% of the edges for vertices which have 3 or more outgoing edges. We expect about 15 edges, but the number may vary.

Query
Results

sampleEx3: SAMPLE based on edges per source vertex

CREATE QUERY sampleEx3() FOR GRAPH computerNet
{
    MapAccum<STRING,ListAccum<STRING>> @@absEdges; // record each selected edge as (src->tgt)
    SumAccum<INT> @@totalAbs;
    MapAccum<STRING,ListAccum<STRING>> @@pctEdges; // record each selected edge as (src->tgt)
    SumAccum<INT> @@totalPct;

    start = {computer.*};

    # Sample one outgoing edge per source vertex = Random Walk
    absSample = SELECT v FROM start:s -(:e)-> :v
             SAMPLE 1 EDGE WHEN s.outdegree() >= 1    # sample 1 target vertex from each source vertex
             ACCUM @@absEdges += (s.id -> v.id),
                   @@totalAbs += 1;
    PRINT @@totalAbs, @@absEdges;

    pctSample = SELECT v FROM start:s -(:e)-> :v
             SAMPLE 33% EDGE WHEN s.outdegree() >= 3  # select ~1/3 of edges when outdegree >= 3
             ACCUM @@pctEdges += (s.id -> v.id),
                   @@totalPct += 1;
    PRINT @@totalPct, @@pctEdges;
}

sampleEx3.json

GSQL > RUN QUERY sampleEx3()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {
      "@@totalAbs": 7,
      "@@absEdges": {
        "c4": ["c23"],
        "c11": ["c12"],
        "c10": ["c11"],
        "c12": ["c14"],
        "c23": ["c26"],
        "c14": ["c24"],
        "c1": ["c10"]
      }
    },
    {
      "@@totalPct": 13,
      "@@pctEdges": {
        "c4": ["c23"],
        "c11": ["c12"],
        "c10": ["c11"],
        "c12": [
          "c14",
          "c15",
          "c19"
        ],
        "c23": [
          "c29",
          "c25"
        ],
        "c14": [
          "c24",
          "c23"
        ],
        "c1": [
          "c3",
          "c8",
          "c2"
        ]
      }
    }
  ]
}

Below is an example of using SELECT to only traverse one edge for each source vertex. The vertex-attached accumulators @timesTraversedNoSample and @timesTraversedWithSample are used to keep track of the number of times an edge is traversed to reach the target vertex. Without using sampling, this occurs once for each edge; thus @timesTraversedNoSample has the same number as the in-degree of the vertex. With sampling edges, the number of edges is restricted. This is reflected in the @timesTraversedWithSample accumulator. Notice the difference in the result set. Because only one edge per source vertex is traversed when the SAMPLE clause is used, not all target vertices are reached. The vertex company3 has 3 incident edges, but in one instance of the query execution, it is never reached. Additionally, company2 has 6 incident edges, but only 4 source vertices sampled an edge incident to company2 .

Query
Results

example of SAMPLE using an absolute number of edges

CREATE QUERY sampleEx1() FOR GRAPH workNet
{
	SumAccum<INT> @timesTraversedNoSample;
	SumAccum<INT> @timesTraversedWithSample;
	workers = {person.*};

	# the 'beforeSample' result set encapsulates the normal functionality of
	# a SELECT statement, where 'timesTraversedNoSample' vertex accumulator is increased for
	# each edge incident to the vertex.
	beforeSample = SELECT v FROM workers:t -(:e)-> :v
		       ACCUM v.@timesTraversedNoSample += 1;

	# The 'afterSample' result set is formed by those vertices which can be
	# reached when for each source vertex, only one edge is used for traversal.
	# This is demonstrated by the values of 'timesTraversedWithSample' vertex accumulator, which
	# is increased for each edge incident to the vertex which is used in the
	# sample.
	afterSample = SELECT v FROM workers:t -(:e)-> :v
		      SAMPLE 1 EDGE WHEN t.outdegree() >= 1		# only use 1 edge from the source vertex
		      ACCUM v.@timesTraversedWithSample += 1;

	PRINT beforeSample;
	PRINT afterSample;
}

sampleEx1.json

GSQL > RUN QUERY sampleEx1()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"beforeSample": [
      {
        "v_id": "company4",
        "attributes": {
          "country": "us",
          "@timesTraversedNoSample": 1,
          "@timesTraversedWithSample": 1,
          "id": "company4"
        },
        "v_type": "company"
      },
      {
        "v_id": "company5",
        "attributes": {
          "country": "can",
          "@timesTraversedNoSample": 1,
          "@timesTraversedWithSample": 1,
          "id": "company5"
        },
        "v_type": "company"
      },
      {
        "v_id": "company3",
        "attributes": {
          "country": "jp",
          "@timesTraversedNoSample": 3,
          "@timesTraversedWithSample": 3,
          "id": "company3"
        },
        "v_type": "company"
      },
      {
        "v_id": "company2",
        "attributes": {
          "country": "chn",
          "@timesTraversedNoSample": 6,
          "@timesTraversedWithSample": 4,
          "id": "company2"
        },
        "v_type": "company"
      },
      {
        "v_id": "company1",
        "attributes": {
          "country": "us",
          "@timesTraversedNoSample": 6,
          "@timesTraversedWithSample": 3,
          "id": "company1"
        },
        "v_type": "company"
      }
    ]},
    {"afterSample": [
      {
        "v_id": "company4",
        "attributes": {
          "country": "us",
          "@timesTraversedNoSample": 1,
          "@timesTraversedWithSample": 1,
          "id": "company4"
        },
        "v_type": "company"
      },
      {
        "v_id": "company5",
        "attributes": {
          "country": "can",
          "@timesTraversedNoSample": 1,
          "@timesTraversedWithSample": 1,
          "id": "company5"
        },
        "v_type": "company"
      },
      {
        "v_id": "company3",
        "attributes": {
          "country": "jp",
          "@timesTraversedNoSample": 3,
          "@timesTraversedWithSample": 3,
          "id": "company3"
        },
        "v_type": "company"
      },
      {
        "v_id": "company2",
        "attributes": {
          "country": "chn",
          "@timesTraversedNoSample": 6,
          "@timesTraversedWithSample": 4,
          "id": "company2"
        },
        "v_type": "company"
      },
      {
        "v_id": "company1",
        "attributes": {
          "country": "us",
          "@timesTraversedNoSample": 6,
          "@timesTraversedWithSample": 3,
          "id": "company1"
        },
        "v_type": "company"
      }
    ]}
  ]
}

Since the PRINT statements are placed at the end of query, the two vertex sets beforeSample and afterSample are almost identical, showing the final values of both accumulators@timesTraversedNoSample and @timesTraversedWithSample. There is one difference: company3 is not included in afterSample because none of the sample-selected edges reached company3.

WHERE Clause

The WHERE clause is an optional clause that constrains edges and vertices specified in the FROM and SAMPLE clauses.

EBNF for Where Clause

whereClause := WHERE condition

The WHERE clause uses a boolean condition to test each vertex or edge in the FROM set (or the sampled vertex and edge sets, if the SAMPLE clause was used).

If the expression evaluates to false for vertex/edge X, then X excluded from further consideration in the result set. The expression may use constants or any variables or parameters within the scope of the SELECT, arithmetic operators (+, -, , /,%), comparison operators (==, !=, <, ⇐, >,>=), boolean operators (AND, OR, NOT), set operators (IN, NOT IN) and parentheses to enforce precedence. The WHERE conditional expression may use any of the variables within its scope (global accumulators, vertex set variables, query input parameters, the FROM clause’s vertex and edge sets (or their vertex and edge aliases), or any of the attributes or accumulators of the vertex/edge sets.) For a more formal explanation of condition, see the EBNF definitions of *condition and expr.

Using built-in vertex and edge attributes and functions, such as .type and .neighbors(), the WHERE clause can be used to implement sophisticated selection rules for the edge traversal. In the following example, the selection conditions are completely specified in the WHERE clause, with no edge types or vertex types mentioned in the FROM clause.

WHERE used as a filter

resultSet1 = SELECT v FROM S:v-((E1|E2|E3):e)->(V1|V2):t;
resultSet2 = SELECT v FROM S:v-(:e)->:t
					 WHERE t.type IN ("V1", "V2") AND
						   t IN v.neighbors("E1|E2|E3")

The following examples demonstrate using the WHERE clause to limit the resulting vertex set based on a vertex attribute.

Query
Results

Basic SELECT WHERE

CREATE QUERY printCatPosts() FOR GRAPH socialNet {
	posts = {post.*};
	catPosts = SELECT v FROM posts:v		# select only those post vertices
                WHERE v.subject == "cats";  # which have a subset of 'cats'
	PRINT catPosts;
}

Results for Query printCatPosts

GSQL > RUN QUERY printCatPosts()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"catPosts": [
    {
      "v_id": "10",
      "attributes": {
        "postTime": "2011-02-04 03:02:31",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "9",
      "attributes": {
        "postTime": "2011-02-05 23:12:42",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "3",
      "attributes": {
        "postTime": "2011-02-05 01:02:44",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "11",
      "attributes": {
        "postTime": "2011-02-03 01:02:21",
        "subject": "cats"
      },
      "v_type": "post"
    },
    {
      "v_id": "8",
      "attributes": {
        "postTime": "2011-02-03 17:05:52",
        "subject": "cats"
      },
      "v_type": "post"
    }
  ]}]
}

Query
Results

SELECT WHERE using IN operator

CREATE QUERY findGraphFocusedPosts() FOR GRAPH socialNet
{
	posts = {post.*};
	results = SELECT v FROM posts:v					# select only post vertices
		WHERE v.subject IN ("Graph", "tigergraph");	# which have a subject of either 'Graph' or 'tigergraph'
	PRINT results;
}

Results for Query findGraphFocusedPosts

GSQL > RUN QUERY findGraphFocusedPosts()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"results": [
    {
      "v_id": "5",
      "attributes": {
        "postTime": "2011-02-06 01:02:02",
        "subject": "tigergraph"
      },
      "v_type": "post"
    },
    {
      "v_id": "1",
      "attributes": {
        "postTime": "2011-03-03 23:02:00",
        "subject": "tigergraph"
      },
      "v_type": "post"
    },
    {
      "v_id": "6",
      "attributes": {
        "postTime": "2011-02-05 02:02:05",
        "subject": "tigergraph"
      },
      "v_type": "post"
    }
  ]}]
}

WHERE NOT limitations

The NOT operator may not be used in combination with the .type attribute selector. To check if an edge or vertex type is not equal to a given type, use the != operator. See the example below.

The following example shows the equivalence of using WHERE as a type filter as well as its limitations.

Query
Results

SELECT WHERE using AND/OR

# finds female person in the social network. all of the following statements
# are equivalent (i.e., produce the same results)
CREATE QUERY findFemaleMembers() FOR GRAPH socialNet
{
	allVertices = {ANY}; # includes all posts and person
	females = SELECT v FROM allVertices:v
		  WHERE v.type   == "person" AND
		  	    v.gender != "Male";

	females = SELECT v FROM allVertices:v
		  WHERE v.type   == "person" AND
		  	    v.gender == "Female";

	females = SELECT v FROM allVertices:v
		  WHERE v.type       == "person" AND
		  	    NOT v.gender == "Male";

	females = SELECT v FROM allVertices:v
		  WHERE v.type       != "post" AND
		  	    NOT v.gender == "Male";

  	# does not compile. cannot use NOT operator in combination with type attribute
	#females = SELECT v FROM allVertices:v
	#	  WHERE NOT v.type   != "person" AND
	#	  	    NOT v.gender == "Male";

  	# does not compile. cannot use NOT operator in combination with type attribute
	#females = SELECT v FROM allVertices:v
	#	  WHERE NOT v.type   == "post" AND
	#	  	    NOT v.gender == "Male";

	personVertices = {person.*};
	females = SELECT v FROM personVertices:v
		   WHERE NOT v.gender == "Male";

	females = SELECT v FROM personVertices:v
		   WHERE v.gender != "Male";

	females = SELECT v FROM personVertices:v
		   WHERE v.gender != "Male" AND true;

	females = SELECT v FROM personVertices:v
		   WHERE v.gender != "Male" OR false;

	PRINT females;
}

Results for Query findFemaleMembers

GSQL > RUN QUERY findFemaleMembers()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"females": [
    {
      "v_id": "person4",
      "attributes": {
        "gender": "Female",
        "id": "person4"
      },
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {
        "gender": "Female",
        "id": "person5"
      },
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {
        "gender": "Female",
        "id": "person2"
      },
      "v_type": "person"
    }
  ]}]
}

The following example uses edge attributes to determine which workers are registered as full time for some company.

Query
Results

WHERE using edge attributes

# find all workers who are full time at some company
CREATE QUERY fullTimeWorkers() FOR GRAPH workNet
{
	start = {person.*};
	fullTimeWorkers = SELECT v FROM start:v -(worksFor:e)-> company:t
			WHERE e.fullTime;	# fullTime is a boolean attribute on the edge

	PRINT fullTimeWorkers;
}

fullTimeWorkers Results

GSQL > RUN QUERY fullTimeWorkers()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"fullTimeWorkers": [
    {
      "v_id": "person4",
      "attributes": {
        "interestList": ["football"],
        "skillSet": [ 10, 1, 4 ],
        "skillList": [ 4, 1, 10 ],
        "locationId": "us",
        "interestSet": ["football"],
        "id": "person4"
      },
      "v_type": "person"
    },
    {
      "v_id": "person11",
      "attributes": {
        "interestList": [ "sport", "football" ],
        "skillSet": [10],
        "skillList": [10],
        "locationId": "can",
        "interestSet": [ "football", "sport" ],
        "id": "person11"
      },
      "v_type": "person"
    },
    {
      "v_id": "person10",
      "attributes": {
        "interestList": [ "football", "sport" ],
        "skillSet": [3],
        "skillList": [3],
        "locationId": "us",
        "interestSet": [ "sport", "football" ],
        "id": "person10"
      },
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {
        "interestList": [ "management", "financial" ],
        "skillSet": [ 3, 2, 1 ],
        "skillList": [ 1, 2, 3 ],
        "locationId": "us",
        "interestSet": [ "financial", "management" ],
        "id": "person1"
      },
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {
        "interestList": [ "music", "art" ],
        "skillSet": [ 10, 7 ],
        "skillList": [ 7, 10 ],
        "locationId": "jp",
        "interestSet": [ "art", "music" ],
        "id": "person6"
      },
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {
        "interestList": ["engineering"],
        "skillSet": [ 6, 5, 3, 2 ],
        "skillList": [ 2, 3, 5, 6 ],
        "locationId": "chn",
        "interestSet": ["engineering"],
        "id": "person2"
      },
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {
        "interestList": ["management"],
        "skillSet": [ 2, 5, 1 ],
        "skillList": [ 1, 5, 2 ],
        "locationId": "chn",
        "interestSet": ["management"],
        "id": "person8"
      },
      "v_type": "person"
    },
    {
      "v_id": "person12",
      "attributes": {
        "interestList": [
          "music",
          "engineering",
          "teaching",
          "teaching",
          "teaching"
        ],
        "skillSet": [ 2, 5, 1 ],
        "skillList": [ 1, 5, 2, 2, 2 ],
        "locationId": "jp",
        "interestSet": [ "teaching", "engineering", "music" ],
        "id": "person12"
      },
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {
        "interestList": ["teaching"],
        "skillSet": [ 6, 1, 4 ],
        "skillList": [ 4, 1, 6 ],
        "locationId": "jp",
        "interestSet": ["teaching"],
        "id": "person3"
      },
      "v_type": "person"
    },
    {
      "v_id": "person9",
      "attributes": {
        "interestList": [ "financial", "teaching" ],
        "skillSet": [ 2, 7, 4 ],
        "skillList": [ 4, 7, 2 ],
        "locationId": "us",
        "interestSet": [ "teaching", "financial" ],
        "id": "person9"
      },
      "v_type": "person"
    }
  ]}]
}

If multiple edge types are specified in edge-induced selection, the WHERE clause should use OR to separate each edge type or each target vertex type. For example,

Multiple Edge Type WHERE clause

CREATE QUERY multipleEdgeTypeWhereEx(vertex<person> m1) FOR GRAPH socialNet {
  allUser = {m1};
  FilteredUser = SELECT s
      FROM allUser:s - ((posted|liked|friend):e) -> (post|person):t
      # WHERE e.actionTime > epoch_to_datetime(1) AND t.gender == "Male";
      WHERE ( e.type == "liked" AND e.actionTime > epoch_to_datetime(1) ) OR
            ( e.type == "friend" AND t.gender == "Male" )
            ;
  PRINT FilteredUser;
}

The above query is compilable. However, if we use line 5 as the WHERE clause instead, the query is not compilable. The edge-type conflict checking detects an error, because i t uses attributes from both "liked" edges and "friend" edges without separating them out by OR.

ACCUM and POST-ACCUM Clauses

The optional ACCUM and POST-ACCUM clauses enable sophisticated aggregation and other computations across the set of vertices or edges selected by the preceding FROM, SAMPLE, and WHERE clauses. A query can contain one or both of these clauses. The statements in an ACCUM clause are applied for every edge in an edge-induced selection or every vertex in a vertex-induced selection.

If there is more than one statement in the ACCUM clause, the statements are separated by commas and executed sequentially for each selected element. However, the TigerGraph system uses parallelism to improve performance. Within an ACCUM clause, each edge is handled by a separate process. As such, there is no fixed order in which the edges are processed within the ACCUM clause and the edges should not be treated as executing sequentially. The accumulators are mutex variables shared among each of these processes. The results of any accumulation within the ACCUM clause is not complete until all edges are traversed. Any inspection of an intermediate result within the ACCUM is incomplete and may not be that meaningful.

The statements within the ACCUM clause are executed sequentially for a given vertex or edge. However, there is no fixed order in which a vertex set or edge set is processed.

The optional POST-ACCUM clause enables aggregation and other computations across the set of vertices (but not edges) selected by the preceding clauses. POST-ACCUM can be used without ACCUM. If it is preceded by an ACCUM clause, then it can be used for 2-stage accumulative computation: a first stage in ACCUM followed by a second stage in POST-ACCUM.

Each statement within the POST-ACCUM clause can refer to either source vertices or target vertices but not both.

Since the ACCUM clause iterates over edges, and often two edges will connect to the same source vertex or to the same target vertex, the ACCUM clause can be repeated multiple times for one vertex.

Operations that are to be performed exactly once per vertex should be performed in the POST-ACCUM clause.

The primary purpose of the ACCUM or POST-ACCUM clause is to collect information about the graph by updating accumulators (via += or =). See the "Accumulator" section for details on the += operation. However, other kinds of statements (e.g., branching, iteration, local assignments) are permitted to support more complex computations or to log activity. The EBNF syntax below defines the allowable kinds of statements that can occur within an ACCUM or POST-ACCUM. The dmlSubStmt list is similar to the queryBodyStmt list which applies to statements outside of a SELECT block; it is important to note the differences. Each of these statement types is discussed in one of the main sections of this reference document.

EBNF for ACCUM and POST-ACCUM Clauses

accumClause := [perClauseV2] ACCUM dmlSubStmtList

perClauseV2 := PER "(" alias ["," alias] ")"

postAccumClause := POST-ACCUM dmlSubStmtList

dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*

dmlSubStmt := assignStmt           // Assignment
            | funcCallStmt         // Function Call
            | gAccumAccumStmt      // Assignment
            | lAccumAccumStmt      // Assignment
            | attrAccumStmt        // Assignment
            | vAccumFuncCall       // Function Call
            | localVarDeclStmt     // Declaration
            | dmlSubCaseStmt       // Control Flow
            | dmlSubIfStmt         // Control Flow
            | dmlSubWhileStmt      // Control Flow
            | dmlSubForEachStmt    // Control Flow
            | BREAK                // Control Flow
            | CONTINUE             // Control Flow
            | insertStmt           // Data Modification
            | dmlSubDeleteStmt     // Data Modification
            | printlnStmt          // Output
            | logStmt              // Output

Note that dml-sub-statements do not include global accumulator assignment statement (gAccumAssignStmt) but global accumulator accumulation statement (gAccumAccumStmt). Global accumulators may perform accumulation += but not assignment "=" within these clauses.

There are additional restrictions on dml-sub level statements:

Global variable assignment is permitted in ACCUM or POST-ACCUM clauses, but the change in value will not take place until the query completes. Therefore, if there are multiple assignment statements for the same variable, only the final one will take effect.
Vertex attribute assignment "=" is not permitted in an ACCUM clause. However, edge attribute assignment is permitted. This is because the ACCUM clause iterates over an edge set. Vertex attribute attribute assignment is permitted in the POST-ACCUM clause. Like all updates, the change in value does not take place until the query completes.

Aliases and ACCUM/POST-ACCUM Iteration Model

To reference each element of the selected set, use the aliases defined in the FROM clause. For example, assume that we have the following aliases:

Example of vertex and edge aliases

FROM source:s -(edgeTypes:e)-> targetTypes:t # edge-induced selection
FROM source:v                                # vertex-induced selection

Let (V1, V2,… Vn) be the vertices in the vertex-induced selection . The following pseudocode emulates ACCUM clause behavior.

Model for ACCUM behavior in vertex-induced selection

FOREACH v in (V1,V2,...Vn) DO # iterations may occur in parallel, in unknown order
  dmlSubStmts referencing v
DONE

Let E = (E1, E2,… En) be the edges in the edge-induced selected set. Further, let S = (S1,S1,…Sn) and T= (T1,T2,…Tn) be the multisets (bags) of source vertices and target vertices which correspond to the edge set. S and T are bags, because they can contain repeated elements.

Model for ACCUM behavior in edge-induced selection

FOREACH i in (1..n) DO # iterations may occur in parallel, in unknown order
  dmlSubStmts referencing e, s, t, which really means e_i, s_i, t_i
DONE

Note that any reference to the source alias s or target alias t is for the endpoint vertices of the current edge.

Similarly, the POST-ACCUM clause acts like a FOREACH loop on the vertex result set specified in the SELECT clause (e.g., either S or T).

Pattern Matching ACCUM and POST-ACCUM

In Pattern Matching (SYNTAX v2), we may have a multi-step pattern, with several vertex and edge aliases. Each statement (dml-sub-statement) can refer to one or more aliases. There are no restrictions, though complex expressions and accessing multiple aliases may degrade performance.

The PER clause is available only in SYNTAX v2. It is an optional prefix to the ACCUM clause, affecting only that clause. The PER clause allows the user to specify that they wish to aggregate the match table, so that there is one row PER <alias>. For more information see PER Clause section in the Pattern Matching tutorial.

Multiple POST-ACCUM clauses are supported in TG 3.0+. In SYNTAX v2, Each POST-ACCUM may refer to only one vertex alias. See the POST-ACCUM section in the Pattern Matching tutorial.

Edge/Vertex Type Inference and Conflict

If multiple edge types are specified in edge-induced selection, each ACCUM statement in ACCUM clause checks whether edge types are conflicted. If only a subset of edge types are effective in an ACCUM statement , this statement is not executed on other edge types. For example:

Multiple Edge Type ACCUM statement check

CREATE QUERY multipleEdgeTypeCheckEx(vertex<person> m1) FOR GRAPH socialNet {
  ListAccum<STRING> @@testList1, @@testList2, @@testList3;
  allUser = {m1};
  allUser = SELECT s
         FROM allUser:s - ((posted|liked|friend):e) -> (post|person):t
         ACCUM @@testList1 += to_string(datetime_to_epoch(e.actionTime))
              ,@@testList2 += t.gender
             #,@@testList3 += to_string(datetime_to_epoch(e.actionTime)) + t.gender # illegal
               ;
  PRINT @@testList1, @@testList2, @@testList3;
}

In the above example, line 6 is only executed on "liked" edges, because "actionTime" is the attribute of "liked" edge only. Similarly, line 7 is only executed on "friend" edges, because "gender" is the attribute of "person" only, and only "friend" edge uses "person" as target vertex. However, line 8 causes a compilation error, because it uses multiple edges where some edges cannot be supported in a part of the statement, i.e., "liked" edges doesn’t have t.gender, "friend" edges doesn’t have e.actionTime.

We strongly suggest that if multiple edge types are specified in edge-induced selection, ACCUM clauses should uses CASE statement (see Section "Control Flow Statements" for more details) to separate the operation on each edge type or each target vertex type (or combination of target vertex type and edge type). The edge-type conflict checking then checks the ACCUM statement inside each THEN/ELSE blocks based on the condition. For example,

Multiple Edge Type ACCUM statement check 2

CREATE QUERY multipleEdgeTypeCheckEx2(vertex<person> m1) FOR GRAPH socialNet {
  ListAccum<STRING> @@testList1;
  allUser = {m1};
  allUser = SELECT s
            FROM allUser:s - ((posted|liked|friend):e) -> (post|person):t
            ACCUM CASE
                    WHEN e.type == "liked" THEN    # for liked edges
                      @@testList1 += to_string(datetime_to_epoch(e.actionTime))
                    WHEN e.type == "friend" THEN   # for friend edges
                      @@testList1 += t.gender
                    ELSE      # For the remained edge type, which is posted edges
                      @@testList1 += to_string(datetime_to_epoch(t.postTime))
                  END
            ;
  PRINT @@testList1;
}

The above query is compilable. However, if we switch line 8 and line 10, the edge-type conflict checking generates errors because "liked" edges doesn’t support t.gender and "friend" edges doesn’t support e.actionTime.

Similar to the ACCUM clause, if multiple source/target vertex types are specified in edge-induced selection and the POST-ACCUM clauses accesses source/target vertex, each ACCUM statement in POST-ACCUM clause checks whether source/target vertex types are conflicted. If only a subset of source/target vertex types are effective in a POST-ACCUM statement, this statement is not executed on other source/target vertex types.

Similar to ACCUM clause, we strongly suggest that if multiple source/target vertex types are specified in edge-induced selection and the POST-ACCUM clauses accesses source/target vertex, POST-ACCUM clauses should uses CASE statement (see Section "Control Flow Statements" for more details) to separate the operation on each source/target vertex type. The vertex type conflict checking then checks the ACCUM statement inside each THEN/ELSE blocks based on the condition.

Rules for Updating Vertex-Attached Accumulators

Prior to v1.0, a vertex-attached accumulator could only be updated in an ACCUM or POST-ACCUM clause and only if its vertex was selected for by the preceding FROM-SAMPLE-WHERE clauses.

Beginning in v1.0, there are additional circumstances where a vertex-attached accumulator may be updated. Vertices which are referenced via a vertex-attached accumulator of a selected vertex may have their vertex-attached accumulators updated in the ACCUM clause (but not in the POST-ACCUM clause). That is, a vertex referenced by an selected vertex can be updated, with some limitations explained below. Some examples will help to illustrate this more complex condition.

Suppose a query declares a vertex-attached accumulator which holds vertex information . We call this a vertex-holding accumulator . This could take several forms:
- A scalar accumulator, e.g., MaxAccum< VERTEX > @maxV;
- A collection accumulator: e.g., ListAccum< VERTEX > @listV;
- An accumulator containing tuple(s), where the tuple type contains a VERTEX field.
If a vertex V is selected, then not only can V’s accumulators be updated, but the vertices stored in its vertex-holding accumulators can also be updated, in the ACCUM clause.
Before these indirectly referenced vertices can be used, they need to be activated . There are two ways to activate an indirect vertex:
- A vertex from a vertex-holding accumulator is first assigned to a local vertex variable. The vertex can now be updated through the local vertex variable.

ACCUM
  VERTEX<person> mx = tgt.@maxV,   # assign to local variable
          mx.@curId += src.id      # access via local variable

A FOREACH loop can iterate on a vertex-holding collection accumulator. The vertices can now be updated through the loop variable.

ACCUM
  FOREACH vtx IN src.@setIds DO   # iterate on collection accumulator
      vtx.@curId += tgt.id        # access via loop variable
  END

The following uses are NOT supported by the new rules:

Indirectly activated vertices may not be updated in the POST-ACCUM clause or outside of a SELECT statement.
Passing a vertex into the query as an input parameter is not a route to activation.
Using a global vertex-holding accumulator is not a route to activation.
If a vertex is being indirectly activated by assigning it to a local variable (e.g., a variable declaring in ACCUM or POST-ACCUM), note the following rule, which always applies to all local variables:
- A local variable can be declared and initialized in an ACCUM block once. It cannot be redeclared or reassigned later in the ACCUM block.

The following query demonstrates updates to indirectly activated vertices.

Query
Results

Updating an Indirectly-Referenced Vertex

CREATE QUERY vUpdateIndirectAccum() FOR GRAPH socialNet {

  SetAccum<VERTEX<person>> @posters;
  SetAccum<VERTEX<person>> @fellows;

   Persons = {person.*};
   # To each post, attach a list of persons who liked the post
   likedPosts = SELECT p
       FROM Persons:src -(liked:e)-> post:p
       ACCUM
       	p.@posters += src;

    # To each person who liked a post, attach a list of everyone
    # who also liked one of this person's liked posts.
	likedPosts = SELECT src
		FROM likedPosts:src
		ACCUM
		  FOREACH v IN src.@posters DO
		    v.@fellows += src.@posters
		  END
        ORDER BY src.subject;

	PRINT Persons[Persons.@fellows];
}

Results from Query vUpdateIndirectAccums

GSQL > RUN QUERY vUpdateIndirectAccess()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"Persons": [
    {
      "v_id": "person4",
      "attributes": {"Persons.@fellows": [
        "person8",
        "person4"
      ]},
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {"Persons.@fellows": [ "person2", "person1", "person3" ]},
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {"Persons.@fellows": ["person7"]},
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {"Persons.@fellows": [ "person2", "person1", "person3" ]},
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {"Persons.@fellows": ["person5"]},
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {"Persons.@fellows": ["person6"]},
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {"Persons.@fellows": [ "person2", "person1", "person3" ]},
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {"Persons.@fellows": [ "person8", "person4" ]},
      "v_type": "person"
    }
  ]}]
}

ACCUM and POST-ACCUM Examples

We now show several examples. This example demonstrates how ACCUM or POST-ACCUM can be used to count the number of vertices in the given set.

Query
Results

Accum and PostAccum Semantics

#Show Accum PostAccum Behavior
CREATE QUERY accumPostAccumSemantics() FOR GRAPH workNet {

  SumAccum<INT> @@vertexOnlyAccum;
  SumAccum<INT> @@vertexOnlyPostAccum;

  SumAccum<INT> @@vertexOnlyWhereAccum;
  SumAccum<INT> @@vertexOnlyWherePostAccum;

  SumAccum<INT> @@sourceWithEdgeAccum;
  SumAccum<INT> @@sourceWithEdgePostAccum;

  SumAccum<INT> @@targetWithEdgeAccum;
  SumAccum<INT> @@targetWithEdgePostAccum;

  #Seed start set with all company vertices
  start = {company.*};

  #Select all vertices in source set start
  selectVertexSet = SELECT v from start:v
					#Happens once for each vertex discovered
					ACCUM @@vertexOnlyAccum += 1

					#Happens once for each vertex in the result set "v"
					POST-ACCUM @@vertexOnlyPostAccum += 1;

  #Select all vertices in source set start with a where constraint
  selectVertexSetWhere = SELECT v from start:v WHERE (v.country == "us")
						#Happens once for each vertex discovered that also
						# meets the constraint condition
						ACCUM @@vertexOnlyWhereAccum += 1

						#Happens once for each vertex in the result set "v"
						POST-ACCUM @@vertexOnlyWherePostAccum += 1;

  #Select all source "s" vertices in set start and explore all "worksFor" edge paths
  selectSourceWithEdge = SELECT s from start:s -(worksFor)-> :t
		  		       	 #Happens once for each "worksFor" edge discovered
						 ACCUM @@sourceWithEdgeAccum += 1

						#Happens once for each vertex in result set "s" (source)
						POST-ACCUM @@sourceWithEdgePostAccum += 1;

  #Select all target "t" vertices found from exploring all "worksFor" edge paths from set start
  selectTargetWithEdge = SELECT t from start:s -(worksFor)-> :t
						 #Happens once for each "worksFor" edge discovered
						 ACCUM @@targetWithEdgeAccum += 1

						 #Happens once for each vertex in result set "t" (target)
						 POST-ACCUM @@targetWithEdgePostAccum += 1;

  PRINT @@vertexOnlyAccum;
  PRINT @@vertexOnlyPostAccum;

  PRINT @@vertexOnlyWhereAccum;
  PRINT @@vertexOnlyWherePostAccum;

  PRINT @@sourceWithEdgeAccum;
  PRINT @@sourceWithEdgePostAccum;

  PRINT @@targetWithEdgeAccum;
  PRINT @@targetWithEdgePostAccum;
}

accumPostAccumSemantics Result

GSQL > RUN QUERY accumPostAccumSemantics()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"@@vertexOnlyAccum": 5},
    {"@@vertexOnlyPostAccum": 5},
    {"@@vertexOnlyWhereAccum": 2},
    {"@@vertexOnlyWherePostAccum": 2},
    {"@@sourceWithEdgeAccum": 17},
    {"@@sourceWithEdgePostAccum": 5},
    {"@@targetWithEdgeAccum": 17},
    {"@@targetWithEdgePostAccum": 12}
  ]
}

This example uses ACCUM to find all the subjects a user posted about.

Query
Results

Vertex ACCUM Example

# For each person, make a list of all their post subjects
CREATE QUERY userPosts() FOR GRAPH socialNet {
  ListAccum<STRING> @personPosts;
  start = {person.*};

  # Find all user post topics and append them to the vertex list accum
  userPostings = SELECT s FROM start:s -(posted)-> :g
                 ACCUM s.@personPosts += g.subject;

  PRINT userPostings;
}

Results for Query userPosts

GSQL > RUN QUERY userPosts()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"userPostings": [
    {
      "v_id": "person4",
      "attributes": {
        "gender": "Female",
        "@personPosts": ["cats"],
        "id": "person4"
      },
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {
        "gender": "Male",
        "@personPosts": ["query languages"],
        "id": "person3"
      },
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {
        "gender": "Male",
        "@personPosts": [ "cats", "tigergraph" ],
        "id": "person7"
      },
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {
        "gender": "Male",
        "@personPosts": ["Graphs"],
        "id": "person1"
      },
      "v_type": "person"
    },
/*** other vertices omitted ***/
  ]}]
}

This example shows each person’s posted vertices and each person’s like behaviors (liked edges).

Query
Results

ACCUM<VERTEX> and ACCUM<EDGE> Example

# Show each user's post and liked post time
CREATE QUERY userPosts2() FOR GRAPH socialNet {
  ListAccum<VERTEX> @personPosts;
  ListAccum<EDGE> @personLikedInfo;
  start = {person.*};

  # Find all user post topics and append them to the vertex list accum
  userPostings = SELECT s FROM start:s -(posted)-> :g
                 ACCUM s.@personPosts += g;

  userPostings = SELECT s from start:s -(liked:e)-> :g
                 ACCUM s.@personLikedInfo += e;

  PRINT start;
}

Results from Query userPosts2

GSQL > RUN QUERY userPosts2()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"start": [
    {
      "v_id": "person4",
      "attributes": {
        "gender": "Female",
        "@personPosts": ["3"],
        "id": "person4",
        "@personLikedInfo": [{
          "from_type": "person",
          "to_type": "post",
          "directed": true,
          "from_id": "person4",
          "to_id": "4",
          "attributes": {"actionTime": "2010-01-13 03:16:05"},
          "e_type": "liked"
        }]
      },
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {
        "gender": "Male",
        "@personPosts": [ "9", "6" ],
        "id": "person7",
        "@personLikedInfo": [{
          "from_type": "person",
          "to_type": "post",
          "directed": true,
          "from_id": "person7",
          "to_id": "10",
          "attributes": {"actionTime": "2010-01-12 11:22:05"},
          "e_type": "liked"
        }]
      },
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {
        "gender": "Male",
        "@personPosts": ["0"],
        "id": "person1",
        "@personLikedInfo": [{
          "from_type": "person",
          "to_type": "post",
          "directed": true,
          "from_id": "person1",
          "to_id": "0",
          "attributes": {"actionTime": "2010-01-11 11:32:00"},
          "e_type": "liked"
        }]
      },
      "v_type": "person"
    },
/*** other vertices omitted ***/
  ]}]
}

This example counts the total number of times each topic is used.

Query
Results

Global ACCUM Example

# Show number of total posts by topic
CREATE QUERY userPostsByTopic() FOR GRAPH socialNet {
  MapAccum<STRING, INT> @@postTopicCounts;
  start = {person.*};

  # Append subject and update the appearance count in the global map accum
  posts = SELECT g FROM start -(posted)-> :g
		  ACCUM @@postTopicCounts += (g.subject -> 1);

  PRINT @@postTopicCounts;
}

Results for Query userPostsByTopic

GSQL > RUN QUERY userPostsByTopic()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"@@postTopicCounts": {
    "cats": 5,
    "coffee": 1,
    "query languages": 1,
    "Graphs": 2,
    "tigergraph": 3
  }}]
}

This is an example of using ACCUM and POST-ACCUM in conjunction. The ACCUM traverses the graph and finds all people who live and work in the same country. After this is determined, POST-ACCUM examines each vertex (person) to see if they work where they live.

Query
Results

Vertex POST-ACCUM Example

#Show all person who both work and live in the same country
CREATE QUERY residentEmployees() FOR GRAPH workNet {

  ListAccum<STRING> @company;
  OrAccum @worksAndLives;

  start = {person.*};


  employees = SELECT s FROM start:s -(worksFor)-> :c
              #If a person works for a company in the same country where they live
              # add the company to the list
              ACCUM CASE WHEN (s.locationId == c.country) THEN
                           s.@company += c.id
                         END

              #Check each vertex and see if a person works where they live
              POST-ACCUM CASE WHEN (s.@company.size() > 0) THEN
                           s.@worksAndLives += True
                         ELSE
                           s.@worksAndLives += False
                         END;

  PRINT employees WHERE (employees.@worksAndLives == True);
}

residentEmployees Result

GSQL > RUN QUERY residentEmployees()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"employees": [
    {
      "v_id": "person11",
      "attributes": {
        "interestList": [
          "sport",
          "football"
        ],
        "skillSet": [10],
        "skillList": [10],
        "@worksAndLives": true,
        "locationId": "can",
        "interestSet": [ "football", "sport" ],
        "id": "person11",
        "@company": ["company5"]
      },
      "v_type": "person"
    },
    {
      "v_id": "person10",
      "attributes": {
        "interestList": [ "football", "sport" ],
        "skillSet": [3],
        "skillList": [3],
        "@worksAndLives": true,
        "locationId": "us",
        "interestSet": [ "sport", "football" ],
        "id": "person10",
        "@company": ["company1"]
      },
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {
        "interestList": [ "management", "financial" ],
        "skillSet": [ 3, 2, 1 ],
        "skillList": [ 1, 2, 3 ],
        "@worksAndLives": true,
        "locationId": "us",
        "interestSet": [ "financial", "management" ],
        "id": "person1",
        "@company": ["company1"]
      },
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {
        "interestList": ["engineering"],
        "skillSet": [ 6, 5, 3, 2 ],
        "skillList": [ 2, 3, 5, 6 ],
        "@worksAndLives": true,
        "locationId": "chn",
        "interestSet": ["engineering"],
        "id": "person2",
        "@company": ["company2"]
      },
      "v_type": "person"
    }
  ]}]
}

This is an example of a POST-ACCUM only that counts the number people with a particular gender.

Query
Results

Global POST-ACCUM Example

#Count the number of person of	a given	gender
CREATE QUERY personGender(STRING gender) FOR GRAPH socialNet {

  SumAccum<INT> @@genderCount;

  start = {ANY};

  # Select all person vertices and check the gender attribute
  friends = SELECT v FROM start:v
            WHERE v.type == "person"

            POST-ACCUM CASE WHEN (start.gender == gender) THEN
                         @@genderCount += 1
                       END;

  PRINT @@genderCount;
}

Results for Query personGender

GSQL > RUN QUERY personGender("Female")
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"@@genderCount": 3}]
}

HAVING Clause

The optional HAVING clause provides constraints on the result set of the SELECT. The constraints are applied after ACCUM and POST-ACCUM actions. This differs from the WHERE clause, which is applied before the ACCUM and POST-ACCUM actions.

EBNF for HAVING Clause

havingClause := HAVING condition

The condition in a HAVING clause is applied to each vertex in the SELECT set (either source or target vertices) which also fulfilled the FROM and WHERE conditions. The HAVING clause is intended to test one or more of the accumulator variables that were updated in the ACCUM or POST-ACCUM clause, though the condition may be anything that equates to a boolean value. If the condition is false for a particular vertex, then that vertex is excluded from the result set.

The following example demonstrates using the HAVING clause to constrain a result set based on the vertex accumulator variable which was updated during the ACCUM clause.

Query
Results

Example 1. HAVING

# find all persons meeting a given activityThreshold, based on how many posts or likes a person has made
CREATE QUERY activeMembers(int activityThreshold) FOR GRAPH socialNet
{
        SumAccum<int> @activityAmount;
        start = {person.*};
        result = SELECT v FROM start:v -(:e)-> post:tgt
                          ACCUM v.@activityAmount +=1
	                      HAVING v.@activityAmount >= activityThreshold;
        PRINT result;
}

If the activityThreshold parameter is set to 3, the query returns 5 vertices:

Example 1 Results

GSQL > RUN QUERY activeMembers(3)
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"result": [
    {
      "v_id": "person7",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 3,
        "id": "person7"
      },
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {
        "gender": "Female",
        "@activityAmount": 3,
        "id": "person5"
      },
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 3,
        "id": "person6"
      },
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {
        "gender": "Female",
        "@activityAmount": 3,
        "id": "person2"
      },
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 3,
        "id": "person8"
      },
      "v_type": "person"
    }
  ]}]
}

If the activityThreshold parameter is set to 2, the query would return 8 vertices. With activityThreshold = 4, the query would return no vertices.

The following example demonstrates the equivalence of a SELECT statement in which the condition for the HAVING clause is always true.

Query
Results

Example 2. HAVING with literal condition

# find all person meeting a given activityThreshold, based on how many posts or likes a person has made
CREATE QUERY printMemberActivity() FOR GRAPH socialNet
{
        SumAccum<int> @activityAmount;
        start = {person.*};

		### --- equivalent statements -----
        result = SELECT v FROM start:v -(:e)-> post:tgt
					      ACCUM v.@activityAmount +=1
					      HAVING true;

		result = SELECT v FROM start:v -(:e)-> post:tgt
					      ACCUM v.@activityAmount +=1;
		### -----

        PRINT result;
}

Results from Query printMemberActivity

GSQL > RUN QUERY printMemberActivity()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"result": [
    {
      "v_id": "person4",
      "attributes": {
        "gender": "Female",
        "@activityAmount": 4,
        "id": "person4"
      },
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 4,
        "id": "person3"
      },
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 6,
        "id": "person7"
      },
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 4,
        "id": "person1"
      },
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {
        "gender": "Female",
        "@activityAmount": 6,
        "id": "person5"
      },
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 6,
        "id": "person6"
      },
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {
        "gender": "Female",
        "@activityAmount": 6,
        "id": "person2"
      },
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {
        "gender": "Male",
        "@activityAmount": 6,
        "id": "person8"
      },
      "v_type": "person"
    }
  ]}]
}

The following shows an example of equivalent result sets from using WHERE vs. HAVING. Recall that the WHERE clause is evaluated before the ACCUM and that the HAVING clause is evaluated after the ACCUM. Both constrain the result set based on a condition that vertices must meet.

Query
Results

Example 3. HAVING vs. WHERE

# Compute the total post activity for each male person.
# Because the gender of the vertex does not change, evaluating whether the person vertex
# is male before (WHERE) the ACCUM clause or after (HAVING) the ACCUM clause does not
# change the result. However, if the condition in the HAVING clause could change within
# the ACCUM clause, these statements would produce different results.

CREATE QUERY activeMaleMembers() FOR GRAPH socialNet
{
    SumAccum<INT> @activityAmount;
    start = {person.*};

    ### --- statements produce equivalent results
    result1 = SELECT v FROM start:v -(:e)-> post:tgt
                      WHERE v.gender == "Male"
                      ACCUM v.@activityAmount +=1;

    result2 = SELECT v FROM start:v -(:e)-> post:tgt
                      ACCUM v.@activityAmount +=1
                      HAVING v.gender == "Male";

    PRINT result2[result2.@activityAmount];
    PRINT result2[result2.@activityAmount];
}

Results from Query ActiveMaleMembers

GSQL > RUN QUERY activeMaleMembers()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [
    {"result2": [
      {
        "v_id": "person3",
        "attributes": {"result2.@activityAmount": 4},
        "v_type": "person"
      },
      {
        "v_id": "person7",
        "attributes": {"result2.@activityAmount": 6},
        "v_type": "person"
      },
      {
        "v_id": "person1",
        "attributes": {"result2.@activityAmount": 4},
        "v_type": "person"
      },
      {
        "v_id": "person6",
        "attributes": {"result2.@activityAmount": 6},
        "v_type": "person"
      },
      {
        "v_id": "person8",
        "attributes": {"result2.@activityAmount": 6},
        "v_type": "person"
      }
    ]},
    {"result2": [
      {
        "v_id": "person3",
        "attributes": {"result2.@activityAmount": 4},
        "v_type": "person"
      },
      {
        "v_id": "person7",
        "attributes": {"result2.@activityAmount": 6},
        "v_type": "person"
      },
      {
        "v_id": "person1",
        "attributes": {"result2.@activityAmount": 4},
        "v_type": "person"
      },
      {
        "v_id": "person6",
        "attributes": {"result2.@activityAmount": 6},
        "v_type": "person"
      },
      {
        "v_id": "person8",
        "attributes": {"result2.@activityAmount": 6},
        "v_type": "person"
      }
    ]}
  ]
}

The following example has a compilation error because the result set is taken from the source vertices, but the HAVING condition is checking the target vertices.

Query
Results

Example 4. HAVING the wrong vertex set

# find all person having a post subject about cats
# This query is illegal because the having condition is testing the wrong vertex set
CREATE QUERY printMemberAboutCats() FOR GRAPH socialNet
{
        start = {person.*};

        result = SELECT v FROM start:v -(:e)-> post:tgt
                          HAVING tgt.subject == "cats";
        PRINT result;
}

Compilation Error for printMemberAboutCats

> gsql printMemberAboutCats.gsql
Semantic Check Error in query printMemberAboutCats (SEM-50): line 8, col 33
The SELECT block selects src, but the HAVING clause uses tgt

ORDER BY Clause

The optional ORDER BY clause sorts the result set.

EBNF for ORDER BY Clause

orderClause := ORDER BY expr [ASC | DESC] ["," expr [ASC | DESC]]*

ASC specifies ascending order (least value first), and DESC specifies descending order (greatest value first). If neither is specified, then ascending order is used. Each expr must refer to the attributes or accumulators of a member of the result set, and the expr must evaluate to a sortable value (e.g., a number or a string). ORDER BY offers hierarchical sorting by allowing a comma-separated list of expressions, sorting first by the leftmost expr. It uses the next expression only to sort items where the current sort expr results in identical values. Any items in the result set which cannot be sorted (because the sort expressions do not pertain to them) will appear at the end of the set, after the sorted items.

In tabular SELECT queries, the ORDER BY expressions may only be SELECT column aliases.

The following example demonstrates the use of ORDER BY with multiple expressions. The returned vertex set is first ordered by the number of friends of the vertex, and then ordered by the number of coworkers of that vertex.

Query
Results

topPopular.gsql: ORDER BY Descending

# find the most popular people, sorting first based on the number as friends
# and then in case of a tie by the number of coworkers
CREATE QUERY topPopular() FOR GRAPH friendNet
{
	SumAccum<INT> @numFriends;
	SumAccum<INT> @numCoworkers;
	start = {person.*};

	result = SELECT v FROM start -((friend|coworker):e)-> person:v
	       	 ACCUM CASE WHEN e.type == "friend" THEN v.@numFriends += 1
		       	    WHEN e.type == "coworker" THEN v.@numCoworkers += 1
		       END
		 ORDER BY v.@numFriends DESC, v.@numCoworkers DESC;

	PRINT result;
}

topPopular.json

GSQL > RUN QUERY topPopular()
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"result": [
    {
      "v_id": "person9",
      "attributes": {
        "@numCoworkers": 3,
        "@numFriends": 5,
        "id": "person9"
      },
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {
        "@numCoworkers": 1,
        "@numFriends": 4,
        "id": "person8"
      },
      "v_type": "person"
    },
    {
      "v_id": "person12",
      "attributes": {
        "@numCoworkers": 1,
        "@numFriends": 4,
        "id": "person12"
      },
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {
        "@numCoworkers": 4,
        "@numFriends": 3,
        "id": "person6"
      },
      "v_type": "person"
    },
    {
      "v_id": "person1",
      "attributes": {
        "@numCoworkers": 3,
        "@numFriends": 3,
        "id": "person1"
      },
      "v_type": "person"
    },
    {
      "v_id": "person4",
      "attributes": {
        "@numCoworkers": 5,
        "@numFriends": 2,
        "id": "person4"
      },
      "v_type": "person"
    },
    {
      "v_id": "person3",
      "attributes": {
        "@numCoworkers": 3,
        "@numFriends": 2,
        "id": "person3"
      },
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {
        "@numCoworkers": 3,
        "@numFriends": 2,
        "id": "person2"
      },
      "v_type": "person"
    },
    {
      "v_id": "person10",
      "attributes": {
        "@numCoworkers": 1,
        "@numFriends": 2,
        "id": "person10"
      },
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {
        "@numCoworkers": 6,
        "@numFriends": 1,
        "id": "person7"
      },
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {
        "@numCoworkers": 5,
        "@numFriends": 1,
        "id": "person5"
      },
      "v_type": "person"
    },
    {
      "v_id": "person11",
      "attributes": {
        "@numCoworkers": 1,
        "@numFriends": 1,
        "id": "person11"
      },
      "v_type": "person"
    }
  ]}]
}

LIMIT Clause

The optional LIMIT clause sets constraints on the number and ranking of items included in the final result set.

EBNF for LIMIT Clause

limitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )

Each of the expr must evaluate to a nonnegative integer. To understand LIMIT, note that the tentative result set is held in the computer as a list of vertices. If the query has an ORDER BY clause, the order is specified; otherwise the list order is unknown. Assume we number the vertices as v_1 , v_2 , …, v_n . The LIMIT clause specifies a range of vertices, starting from a lower position in the list to an upper position.

There are three forms:

LIMIT scenarios

result = SELECT v FROM S -(:e)-> :v LIMIT k;		  # case 1: k = Count
result = SELECT v FROM S -(:e)-> :v LIMIT j, k; 	  # case 2: j = Offset from the start of the list, k = Count
result = SELECT v FROM S -(:e)-> :v LIMIT k OFFSET j; # case 3: k = Count, j = Offset from the start of the list

Case 1: LIMIT k

When a single expr is provided, LIMIT returns the first k elements from the tentative result set. If there are fewer than k elements available, then all elements will be returned in the result set. If k=5 and the tentative result set has at least 5 items, then the final result list will be [ v_1 , v_2 , v_3 , v_4 , v_5 ].

Case 2: LIMIT j, k

When a comma separates two expressions, LIMIT treats the first expression j as an offset. That is, it skips the first j items in the list. The second expr k tells the maximum number of items items to include. If the list has at least 7 items, then LIMIT 2, 5 would return [ v_3 , v_4 , v_5, v_6 , v_7 ].

Case 3: LIMIT k OFFSET j

The behavior of Case 3 is the same as that of Case 2, except that the syntax is different. The keyword OFFSET separates the two expressions, and the count comes before the offset, rather than vice versa. If the list has at least 7 items, then LIMIT 5 OFFSET 2 would return [ v_3 , v_4 , v_5, v_6 , v_7 ].

If any of the expressions evaluate to a negative integer, the results are undefined.

OFFSET is intended for result sets which are in a known order. It is a compile time error to use OFFSET without the ORDER BY clause.

The following examples demonstrate the various forms of the LIMIT clause.

The first example shows the LIMIT clause when used as an upper limit. It returns a result set with a maximum size of 4 elements in the set.

Query
Results

limitEx1.gsql: LIMIT by some number

CREATE QUERY limitEx1(INT k) FOR GRAPH friendNet
{
        start = {person.*};

        result1 = SELECT v FROM start:v
                ORDER BY v.id
                LIMIT k;

		PRINT result1[result1.id]; // api v2
}

limit1Ex.json Results

GSQL > RUN QUERY limitEx1(4)
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"result1": [
    {
      "v_id": "person1",
      "attributes": {"result1.id": "person1"},
      "v_type": "person"
    },
    {
      "v_id": "person10",
      "attributes": {"result1.id": "person10"},
      "v_type": "person"
    },
    {
      "v_id": "person11",
      "attributes": {"result1.id": "person11"},
      "v_type": "person"
    },
    {
      "v_id": "person12",
      "attributes": {"result1.id": "person12"},
      "v_type": "person"
    }
  ]}]
}

The following example shows how to use the LIMIT clause with an offset.

Query
Results

limit2Ex.gsql: LIMIT with lower-bound and size

CREATE QUERY limitEx2(INT j, INT k) FOR GRAPH friendNet
{
        start = {person.*};
        result2 = SELECT v FROM start:v
                ORDER BY v.id
                LIMIT j, k;

        PRINT result2[result2.id]; // api v2
}

limit2Ex.json Results

GSQL > RUN QUERY limitEx2(2,3)
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"result2": [
    {
      "v_id": "person11",
      "attributes": {"result2.id": "person11"},
      "v_type": "person"
    },
    {
      "v_id": "person12",
      "attributes": {"result2.id": "person12"},
      "v_type": "person"
    },
    {
      "v_id": "person2",
      "attributes": {"result2.id": "person2"},
      "v_type": "person"
    }
  ]}]
}

The following example shows the alternative syntax for a result size limit with an offset. This time we try larger values for offset and size. In a large data set, limitTest(5,20) might return 20 vertices, but since we don’t have 25 vertices in the original data, the output was fewer than 20 vertices.

Query
Results

limit3Ex.gsql: LIMIT with OFFSET

CREATE QUERY limitEx3(INT j, INT k) FOR GRAPH friendNet
{
        start = {person.*};

        result3 = SELECT v FROM start:v
                ORDER BY v.id
                LIMIT k OFFSET j;

        PRINT result3[result3.id]; // api v2
}

limit3Ex.json Results

GSQL > RUN QUERY limitEx3(5,20)
{
  "error": false,
  "message": "",
  "version": {
    "edition": "developer",
    "schema": 0,
    "api": "v2"
  },
  "results": [{"result3": [
    {
      "v_id": "person3",
      "attributes": {"result3.id": "person3"},
      "v_type": "person"
    },
    {
      "v_id": "person4",
      "attributes": {"result3.id": "person4"},
      "v_type": "person"
    },
    {
      "v_id": "person5",
      "attributes": {"result3.id": "person5"},
      "v_type": "person"
    },
    {
      "v_id": "person6",
      "attributes": {"result3.id": "person6"},
      "v_type": "person"
    },
    {
      "v_id": "person7",
      "attributes": {"result3.id": "person7"},
      "v_type": "person"
    },
    {
      "v_id": "person8",
      "attributes": {"result3.id": "person8"},
      "v_type": "person"
    },
    {
      "v_id": "person9",
      "attributes": {"result3.id": "person9"},
      "v_type": "person"
    }
  ]}]
}