SELECT Statement

NOTE: There are now two versions of the grammar for the SELECT Statement. This section covers both what is common in both versions (which is the majority of the grammar) plus the details for the original V1 syntax. The details of the V2 syntax are described in the Pattern Matching tutorial. A query indicates which grammar to use in the SYNTAX clause of the CREATE QUERY header. V1 is the default.

The key differences between the two grammars:

  1. In V1 (default), each SELECT statement can traverse one hop (from a set of vertices to their adjacent vertices).

    • To write a multi-hop query, you write a sequence of SELECT statements.

    • The traversal action is from left to right, and the notation uses "ASCII art" to depict a connection, either with a rightward facing arrowhead or no arrowhead: Start:s -( Edges:e )-> Target:t or Start:s -( Edges:e )- Target:t

  2. In V2, each SELECT statement can traverse a multi-hop path.

    • The traversal direction is under the control of the query writer, with arrowheads on each edge set to show the direction.

    • There is no arrowhead outside the parentheses:

      Start:s -( (ForwardEdge> | <BackwardEdge):e )- Target:t

    • Users can write paths which explicitly show multiple hops, and they can use a Kleene star (*) to indicate repetition. Start:s -(Edge1>:e1)- Middle:m -(<Edge2:e2)- Target:t or Start:s -(Edge*1..3)- Target:t

This section discusses the SELECT statement in depth and covers the following EBNF syntax:

EBNF for Select Statement
selectStmt := vertexSetName "=" selectBlock
selectBlock := SELECT alias [fromClause]
[sampleClause]
[whereClause]
[accumClause]
[postAccumClause]*
[havingClause]
[orderClause]
[limitClause]
fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)
step := stepSourceSet ["-" "(" stepEdgeSet ")" "-"[">"] stepVertexSet]
stepV2 := stepVertexSet ["-" "(" stepEdgeSet ")" "-" stepVertexSet]
stepSourceSet := vertexSetName [":" vertexAlias]
stepEdgeSet := [stepEdgeTypes] [":" edgeAlias]
stepVertexSet := [stepVertexTypes] [":" vertexAlias]
alias := (vertexAlias | edgeAlias)
vertexAlias := name
edgeAlias := name
stepEdgeTypes := atomicEdgeType | "(" edgeSetType ["|" edgeSetType]* ")"
atomicEdgeType := "_" | ANY | edgeSetType
edgeSetType := edgeType | paramName | "@@"accumName
stepVertexTypes := atomicVertexType | "(" vertexSetType ["|" vertexSetType]* ")"
atomicVertexType := "_" | ANY | vertexSetType
vertexSetType := vertexType | paramName | "@@"accumName
#----------# Pattern Matching #----------#
pathPattern := stepVertexSet ["-" "(" pathEdgePattern ")" "-" stepVertexSet]*
pathEdgePattern := atomicEdgePattern
| "(" pathEdgePattern ")"
| pathEdgePattern "." pathEdgePattern
| disjPattern
| starPattern
atomicEdgePattern := atomicEdgeType
| atomicEdgeType ">"
| "<" atomicEdgeType
disjPattern := atomicEdgePattern ("|" atomicEdgePattern)*
starPattern := ([atomicEdgePattern] | "(" disjPattern ")") "*" [starBounds]
starBounds := CONST_INT ".." CONST_INT
| CONST_INT ".."
| ".." CONST_INT
| CONST_INT
#----------------------------------------#
sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition
| SAMPLE expr TARGET WHEN condition
| SAMPLE expr "%" TARGET PINNED WHEN condition
whereClause := WHERE condition
accumClause := [perClauseV2] ACCUM dmlSubStmtList
perClauseV2 := PER "(" alias ["," alias] ")"
postAccumClause := POST-ACCUM dmlSubStmtList
dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*
dmlSubStmt := assignStmt // Assignment
| funcCallStmt // Function Call
| gAccumAccumStmt // Assignment
| lAccumAccumStmt // Assignment
| attrAccumStmt // Assignment
| vAccumFuncCall // Function Call
| localVarDeclStmt // Declaration
| dmlSubCaseStmt // Control Flow
| dmlSubIfStmt // Control Flow
| dmlSubWhileStmt // Control Flow
| dmlSubForEachStmt // Control Flow
| BREAK // Control Flow
| CONTINUE // Control Flow
| insertStmt // Data Modification
| dmlSubDeleteStmt // Data Modification
| printlnStmt // Output
| logStmt // Output
vAccumFuncCall := vertexAlias "." "@"accumName ("." funcName "(" [argList] ")")+
havingClause := HAVING condition
orderClause := ORDER BY expr [ASC | DESC] ["," expr [ASC | DESC]]*
limitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )

The SELECT block uses a step pattern or path pattern to select some of the graph's vertices and edges. There are a number of optional clauses that define and/or refine the selection by constraining the vertex or edge set or the result set. The final output of a query is a vertex set, known as the result set .

Size limitation

There is a maximum size limit of 2GB for the result set of a SELECT block . If the result of the SELECT block is larger than 2GB, the system will return no data. NO error message is produced.

SELECT Statement Data Flow

The SELECT statement is an assignment statement with a SELECT block on the right hand side. The SELECT block has many possible clauses, which fit together in a logical flow. Overall, the SELECT block starts from a source set of vertices and returns a result set that is either a subset of the source vertices or a subset of their neighboring vertices. Along the way, computations can be performed on the selected vertices and edges. The figure below graphically depicts the overall SELECT data flow. While the ACCUM and POST-ACCUM clauses do not directly affect which vertices are included in the result set, they affect the data (accumulators) which are attached to those vertices.

FROM Clause: Vertex and Edge Sets

In classic (Syntax v1) GSQL, the FROM clause described one step or hop pattern. In Pattern Matching (Syntax v2), the pattern can be multiple hops long. Path patterns also have many other options for finer control and greater flexibility.

FROM clause
fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)

A hop or step consists of going from a starting set of vertices, crossing over a set of their edges, to an ending set of vertices. We typically use the names Source and Target for the starting and ending vertex sets: Source -(Edges)-> Target

The step pattern defines constraints for the Source set, the Edge set, and the Target set. The result of the FROM clause can be interpreted as a 3-column virtual table called the match table. Each row is a 3-element tuple: (source vertex, connected edge, target vertex).

Source Vertex Set (SYNTAX v1)

Notice that the edge set and target set are optional: a step can be just source vertices (stepSourceSet).

EBNF for source-only pattern (SYNTAX v1)
step := stepSourceSet ["-" "(" stepEdgeSet ")" "-"[">"] stepVertexSet]

Rules for Source Vertex Set in Syntax V1:

  1. The source set may only be a vertexSetName. If this the first SELECT statement in the query, then the vertexSetName is generally created as a seedsSet:

  2. The vertexSetName is optionally followed by an alias, which used in subsequent clauses to refer to the source set:

    stepSourceSet := vertexSetName [":" vertexAlias]
    vertexAlias := name

For example:

resultSet = SELECT s FROM Source:s;

This statement can be interpreted as "Select all vertices s, from the vertex set Source ." The result is a vertex set. Below is a simple example of a vertex selection.

Vertex SELECT example
# displays all 'post'-type vertices
CREATE QUERY printAllPosts() FOR GRAPH socialNet
{
start = {post.*}; # initialized with all vertices of type 'post'
results = SELECT s FROM start:s; # select these vertices
PRINT results;
}
Results of Query printAllPosts
GSQL > RUN QUERY printAllPosts()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"results": [
{
"v_id": "0",
"attributes": {
"postTime": "2010-01-12 11:22:05",
"subject": "Graphs"
},
"v_type": "post"
},
{
"v_id": "10",
"attributes": {
"postTime": "2011-02-04 03:02:31",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "2",
"attributes": {
"postTime": "2011-02-03 01:02:42",
"subject": "query languages"
},
"v_type": "post"
},
{
"v_id": "4",
"attributes": {
"postTime": "2011-02-07 05:02:51",
"subject": "coffee"
},
"v_type": "post"
},
{
"v_id": "9",
"attributes": {
"postTime": "2011-02-05 23:12:42",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "3",
"attributes": {
"postTime": "2011-02-05 01:02:44",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "5",
"attributes": {
"postTime": "2011-02-06 01:02:02",
"subject": "tigergraph"
},
"v_type": "post"
},
{
"v_id": "7",
"attributes": {
"postTime": "2011-02-04 17:02:41",
"subject": "Graphs"
},
"v_type": "post"
},
{
"v_id": "1",
"attributes": {
"postTime": "2011-03-03 23:02:00",
"subject": "tigergraph"
},
"v_type": "post"
},
{
"v_id": "11",
"attributes": {
"postTime": "2011-02-03 01:02:21",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "8",
"attributes": {
"postTime": "2011-02-03 17:05:52",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "6",
"attributes": {
"postTime": "2011-02-05 02:02:05",
"subject": "tigergraph"
},
"v_type": "post"
}
]}]
}

1-Hop Step (SYNTAX v1)

Usually, a FROM clause has a full 1-hop step.

step := stepSourceSet ["-" "(" stepEdgeSet ")" "-"[">"] stepVertexSet]

The symbols -( and )- enclose the stepEdgeSet and separate the three parts. Each of the three parts may also define an alias, which makes a convenient way to refer to each of the three sets of entities.

stepSourceSet := vertexSetName [":" vertexAlias]
stepEdgeSet := [setEdgeTypes] [":" edgeAlias]
stepVertexSet := [setVertexTypes] [":" vertexAlias]

Below is a simple example:

Person:s -( (Bought|Rented):e )- (Product|Service):t

The Source set is all Persons, but the pattern will not include all Persons. It will only include those Persons who Bought or Rented a Product or Service. Moreover, the result will be a set of matched triples: (s, e, t). For example, if Sam bought a TV and Andy rented a car, the results will include (Sam, Bought, TV) and (Andy, Rented, Car). However, these two facts do not imply (Sam, Rented, Car) or (Andy, Rented, TV).

NOTE: The GSQL grammar allows the righthand enclosure to be either )- or )->. Previously, we recommended the arrowhead )-> . However, for better compatibility with the new V2 pattern matching syntax, we now recommend the headless version: )- Syntax v2 has a specific semantic meaning for the arrowheads.

Edge Set and Target Vertex Set Options

Rules for Edge Set and Target Set in Syntax V1:

  1. The edge set (stepEdgeSet) and target vertex set (stepVertexSet) obey very similar rules. Each may be a set specifier (stepEdgeTypes or stepVertexTypes) followed optionally by an alias.

  2. The set specifier can be any of the following:

    • "_" or "ANY" or <blank>, which means any edge/vertex.

    • a named type (edgeType or vertexType)

    • a global SetAcum accumulator containing a set of edges or vertices ("@@"accumName)

    • a string or string set parameter which names one or more edge or vertex types (paramName) This parameterized type is one aspect of Dynamic Querying.

    • A list of types, string parameters, or global accumulators, e.g., "(" edgeSetType ["|" edgeSetType]* ")"

    • The list of types must contain the same type of specifiers.

      stepEdgeSet := [setEdgeTypes] [":" edgeAlias]
      stepVertexSet := [setVertexTypes] [":" vertexAlias]
      alias := (vertexAlias | edgeAlias)
      vertexAlias := name
      edgeAlias := name
      stepEdgeTypes := atomicEdgeType
      | "(" edgeSetType ["|" edgeSetType]* ")"
      atomicEdgeType := "_" | ANY | edgeSetType
      edgeSetType := edgeType | paramName | "@@"accumName
      stepVertexTypes := atomicVertexType
      | "(" vertexSetType ["|" vertexSetType]* ")"
      atomicVertexType := "_" | ANY | vertexSetType
      vertexSetType := vertexType | paramName | "@@"accumName

notation

accepted vertex/edge types

(empty)

any type

_

any type

ANY

any type

vertex/edge type name

given type

string parameter holding a vertex/edge type name

given type

global SetAccum containing vertices or edges

given set

(name | name ...)

UNION of two or more collections

Parentheses are always needed around the edge type in a FROM clause: FROM Source:s -(eType:e)- Target:t

Parentheses are also needed if a vertexEdgeType is the union of more than one or more individual edge types or vertex types: FROM Source:s -((eType1 | eType2):e) - Target:t Note the double set of parentheses for the edge specifier. If these is an edge alias, these parentheses are needed. If however there is no edge alias, it is legal to have just a single set of parentheses: FROM Source:s -(eType1 | eType2) - Target:t

Either the source vertex set ( s ) or target vertex set ( t ) can be used as the SELECT argument, which determines the result of the SELECT statement. Note the small difference in the two SELECT statements below.

Selecting source or target vertices from edge-induced selection
resultSet1 = SELECT s FROM source:s-(eType:e)-tType:t; //select from the source set
resultSet2 = SELECT t FROM source:s-(eType:e)-tType:t; //select from the target set

resultSet1 is based on the source end of the edges. resultSet2 is based on the target end of the selected edges. However, resultSet1 is NOT identical to the Source vertex set. It is only those members of Source which connect to an eType edge and then to a tType vertex. Other clauses (presented later in this "SELECT Statement" section, can do additional filtering of the Source set.

We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features which are only available to vertex and edge aliases.

If is legal to declare an alias without explicitly stating an edge/target type. See the examples below.

Target vertex type inference
resultSet3 = SELECT v FROM Source:v-(eType:e)->(V1|V2):t;
resultSet4 = SELECT v FROM Source:v-(eType:e)->:t;
resultSet5 = SELECT v FROM Source:v-(eType:e)->ANY:t;
resultSet6 = SELECT v FROM Source:v-(eType:e)->_:t;
Edge type inference
resultSet7 = SELECT v FROM Source:v-((E1|E2|E3):e)->tType:t;
resultSet8 = SELECT v FROM Source:v-(:e)->tType:t;
resultSet9 = SELECT v FROM Source:v-(_:e)->tType:t;
resultSet10 = SELECT v FROM Source:v-(ANY:e)->tType:t;

The following are a set of queries that demonstrate edge-induced SELECT blocks. The allPostsLiked and allPostsMade queries show how the target vertex type can be omitted. The allPostsLikedOrMade query uses the "|" operator to select multiple types of edges.

Edge induced SELECT example
# uses various SELECT statements (some of which are equivalent) to print out
# either the posts made by the given user, the posts liked by the given
# user, or the posts made or liked by the given user.
CREATE QUERY printAllPosts2(vertex<person> seed) FOR GRAPH socialNet
{
start = {seed}; # initialize starting set of vertices
# --- statements produce equivalent results
# select all 'post' vertices which can be reached from 'start' in one hop
# using an edge of type 'liked'
allPostsLiked = SELECT targetVertex FROM start -(liked:e)-> post:targetVertex;
# select all vertices of any type which can be reached from 'start' in one hop
# using an edge of type 'liked'
allPostsLiked = SELECT targetVertex FROM start -(liked:e)-> :targetVertex;
# ----
# --- statements produce equivalent results
# start with the vertex set from above, and traverse all edges of type "posted"
# (locally those edges are just given a name 'e' in case they need accessed)
# and return all vertices of type 'post' which can be reached within one-hop of 'start' vertices
allPostsMade = SELECT targetVertex FROM start -(posted:e)-> post:targetVertex;
# start with the vertex set from above, and traverse all edges of type "posted"
# (locally those edges are just given a name 'e' in case they need accessed)
# and return all vertices of any type which can be reached within one-hop of 'start' vertices
allPostsMade = SELECT targetVertex FROM start -(posted:e)-> :targetVertex;
# ----
# --- statements produce equivalent results
# select all vertices of type 'post' which can be reached from 'start' in one hop
# using an edge of any type
# not equivalent to any statement. because it doesn't restrict the edge type,
# this will include any vertex connected by 'liked' or 'posted' edge types
allPostsLikedOrMade = SELECT t FROM start -(:e)-> t;
# select all vertices of type 'post' which can be reached from 'start' in one hop
# using an edge of type either 'posted' or 'liked'
allPostsLikedOrMade = SELECT t FROM start -((posted|liked):e)-> post:t;
# select all vertices of any type which can be reached from 'start' in one hop
# using an edge of type either 'posted' or 'liked/
allPostsLikedOrMade = SELECT t FROM start -((posted|liked):e)-> :t;
#option for simplified parentheses in edge pattern:
allPostsLikedOrMade = SELECT t FROM start - (posted|liked)-> :t
# ----
PRINT allPostsLiked;
PRINT allPostsMade;
PRINT allPostsLikedOrMade;
}
Results of Query printAllPosts2
GSQL > RUN QUERY printAllPosts2("person2")
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [
{"allPostsLiked": [
{
"v_id": "0",
"attributes": {
"postTime": "2010-01-12 11:22:05",
"subject": "Graphs"
},
"v_type": "post"
},
{
"v_id": "3",
"attributes": {
"postTime": "2011-02-05 01:02:44",
"subject": "cats"
},
"v_type": "post"
}
]},
{"allPostsMade": [{
"v_id": "1",
"attributes": {
"postTime": "2011-03-03 23:02:00",
"subject": "tigergraph"
},
"v_type": "post"
}]},
{"allPostsLikedOrMade": [
{
"v_id": "0",
"attributes": {
"postTime": "2010-01-12 11:22:05",
"subject": "Graphs"
},
"v_type": "post"
},
{
"v_id": "3",
"attributes": {
"postTime": "2011-02-05 01:02:44",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "1",
"attributes": {
"postTime": "2011-03-03 23:02:00",
"subject": "tigergraph"
},
"v_type": "post"
}
]}
]
}
GSQL > RUN QUERY printAllPosts2("person6")
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [
{"allPostsLiked": [{
"v_id": "8",
"attributes": {
"postTime": "2011-02-03 17:05:52",
"subject": "cats"
},
"v_type": "post"
}]},
{"allPostsMade": [
{
"v_id": "10",
"attributes": {
"postTime": "2011-02-04 03:02:31",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "5",
"attributes": {
"postTime": "2011-02-06 01:02:02",
"subject": "tigergraph"
},
"v_type": "post"
}
]},
{"allPostsLikedOrMade": [
{
"v_id": "10",
"attributes": {
"postTime": "2011-02-04 03:02:31",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "5",
"attributes": {
"postTime": "2011-02-06 01:02:02",
"subject": "tigergraph"
},
"v_type": "post"
},
{
"v_id": "8",
"attributes": {
"postTime": "2011-02-03 17:05:52",
"subject": "cats"
},
"v_type": "post"
}
]}
]
}

This example is another edge selection that uses the "|" operator to select edges that have target vertices of multiple types.

Edge induced SELECT example
# uses a SELECT statement to print out everything related to a given user
# this includes posts that the user liked, posts that the user made, and friends
# of the user
CREATE QUERY printAllRelatedItems(vertex<person> seed) FOR GRAPH socialNet
{
sourceVertex = {seed};
# -- statements produce equivalent output
# returns all vertices of type either 'person' or 'post' that can be reached
# from the sourceVertex set using one edge of any type
everythingRelated = SELECT v FROM sourceVertex -(:e)-> (person|post):v;
# returns all vertices of any type that can be reached from the sourceVertex
# using one edge of any type
# this statement is equivalent to the above one because the graph schema only
# has vertex types of either 'person' or 'post'. if there were more vertex
# types present, these would not be equivalent.
everythingRelated = SELECT v FROM sourceVertex -(:e)-> :v;
# --
PRINT everythingRelated;
}
Results
GSQL > RUN QUERY printAllRelatedItems("person2")
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"everythingRelated": [
{
"v_id": "0",
"attributes": {
"postTime": "2010-01-12 11:22:05",
"subject": "Graphs"
},
"v_type": "post"
},
{
"v_id": "person3",
"attributes": {
"gender": "Male",
"id": "person3"
},
"v_type": "person"
},
{
"v_id": "person1",
"attributes": {
"gender": "Male",
"id": "person1"
},
"v_type": "person"
},
{
"v_id": "3",
"attributes": {
"postTime": "2011-02-05 01:02:44",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "1",
"attributes": {
"postTime": "2011-03-03 23:02:00",
"subject": "tigergraph"
},
"v_type": "post"
}
]}]
}
GSQL > RUN QUERY printAllRelatedItems("person6")
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"everythingRelated": [
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"id": "person4"
},
"v_type": "person"
},
{
"v_id": "10",
"attributes": {
"postTime": "2011-02-04 03:02:31",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "5",
"attributes": {
"postTime": "2011-02-06 01:02:02",
"subject": "tigergraph"
},
"v_type": "post"
},
{
"v_id": "person8",
"attributes": {
"gender": "Male",
"id": "person8"
},
"v_type": "person"
},
{
"v_id": "8",
"attributes": {
"postTime": "2011-02-03 17:05:52",
"subject": "cats"
},
"v_type": "post"
}
]}]
}

Vertex and Edge Aliases

Vertex and edge aliases are declared within the FROM clause of a SELECT block, by using the character ":", followed by the alias name. Aliases can be accessed anywhere within the same SELECT block. They are used to reference a single selected vertex or edge of a set. It is through the vertex or edge aliases that attributes of these vertices or edges can be accessed.

For example, the following code snippets shows two different SELECT statements. The first SELECT statement starts from a vertex set called allVertices, and the vertex alias name v can access each individual vertex from allVertices. The second SELECT statement selects a set of edges. It can use the vertex alias s to reference the source vertices, or the alias t to reference the target vertices.

Vertex variables
results = SELECT v FROM allVertices:v;
results = SELECT t FROM allVertices:s -()-> :t;

The following example shows an edge-based SELECT statement, declaring aliases for all three parts of the edge. In the ACCUM clause, the e and t aliases are assigned to local vertex and edge variables.

Edge variables
results = SELECT v
FROM allVertices:s -(:e)-> :t
ACCUM VERTEX v = t, EDGE eg = e;

We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features which are only available to vertex and edge aliases.

Pattern Matching (SYNTAX v2)

We give a brief overview of Pattern Matching syntax and semantics in the FROM clause, from the perspective of the formal notation and how the rules can be inferred from that. For a more practical explanation of Pattern Matching, we recommend the GSQL102 Pattern Matching tutorial.

There are three options for the syntax of the pattern in a FROM clause:

fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)
  1. step pattern, SYNTAX v1. This is the default syntax from classic GSQL.

  2. stepV2 pattern, SYNTAX v2. The query or the GSQL session must specify that SYNTAX v2 is to be used. This is still a 1-Hop query, but there are two differences:

    • The source vertex set now has the same flexibility as target vertex. It is not necessary to create a seedSet. Instead, the source can be one more more vertex types, given either statically or dynamically as string parameters, or the special symbol ANY or _.

    • The arrowhead -( )-> should not be used between the edge set and the target set. Simple use a dash -( )-.

  3. One or more pathPattern s.

A pathPattern begins with a stepVertexSet and then has one or more hops across and edge set to a target set: -(pathEdgePattern)-setVertexSet.

pathPattern := stepVertexSet ["-" "(" pathEdgePattern ")" "-" stepVertexSet]*

We have already studied stepVertexSet. Now we will look at the options for pathEdgePattern and their meanings.

pathEdgePattern := atomicEdgePattern
| "(" pathEdgePattern ")"
| pathEdgePattern "." pathEdgePattern
| disjPattern
| starPattern
atomicEdgePattern := atomicEdgeType
| atomicEdgeType ">"
| "<" atomicEdgeType
atomicEdgeType := "_" | ANY | edgeSetType
edgeSetType := edgeType | paramName | "@@"accumName

The most basic form for a pathEdgePattern is an atomicEdgePattern. This in turn can be, similar to stepVertexSet, one of the following:

  • "_" or "ANY"

  • an edgeType, a string parameter, or a global SetAccum accumulator.

Edge Direction

Moreover, an atomicEdgePattern can have either a left pointer "<" on the left or a right pointer ">" on the right. These indicate edge direction, of course. If no pointer is used, then the edge is undirected. Suppose we have 3 edge types or parameters called A, B, C.

  • A> is a rightward facing A edge

  • <B is a leftward facing B edge

  • C is an undirected C edge. If C is actually a directed edge type, then there is no match.

Disjunction, Repeats, and Dot Concatenation

Looking at the remaining options for pathEdgePattern, we see that we can have parentheses around it, we can use a dot "." between two pathEdgePatterns, or we can have a disjPattern or starPattern.

disjPattern is how we write "either this edge pattern or that edge pattern""

disjPattern := atomicEdgePattern ("|" atomicEdgePattern)*

For example, we can combine the three direction-specfic examples from above: (A> | <B | C)

starPattern explains how the Kleene star and min..max range specifiers can be used to say "repeat this edge pattern from min to max times."

starPattern := ([atomicEdgePattern] | "(" disjPattern ")") "*" [starBounds]
starBounds := CONST_INT ".." CONST_INT
| CONST_INT ".."
| ".." CONST_INT
| CONST_INT

For example,(A> | <B | C)*2..4 means "a series of 2 to 4 adjacent edges, where each edge has type A>, <B, or C. Note that we do not have to use the same edge type for all the repeats. Either iteration can select an option from disjPattern.

The dot operator means concatenate the two edge patterns into one. Naturally, there must be a vertex joining the two edges, but it is omitted from the syntax. The dot operator is a shorthand, when you don't care about the type of that intermediate vertex. (A>.<B.C) means a series of 3 edges, having the specifying types and directions.

Conjunctive Pattern Matching

The optional repeating phrase ["," pathPattern]* allows you to have multiple pathPatterns. They form a conjunction, meaning all of them must be satisfied in order to have a valid match result.

fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)

Recall that each step pattern or path pattern forms a match table, one row per matching path in the graph. Each vertex alias or edge alias is one column in the table. When we have a conjunctive path, each path must share at least one vertex alias with another path. This enables the two path sets (and match tables) to be joined. Formally, we make the natural join of the two tables.

This explains all of the syntax for the FROM clause with SYNTAX v2 (Pattern Matching). The other significant area to consider for SYNTAX v2 is the ACCUM and POST-ACCUM clauses.

SAMPLE Clause

The SAMPLE clause is an optional clause that selects a uniform random sample from the population of edges or target vertices specified in the FROM argument.

If you want to sample from a set of vertices directly, not from edges or from neighboring (target) vertices, then the following technique is simpler and faster:

Select k random vertices from a vertex set S
Random = SELECT s
FROM S:s
LIMIT k;

The SAMPLE clause draws from the edge population consisting of those edges which satisfy all three parts – source set, edge type, and target type – of the FROM clause. The SAMPLE clause is intended to provide a representative sample of the distribution of edges (or vertices) connected to hub vertices, instead of dealing with all edges. A hub vertex is a vertex with a relatively high degree. (The degree of a vertex is the number of edges which connect to it. If edges are directional, one can distinguish between indegree and outdegree.)

EBNF for Sample Clause
sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition # Sample an absolute number (or a percentage) of edges for each source vertex.
| SAMPLE expr TARGET WHEN condition # Sample an absolute number of edges incident to each target vertex.
| SAMPLE expr "%" TARGET PINNED WHEN condition # Sample a percentage of edges incident to each target vertex.

The expression following SAMPLE specifies the sample size, either an absolute number or a percentage of the population. The expression in sampleClause must evaluate to a positive integer. There are two sampling methods. One is sampling based on edge id. The other is based on target vertex id: if a target vertex id is sampled, all edges from this source vertex to the sampled target vertex are sampled.

Note: Currently, the WHEN condition that can be used with a SAMPLE clause is limited strictly to checking if the result of a function call on a vertex is greater than or greater than/equal to some number.

Given that the sampling is random, some of the details of each of the example queries may change each time they are run.

The following query displays two modes of sampling: an absolute number of edges from a source vertex and a percentage of edges fro a source vertex. We use the computerNet graph (see Appendix D). In computerNet, there are 31 vertices and 43 edges, but only 7 vertices are source vertices. Moreover, c1, c12, and c23 are hub nodes, with at least 10 outgoing edges each. For the absolute count case, we set the size to 1 edge per source vertex, which is equivalent to a random walk. We expect exactly 7 edges to be selected. For the percentage sampling case, we sample 33% of the edges for vertices which have 3 or more outgoing edges. We expect about 15 edges, but the number may vary.

sampleEx3: SAMPLE based on edges per source vertex
CREATE QUERY sampleEx3() FOR GRAPH computerNet
{
MapAccum<STRING,ListAccum<STRING>> @@absEdges; // record each selected edge as (src->tgt)
SumAccum<INT> @@totalAbs;
MapAccum<STRING,ListAccum<STRING>> @@pctEdges; // record each selected edge as (src->tgt)
SumAccum<INT> @@totalPct;
start = {computer.*};
# Sample one outgoing edge per source vertex = Random Walk
absSample = SELECT v FROM start:s -(:e)-> :v
SAMPLE 1 EDGE WHEN s.outdegree() >= 1 # sample 1 target vertex from each source vertex
ACCUM @@absEdges += (s.id -> v.id),
@@totalAbs += 1;
PRINT @@totalAbs, @@absEdges;
pctSample = SELECT v FROM start:s -(:e)-> :v
SAMPLE 33% EDGE WHEN s.outdegree() >= 3 # select ~1/3 of edges when outdegree >= 3
ACCUM @@pctEdges += (s.id -> v.id),
@@totalPct += 1;
PRINT @@totalPct, @@pctEdges;
}
sampleEx3.json
GSQL > RUN QUERY sampleEx3()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [
{
"@@totalAbs": 7,
"@@absEdges": {
"c4": ["c23"],
"c11": ["c12"],
"c10": ["c11"],
"c12": ["c14"],
"c23": ["c26"],
"c14": ["c24"],
"c1": ["c10"]
}
},
{
"@@totalPct": 13,
"@@pctEdges": {
"c4": ["c23"],
"c11": ["c12"],
"c10": ["c11"],
"c12": [
"c14",
"c15",
"c19"
],
"c23": [
"c29",
"c25"
],
"c14": [
"c24",
"c23"
],
"c1": [
"c3",
"c8",
"c2"
]
}
}
]
}

Below is an example of using SELECT to only traverse one edge for each source vertex. The vertex-attached accumulators @timesTraversedNoSample and @timesTraversedWithSample are used to keep track of the number of times an edge is traversed to reach the target vertex. Without using sampling, this occurs once for each edge; thus @timesTraversedNoSample has the same number as the in-degree of the vertex. With sampling edges, the number of edges is restricted. This is reflected in the @timesTraversedWithSample accumulator. Notice the difference in the result set. Because only one edge per source vertex is traversed when the SAMPLE clause is used, not all target vertices are reached. The vertex company3 has 3 incident edges, but in one instance of the query execution, it is never reached. Additionally, company2 has 6 incident edges, but only 4 source vertices sampled an edge incident to company2 .

example of SAMPLE using an absolute number of edges
CREATE QUERY sampleEx1() FOR GRAPH workNet
{
SumAccum<INT> @timesTraversedNoSample;
SumAccum<INT> @timesTraversedWithSample;
workers = {person.*};
# The 'beforeSample' result set encapsulates the normal functionality of
# a SELECT statement, where 'timesTraversedNoSample' vertex accumulator is increased for
# each edge incident to the vertex.
beforeSample = SELECT v FROM workers:t -(:e)-> :v
ACCUM v.@timesTraversedNoSample += 1;
# The 'afterSample' result set is formed by those vertices which can be
# reached when for each source vertex, only one edge is used for traversal.
# This is demonstrated by the values of 'timesTraversedWithSample' vertex accumulator, which
# is increased for each edge incident to the vertex which is used in the
# sample.
afterSample = SELECT v FROM workers:t -(:e)-> :v
SAMPLE 1 EDGE WHEN t.outdegree() >= 1 # only use 1 edge from the source vertex
ACCUM v.@timesTraversedWithSample += 1;
PRINT beforeSample;
PRINT afterSample;
}
sampleEx1.json
GSQL > RUN QUERY sampleEx1()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [
{"beforeSample": [
{
"v_id": "company4",
"attributes": {
"country": "us",
"@timesTraversedNoSample": 1,
"@timesTraversedWithSample": 1,
"id": "company4"
},
"v_type": "company"
},
{
"v_id": "company5",
"attributes": {
"country": "can",
"@timesTraversedNoSample": 1,
"@timesTraversedWithSample": 1,
"id": "company5"
},
"v_type": "company"
},
{
"v_id": "company3",
"attributes": {
"country": "jp",
"@timesTraversedNoSample": 3,
"@timesTraversedWithSample": 3,
"id": "company3"
},
"v_type": "company"
},
{
"v_id": "company2",
"attributes": {
"country": "chn",
"@timesTraversedNoSample": 6,
"@timesTraversedWithSample": 4,
"id": "company2"
},
"v_type": "company"
},
{
"v_id": "company1",
"attributes": {
"country": "us",
"@timesTraversedNoSample": 6,
"@timesTraversedWithSample": 3,
"id": "company1"
},
"v_type": "company"
}
]},
{"afterSample": [
{
"v_id": "company4",
"attributes": {
"country": "us",
"@timesTraversedNoSample": 1,
"@timesTraversedWithSample": 1,
"id": "company4"
},
"v_type": "company"
},
{
"v_id": "company5",
"attributes": {
"country": "can",
"@timesTraversedNoSample": 1,
"@timesTraversedWithSample": 1,
"id": "company5"
},
"v_type": "company"
},
{
"v_id": "company3",
"attributes": {
"country": "jp",
"@timesTraversedNoSample": 3,
"@timesTraversedWithSample": 3,
"id": "company3"
},
"v_type": "company"
},
{
"v_id": "company2",
"attributes": {
"country": "chn",
"@timesTraversedNoSample": 6,
"@timesTraversedWithSample": 4,
"id": "company2"
},
"v_type": "company"
},
{
"v_id": "company1",
"attributes": {
"country": "us",
"@timesTraversedNoSample": 6,
"@timesTraversedWithSample": 3,
"id": "company1"
},
"v_type": "company"
}
]}
]
}

Since the PRINT statements are placed at the end of query, the two vertex sets beforeSample and afterSample are almost identical, showing the final values of both accumulators@timesTraversedNoSample and @timesTraversedWithSample. There is one difference: company3 is not included in afterSample because none of the sample-selected edges reached company3.

WHERE Clause

The WHERE clause is an optional clause that constrains edges and vertices specified in the FROM and SAMPLE clauses.

EBNF for Where Clause
whereClause := WHERE condition

The WHERE clause uses a boolean condition to test each vertex or edge in the FROM set (or the sampled vertex and edge sets, if the SAMPLE clause was used).

If the expression evaluates to false for vertex/edge X, then X excluded from further consideration in the result set. The expression may use constants or any variables or parameters within the scope of the SELECT, arithmetic operators (+, -, *, /,%), comparison operators (==, !=, <, <=, >,>=), boolean operators (AND, OR, NOT), set operators (IN, NOT IN) and parentheses to enforce precedence. The WHERE conditional expression may use any of the variables within its scope (global accumulators, vertex set variables, query input parameters, the FROM clause's vertex and edge sets (or their vertex and edge aliases), or any of the attributes or accumulators of the vertex/edge sets.) For a more formal explanation of condition, see the EBNF definitions of condition and expr.

Using built-in vertex and edge attributes and functions, such as .type and .neighbors(), the WHERE clause can be used to implement sophisticated selection rules for the edge traversal. In the following example, the selection conditions are completely specified in the WHERE clause, with no edge types or vertex types mentioned in the FROM clause.

WHERE used as a filter
resultSet1 = SELECT v FROM S:v-((E1|E2|E3):e)->(V1|V2):t;
resultSet2 = SELECT v FROM S:v-(:e)->:t
WHERE t.type IN ("V1", "V2") AND
t IN v.neighbors("E1|E2|E3")

The following examples demonstrate using the WHERE clause to limit the resulting vertex set based on a vertex attribute.

Basic SELECT WHERE
CREATE QUERY printCatPosts() FOR GRAPH socialNet {
posts = {post.*};
catPosts = SELECT v FROM posts:v # select only those post vertices
WHERE v.subject == "cats"; # which have a subset of 'cats'
PRINT catPosts;
}
Results for Query printCatPosts
GSQL > RUN QUERY printCatPosts()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"catPosts": [
{
"v_id": "10",
"attributes": {
"postTime": "2011-02-04 03:02:31",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "9",
"attributes": {
"postTime": "2011-02-05 23:12:42",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "3",
"attributes": {
"postTime": "2011-02-05 01:02:44",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "11",
"attributes": {
"postTime": "2011-02-03 01:02:21",
"subject": "cats"
},
"v_type": "post"
},
{
"v_id": "8",
"attributes": {
"postTime": "2011-02-03 17:05:52",
"subject": "cats"
},
"v_type": "post"
}
]}]
}
SELECT WHERE using IN operator
CREATE QUERY findGraphFocusedPosts() FOR GRAPH socialNet
{
posts = {post.*};
results = SELECT v FROM posts:v # select only post vertices
WHERE v.subject IN ("Graph", "tigergraph"); # which have a subject of either 'Graph' or 'tigergraph'
PRINT results;
}
Results for Query findGraphFocusedPosts
GSQL > RUN QUERY findGraphFocusedPosts()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"results": [
{
"v_id": "5",
"attributes": {
"postTime": "2011-02-06 01:02:02",
"subject": "tigergraph"
},
"v_type": "post"
},
{
"v_id": "1",
"attributes": {
"postTime": "2011-03-03 23:02:00",
"subject": "tigergraph"
},
"v_type": "post"
},
{
"v_id": "6",
"attributes": {
"postTime": "2011-02-05 02:02:05",
"subject": "tigergraph"
},
"v_type": "post"
}
]}]
}

WHERE NOT limitations

The NOT operator may not be used in combination with the .type attribute selector. To check if an edge or vertex type is not equal to a given type, use the != operator. See the example below.

The following example shows the equivalence of using WHERE as a type filter as well as its limitations.

SELECT WHERE using AND/OR
# finds female person in the social network. all of the following statements
# are equivalent (i.e., produce the same results)
CREATE QUERY findFemaleMembers() FOR GRAPH socialNet
{
allVertices = {ANY}; # includes all posts and person
females = SELECT v FROM allVertices:v
WHERE v.type == "person" AND
v.gender != "Male";
females = SELECT v FROM allVertices:v
WHERE v.type == "person" AND
v.gender == "Female";
females = SELECT v FROM allVertices:v
WHERE v.type == "person" AND
NOT v.gender == "Male";
females = SELECT v FROM allVertices:v
WHERE v.type != "post" AND
NOT v.gender == "Male";
# does not compile. cannot use NOT operator in combination with type attribute
#females = SELECT v FROM allVertices:v
# WHERE NOT v.type != "person" AND
# NOT v.gender == "Male";
# does not compile. cannot use NOT operator in combination with type attribute
#females = SELECT v FROM allVertices:v
# WHERE NOT v.type == "post" AND
# NOT v.gender == "Male";
personVertices = {person.*};
females = SELECT v FROM personVertices:v
WHERE NOT v.gender == "Male";
females = SELECT v FROM personVertices:v
WHERE v.gender != "Male";
females = SELECT v FROM personVertices:v
WHERE v.gender != "Male" AND true;
females = SELECT v FROM personVertices:v
WHERE v.gender != "Male" OR false;
PRINT females;
}
Results for Query findFemaleMembers
GSQL > RUN QUERY findFemaleMembers()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"females": [
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"id": "person4"
},
"v_type": "person"
},
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"id": "person5"
},
"v_type": "person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"id": "person2"
},
"v_type": "person"
}
]}]
}

The following example uses edge attributes to determine which workers are registered as full time for some company.

WHERE using edge attributes
# find all workers who are full time at some company
CREATE QUERY fullTimeWorkers() FOR GRAPH workNet
{
start = {person.*};
fullTimeWorkers = SELECT v FROM start:v -(worksFor:e)-> company:t
WHERE e.fullTime; # fullTime is a boolean attribute on the edge
PRINT fullTimeWorkers;
}
fullTimeWorkers Results
GSQL > RUN QUERY fullTimeWorkers()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"fullTimeWorkers": [
{
"v_id": "person4",
"attributes": {
"interestList": ["football"],
"skillSet": [ 10, 1, 4 ],
"skillList": [ 4, 1, 10 ],
"locationId": "us",
"interestSet": ["football"],
"id": "person4"
},
"v_type": "person"
},
{
"v_id": "person11",
"attributes": {
"interestList": [ "sport", "football" ],
"skillSet": [10],
"skillList": [10],
"locationId": "can",
"interestSet": [ "football", "sport" ],
"id": "person11"
},
"v_type": "person"
},
{
"v_id": "person10",
"attributes": {
"interestList": [ "football", "sport" ],
"skillSet": [3],
"skillList": [3],
"locationId": "us",
"interestSet": [ "sport", "football" ],
"id": "person10"
},
"v_type": "person"
},
{
"v_id": "person1",
"attributes": {
"interestList": [ "management", "financial" ],
"skillSet": [ 3, 2, 1 ],
"skillList": [ 1, 2, 3 ],
"locationId": "us",
"interestSet": [ "financial", "management" ],
"id": "person1"
},
"v_type": "person"
},
{
"v_id": "person6",
"attributes": {
"interestList": [ "music", "art" ],
"skillSet": [ 10, 7 ],
"skillList": [ 7, 10 ],
"locationId": "jp",
"interestSet": [ "art", "music" ],
"id": "person6"
},
"v_type": "person"
},
{
"v_id": "person2",
"attributes": {
"interestList": ["engineering"],
"skillSet": [ 6, 5, 3, 2 ],
"skillList": [ 2, 3, 5, 6 ],
"locationId": "chn",
"interestSet": ["engineering"],
"id": "person2"
},
"v_type": "person"
},
{
"v_id": "person8",
"attributes"